Tuesday, March 30, 2004

[strike out]

[strike out]: "This is not an exciting story: I happened to be browsing aimlessly through case studies and other publications released by Microsoft as a part of their 'Get the facts' initiative. At one point, I stumbled upon a Word file I wanted to read - and as soon as I ran it through wvWare, I noticed there is a good deal of amusing change tracking information still recorded within the document. Naturally, publishing documents with 'collaboration' data is not unheard of in the corporate world, but the fact Microsoft had became a victim of their own technology, and had failed to run their own tools against these publications makes it more entertaining. On a more serious note, it serves as a good warning it is really difficult to manage this, and that inline filtering tools on SMTP gateways and in web publishing systems may be necessary in some corporate environments.

A pointless idea came to my mind that instant: why not run a gentle web spider against all Microsoft sites in English, specifically looking for other instances of tracking data not removed from documents? I coded a bunch of scripts and let them run through the night, fetching approximately 10,000 unique documents; over 10% was identified as containing change tracking records. I decided to collect only those with deleted text still present, yielding a crop of over 5% of all documents. Quite impressive. Below, you will find a brief (and rest assured, incomplete) list of the most entertaining samples I've run into, along with some speculation (and only speculation) as to the reasons we see them."

No comments:

Edward A. Villarreal. Powered by Blogger.

Labels

Total Pageviews