David Cancel, the CTO of the web market research firm Compete Incorporated, raised eyebrows at the Open Data 2007 Conference in New York when he revealed that many Internet service providers sell the clickstream data of their users. Clickstream data includes every web site visited by each user and in which order they were clicked.
The data is not sold with accompanying user names or information but merely as a numerical user value. However, it is still theoretically possible to tie this information to a specific ISP account. Cancel told Ars that his company licenses the data from ISPs for millions of dollars. He did not give a specific figure about what this broke down to in terms of dollars per ISP user, although someone in the audience estimated that it was in the range of 40¢ per user per month—this estimate was erroneously attributed to Cancel himself in some reports on the event. Cancel said that this clickstream data is “much more comprehensive” than data that is normally gleaned through analyzing search queries.
The revelation brings to mind the minor scandal that erupted when AOL was found to be giving away its search results to researchers—this was discovered only after a large sample of data was accidentally released to the public. Clickstream data is, as Cancel admitted, much more interesting to marketers than search engine data.
There is some evidence that the data traditionally gathered by market research firms such as Compete may not be as accurate as the companies had hoped. Establishing “ISP relationships” helps to augment that data—traditionally gathered by user-installed toolbars—but may not be, strictly speaking, an ethical practice.
Of course, it’s an established fact that if you surf the web, your surfing habits can be tracked by any site you happen to visit. The expectation of privacy while surfing isn’t necessarily a reasonable one, but ISPs that expect to profit from this data should at the very least inform their users that they are doing so. Some US Congressmen are calling for new laws to be enacted to protect the privacy of Internet users.
For his part, David Cancel told Ars that he “strongly supports an increase in the methods and degree to which disclosure is communicated,” not only for clickstream data but for any kind of data collected on users’ personal surfing habits. He stated that “all users should be informed explicitly when their data can be sold to a third party.”