Real-world Web browser history detection results (WARNING: not updated since 2010)

For the last six months, this website has served as a tool to teach Internet users about Web browser history detection, which allows any website on the Internet to view the browsing history of most of its viewers.

At the same time, we were analyzing the problem in more detail to determine how many of our visitors were affected by this attack, how difficult it is to scan browsers' histories for visited sites and resources, and how much information can be gathered about most of us in this manner. We're pleased to announce that we'll be presenting our results at the Web 2.0 Security and Privacy 2010 workshop on May 20th in Oakland. You can view the full paper, or read on for the highlights.

Main results

  • How many people are affected?
    • We analyzed the results from over a quarter of a million people who ran our tests in the last few months, and found that we can detect browsing histories for over 76% of them. All major browsers allow their users' history to be detected, but it seems that users of the more modern browsers such as Safari and Chrome are more affected; we detected visited sites for 82% of Safari users and 94% of Chrome users.
    • Visitors with JavaScript turned off are just as vulnerable to history detection as JS-enabled browsers. We detected histories for 77% of such users; for some tests, users without JavaScript had more visited sites detected than others. Read our details page to see why turning off JavaScript or installing NoScript won't help in this case.
  • How much information can be gathered?
    • While our tests were quite limited, for our test of 5000 most popular websites, we detected an average of 63 visited locations (13 sites and 50 subpages on those sites); the medians were 8 and 17 respectively.
    • Almost 10% of our visitors had over 30 visited sites and 120 subpages detected -- heavy Internet users who don't protect themselves are more affected than others.
    • We also detected zipcodes our visitors had typed into online forms on sites such as Yahoo! Movies or for 9.8% of users.
  • How easy is it to detect browsing histories?
    • The ability to detect visitors' browsing history requires just a few lines of code. Armed with a list of websites to check for, a malicious webmaster can scan over 25 thousand links per second (1.5 million links per minute) in almost every recent browser.
    • Most websites and pages you view in your browser can be detected as long as they are kept in your history. Almost every address that was in your browser's address bar can be detected (this includes most pages, including those retrieved using https and some forms with potentialy private information such as your zipcode or search query). Pages won't be detected when they expire from your history (usually after a month or two), or if you manually clear it.
More results and system design details are available in the full paper. If you'd like to learn about steps browsers vendors are taking to address the problem in next-generation browsers, read David Baron's proposal, and associated discussions in the Mozilla forums.

Who visits the FBI most wanted list?

In addition to global comparisons, we also performed a more targeted examination of users who visited FBI's Ten Most Wanted Fugitives page. The data was, again, taken from the top5k test; overall number of users who had visited the link was 178. The raw list of other websites visited by those users is available here. Several interesting observations can be made; for example, only 44% of users had visited the main FBI page, which hints that they came from off-site or had the specific page bookmarked. In addition, some less-popular Internet locations were visited by a large number of users from the analyzed group; the averages for torrent sites, adult sites, and 4chan, are significantly higher than for the rest of our data set.

End of the project

As of today the project is finished. In summary two ending reports are done. Below dumps are shown.

Report after scan could have had the following form: Followed with a more detailed list of secondary resources: