ML
CB
Can websites track you without telling you about it or giving you an easy way to see what they're doing? Yes. Can we stop it? Yes!
With the aim of increasing online privacy, we present a novel, machine-learning based approach to blocking one of the three main ways website visitors are tracked online—canvas fingerprinting. Because the act of canvas fingerprinting uses, at its core, a JavaScript program, and because many of these programs are reused across the web, we are able to fit several machine learning models around a semantic representation of a potentially offending program, achieving accurate and robust classifiers. Our supervised learning approach is trained on a dataset we created by scraping roughly half a million websites using a custom Google Chrome extension storing information related to the canvas. Classification leverages our key insight that the images drawn by canvas fingerprinting programs have a facially distinct appearance, allowing us to manually classify files based on the images drawn; we take this approach one step further and train our classifiers not on the malleable images themselves, but on the more-difficult-to-change, underlying source code generating the images. As a result, ML-CB allows for more accurate tracker blocking.
484 K
Websites scraped
3 .51K
Training images
84 K
Training programs
We achieved a +16.6% improvement over the state of the art for detecting Canvas fingerprinting techniques.
ML-CB uses machine learning and was trained on a dataset we created by scraping roughly half a million websites using a custom Google Chrome extension storing information related to the canvas. Out new classifier is highly accurate, robust, and resiliant to modern-day anti-fingerprinting techniques.