Anurag Chaudhary | Freelancer Google’s F Lo C And Its Impact On Privacy

Google’s FLoC and its impact on Privacy

Google’s FLoC Impact on Privacy of User For more than two decades, the third-party cookie had been backing multi-billion dollar advertising surveillance that had netizens followed across the web, profiled and retargeted based on their online activity. Although the technology worked in favour of marketers, it had risen beyond control for users, permeating their online browsing, breaching privacy, and transpiring a common choice that it be subdued for the better. Citing the privacy-endangering aspect, several browsers including Firefox and Safari have started phasing out third-party cookies from their platforms by default. With cookies’ departure also sets sail the days of personalised advertising, leaving a void for replacement. Chrome, still on the cookie-fed ventilator, has to find its feet on the ground, amid the foundation of the privacy landscape moving. Thus is proposed a viable means, in congruence with less intrusive, mass-shared and user-centric. Google in March 2021, almost a year after Safari facilitated users to turn off third-party cookies, announced that they would put an end to third-party trackers by 2022 and replace the aforementioned with Privacy Sandbox’s Federated Learning of Cohorts (FLoC for short). The idea behind FLoC is to serve ads based on the interest of the users without their browsing history revealed to advertisers. FLoC replaces third-party cookies with a new technology called “cohort” identifier which basically involves grouping users with similar interests. Tracking mechanism of FLoC Reportedly, FLoC will use the SimHash algorithm. It was originally created for use by Google web crawlers to detect nearly identical web pages. With FLoC, users’ browsing history remains private. Instead of cookies’ way of tracking users’ browsing history, FLoC categories users with similar browsing behaviour into numbered “cohorts”. Each cohort, or simply group, contains thousands of users. This method hides individuals in the group and deploys on-device processing to keep a person’s web history private on the browser. Since this happens locally, on user’s devices, their data wouldn’t get stored on a server--one of the privacy concerns linked with third-party cookies. According to the proposed model, each week an individual’s browser will run a review of the sites visited by the individual and cluster them to a cohort. Each cohort holds visitors’ interest and behaviour data for up to a week and is updated weekly based on the prior week’s interest and behaviour data. FLoC assigns an anonymised ID to the accumulated browser history of an individual and adds it to a group of other browsers with similar behaviours where the overall patterns are accessible to advertisers. (Note: The website, along with its contents, influence the user ‘clustering’.) Let's understand how Google Chrome’s algorithms assign users a common “cohort” with an example, but before that let’s acquaint with different parties involved in the process: ● ● ● The advertiser (a company that pays for advertising) let’s say is an online shoe retailer: shoestore.example The publisher (a site that sells ad space) let’s say is a news site: dailynews.exmple The adtech platform (one that provides software and tools to deliver advertising): adnetwork.example Let’s, for this particular case, call users, Brad and Angelina whose browsers belong to the same cohort, 1234. (Note: Names are random. With FLoC, names and individual identities are not revealed to the advertisers, publishers, or adtech platforms. Also, think of cohorts as a grouping of browsing activity, not a collection of people.) Let’s see the different layers of serving ads: 1. FLoC service: The FLoC service of the browser formulates a mathematical model with thousands of “cohorts”, each representing thousands of web browsers with similar browsing histories. Each cohort is issued an ID. 2. Browser: From the FLoC service, Brad’s browser gets data describing the FLoC model. Using the FLoC model’s algorithm, Brad’s browser exercises which cohort corresponds closely to its own browsing history, which for this case is 1234. (Note: Brad’s browser doesn’t share any data with FLoC service.) Similarly, Angelina’s browser calculates its cohort ID and associates itself to 1234. (Note: Angelina’s browsing history is different from Brad’s yet close enough to belong to the same cohort.) 3. Advertiser: Brad, looking for hiking boots, visits shoestore.example. The site fetches cohort 1234 from Brad’s browser. The site registers that someone from cohort 1234 exhibited an interest in hiking boots. The site also registers some additional interest in its product from the same cohort, as well as from other cohorts, which it periodically aggregates and shares with adtech platform, adnetwork.example. 4. Publisher: Angelina visits dailynews.example where the site asks Angelina’s browser for its cohort. The site then makes a request to its adtech platform, adnetwork.example, for an ad, including Angelina’s browser’s cohort, 1234. 5. Adtech platform: adnetwork.example selects an ad suitable for Angelina by mixing the data--Angelina’s cohort (1234) provided by dailynews.example and data related to cohorts and product interests provided by shoestore.example--acquired from the publisher and the advertiser. Adnework.example selects an ad for hiking boots for Angelina and dailynews.example displays the ad. Impact on advertisers and publishers Nowadays, when people are becoming more and more privacy-conscious, switching to cohorts can be seen as a go-to strategy for marketers, rather than interpreting it to future-proofing marketing strategies, for there is nothing new about cohorts. In fact, the very concept around which FLoC is built--the clustering of large groups of people with a shared interest in such a manner that privacy stays unviolated--has been a marketing principle for nearly forever. Cohorts pose the same limitations for advertisers and publishers as used to third-party cookies: insufficient, time-bounded, browser-level insights of their audience. Advertisers are limited to seeing only the cohort an individual belongs to; without any info about characteristics that link its members. As the case with FLoC appears, advertisers and publishers should forget about delivering bespoke experiences to individuals like the case with third-party trackers. Capitalisation on data and building billion-dollar companies off of it would soon get pivoted around the privacy hinge wasn’t something that marketers expected. But if anything was sure about advertising’s future, was cohorts coming to full-fledged potential. From the Google Blog we find, results driven from simulation tests run on the effectiveness of principles defined in Chrome’s FLoC proposal yield at least 95% of the conversions per dollar spent compared to cookie-based advertising. https://blog.google/products/ads-commerce/2021-01-privacy-sandbox/ The conjunction of cohorts and probabilistic data--identifying users by matching them with a known user who exhibits similar browsing behaviour--is a well-established concept within many of the world’s largest enterprises, but it hasn’t received mainstream attention--until now. Probabilistic onboarding is all about structuring cohorts and finding new customers. This business strategy which lies at the heart of Google’s FLoC, can’t be overshadowed on the quest for personalisation. With the implementation of FLoC, Google wants advertisers and publishers to begin tracking user activity with its own first-party cookies rather than depending on third-party data. The marketer’s solution to FLoC will be leveraging first-party data, which will no longer be optional but will comprise the core component of any successful marketing strategy, for creating better customer experiences and optimizing marketing efforts. “73% of consumers are willing to share more data if a company is transparent about how and why it is used.” Privacy analysis of FLoC There are numerous privacy issues with FLoC that are getting public attention way before launch. We are addressing here a few: Cohort IDs can be used for tracking According to Firefox CTO Eric Rescorla, cohorts will likely consist of thousands of users at most. Tracking companies can employ browser fingerprinting to narrow down the list of potential users in a cohort to just a few very quickly. To do so, trackers would only require “a relatively small amount of information” when combined with a FLoC cohort. This is possible through a number of ways: Browser Fingerprinting Even though users’ local browsing data is not shared--only cohort information is transmitted--that data along with other data exposed in the browser can be compiled to create a unique fingerprint of each person. Each detail of user-specific variation--like browser type, OS brand, language, country--can help reveal a distinction between users. In case a cohort of about 10000 users is divided into 5000 groups with a fingerprint technique, the number of users in each FLoC cohort pair/fingerprinting group narrows down to as low as one-digit--as easy as pie to identify people individually. Though this is not possible with cohorts of large size, it doesn’t set FLoC free from individual targeting. Multiple visits People’s interests online aren’t constant and neither are their FLoC IDs which are recomputed every next week. If a tracker succeeds in using other already available information to link up users multiple visits over time, it’s within their capacity to distinguish individual users by combining FLoC IDs in week 1, week 2, etc. It poses a big challenge for de-anonymisation as FLoC restores cross-site tracking even if users have anti-tracking mechanisms enabled. The project’s Github page states, “Sites that know a person’s PII (e.g., when people sign in using their email address) could record and reveal their cohorts. This means that information about an individual’s interest may eventually become public.” In other words, FLoC’s technology will share personal data with existing trackers which already identify users. https://github.com/WICG/floc FLoC exposes ton load of info other than necessary A site interested in learning users' interests only needs to participate in tracking the user across a large number of sites or work with some other big trackers. Because FLoC IDs are common across all sites, they become a shared key to which trackers can link data with external sources, making it possible for a tracker with a large first-party interest database to work out a service that answers questions about the interests of a given FLoC ID, like “Do people with this cohort ID like pizza?” To do so, all a site needs to do is call the FLoC APIs to fetch the cohort ID and then use it to scan information in the service. Also, this ID can be combined with fingerprinting data to learn a lot more about a user. For example, “Do people who have this cohort ID, live in India and use Safari have any affinity for a certain product?” Safety of sensitive information Google has proposed that it will suppress FLoC cohorts which it finds closely linked with “sensitive” topics. In a whitepaper entitled “Measuring Sensitivity of Cohorts Generated by the FLoC API” Google details out its strategy regarding the safety of sensitive data. If Google finds users in a given cohort frequently visiting a set of sites with sensitive info, they will return an empty cohort ID pertinent to that cohort. In addition, they will also remove sites that they find sensitive from the FLoC computation. However, complications with the sensitive info categorisation--like people’s disagreement over what qualifies as sensitive for them, incomplete formulation of sensitive categories, correlation of non-sensitive sites with sensitive sites--make Google’s defence mechanism quite a hard task to execute. Although Google has proposed plenty of countermeasures to mitigate sensitive data-related problems, including making FLoC opt-in for websites and suppressing cohorts associated with sensitive topics, Firefox finds it not enough. Addressing this issue, Rescorla wrote, “While these mitigations seem useful, they seem to mostly be improvements at the margins, and don’t address the basic issues described above, which we believe require further study by the community.” Honing the significance attached to protection of sensitive data in post-cookie era, Marshall Vale, the product manager at Google’s privacy sandbox, writes: “Before a cohort becomes eligible, Chrome analyses it to see if the cohort is visiting pages with sensitive topics, such as medical websites or websites with religious content, at a high rate. If so, Chrome ensures that the cohort isn’t used, without learning which sensitive topics users were interested in.” https://blog.google/products/chrome/privacy-sustainability-and-the-importance -of-and/ FLoC is getting booed, for obvious reasons FLoC is only being tested in countries where GDPR is not in place. FLoC trial in the European Union has been paused on the grounds of GDPR non-compliance. FLoC lacks the consent mechanism for users to opt-out of having their interest and behavioural data included for advertising. According to Malwarebytes, millions of Chrome users were automatically made part of the FLoC’s pilot without being informed. Despite Google’s rhetoric stance on safeguarding user privacy, Google started testing FLoC without sending individualised notifications to users. Chrome users have no option to opt-out, instead having to block all third-party cookies to pull out of the trial. https://blog.malwarebytes.com/cybercrime/privacy/2021/04/millions-of-chromeusers-quietly-added-to-googles-floc-pilot/ In one of the Electronic Frontier Foundation (EFF) posts, “Google’s FLoC Is a Terrible Idea”, Bennett Cyphers, author of the article, writes: Google is adopting a false dichotomy when it comes to privacy. “Instead of re-inventing the tracking wheel, we should imagine a better world without the myriad problems of targeted ads.” The author argues that users’ options should not be truncated to “You either have old tracking or new tracking”. https://www.eff.org/deeplinks/2021/03/googles-floc-terrible-idea Privacy pundits like DuckDuckGo and Brave browser take issue with all forms of tracking. Citing Google’s tracking via FLoC is non-optional, DuckDuckGo raised voice against Google’s new tracking technology. It’s bringing FLoC-blocking features to DuckDuckGo search engine and Chrome browser extension. Brave browser said that FLoC is promoting a false notion of what privacy is, and why privacy is important. Conclusion The privacy awareness is not beneficial for targeted advertisers or Google. All the hullabaloo about FLoC is for the underlying reason that Google’s FLoC is plagued with a number of privacy risks if it were deployed in its current form, which is in the testing phase. At first glance FLoC appears to be a win-win for advertisers, publishers and internet users, but there is more to it than easy execution and Google’s dream of dominance in advertisement, which we shall see once Google finally uncover its long-awaited advertising technique and market’s response to it. While FLoC has been a matter of uncertainty for marketers recently, it’s time for them to get serious about leveraging first-party data strategy, which is the future of digital marketing.