Google’s FLoC and its impact on Privacy
Google’s FLoC Impact on Privacy of
User
For more than two decades, the third-party cookie had been backing multi-billion
dollar advertising surveillance that had netizens followed across the web, profiled
and retargeted based on their online activity. Although the technology worked in
favour of marketers, it had risen beyond control for users, permeating their online
browsing, breaching privacy, and transpiring a common choice that it be subdued
for the better.
Citing the privacy-endangering aspect, several browsers including Firefox and
Safari have started phasing out third-party cookies from their platforms by default.
With cookies’ departure also sets sail the days of personalised advertising, leaving a
void for replacement. Chrome, still on the cookie-fed ventilator, has to find its feet
on the ground, amid the foundation of the privacy landscape moving. Thus is
proposed a viable means, in congruence with less intrusive, mass-shared and
user-centric.
Google in March 2021, almost a year after Safari facilitated users to turn off
third-party cookies, announced that they would put an end to third-party trackers
by 2022 and replace the aforementioned with Privacy Sandbox’s Federated Learning
of Cohorts (FLoC for short). The idea behind FLoC is to serve ads based on the
interest of the users without their browsing history revealed to advertisers. FLoC
replaces third-party cookies with a new technology called “cohort” identifier which
basically involves grouping users with similar interests.
Tracking mechanism of FLoC
Reportedly, FLoC will use the SimHash algorithm. It was originally created for use
by Google web crawlers to detect nearly identical web pages. With FLoC, users’
browsing history remains private. Instead of cookies’ way of tracking users’
browsing history, FLoC categories users with similar browsing behaviour into
numbered “cohorts”.
Each cohort, or simply group, contains thousands of users. This method hides
individuals in the group and deploys on-device processing to keep a person’s web
history private on the browser. Since this happens locally, on user’s devices, their
data wouldn’t get stored on a server--one of the privacy concerns linked with
third-party cookies.
According to the proposed model, each week an individual’s browser will run a
review of the sites visited by the individual and cluster them to a cohort. Each cohort
holds visitors’ interest and behaviour data for up to a week and is updated weekly
based on the prior week’s interest and behaviour data.
FLoC assigns an anonymised ID to the accumulated browser history of an individual
and adds it to a group of other browsers with similar behaviours where the overall
patterns are accessible to advertisers. (Note: The website, along with its contents,
influence the user ‘clustering’.)
Let's understand how Google Chrome’s algorithms assign users a common
“cohort” with an example, but before that let’s acquaint with different parties
involved in the process:
●
●
●
The advertiser (a company that pays for advertising) let’s say is an online
shoe retailer: shoestore.example
The publisher (a site that sells ad space) let’s say is a news site:
dailynews.exmple
The adtech platform (one that provides software and tools to deliver
advertising): adnetwork.example
Let’s, for this particular case, call users, Brad and Angelina whose browsers belong
to the same cohort, 1234. (Note: Names are random. With FLoC, names and
individual identities are not revealed to the advertisers, publishers, or adtech
platforms. Also, think of cohorts as a grouping of browsing activity, not a collection
of people.)
Let’s see the different layers of serving ads:
1. FLoC service: The FLoC service of the browser formulates a mathematical
model with thousands of “cohorts”, each representing thousands of web
browsers with similar browsing histories. Each cohort is issued an ID.
2. Browser: From the FLoC service, Brad’s browser gets data describing the
FLoC model. Using the FLoC model’s algorithm, Brad’s browser exercises
which cohort corresponds closely to its own browsing history, which for this
case is 1234. (Note: Brad’s browser doesn’t share any data with FLoC service.)
Similarly, Angelina’s browser calculates its cohort ID and associates itself to
1234. (Note: Angelina’s browsing history is different from Brad’s yet close
enough to belong to the same cohort.)
3. Advertiser: Brad, looking for hiking boots, visits shoestore.example. The site
fetches cohort 1234 from Brad’s browser. The site registers that someone
from cohort 1234 exhibited an interest in hiking boots. The site also registers
some additional interest in its product from the same cohort, as well as from
other cohorts, which it periodically aggregates and shares with adtech
platform, adnetwork.example.
4. Publisher: Angelina visits dailynews.example where the site asks Angelina’s
browser for its cohort. The site then makes a request to its adtech platform,
adnetwork.example, for an ad, including Angelina’s browser’s cohort, 1234.
5. Adtech platform: adnetwork.example selects an ad suitable for Angelina by
mixing the data--Angelina’s cohort (1234) provided by dailynews.example
and data related to cohorts and product interests provided by
shoestore.example--acquired from the publisher and the advertiser.
Adnework.example selects an ad for hiking boots for Angelina and
dailynews.example displays the ad.
Impact on advertisers and publishers
Nowadays, when people are becoming more and more privacy-conscious, switching
to cohorts can be seen as a go-to strategy for marketers, rather than interpreting it
to future-proofing marketing strategies, for there is nothing new about cohorts. In
fact, the very concept around which FLoC is built--the clustering of large groups of
people with a shared interest in such a manner that privacy stays unviolated--has
been a marketing principle for nearly forever.
Cohorts pose the same limitations for advertisers and publishers as used to
third-party cookies: insufficient, time-bounded, browser-level insights of their
audience. Advertisers are limited to seeing only the cohort an individual belongs to;
without any info about characteristics that link its members. As the case with FLoC
appears, advertisers and publishers should forget about delivering bespoke
experiences to individuals like the case with third-party trackers.
Capitalisation on data and building billion-dollar companies off of it would soon get
pivoted around the privacy hinge wasn’t something that marketers expected. But if
anything was sure about advertising’s future, was cohorts coming to full-fledged
potential. From the Google Blog we find, results driven from simulation tests run on
the effectiveness of principles defined in Chrome’s FLoC proposal yield at least 95%
of the conversions per dollar spent compared to cookie-based advertising.
https://blog.google/products/ads-commerce/2021-01-privacy-sandbox/
The conjunction of cohorts and probabilistic data--identifying users by matching
them with a known user who exhibits similar browsing behaviour--is a
well-established concept within many of the world’s largest enterprises, but it
hasn’t received mainstream attention--until now. Probabilistic onboarding is all
about structuring cohorts and finding new customers. This business strategy which
lies at the heart of Google’s FLoC, can’t be overshadowed on the quest for
personalisation.
With the implementation of FLoC, Google wants advertisers and publishers to begin
tracking user activity with its own first-party cookies rather than depending on
third-party data. The marketer’s solution to FLoC will be leveraging first-party
data, which will no longer be optional but will comprise the core component of any
successful marketing strategy, for creating better customer experiences and
optimizing marketing efforts.
“73% of consumers are willing to share more data if a company is transparent about
how and why it is used.”
Privacy analysis of FLoC
There are numerous privacy issues with FLoC that are getting public attention way
before launch. We are addressing here a few:
Cohort IDs can be used for tracking
According to Firefox CTO Eric Rescorla, cohorts will likely consist of thousands of
users at most. Tracking companies can employ browser fingerprinting to narrow
down the list of potential users in a cohort to just a few very quickly. To do so,
trackers would only require “a relatively small amount of information” when
combined with a FLoC cohort.
This is possible through a number of ways:
Browser Fingerprinting
Even though users’ local browsing data is not shared--only cohort information is
transmitted--that data along with other data exposed in the browser can be
compiled to create a unique fingerprint of each person.
Each detail of user-specific variation--like browser type, OS brand, language,
country--can help reveal a distinction between users. In case a cohort of about
10000 users is divided into 5000 groups with a fingerprint technique, the number of
users in each FLoC cohort pair/fingerprinting group narrows down to as low as
one-digit--as easy as pie to identify people individually.
Though this is not possible with cohorts of large size, it doesn’t set FLoC free from
individual targeting.
Multiple visits
People’s interests online aren’t constant and neither are their FLoC IDs which are
recomputed every next week. If a tracker succeeds in using other already available
information to link up users multiple visits over time, it’s within their capacity to
distinguish individual users by combining FLoC IDs in week 1, week 2, etc.
It poses a big challenge for de-anonymisation as FLoC restores cross-site tracking
even if users have anti-tracking mechanisms enabled.
The project’s Github page states, “Sites that know a person’s PII (e.g., when people
sign in using their email address) could record and reveal their cohorts. This means
that information about an individual’s interest may eventually become public.” In
other words, FLoC’s technology will share personal data with existing trackers
which already identify users.
https://github.com/WICG/floc
FLoC exposes ton load of info other than necessary
A site interested in learning users' interests only needs to participate in tracking the
user across a large number of sites or work with some other big trackers.
Because FLoC IDs are common across all sites, they become a shared key to which
trackers can link data with external sources, making it possible for a tracker with a
large first-party interest database to work out a service that answers questions
about the interests of a given FLoC ID, like “Do people with this cohort ID like
pizza?” To do so, all a site needs to do is call the FLoC APIs to fetch the cohort ID
and then use it to scan information in the service.
Also, this ID can be combined with fingerprinting data to learn a lot more about a
user. For example, “Do people who have this cohort ID, live in India and use Safari
have any affinity for a certain product?”
Safety of sensitive information
Google has proposed that it will suppress FLoC cohorts which it finds closely linked
with “sensitive” topics. In a whitepaper entitled “Measuring Sensitivity of Cohorts
Generated by the FLoC API” Google details out its strategy regarding the safety of
sensitive data.
If Google finds users in a given cohort frequently visiting a set of sites with sensitive
info, they will return an empty cohort ID pertinent to that cohort. In addition, they
will also remove sites that they find sensitive from the FLoC computation.
However, complications with the sensitive info categorisation--like people’s
disagreement over what qualifies as sensitive for them, incomplete formulation of
sensitive categories, correlation of non-sensitive sites with sensitive sites--make
Google’s defence mechanism quite a hard task to execute.
Although Google has proposed plenty of countermeasures to mitigate sensitive
data-related problems, including making FLoC opt-in for websites and suppressing
cohorts associated with sensitive topics, Firefox finds it not enough.
Addressing this issue, Rescorla wrote, “While these mitigations seem useful, they
seem to mostly be improvements at the margins, and don’t address the basic issues
described above, which we believe require further study by the community.”
Honing the significance attached to protection of sensitive data in post-cookie era,
Marshall Vale, the product manager at Google’s privacy sandbox, writes: “Before a
cohort becomes eligible, Chrome analyses it to see if the cohort is visiting pages
with sensitive topics, such as medical websites or websites with religious content, at
a high rate. If so, Chrome ensures that the cohort isn’t used, without learning which
sensitive topics users were interested in.”
https://blog.google/products/chrome/privacy-sustainability-and-the-importance
-of-and/
FLoC is getting booed, for obvious reasons
FLoC is only being tested in countries where GDPR is not in place. FLoC trial in the
European Union has been paused on the grounds of GDPR non-compliance. FLoC
lacks the consent mechanism for users to opt-out of having their interest and
behavioural data included for advertising.
According to Malwarebytes, millions of Chrome users were automatically made part
of the FLoC’s pilot without being informed. Despite Google’s rhetoric stance on
safeguarding user privacy, Google started testing FLoC without sending
individualised notifications to users. Chrome users have no option to opt-out,
instead having to block all third-party cookies to pull out of the trial.
https://blog.malwarebytes.com/cybercrime/privacy/2021/04/millions-of-chromeusers-quietly-added-to-googles-floc-pilot/
In one of the Electronic Frontier Foundation (EFF) posts, “Google’s FLoC Is a
Terrible Idea”, Bennett Cyphers, author of the article, writes: Google is adopting a
false dichotomy when it comes to privacy. “Instead of re-inventing the tracking
wheel, we should imagine a better world without the myriad problems of targeted
ads.” The author argues that users’ options should not be truncated to “You either
have old tracking or new tracking”.
https://www.eff.org/deeplinks/2021/03/googles-floc-terrible-idea
Privacy pundits like DuckDuckGo and Brave browser take issue with all forms of
tracking. Citing Google’s tracking via FLoC is non-optional, DuckDuckGo raised
voice against Google’s new tracking technology. It’s bringing FLoC-blocking
features to DuckDuckGo search engine and Chrome browser extension. Brave
browser said that FLoC is promoting a false notion of what privacy is, and why
privacy is important.
Conclusion
The privacy awareness is not beneficial for targeted advertisers or Google. All the
hullabaloo about FLoC is for the underlying reason that Google’s FLoC is plagued
with a number of privacy risks if it were deployed in its current form, which is in the
testing phase.
At first glance FLoC appears to be a win-win for advertisers, publishers and internet
users, but there is more to it than easy execution and Google’s dream of dominance
in advertisement, which we shall see once Google finally uncover its long-awaited
advertising technique and market’s response to it.
While FLoC has been a matter of uncertainty for marketers recently, it’s time for
them to get serious about leveraging first-party data strategy, which is the future of
digital marketing.