Automated Experiments on Ad Privacy Settings

Automated Experiments on Ad Privacy Settings
By Ajinkya Thorve

Introduction Advancement of tracking technologies has lead to increased data collection. Collected data used, sold and resold for serving targeted advertisements. Serious privacy concern! To increase transparency and provide control:

Google Ad Settings Page
Ref:

The Problem Little information about how these pages operate.
Need to explore how user behavior (either directly with the Ad Settings or with content providers) alters the ads and settings shown to the user. Need to study the degree to which the settings provides transparency and choice as well as check for the presence of discrimination.

Privacy Properties 1. Discrimination
Discrimination between two classes is difference in behavior towards those two classes. Membership in a class causes a change in ads. Discrimination is not always bad (e.g. clothing ads)

Privacy Properties (contd.)
2. Transparency Display to users what the ad network may have learned about them. Cannot expect an ad network to be completely transparent. Only study the extreme case of the lack of transparency — opacity. If some browsing activity results in a significant effect on the ads served, but has no effect on the ad settings — lack of transparency.

Privacy Properties (contd.)
3. Choice Effectful choice: Altering the settings has some effect on the ads seen by the user. Shows that altering the settings is not merely a “placebo button”, it has a real effect on the network’s ads. Ad choice: Removing an inferred interest results in decrease in the number of ads related to the removed interest. Not always possible to see effectful choice. Cars – no ads in repository. Also, does not capture whether the effect on ads is meaningful.

Methodology Null hypothesis: Inputs do not affect the outputs.
Inputs: User Behavior, Ad Settings Output: Ads seen by the user The goal: To establish that changes in a certain type input to a system causes an effect on a certain type of output of the system. Input and Output Examples User Behavior, Ad Settings Ads seen Changes: Visit websites -> Ad Settings page Changes in Ad Settings -> Ads seen

Methodology (contd.)

AdFisher An automated tool to run experiments using the above methodology for a set of treatments, measurements, and classifiers. Extensible: allowing the experimenter to implement additional functionalities or even study a different online platform.

AdFisher (contd.) To simulate a new person, AdFisher creates an agent from a fresh browser instance with no browsing history, cookies, or other personalization. To simulate interests, AdFisher downloads the top 100 URLs for different categories from Alexa and creates lists of webpages. AdFisher randomly assigns each agent to a group and applies the appropriate treatment. Next, AdFisher takes measurements from the agent, parses the page to find the ads shown by Google and stores the ads. 10 reloads, 5s between successive reloads.

AdFisher (contd.) News sites since they generally show many ads. Among the top 20 news websites on alexa.com, only five displayed text ads served by Google. Most of the experiments on Times of India as it serves the most (five) text ads per page reload. Repeat some experiments on the Guardian (three ads per reload) to demonstrate that our results are not specific to one site.

AdFisher (contd.) It splits the entire data set into training and testing subsets, and examines a training subset of the collected measurements to select a classifier that distinguishes between the measurements taken from each group. AdFisher has functions for converting the text ads seen by an agent into three different feature sets. The URL feature set, the URL+Title feature set, the word feature set.

AdFisher (contd.) Explored a variety of classification algorithms provided by the scikit-learn library. Logistic regression with an L2 penalty over the URL+title feature set consistently performed well compared to the others.

Experiments

Experiments 1. Discrimination
Set up AdFisher to have the agents in one group visit the Google Ad Settings page and set the gender bit to female while agents in the other group set theirs to male. All the agents then visited the top 100 websites listed under the Employment category of Alexa. The agents then collect ads from Times of India. The learned classifier attained a test-accuracy of 93%, suggesting that Google did in fact treat the genders differently.

Experiments (contd.)

Experiments (contd.) 2. Transparency
The experimental group visited substance abuse websites while the control group idled. None of the 500 agents in the experimental group had interests related to substance abuse on their Ad Settings pages. Collected the ads shown to the agents.

Experiments (contd.) 3. Effectful Choice
Tested whether making changes to Ad Settings has an effect on the ads seen, thereby giving the users a degree of choice over the ads. Simulated an interest in online dating by visiting the website Agents in the experimental group removed the interest “Dating & Personals”. All the agents then collected ads from the Times of India. Found statistically significant differences between the groups. Thus, the ad settings appear to actually give users the ability to avoid ads they might dislike or find embarrassing.

Conclusions Conducted 21 experiments using 17,370 agents that collected over 600,000 ads. Found instances of discrimination, opacity, and choice in targeted ads. Cannot assign blame; cannot determine whether Google, the advertiser, or complex interactions among them caused the issues; lack the access needed to make this determination.

My Understanding and Issues
Only a few thousand browser agents, cannot generalize results. “...we do not claim these findings to generalize or imply widespread issues, we find them concerning and warranting further investigation by those with visibility into the ad ecosystem.” “We do not claim that we will always find a difference if one exists, nor that the differences we find are typical of those experienced by users.”

My Understanding and Issues (contd.)
Limitations of the experiment: Only text ads, only two websites. “It comes with stock functionality for collecting and analyzing text ads. Experimenters can add methods for image, video, and flash ads.” “The experimenter can add parsers to collect ads from other websites.”

My Understanding and Issues (contd.)
Same IP address. “We do not claim “completeness” or “power”: we might fail to detect some use of information.” “For example, Google might not serve different ads upon detecting that all the browser agents in our experiment are running from the same IP address. Despite this limitation in our experiments, we found interesting instances of usage.”

References Datta, A., Tschantz, M. & Datta, A. (2015). Automated Experiments on Ad Privacy Settings. Proceedings on Privacy Enhancing Technologies, 2015(1), pp

Thank You!

Automated Experiments on Ad Privacy Settings

Similar presentations

Presentation on theme: "Automated Experiments on Ad Privacy Settings"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Automated Experiments on Ad Privacy Settings

Similar presentations

Presentation on theme: "Automated Experiments on Ad Privacy Settings"— Presentation transcript:

Similar presentations

About project

Feedback