Why data sharing needs to change
Paul H. Müller
Co-Founder & CTO
Nov 4, 2019
One of the most commonly used features of Adjust’s dashboard is setting up a partner module to share the raw data of installs, sessions and events.
More often than not, partners require sending over all the data from your app, not just the traffic attributed to them. This means your organic data as well as the data attributed to other networks. And while Adjust doesn’t share the attribution information itself between networks, this sharing gives such partners an insight into your entire user base.
Let’s talk about why this is the status quo, and why that needs to change.
Between the Devil and the Deep Blue Sea
When you ask a network partner why you need to enable a module to share all your installs with them, instead of just the traffic that they drove, the #1 answer you’ll hear is exclusion targeting.
Exclusion targeting seems to be the singular necessity of an inventory seller, who wants to have as much data shared with them as possible.
So, let’s take a closer look at “exclusion targeting.”
The idea is pretty simple. By not showing ads to people that already use your app, you can dramatically increase the efficiency of user acquisition campaigns. As it turns out, people who love your app also love clicking your ads. Since many ad-servers optimize on click-through-rates over install-rates, you can end up spending a lot of money on showing ads to your already-engaged user base.
So, all an ad serving partner needs to do is exclude those people from their targeting; in sum, exclusion targeting.
To share the data, your network partner needs to perform said exclusion. The de facto standard today is to fire a stream of real-time callbacks at their backend.
From that stream of individual events, partners generate a list of active devices that already use your app.
The problem is that sending a real-time feed of installs only works forward from the time the callbacks were enabled. Every install that occurred before does not get transmitted. This means that for a typical app, the majority of active users will have already installed before the callbacks are activated.
How useful is exclusion targeting when it doesn’t know about 90% of your existing users?
Now your network partner will say that there’s an easy fix for this. Simply send your session data along, and they’ll be able to see everyone that’s currently using the app. This, they say, will vastly improve coverage for their exclusion list.
The problem here is that you are giving away more than the basic fact that a user has, at some point, already installed your app. In actuality, you are giving away sensitive information about your most valuable users.
By seeing which devices open your app the most, it would be effortless to create a list of your most engaged users, something of much higher value than, say, a list of all users that used your app within the last 30 days.
So both currently-favored solutions have unacceptable downsides. Isn’t there another way?
In a perfect world
Before we face the harshness of reality, let’s discuss the ideal solution.
When sharing any audience with a partner, we would like to share the least possible amount of information and update changes to the segment, automatically, in real time. This way, a partner would not be able to utilize our list of users for any other purpose while always having an up-to-date audience.
Optimally, we could share a list in a way that no one can extract all devices from it, but only a yes/no if a certain device ID is on it.
For example, looking at exclusion targeting, we wouldn’t want to share any information about a user beyond the fact that he already has the app installed.
But how close can we get to the optimum?
In the real world
When we split our optimal solution into its different aspects we can see where it clashes with the real world.
Can we share a list without sharing a list?
To answer this question, we first need to understand how device IDs like (for example) the IDFA are sent through the advertising ecosystem. Whenever a device requests an ad impression, the ad serving party receives the IDFA and can decide which advertisements to show to the device. At the same time, nothing is stopping the ad server from also storing the IDFA together with some basic information like country, OS and device type. Even if the network has no ill intent, they still need to store the IDFA to remember which ads they already served how often to this device.
Now imagine we would share a specially-encrypted list with the network, or install a DMP style server next to their ad server that only answers yes/no when given a specific IDFA. This way, they cannot ever read the full list, right?
In reality, it would only take them a few minutes to fire their entire catalog of known IDFAs against this black box to receive the full list or at least a very close version of it.
So any current approach to sharing a list with either hashed or encrypted device IDs, or via a DMP integration, could easily be circumvented by the fact that almost everyone knows all active IDFAs.
There are some exciting new advances in cryptography that might make such a scenario possible in the future. Still, the main hurdle for that will be the operating system, which has to support the generation of specially-encrypted device IDs that cannot be used for any other purpose than intended by the app publisher.
Until then, we have to accept that we cannot keep the list of users itself away from the media partners we want to work with.
Despite this sobering fact, passing a raw list of device IDs is still a huge improvement over the status quo. Instead of sharing a possibly incomplete and thus inaccurate user list, or exposing the session data of your entire user base, you can select to share the minimal viable list of devices they need to exclude from the campaign targeting you set up.
For example, if you selected to run a campaign targeted at iPhones in Japan, you only need to share a list of devices matching these criteria. Compared to the alternatives, this is undoubtedly an improvement.
So if sharing device ID lists is so much better, what do you need to do it?
How do we get there?
First of all, you need to be able to generate a list of devices that match a certain set of criteria. Optimally, this generator would automatically keep such an audience up to date without additional manual input.
Next up, you need to be able to share and update your list with the network partner of your choosing.
There are two general ways to share audiences with ad servers. Either via a push API provided by the partner or by raw lists shared via (for example) URLs that partners can use as a pull API for updates at regular intervals.
As it just so happens, Adjust’s Audience Builder can do precisely those things. You can create and share audiences of any criteria in real time, with many of your current traffic partners.
With more and more clients asking us to help them reduce data oversharing, we have decided to expand the Audience Builder program vastly.
Adjust will soon add all currently available push APIs from our partners and work with those that rely on our pull API to fetch audiences.
While there are still many partners that need to upgrade their systems to consume audiences this way, we are positive that marketers can drive this change by demanding a better way to share their data.
Join us and challenge the status quo.