App Marketing

Improving Quality of Mobile App Traffic Through Fraud Prevention

Jiao Chen
Data Scientist
Topics

The sophisticated mobile marketer is helpless without their data.

As a marketer, you might recognize this pattern. There are some segments in your dataset that perform better than you expect them to. Maybe the in-app purchase revenue isn’t the same in your iTunes account aggregates as they are in your analytics stack and you start thinking that some of the biggest purchases are just converting too well. Sometimes, when things look too good to be true, they’re not.

At Adjust, we speak about sources of error in the data. An error is typically a conversion that is, by mistake or through malice by some third party, wrongly registered with your analytics stack. The error can be that the conversion was supposed to be registered to another segment, or never existed in the first place.

As we found in our recent study, common sources of error include: - Fake in-app purchases, which can throw off revenue numbers in analytics stacks and present a false picture of ROI. - Fraudulent datacenter traffic, which introduces simulated conversions into your dataset. - Organic users poached through click-spam fraud, where fraudulent sub-publishers abuse attribution & analytics systems to poach attribution for organic conversions, effectively boosting the average performance of these paid channels. - Poorly attributed traffic, resulting from overly-simplistic attribution algorithms (that e.g. rely exclusively on IP addresses or other simpler indicators).

The main pattern between all of these schemes is that conversions are being tracked, but either never really reflected true earnings, or were supposed to be attributed to a different segment. Sources of error can be introduced accidentally, but they are also frequently the result of mobile user acquisition fraud or end user cheating.

When these errors manifest in a dataset, your conclusions become weaker. Channels and segments that look like they are performing well with high ROI can often hide cheaters (who inherently look like whales) or fraudsters. This typically manifests itself in discrepancies between the sums of detailed analytics platforms and aggregate “truths”, like the final iTunes payout. This even affects channels that may be totally clean. You might very well be aware that a certain channel or platform is suspicious, but it’s difficult to know to what extent, and where specifically the problem is localized. When we don’t have all the information, it’s a lot more difficult to make confident, data-driven decisions.

For example, when a fraudulent sub-publisher pollutes a performance campaign, they might be successful to some degree in claiming conversions that were actually organic. Since organic users typically perform better than acquired users, this boosts the average engagement metrics for the campaign they’re part of. This might be differences on the order of a few percent, but it is effective to skew the relative performance of, say, a particular publisher vertical compared to another – resulting in you & your network partners, quite reasonably, allocating more budget to that vertical. The same can be true for specific creatives or particular copy. The errors are typically disinterested in how real users respond to your content.

Similarly, certain campaigns may be effectively tapping into audiences that are more likely than others to cheat and use rootkits to fake purchases. When you look at the ARPU of these audiences, those that are prone to cheating will look much more valuable than others. Again, this results in a reallocation of investment to chase after more of the users that cheat. In one study, we found that as much as 30 percent of all purchases made on iOS are inconsistent with the data registered with analytics platforms. So almost a third of all the purchases that you are basing your decisions on aren’t actually contributing the ARPU you might expect.

How do you know if you’re affected?

This typically starts with a discrepancy that you can’t explain, or inconsistent patterns in the behavior of certain segments. Perhaps you’ve invested in particular campaigns, or particular experiments within a campaign, that didn’t deliver the ROI that previous data indicated. In particular, behavioral discrepancies within your funnel between traffic in general and specific segments can indicate that there’s something funky with the segment, like high initial retention rates that never trigger a signup, or groups that appear to buy high-value IAPs at a higher rate than lower-value options. All of these inconsistencies are indicators that there might be sources of error that pollute your data.

What if we were totally certain, though, that all of our KPIs were derived from a clean dataset, and what if we knew exactly where every discrepancy came from? This is the ambition behind tools to help you clean your dataset. For gaming apps, frequently, this begins with purchase verification. Knowing that each and every one of your in-app purchases has been synchronously cleared with the app store’s verification servers allows you to know that the ARPU of a channel will be directly reflected in your payout at the end of the month. If you’re running boosts or incent campaigns, knowing that users aren’t able to game the system by resetting and reinstalling allows you to get that predictable correlation between the traffic that you drive, and the results on the App Store or for your k-factor. It takes a little getting used to, but when your datasets are clean, you start to intuitively understand the trends and patterns that drive your decisions – and you can trust your gut.

The most important aspect to consider when you’re cleaning your dataset is that the changes need to be reflected in every part of your analytics stack. Marketers usually have not one analytics platform, but multiple tools that all require data about the app, whether that’s ad networks or demand-side platforms that need to know who to retarget, or vertical-specific analytics that handle a particular type of analysis specific to your app. This quickly turns into spaghetti and SDK overload. If you centralize the data collection and aggregation into a single platform, you can put checks and filters in place on the central platform and immediately have the results reflected in the rest of your stack. This is the best practice for implementation of attribution platforms. When most of your stack feeds from the data collected by an attribution provider, you can gain all the advantages of individual trees without losing sight of the forest.

This is a brief introduction to the wider study that we released here, which hopefully can give you some further insight into the type of errors that can sneak into your dataset. If you’re curious for more detail, you can check out our study.

About Aarki

Aarki is transforming mobile app marketing through unified optimization of creative and media. It delivers superior results using proprietary machine learning technology for performance optimization.

Want more fraud content from the only solution to actively prevent it?

Learn more about how Adjust takes the fight to fraud by signing up below.

Awesome!

You're now signed up to our mailing list! You'll now get updates from our blog every week.