Share article on

Countering synthetic identity fraud with behavioral data

One of the more insidious risks plaguing financial services companies in recent years is that of synthetic identity fraud. Different from more easily detected fraud typologies like stolen card purchases, synthetic identity fraud combines the use of fabricated PII across wider time horizons with the goal of extracting value from banking or lending accounts.

The nature of this fraud makes it challenging to detect, where its costs are realized months or even years after the synthetic identity is created. 

In this post, we’ll detail how synthetic identity fraud works, then outline how companies can utilize risk models built on gathered behavioral data to neutralize synthetic ID fraud at the root, well before attackers succeed in defrauding financial systems.

Synthetic ID fraud; an iceberg problem 

In traditional identity fraud, attackers use only the legitimate PII of a victim to open accounts in the victim’s name. Purchased on the dark web or stolen through social engineering scams, names, dates of birth, addresses, and social security numbers are used to apply for credit, open bank accounts or gain access to loans. 

In these scams, the attacker mimics the victim’s identity and behavior as closely as possible at registration to stay undetected and to take advantage of the victim’s financial history and standing.

Synthetic identity scams differ in that the attacker uses real PII acquired through similar means, though blends it with completely fabricated or “synthetic” information during the account registration process. Today, companies have no process to verify an authentic connection between a US social security number to date of birth or an SSN to an individual name.

Exploiting this gap, an attacker can create an identity with a legitimate SSN, a fabricated name, and date of birth, then apply for a credit card or loan, as an example. At registration, this consumer profile may look virtually identical to a new entrant into the credit market from a risk standpoint, even though the attacker had created out of thin air a “frankenstein identity”.

This makes synthetic identity fraud difficult to detect, allowing attackers to build financial histories and credit over periods of months or even years. As credit limits are continually raised, the account reaches a maturity period to be “busted out” by the attacker, being used for a final cash or loan withdrawal before ever being used again. 

Synthetic IDs are also used in clusters (often created from the same set of PII) to transfer illicitly gained funds, run complicated scams like micro-transaction fraud, and are even sold on their own. The DOJ in 2013 charged 18 individuals for the largest synthetic ID ring to date, accounting for $200M across 7,000 synthetic IDs and 25,000 credit cards. 

More recently, the Federal Reserve estimated in 2019 that 85-95% of applicants identified as synthetic identities are not flagged by traditional fraud models, and synthetic ID fraud was the fastest growing type of financial crime in the US. 

It isn’t difficult to imagine how the scale and sophistication of synthetic ID fraud has grown given the 2013 numbers, and the more recent predictions for businesses to see growing losses from this fraud. 

Applying first-party behavior data for detection

Synthetic accounts are ticking timebombs for financial organizations, and can not effectively be detected via typical means like global chargeback profiling, device identification or geolocation tracking. 

Despite this, the usage of first-party behavioral data in risk modeling represents an effective method against what might seem like an invisible typology of fraud.

Financial services companies that gather fine-grained behavioral data at the onset of the user lifecycle not only minimize their assumptions made about a user from a risk perspective, but are also well-positioned to create synthetic identity detection heuristics with those data types that address their cases, however unique.

Here are examples of how heuristics can be informed through measuring user behaviors and interactions from the start:

Application fluency: measures how familiar a given user is with a particular application, process, or form fill. Navigation patterns on registration forms across device types (mobile, desktop), input submission by keypress vs. mouse clicks, and input focus on certain fields can indicate the degree a user is too familiar with an application they should be seeing for the first time, more than average across the user population.

Data familiarity: looks at how “familiar” a user is with data inputs (drawing from memory vs. reading/copy-pasting). Any input like an email address, social security number, or name can be measured for time averages across key inputs. A user who knows these inputs from memory will have relatively even time average inputs vs. an attacker reading from another source and inputting keys, deleting, copying and pasting, or re-entering the PII. 

Expert behavior: similar to application fluency but broader, this measures overall computer fluency compared to the average user. Keyboard shortcut usage, special keys, application toggling, use of automation, low average time between key inputs, etc can all be collected, measured, and incorporated into models. 

Age analysis: Companies using behavioral data can draw patterns of application usage across age ranges. Variables here include mobile device orientation, swipe patterns, and character input time averages, and can all be measured to check the likelihood that the “date of birth” entry was synthetic or legitimate based on averages. 

More advantages than just detection

Not only does first-party behavioral data gathering provide a proper data foundation to combat synthetic ID fraud, but it also allows risk teams to segment their users in classes that inform risk policy decisions and feature access.

For example, if a given user tracked low data familiarity when inputting a social security number during registration, a security prompt could be triggered when making applying for a loan or transfer of a certain threshold, where the user would input their social security number once again. 

This would serve a dual risk purpose; creating an additional data point to help measure the variance in data familiarity from the initial input to profile for synthetic ID risk while authenticating the user as the account owner (helping prevent a different fraud typology, account takeover). The options for high impact, low friction risk policies are limitless when behavioral data acquisition forms the foundation of the risk policy. 

The financial losses from synthetic identity fraud are unfortunately lagging indicators of an iceberg problem; seeing synthetic ID accounts busted out means a model was implemented much too late, or incorrectly from the start. As the industry grapples with what is expected to be a blowout of incidents, behavioral data gathering combined with proper modeling might be the only effective process to curb further damage to businesses and consumers alike before further damage occurs.