Fraud Invaders - Christopher Hillman

Learn Data Science

Fraud Invaders - Christopher Hillman

Started ‎04-29-2015 by
Modified ‎04-29-2015 by


Insurance Industry‌


Christopher Hillman‌

art_of_analytics, fraud_invaders, graph_analytics, text_analytics

About the Insights

What appear to be bug-like aliens invading an unsuspecting planet are, in fact, swarms of potentially fraudulent insurance claims. Fraudsters often leave tiny data traces in claims details and call-center notes, such as a common address, phone numbers, emails, bank accounts, registration details, and doctors or lawyers.
‘Fraud Invaders’ shows the connections between good and (already identified) bad claims. Each node represents an individual insurance claim. The large dots (nodes) are claims that have been investigated and found to be fraudulent. The smaller nodes are still-considered-good claims which could turn out to be fraudulent or genuine. The lines (edges) between the nodes show connections between claims. This could mean they’re reusing the same phone numbers, addresses, bank account details, email addresses, and registration details. The more red and thick the line, the greater the connection (more than one common email, address, phone number). 
From the analysis we can easily pick out clusters of potential fraud claims such as the alien, buglike- invader shape at the seven o’clock position
on the circle, where there are good claims (small nodes) with many common connection points to fraudulent claims (large nodes). We can quickly
isolate non-investigated claims that are highly connected to confirmed fraudulent claims.
Now that we can see the claims that are still considered good but have common links to known fraud claims, we can take action. The final step
in the analysis is to produce a list of claims and their connection points to prior fraud claims and send that to the fraud department for investigation.

This list had a very high success rate.

About the Analytics

This visualization was created using Teradata Aster® Analytics. We used detailed claims data (hundreds of Gigabytes or Terabytes, typically) along  with text from the call-center agents who deal with the claims. The data was loaded into the Teradata Aster Database for analysis.
Policy numbers allow us to link call-center-agent text data to the claim data. Because they exist in text form mostly, common or repeated connections are hard to find. We got most of the detailed connections data by text mining the claims forms and call-center notes, using native Analytics text-mining functions like the Named Entity Recognition algorithm. The output was used to identify data repetition, and to create an underlying node-edge table. Then we used Aster Analytics and the ForceAtlas2 display algorithm to visualize this table as a graph.

About the Benefits

Identifying hidden fraud through tracing the tiny data connections between already-known fraud cases and cases that are still considered good is
used across multiple industries. It is particularly effective in insurance fraud, where the data about a claim is distributed between the original policy
application, call center records, investigative agents notes, claim applications, and so on (frequently in free-form text which makes it very hard to uncover without analytics).
This analysis helps optimize the claims-handling process by providing indications on the first notice of loss (FNOL). So, potentially-fraudulent claims can be passed to investigation agents and isolated before the claim is paid. The payback is very high—stopping fraudulent claims from being paid directly improves the bottom line. In addition, analysis can improve the efficiency of the claims handling process by lowering cost, reducing the time taken to pay good claims and, ultimately, helping to increase customer satisfaction.

Version history
Revision #:
1 of 1
Last update:
‎04-29-2015 01:34 PM
Updated by:
Labels (1)