logo

Credit card fraud detection with synthetic data and AutoML

Hui Fang Yeo

Part 1: Investigating fraudulent transactions in real-time with Atoti

Credit card fraud figures were boosted by the COVID-19 pandemic, making it more vital than ever to be able to detect credit card fraud quickly and effectively. We split this article into two parts:

Before we go further, let’s take a look at the rationale behind the need for credit card fraud detection in a real-world situation.

Increasing use of credit cards leads to a greater number of fraudulent transactions

With the frequent lockdowns in these two years, more people turned to online shopping. Companies accelerated the digitization of their businesses as a workaround. We saw the number of e-commerce sales spike in 2020 and it’s predicted to continue growing. 

eMarketer’s 2021 sales and growth forecast

The rise of e-commerce also gave rise to opportunities for fraudsters of various kinds: chargebacks fraud, synthetic identity fraud and of course, credit card fraud. 

Online transactions meant more credit card transactions: even PayPal can be linked to a credit card. Moreover, contactless payment at retail outlets has become highly encouraged. We can pay with the credit card’s “tap and go”; or we can pay with other payment services such as Alipay, Apple Pay, GooglePay, GrabPay, Shopee Pay and they are usually linked to the users’ credit card.

Credit card fraud usually involves unauthorized transactions. Stolen credit cards, compromised accounts due to database security lapses or phishing that result in account takeover, or application fraud-these are just some means fraudsters use to commit the crime. 

Zero Liability Policy to safeguard consumers against credit card fraud

Did you know that major credit card networks such as American Express, Discover, Mastercard and Visa have a zero liability policy provisioned against unauthorized transactions? If fraud is discovered or reported promptly, the cardholder will not be held responsible for its charges. It’s good to check the fine print on the conditions though.

Fraudulent charges must be reported within 60 days of receiving the billing statement with the suspicious charge. However, financial institutions have their own artificial intelligence that monitors our accounts for suspicious transactions. You have probably received an email or a mobile alert asking if a recent transaction is familiar or received a call from a bank staff about an overseas purchase. 

 

Alert messages from bank to consumer for large sum spending.

But this isn’t foolproof. Ultimately, someone still has to bear the cost of the fraud, whether the consumers, the merchants or the issuers. Let’s look at how we can make use of machine learning to detect potential fraud.

Real-time transactions monitoring and fraud investigation

While we do not have actual transaction data, we used synthetic data with fraud patterns spread across different consumer profiles. In our case, we have the fraud indicator that helps to identify fraudulent transactions. However, in real life, fraudulent transactions do not have such labels.

So how do we identify them?

There are two parts to this:

  1. Using PyCaret, an AutoML library to create an ML model to detect the fraud
  2. Using Atoti, a Python BI analytic tool to translate the prediction into business metrics

All incoming credit card transactions go through the trained machine learning (ML) model for fraud detection in batches (to replicate incoming transactions for real-time demonstration). We upload the transactions along with their prediction and scoring from the ML into Atoti. You can read more about this in part two of this article.

Getting real-time fraud statistics

Tada! See the incoming transactions get translated into the various metrics in real time!

Real-time computation of statistics from the incoming transactions.

We have a global view of the number of fraudulent transactions predicted by machine learning. Statically speaking, the F1 score is pretty consistent around 0.9 and the recall is around 86% which is acceptable. These statistics are pretty in sync with the value from the tuned model. Refer to part 2 of the article to see the statistics of the trained models.

From above, we see that the amount involved in fraudulent transactions is higher for women above 55 years old. Similarly, the youngest and oldest populations are more susceptible to fraud. Perhaps we could start a fraud prevention campaign for these populations.

Most fraudulent transactions occur on weekends and at night. 

Investigating from the customer’s perspective…

In Atoti, we are able to design a dashboard that reflects all the incoming transactions. By applying filters in the dashboard, we are able to investigate higher risk transactions.

Dashboard designed to look at fraudulent transactions by consumers.

From the above dashboard, we can see that this particular customer has zero suspicious transactions in July but has 15 such transactions detected by machine learning. The transaction amounts involved are also on the high side. 

It seems that the fraudulent transactions start occurring in August and the transaction amount has a growing trend. 

Investigate trends in suspicious transactions for a given consumer.

If we include the “presumed” non-fraudulent transactions, there are more transactions in August than in July.

Toggle with filters to see the number of transactions across different months.

Zooming in on the largest transacted amount, we can see that it’s spent on travel. It is plausible that the increase in spendings is due to an upcoming trip.

Investigate spending patterns on suspicious transactions of a given consumer.

In any case, we can still call the customer to verify if these transactions are legitimate. After all, there’s nothing wrong with being on the safe side.

Since we have the actual fraud label in our dataset, we can also drill down on the fraudulent transaction to see if it’s a true fraud.

Atoti allows drill-down on both rows and columns to further investigation from both angles.
Atoti allows drill-down on both rows and columns to further investigation from both angles.

Ways to prioritize transactions

1 – Color coding transactions to give a high-level view of the urgency

Using color coding, red cells show transactions with a higher probability of being true fraud. Orange cells, in our case, show transactions with a monetary value greater than $500; yellow cells for transactions within $100 to $500. In a quick glance, we should be able to have an idea of the number of transactions that have to be addressed.

Color coding brings attention to transactions with a higher probability of fraud occurrence and larger transaction amounts.

Likewise, we highlighted the distance between the customer and the merchant in pink if it’s further than 85km. Rare transactions that occurred further away are more suspicious as customers are unlikely to go out of their way to make a purchase. Though it might not be true in the current pandemic situation, where most people carry out online transactions from overseas portals.

In any case, we took the mean distance in this case as a gauge. You are free to decide the appropriate distance to flag warnings.

2 – Sorting by monetary value to prioritize transactions

By sorting the transaction amount in descending order, we have access to those transactions with the highest monetary value. We can pick the top transaction with its scoring highlighted in red for investigation.

Sort transactions on a selected feature for prioritized investigations

Investigating from the merchant’s perspective

Let’s switch the page to the “Suspicious transactions (Merchants)” tab.

Dashboard designed to look at fraudulent transactions by merchants.

Well, it’s not really necessary to have the dashboard displaying transactions in real time for merchants. But just to let you know, Atoti is able to reflect the data upfront almost instantly upon loading.

Viewing fraudulent transactions in a dashboard designed from merchants’ perspective

We can zoom into a specific merchant chain to see if there are certain outlets that are having higher fraud counts than other outlets.

Also, we can see if a certain consumer has multiple fraudulent transactions with the same merchant. If we have refund information in the dataset, we could even see if the same consumer has repeated refunds, which may suggest chargeback fraud.

Drilling down on customer’s transaction history for further investigation

 

In the event the business management team is interested to see how these outlets are spread across geographically, we have enough information to provide this information upfront (without having to get developers involved!):

Build visualizations as and when required, as long as the information is available in the cube.

To be continued…

Hopefully, this article has given you some inspiration on the type of analytics we can have with credit card datasets, along with machine learning to detect fraud.

Read on to learn more about how we implement this solution in part 2 of the article.

Like this post ? Please share

Latest Articles

View All

A cleaner, more streamlined Atoti Python API

Read More

More flexibility and observability with Atoti Python API 0.9.0

Read More

What’s new with DirectQuery in Atoti Server and Java API 6.1

Read More

NEWSLETTER

Join our Community for the latest insights and information about Atoti