Data Science - Dealing with False Positives and Negatives in Machine Learning ...

Learn Data Science
Teradata Employee

Machine Learning is used in Predictive Analytics to find out if something is true or not in the data or to classify the input to one of A, B, C, ... categories. Some examples:

  • Is this transaction activity indicative of fraud ?
  • Is this the image of a most wanted person in my database ?
  • Is this stream of data from the person who is wearing the health wearable is in trouble ?
  • Is this website behavior indicative of an imminent checkout step ?
  • Is this stream of events suggest that the user is going to close the account ?

Simple as it might sound, everyone knows no single algorithm or a combination of algorithms is 100% accurate all the time.  As a data scientist, we are often tasked with the idea to provide predictions with the highest accuracy possible, but there are factors in the data and algorithm that forces one's hand to make trade-offs. This blog post is about discussing those trade-offs keeping in mind the cost of the analytic approach.


Basic Definitions:

There are four outcomes of the quality of predictions when data is evaluated.


  • True Positive
  • True Negative
  • False Positive
  • False Negative


True Positive means the algorithm(s) identified  a transaction as  fraudulent correctly. True Negative means the algorithm identified a benign transaction as harmless also correctly. Ideally speaking everyone wants only True Positives and True Negatives :). So what are False Positives and Negatives ?

What we don't want and that occurs quite often:

False Positive or "Waste of time", "Mistaken Identity" -> Algorithm wrongly marked a benign transaction as fraudulent.

False Negative or "Missed Opportunities", "Slipped through the cracks" -> A fraudulent transaction was marked benign.

False Positives => Wasted Time & Resources:

For fighting crime, cyber fraud - false positives are usually tolerated and cost written off in exchange to a wider net to catch 'every single occurrence of a problem'. Missing one can be very costly or deadly -> Catching 100  knowing that 5 are truly bad and the rest 95 are benign *might be* considered ok in this situation. The 95 here turns out to be false positives, but if 5 is all there to it, it makes up for the effort chasing the 95 phantoms.

We can also argue that predicting churn will fall into this category. Ok to err on the side of caution.

False Negatives => Missed Opportunities

However, for conversion opportunities like up-sell etc., based on website browsing behavior or targeted campaigns, it's a slightly different scenario. The goal would be to achieve high accuracy on the prospects we flag for 'potential conversions', but be tolerant of allowing the rest to fall through the cracks.  For example, if we identify 30  potential future customers out of 100 prospects and be 100% accurate with those identified, then it would be considered a huge win. The false negatives of 70 or missed opportunities, *could be* easily traded off with efforts spent on the good 30!

Tradeoffs between False Positives and False Negatives

It is hard to eliminate false positives and negatives in every situation. However data scientists can choose and tweak the algorithm(s) or dials for either more false positives or false negatives and strike a balance that meets the spend of each use case.

If false positives can be tolerated, there is  going to be a lot of  fish in the net (also called as high recall) including *most* of the fish that you care about, but we've to spend time throwing away the other fishes that we don't want ...

If false negatives can be tolerated, then you can do so by using a specialized net to catch *ONLY* the type of fish that you want (high precision), with not much to throw away. But a lot of the fish that you want would also stay in the pond as well ...

In general where there is high precision, the recall will be low and vice versa. We can move the dial to get the right mix of precision and recall to find an acceptable balance. At the end, there is going to be a mix of True Positives, True Negatives, False Positives and False Negatives in every single model. While we continue to increase the True Positives and True Negatives through iterations, algorithms, parameters etc., the False positives and Negatives has to be reconciled ..

Precision/Recall Formulas & Measures:

Quantitatively speaking, precision and recall is measured as follows:

Precision = True Positives/(True Positives + False Positives)

Recall = True Positives/(True Positives + False Negatives)

F-Measure = 2* (Precision*Recall)/(Precision + Recall)

We can also drop the predicted vs actual in a table called as Confusion Matrix (confusion arising from false positives and negatives) to see how well we are doing in the training phase.

Any way to calculate all of this in on shot on Big Data ?

In an analytical/data science platform like Teradata Aster, you can run  the Confusion Matrix SQL/MR on a billion+ rows of prediction output from an algorithm like SVM, Naive Bayes, Random Forest etc., You can do this during training phase and see the numbers. Tweak the algorithm, do prediction, run the Confusion Matrix, check the precision/recall - repeat and rinse until we get acceptable precision and recall. In the training phase, we'll already know the ground truth. So precision/recall will be calculated based on the actual vs prediction to see how close it is.

The Aster Confusion Matrix SQL/MR can also compute precision/recall on multiple category predictions as well so we can see how the trade-offs occur across many categories in one pass and fix data problems if necessary.

You can watch John Thuma's video on Aster Learning Series on the usage of the Confusion Matrix function.

1 Comment

Great stuff... It's worth adding that in practice in business the precision/recall "dial" is set by evaluating the cost of false positives vs. false negatives.  This depends on the specific use case and what is being classified!