Precision and Recall
What is it
Precision and recall are performance metrics that help engineers evaluate the quality of a dataset in different ways, by choosing trade‑offs between types of error (false positives vs false negatives).
Precision and recall are used in pattern recognition, information retrieval and machine learning tasks such as object detection and classification.
Why it matters
In many problems with large samples, such as rail infrastructure monitoring, tests may be relatively accurate while nevertheless yielding false positives in sufficient numbers to pose serious practical challenges. This requires a method of refining and recalibrating test results so that engineers can choose ways of systematically excluding or including data points, depending on priorities.
How it works
Recall asks, “of all the truly positive cases, how many did the system successfully find?” It resembles its colloquial synonym, ‘remember’. In this case, recalling/remembering a group of things.
Take an example from sport: try to *recall* all of Bukayo Saka’s goals last season. You think you recall 25 Saka goals. It turns out that Saka only scored 20; five of the goals you ‘remember’ were scored in a previous season. (These are ‘false positives’). But you have successfully ‘recalled’ 20 goals out of 20 – a ‘recall’ of 100%.
Precision, in contrast, means: “of the things predicted positive, how many were actually positive?” You identified 25 goals, five incorrectly – so your ‘precision’ was 80%.
In evaluating a dataset, you can use either precision or recall, depending on your priorities. On the one hand, you may wish to find all data points in a class. Let’s take an example from railways: cracked rails. These can have serious consequences, so it’s important not to miss any. In this case, the best performance metric to apply is recall, even if that means including a higher proportion of false positives in your data set.
On the other, there are cases when you prefer to be absolutely sure that when you identify a data point, you do so correctly. For example, the AIVR Platform can identify trackside graffiti by analysing video using machine learning. In this case, while it’s useful to have an accurate record of every case of trackside graffiti, it’s not critical to identify all of them. In this case, precision is the most suitable performance metric.
In sum, precision and recall are tools that allow us to evaluate data according to different priorities: on the one hand, to find all possible track faults; and on the other, to precisely identify faults, to avoid wasting time on false alarms.
Each metric has its advantages; the two measures can be combined in a flexible tool called the precision-recall curve.