When we talk about multifactor authentication, we talk about either three possible factors (knowledge, possession, inherence) or sometimes add a fourth (location). Colloquially we refer to the first three as something you know (such as a password), something you have (a smart card, a token, or even a phone), something you are (a fingerprint, facial recognition, or behavioural factors such as gait or typing rhythm). The fourth is arguably a combination of the other three (knowing the location, possessing physical access to the location, and being within the location), but is irrelevant when we’re focusing on purely biometrics.
When we’re looking at biometrics, inherence, as an authentication method a common mistake is to limit our metrics for quality to either false positives or less commonly false negatives, how many impersonators a system will allow through, or how many legitimate users the system will refuse access to. Unfortunately for most cases this is an oversimplification, and a more useful measure is to take a mathematical tool designed for medical testing and apply the same measure, and more importantly to ask vendors to provide us the data to use those measures.
Predicted, True and False, Positive and Negative
When we’re looking at measures of quality to assess whether a biometric system is suitable for a particular purpose, we need to examine various different results. One key is the idea of predicted results, these are ones we have confirmed outside of the system’s test (i.e. we know someone’s identity because we’ve confirmed them as genuine or an impostor through some other means). When we then talk about true or false positives or negatives, it’s because we’re comparing them against these predicted results. True positives or negatives agree with the predicted results, while false positives or negatives disagree with them.
The most intuitive and the most commonly quoted measurement, accuracy is also the least useful unless you have an idea of the dataset involved. If you are measuring accuracy off an unequal base of predicted positives and negatives then it can best be described as misleading. While accuracy is a generally useful measure, when you’re evaluating a biometric authentication measure for a specific authentication, or authorisation purpose it may not be worth noting as it tells you little about the best fit for the system, treating false negatives and false positives as having similar costs.
If we are talking about access to a nuclear power plant then a false positive has a much, much higher cost than a false negative. If we are talking about walking through the lobby of a co-working space then a false negative may frustrate our customers and drive them away, while other security measures apply for sensitive areas so a false positive has little to no consequence.
Referring to the true positives (those that agree with the predicted positives) over all positive results, a precision of 1 tells you that a system never mistakenly verifies an entity as authorised when they are not. Precision is most important when looking for a control prioritising denying access to a system, leaning towards caution against allowing in unauthorised users. Controls that score highly on precision, such as retina scanning, are suitable for highly sensitive areas where false positives are the most significant risk.
Recall (or Sensitivity)
Used interchangeably, the recall rate or sensitivity of a test describes the ratio of true positives over the predicted positives (so true positives plus false negatives) and is one of the most important measures when we are dealing with the majority of biometric systems. If a security system is set up with layered controls, as any well-designed defence in depth system should be, then the important thing about the recall rate is that it describes the proportion of users who are falsely inconvenienced by the system. If we are looking at user experience, a recall rate of 1 tells us that no legitimate user is unnecessarily inconvenienced, while a low recall rate should warn us that users working through the system are rapidly going to become frustrated and find ways to bypass security controls rather than face the inconvenience.
The F1 score incorporates both recall and precision, and highlights a balance between the two. If we disproportionately increase one over the other, the F1 score will be lower, so a high F1 score is a strong measure to use in cases where we want to strike a reasonable balance between false negatives and positives.
The art of compromise
Ultimately while biometric systems vary in quality, there will always be a balance in false results. Biometrics are imperfect due to the vagaries of human biology, especially as we get into areas such as behaviourals or other more challenging areas for machines to recognise. When considering biometric security controls it is vital to understand where your priorities lie, and so which measure is most important.
Even more importantly if you are talking to a vendor and they are unable to share data on these measures, they haven’t effectively tested their system and should have some questions to answer.