System Evaluation

Table of Contents

1. Third Lecture

Given the fact that there are different recognition strategies, we have the need to develop different evaluation strategies. Standard ML performance measurements are not enough to evaluate a biometric system; for example in Computer Vision it’s possibile to use only Accuracy or Precision, so we are interested in the right answers, on the other hand in Biometric Systems we’ll focus on the errors.

2. Problems that can affect Biometric Systems

Biometric can’t solve all kinds of authentication problems, especially in uncontrolled, unattended settings and non-cooperative users. One of the first sources of error is wide intra-class variation, despite all the attempts to try to extract the less variant features, there are conditions that chenges the appearence. An example of this problem is represented by the variation of pose, illumination and expression in facial recognition systems, all the problems are also related to the flexibility of the system, so we try to balance the rate of false accemptance and flexibility.

In conclusion we have to balance the Error Rate. knowing that it will never be perfect.

P.I.E. to A.P.I.E. (Age, Pose, Illumination, Expression) is the acronym used to summerize the problems that may arises in Facial Recognition.

Another problem is represent by small inter-class variations may cause false recognition, for example between tweens, parents and children and similar cases. Noisy or Distorted Acquisitions can represent another problem, in case of dirty sensors, wrong acquisition or dry skin (for fingerprint recognition), or non-uniform lighting (in case of facial recogntion or hand geometry).

There are some cases in which the universality property does not stand, like fingerprints of hard workers and the iris of a blind person. Consider that 4% of population has poor quality fingerprints. As any other security and software system, biometric systems can under go an attack. Consider that the architecture of a biometric system can also be distributed. The first attack point is the sensor, we can have presentation attacks, replay attack or an injection attack. Its possible to override feature extraction, or inject a synthetized feature vector and so on, so there are many attack points in a biometric system. As biometric researchers we’ll try to fight the presentation attacks.

Anti spoofing is becaming harder, thanks to deep-fake, that are a dangerous technology and in USA are illegal during election time, but of course is not enough. A very good classifier is good enough, for this reason is necessary to implement multi-modal authentication systems to make much more difficoult to conduct spoofing attacks.

2.1. Verification

In verification there are different possible errors. The kind of performance metrics that we use depends on the application, so we’ll use different methods to compute the performances of verification systems from those used on indentification.

In verification we are in the setting in which there is a subject that claims an identity (in an explicit or implicit way), we say that a subject is accepted if the similarity achieved from matching with the gallery templates corresponding to the claimed identity is greather or equal to the acceptance threshold. Some measures may range in different intervals, so for this reason a normalization phase is needed, so it is important to map every feature to a common interval.

The concept of threshold embed the flexibility of the system, it is very rare to have a 0 distance system, the threshold on the other hand says that the distance must be below a certain value to accept the identy claim (and in an inverse fashion in case of similarity). One of the goals of the System Evaluation is to compute and choose the optimal threshold.

2.1.0.1. Claimed identity is true, but the system reject

The comparison didn’t meet the threshold despite the genuine claim. Type I Error, False Known Match or False Rejection. False Match Rate is the error done by the classifier, False Rejection can also be caused by other elements, like a fail to enroll that can depends on the sensor.

2.1.0.2. An impostor subject is accepted.

This kind of error is called Type II Error, and it can be measured via the False Acceptance Rate.

Between a genuine accept or reject there also can be a middle approach that consider a human in the loop or a repetition of the verification procedure.

2.1.1. Perfomance Evaluation

It is done via comparing the rates, the absolute numbers of error is non sufficient because the proportions of genuine and impostor users is unknown; for this reason we use the rates:

  • False Acceptance Rate is the most critical for security aims, beacuse it means that an unauthorized person can access confidential data. It is defined as the percentage of recognition operation with an impostor claim that are wrongly accepted. The reference number is not the total number of operation but the number of cases in which the system had to reject.
  • False Rejection Rate, it is defined as the percentage of recognition operation with a genuine user that are wrongly rejected.

Of course the system evaluation has to be performed also on-site, but it is foundamental to perform a good evaluation. In practice, the most common measures for evaluate verification are:

  • FAR
  • FRR,
  • ERR, equal error rate; the only point
  • DET, Detection Error Trade Off
  • ROC, Receiving Operating Curve

FAR and FRR are computed with a moving threshold constructing two curves that have opposit behaviour (FAR and FRR are inversly porportional, so it is not possible to optimize both of them). The Equal Error Rate is the crossing point of the two curves, so it is the value of the threshold in which the probability of FAR and FRR are equal.

When carrying out system evaluation we have ground true, assume that we have a ground thruth function `id(template)` that return the true identity associate with that gallery template, or probe template, \(i\) is the identity claimed by a probe \(p_j\). `topMatch(\(p_j\), \(identity\))` is the function that return the best match between a probe template and a gallery template(s) associated to the claimed identity in the gallery. `s(\(t1\), \(t2\))`, its the similarity meatures that returns the similarity between templates (they can be probe or gallery templates).

🏠 Home

Author: Andrea Ercolino

Created: 2022-10-13 gio 17:28