← Back to context

Comment by YeGoblynQueenne

10 days ago

Thanks for the link. Quoting from it:

* The National Physical Laboratory (NPL) tested the algorithm South Wales Police and the Metropolitan Police Service have been using for LFR.

* At the settings police use, the NPL found that for LFR there were no statistically significant differences in performance based on age, gender or ethnicity.

* There was an 89% chance of identifying someone on the specific watchlist of people wanted by the police, and at worst a 1 in 6,000 chance of incorrectly identifying someone on a watchlist with 10,000 images (known as a false alert). In practice, the false alert rate has been far better than this.

It looks like this is the report by the NPL:

science.police.uk/site/assets/files/3396/frt-equitability-study_mar2023.pdf

It's a big report and it'd take some time to go through it but it's clear that laboratory testing of a system deployed in the wild is not going to give accurate results, meaning the "89%" claimed is going to be significantly worse in reality. Anyway there's obvious limitations to the testing e.g. (from the report):

Large demographically balanced datasets: The testing of low error rates in a statistically significant manner requires large datasets. To achieve the required scale, the evaluation uses a supplementary reference image dataset of 178,000 face images (Filler dataset). This is an order of magnitude larger than the typical watchlist size of an operational Live Facial Recognition deployment. To avoid introducing a demographic bias due to reference dataset composition, a demographically balanced reference dataset was used, with equal numbers in each demographic category. For assessment of equitability under operational settings, the results from the large dataset are appropriately scaled to the size and composition of watchlist or reference image database of the operational deployment.

I'd say "uh-oh" to that. Unbalanced classes is a perenial source of error in evaluations. "Equal numbers in each demographic category" is an obvious source of unrealistic bias.

Anyway, I don't have the time to go through that with a fine toothed comb, but just the fact that they report a 100 False Positive Rate for "operator initiated facial recognition" is another big, hot, red flag.

Also, from the UK gov link above:

* The 10 LFR vans rolled out in August 2025 are using the same algorithm that was tested by the NPL.

There's a bit of ambiguity there. The police are using "the same algorithm" tested by the NPL, but are they using the same settings? The report uses specific settings to come up with its conclusions (e.g. a "face match" setting of 0.6 for LFR), but there's nothing to say the police stick to the same. Lots of room for manoeuvering left there, I'd say.

>> The company between Oosto claims 99%.

We can easily dismiss this just by looking at the two digits preceding the "%".