AI Police Reports: Year in Review

1 month ago (eff.org)

What worries me is that _a lot of people seem to see LLMs as smarter than themselves_ and anthropmorphize them into a sort of human-exact intelligence. The worst-case scenario of Utah's law is that when the disclaimer is added that the report is generated by AI, enough jurists begin to associate that with "likely more correct than not".

  • Reading how AI is being approached in China, the focus is more on achieving day to day utilty, without eviscerating youth employment.

    In contrast, the SV focus of AI has been about skynet / singularity, with a hype cycle to match.

    This is supported by the lack of clarity on actual benefits, or clear data on GenAI use. Mostly I see it as great for prototyping - going from 0 to 1, and for use cases where the operator is highly trained and capable of verifying output.

    Outside of that, you seem to be in the land of voodoo, where you are dealing with something that eerily mimics human speech, but you don't have any reliable way of finding out its just BS-ing you.

    • I’m not seeing the dichotomy as much as you do.

      Are they not going to build a “skynet” in China? Second, building skynet doesn’t imply eviscerating youth employment.

      On the other hand, automation of menial tasks does eviscerate all kinds of employment, not only youth emoloyment.

  • One problem here is "smarter" is an ambiguous word. I have no problem believing the average LLM has more knowledge than my brain; if that's what "smarter" means, them I'm happy to believe I'm stupid. But I sure doubt an LLM's ability to deduce or infer things, or to understand its own doubts and lack of knowledge or understanding, better than a human like me.

    • Yeah my thought is that you wouldn't trust a brain surgeon who has read every paper on brain surgery ever written but who has never touched a scalpel.

      Similarly, the claim is that ~90% of communication is nonverbal, so I'm not sure I would trust a negotiator who has seen all of written human communication but never held a conversation.

  • > a lot of people seem to see LLMs as smarter than themselves

    Well, in many cases they might be right..

    • As far as I can tell from poking people on HN about what "AGI" means, there might be a general belief that the median human is not intelligent. Given that the current batch of models apparently isn't AGI I'm struggling to see a clean test of what AGI might be that a human can pass.

      27 replies →

    • > ChatGPT (o3): Scored 136 on the Mensa Norway test in April 2025

      So yes, most people are right in that assumption, at least by the metric of how we generally measure intelligence.

      8 replies →

  • Maybe it's just my circle, but anecdotally most of the non-CS folks I know have developed a strong anti-AI bias. In a very outspoken way.

    If anything, I think they'd consider AI's involvement as a strike against the prosecution if they were on a jury.

    • A core problem with humans, or perhaps it's not even a problem, just something that takes a long time to recognize, is that they complain and hate on something that they continue to spend money on.

      Not like food or clothing, but stuff like DLC content, streaming services, and LLMs.

      2 replies →

    • Why do people in your circle not like AI? I have similar a experience about friends and family not liking AI, but usually it’s due to water and energy reasons, not because of an issue with the model reasoning

      4 replies →

  • > a lot of people seem to see LLMs as smarter than themselves

    I think the anthropomorphizing part is what messes with people. Is the autocomplete in my IDE smarter than I am? What about the search box on Google? What about a hammer or a drill?

    Yet, I will admit that most of the time I hear people complaining about how AI written code is worse than that produced by developers, but it just doesn't match my own experience - it's frankly better (with enough guidance and context, say 95% tokens in and 5% tokens out, across multiple models working on the same project to occasionally validate and improve/fix the output, alongside adequate tooling) than what a lot of the people I know could or frankly do produce in practice.

    That's a lot of conditions, but I think it's the same with the chat format - people accepting unvalidated drivel as fact, or someone using the web search and parsing documents and bringing up additional information that's found as a consequence of the conversation, bringing in external data and making use of the LLM ability to churn through a lot of it, sometimes better than the human reading comprehension would.

    • I think you're spot on here. It's the same idea as scammers and con artists; people can be convinced of things that they might rationally reject if the language is persuasive enough. This isn't some new exploit in human behavior or an epidemic of people who are less intelligent than before; we've just never had to deal with the amount plausible enough sounding coherent human language being almost literally unlimited before. If we're lucky, people will manage to adapt and update their mental models to be less trustworthy of things that they can't verify (like how most of us hopefully don't need to be concerned their older relatives will transfer their bank account contents to benevolent foreign royalties with the expectation of being rewarded handsomely). It's hard to feel especially confident in this though given how much more open-ended the potential deceptions are (without even getting into the question of "intent" from the models or the creators of them).

      2 replies →

  • AI is smarter than everyone already. Seriously, the breadth of knowledge the AI possesses has no human counterpart.

    • Just this weekend it (Gemini) has produced two detailed sets of instructions on how to connect different devices over bluetooth, including a video (that I didn’t watch), while the devices did not support doing the connections in that direction. No reasonable human reading the involved manuals would think those solutions feasible. Not impressed, again.

    • It's pretty similar to looking something up with a search engine, mashing together some top results + hallucinating a bit, isn't it? The psychological effects of the chat-like interface + the lower friction of posting in said chat again vs reading 6 tabs and redoing your search, seems to be the big killer feature. The main "new" info is often incorrect info.

      If you could get the full page text of every url on the first page of ddg results and dump it into vim/emacs where you can move/search around quickly, that would probably be similarly as good, and without the hallucinations. (I'm guessing someone is gonna compare this to the old Dropbox post, but whatever.)

      It has no human counterpart in the same sense that humans still go to the library (or a search engine) when they don't know something, and we don't have the contents of all the books (or articles/websites) stored in our head.

      20 replies →

    •   > the breadth of knowledge
      

      knowledge != intelligence

      If knowledge == intelligence then Google and Wikipedia are "smarter" than you and the AGI problem has been solved for several decades.

      1 reply →

    • Even if we were going to accept the premise that total knowledge is equivalent to intelligence (which is silly, as sibling comments have pointed out), shouldn't accuracy also come into play? AI also says a lot more obviously wrong things than the average person, so how do you weight that against the purported knowledge? You could answer yes or no randomly to any arbitrary question about whether something is true and approximate a 50% accuracy rate with an evenly distributed pool of questions, but that's obviously not proof that you know everything. I don't think the choice of where to draw the line on "how often can you be wrong and have it still matter" is as easy as you're implying, or that everyone will necessarily agree on where it lies (even if we all agree that 50% correctness is obviously way too low).

    • AI has more knowledge than everyone already, I wouldn't say smarter though. It's like wisdom vs intelligence in D+D (and/or life).. wisdom is knowing things, intelligence is how quick you can learn / create new things.

      2 replies →

    • It's like saying google search is smarter than everyone, amount of information indexed by it has no human counterpart, such a silly take...

I think whether any text is written with the help of AI is not the main issue. The real issue is that for texts like police reports a human still has to take full responsibility for its contents. If we preserve this understanding, than the question of which texts are generated by AI becomes moot.

  • Sadly justice system is a place where responsibility does not happen. It is not a system where you make one mistake and you are to prison. Instead everyone but the victims of the system are protected and colluded with. More you punish the victims better you make out.

  • I agree. A programmer has to take responsibility for the generated code they push, and so do police officers for the reports they file. Using a keyboard does not absolve you of typos, it's your responsibility to proofread and correct, this is no different, just a lot more advanced.

    Of course the problem is also that police often operates without any real oversight and covers up more misconduct than workers in an under-rug sweeping factory. But that's another issue.

    • > But that's another issue.

      ...is it?

      It seems to me that the growth of professional police as an institution which bears increased responsibility for public safety, along with an ever-growing set of tools that can be used to defer responsibility (see: it's not murder if it's done with a stun gun, regardless of how predictable these deaths are), are actually precisely the same issue.

      Let's stop allowing the state to hide behind tooling, and all be approximately equally responsible for public safety.

  • Yes. Allowing officers to blame AI creates a major accountability gap. Per e.g. the EU AI Act’s logic, if a human "edits" a draft, they must be held responsible and do not need to disclose the use of AI.

    To ensure safety, those offerings must use premarket red teaming to eliminate biases in summarization. However, ethical safety also requires post-market monitoring, which is impossible if logs aren't preserved. Rather than focusing on individual cases, I think, we must demand systemic oversight in general and access for independent research (not only focussing on a specific technology)

    • It should be treated kind of the same as writing a report after a glass of wine. Probably no one really cares but "sorry that doesn't count because I was intoxicated when I wrote that bit" isn't going to fly.

  • > for texts like police reports

    If what you mean is, "texts upon which the singular violence of the state is legitimately imposed", then a simple solution (and I believe, on sufficiently long time scales, the happily inevitable one) is to abolish police.

    I can't fathom, in an age where we have ubiquitous cameras as eyewitnesses, instant communications capability to declare emergencies and request aid from nearby humans, that we need an exclusivity entity whose job it is to advance safety in our communities. It's so, so, so much more trouble that it's worth.

  • I don’t understand the urgency to replace human work with AI. Why is every organization so eager about skipping the AI as an assistant step? Here there are already massive productivity gains in using the AI to create the draft of the report, it makes little economical to make it do the final version compared to the risk, maybe it’s just plain laziness? Same with developers, why is very organization wanting to leapfrog from humans write all the code to they don’t even read the generated code?

    • Not everyone is in an urgent hurry to replace people with bots; that's a hyperbolic construct.

      But to try to answer some of what I think you're trying to ask about: The bot can be useful. It can be better at writing a coherent collection of paragraphs or subroutines than Alice or Bill might be, and it costs a lot less to employ than either of them do.

      Meanwhile: The bot never complains to HR because someone looked at them sideways. The bot [almost!] never calls in sick; the bot can work nearly 24/7. The bot never slips and falls in the parking lot. The bot never promises to be on-duty while they vacation out-of-state with a VPN or uses a mouse-jiggler to screw up the metrics while they sleep off last night's bender.

      The bot mostly just follows instructions.

      There's lots of things the bot doesn't get right. Like, the stuff it produces may be full of hallucinations and false conclusions that need reviewed, corrected, and outright excised.

      But there's lots of Bills and Alices in the world who are even worse, and the bot is a lot easier and cheaper to deal with than they are.

      That said: When it comes to legal matters that put a real person's life and freedom in jeopardy, then there should be no bot involved.

      If a person in a position of power (such as a police officer) can't write a meaningful and coherent report on their own, then I might suggest that this person shouldn't ever have a job where producing written reports are a part of their job. There's probably something else they're good at that they can do instead (the world needs ditchdiggers, too).

      Neither the presence nor absence of a bot can save the rest of us from the impact of their illiteracy.

      1 reply →

    • Because the biggest cost at a lot of orgs is staff. Your typical software shop will be comical—the salary costs towering down on all the others like LeBron James gazing down at ants. The moment you go from productivity gains to staff reduction you start making real money. Any amount of money for a machine that can fully replace a human process.

> That means that if an officer is caught lying on the stand – as shown by a contradiction between their courtroom testimony and their earlier police report – they could point to the contradictory parts of their report and say, “the AI wrote that.”

Normally, if a witness (e.g. a police officer) were found to be recounting something written by a third party, it would be considered hearsay and struck from the record (on objection).

It would be an interesting legal experiment to have an officer using this system swear to which portions they wrote themselves, and attempt to have all the rest of the testimony disallowed as hearsay.

  • I’d suspect the other direction:

    Police unions get LLMs classified as some kind of cognitive aid, so it becomes discrimination to ban them in school or the workplace.

    • "Losing access to LLMs hurts minorities the hardest, with job performance suffering compared to their cis white male peers..."

      If they use this angle, it's a shoo-in

    • That is an aspect I had not considered in my assumptions that AI/robots will eventually go through the same/similar social justice process as all the other causes, i.e., women’s suffrage, racial equality, gay rights, etc. because it will ultimately and, arguably, more than all the other prior social justice causes célèbres, serve the ruling class that has risen to dominate through social justice causes far more than anything prior.

      It’s going to be interesting to see the state propaganda against the bigots and evil bioists (or whatever the word smithing apparatchiks will devise) so want to bar the full equality in society of AI/robots who look just like you and me after all and also just want equal rights to love each other, and who are you to oppose others since we are all just individuals?

      Shoot the messenger all you want, but it’s coming.

      2 replies →

I recommend taking a look at this video to get an idea behind the through process (or lack thereof) law enforcement might display when provided with a number of "AI" tools, and even if this one example is closer to traditional face recognition than LLMs, the behavior seems the same. Spoiler: complete submission and deference, and in this specific case to a system that was not even their own.

https://www.youtube.com/watch?v=B9M4F_U1eEw

  • I can read that "submission and deference" at the casino as conflict avoidance, the arresting officer says to his peers at the station that he "kind of believes" the suspect. He also states at some point that he can't cite (and I infer then release) the suspect because he is not certain who he is, and therefore has to arrest him as a "John Doe" so that his identity can be established. The fact (?) that the suspect now has a police record for this possible farce won't be settled until after the facts are determined in a court of law.

    This video demonstrates that when it comes down to it the blunt end of law enforcement is oftentimes a shit show of "seems to work for me" and that goes for facial recognition, shot spotter, contraband dogs, drug & DNA tests, you name it.

> important first step in reigning in AI police reports.

That should be 'reining in'. "Reign" is -- ironically - - what monarchs do.

I find this article strange in its logic. If the use of AI generated content is problematic as a principle I can understand the conflict. Then no AI should be used to "transcribe and interpret a video" at all - period. But if the concern is accuracy in the AI "transcript" and not the support from AI as such, isn't it a good thing that the AI generated text is deleted after the officer has processed the text and finalized their report?

That said, I believe it is important to aknowlegde the fact that human memory, experience and interpretation of "what really happened" is flawed, isn't that why the body cameras are in use in the first place? If everyone believed police officers already where able to recall the absolute thruth of everything that happens in situations, why bother with the cameras?

Personally I do not think it is a good idea to use AI to write full police reports based on body camera recordings. However, as a support in the same way the video recordings are available, why not? If, in the future, AI will write accurate "body cam" based reports I would not have any problems with it as long as the video is still available to be checked. A full report should, in my opinion, always contain additional contextual info from the police involved and witnesses to add what the camera recordings not necessarily reflect or contain.

  • My worry is at scale AI from one vendor can introduce biases. We wont know what those biases are. But whatever they are the same bias affects all reports.

    • That is something to worry about, agreed. So, the quality and the reliance of AI is what we should focus on. In addition we should be able to keep track (and records of) how AI has used and build its narrative and conclutions.

  • The EFF's angle is that the police can use an LLM's initial report maliciously to 1) let incriminating inaccuracies generated by the LLM stand or 2) fabricate incriminating inaccuracies. Afterwards, because the LLM generated the initial report, the officer would have plausible deniability to say they themselves didn't intentionally lie, they were just negligent in editing the initial report. So it's about accountability washing.

  • >That said, I believe it is important to aknowlegde the fact that human memory, experience and interpretation of "what really happened" is flawed, isn't that why the body cameras are in use in the first place? If everyone believed police officers already where able to recall the absolute thruth of everything that happens in situations, why bother with the cameras?

    Police tend to not tell the truth, on purpose.

> In July of this year, EFF published a two-part report on how Axon designed Draft One to defy transparency. Police upload their body-worn camera’s audio into the system, the system generates a report that the officer is expected to edit, and then the officer exports the report. But when they do that, Draft One erases the initial draft, and with it any evidence of what portions of the report were written by AI and what portions were written by an officer. That means that if an officer is caught lying on the stand – as shown by a contradiction between their courtroom testimony and their earlier police report – they could point to the contradictory parts of their report and say, “the AI wrote that.” Draft One is designed to make it hard to disprove that.

> Axon’s senior principal product manager for generative AI is asked (at the 49:47 mark) whether or not it’s possible to see after-the-fact which parts of the report were suggested by the AI and which were edited by the officer. His response (bold and definition of RMS added):

“So we don’t store the original draft and that’s by design and that’s really because the last thing we want to do is create more disclosure headaches for our customers and our attorney’s offices.

Policing and Hallucinations. Can’t wait to see this replicated globally.

  • Does the officer not take full ownership of the report once they edit it? If they got an intern to write a report and then they signed off on it, they’d be responsible, right?

This does sound problematic, but if a police officer's report contradicts the body-worn camera or other evidence, it already undermines their credibility, whether they blame AI or not. My impression is that police don't usually face repercussions for inaccuracies or outright lying in court.

> That means that if an officer is caught lying on the stand – as shown by a contradiction between their courtroom testimony and their earlier police report

The bigger issue, that the article doesn't cover, is that police officers may not carefully review the AI generated report, and then when appearing in court months or years later, will testify to whatever is in the report, accurate or not. So the issue is that the officer doesn't contradict inaccuracies in the report.

  • > My impression is that police don't usually face repercussions for inaccuracies or outright lying in court.

    That's because it's a very difficult thing to prove. Bad memories and even completely false memories are real things.

Upvoted because I think it's an important topic, but this take causes me to question the motive for the article... which ironically is my big concern with using LLMs to write stuff generally (the unconscious censoring / proctoring of voice and viewpoint):

  That means that if an officer is caught lying on the stand – as shown by a
  contradiction between their courtroom testimony and their earlier police
  report – they could point to the contradictory parts of their report and say,
  “the AI wrote that.”

IANAL but if they signed off on it then presumably they own it. Same as if it was Microsoft Dog, an intern, whatever. If they said "the AI shat it" then I'd ask "what parts did you find unacceptable and edit?" and then expect we'd get the juicy stuff hallucinations or "I don't recall". Did they write this, or are they testifying to the veracity of hearsay?

From what I've seen reports written by / for lawyers / jurists / judges already "pull" to a voice and viewpoint; I'll leave it there.

> But when they do that, Draft One erases the initial draft, and with it any evidence of what portions of the report were written by AI and what portions were written by an officer. That means that if an officer is caught lying on the stand – as shown by a contradiction between their courtroom testimony and their earlier police report – they could point to the contradictory parts of their report and say, “the AI wrote that."

This seems solvable by passing a law that makes the officer legally responsible for the report as if he had written it. He doesn't get to use this excuse in the courtroom and it gets stricken from the record if he tries. That honestly seems like a better solution than storing the original AI-generated version, because that can reinforce the view that AI wrote it to jurors, even if the officer reviewed it and decided it was correct at the time.

  • Yeah this seems like an obvious solution, which axon ought to be on board with since it protects them.

    When juniors use the excuse “oh Claude wrote that” in a PR, I tell them if the PR has their name on it, they wrote it - and their PRs are part of their performance review. This is no different

The solution is to remove the temptation to hit a "looks good" button. A non-shitty AI tool could return a bullet list of events and time stamps:

- officer said "Freeze!" at 3:12:34am - "my arm!" said person #2 at 3:12:48am

Then the office must at a minimum reformat the events into text and add their own details. Again, AI is a tool and it is immensely useful if we treat it like a tool and not a panacea.

The experiments of AI agents sending emails to grown-ups are good I think – AIs are doing much more dangerous stuff like these AI Police Reports. I don't think making a fuss over every agent-sent email is going to cause other AI incursion into our society to slow down. The Police Report writer is a non-human partially autonomous participant like a K9 officer. It's wishful thinking that AIs aren't going to be set loose doing jobs. The cat is out of the bag.

  • I'm curious about this claim. What about agents sending emails to each other is good?

    • Turing test type stuff. Having agents try to participate in the overall social community, by sending a cold email. The reason I think it's good is because with how AI is increasingly used a lot of different sorts of research is needed. A citizen could have their police report written by an AI without them opting in, just like Rob Pike could get a cold email.

      1 reply →

To me it’s a question of it they are on average better. It’s not like human based input is perfect either.

  • Do mistakes by the LLM carry the same weight as mistakes by the officer? Can the case be thrown out for an LLM mistake?

    Does the officer face repercussions for introducing false information via this tool?