Comment by ArkhamMirror

2 days ago

I got tired of expensive SaaS tools that want my sensitive documents in their cloud. I built ArkhamMirror to do forensic document analysis 100% locally, free and open source.

What makes this different:

Air-gapped: Zero cloud dependencies. Uses local LLMs via LM Studio (Qwen, etc.)

ACH Methodology: Implements the CIA's "Analysis of Competing Hypotheses" technique which forces you to look for evidence that disproves your theories instead of confirming them

Corpus Integration: Import evidence directly from your documents with source links

Sensitivity Analysis: Shows which evidence is critical, so if it's wrong, would your conclusion change?

The ACH feature just dropped with an 8-step guided workflow, AI assistance at every stage, and PDF/Markdown/JSON export with AI disclosure flags. It's better than what any given 3-lettered agency uses.

Tech stack: Python/Reflex (React frontend), PostgreSQL, Qdrant (vectors), Redis (job queue), PaddleOCR, Spacy NER, BGE-M3 embeddings.

All MIT licensed. Happy to answer questions about the methodology or implementation! Intelligence for anyone.

Links: Repo https://github.com/mantisfury/ArkhamMirror

ACH guide with screenshots at https://github.com/mantisfury/ArkhamMirror/blob/reflex-dev/d...

Ironically, is there a way to try this out in the cloud for people who want this tool who aren’t hyper worried about security?

It looks cool.

  • Thanks, glad to hear it!

    Short answer - no, not right now.

    However, instead of going through locally hosted docker and local LLMs, you could reroute it wherever you like, but I don't have a cloud option set up at this time.

    I'm focused on the developing the local, private applications myself, but nothing is stopping someone from hooking it up to stronger cloud-based stuff if they want.

    The good news is that my plans for this include making it more modular, so people have better options for what it does and how powerful it is.

I've been poking at something like this for a while now, almost exactly the same intention. I'm gonna try and contribute instead, this rules.

  • I'm very glad to hear that!

    Let me tell you this - This version of the toolkit is pretty monolithic and reflex is kind of a pain to work with for me. This version of the tool will be polished from here, but I hesitate to add more features to it since it already has like 35 pages of features.

    I'm about to release another version of the tool that's focused on modularity, so you anyone can mix and match the features they want instead of having to take the whole thing or nothing. ACH is going to be the first addon thing added, followed by the rest of the features.

What field are you in, sounds interesting that one would need such a tool?

  • It's not just for people doing interesting things. It just helps people answer questions about stuff. The stuff can be interesting or boring or dangerous or silly. The last question I tested the ACH tool on was "Did William Shakespeare really author all of the works he was credited for?" - You can use this stuff to research whatever you want. That's the point of it - it's no one's business what you are interested in getting to the bottom of.

    • I can say, from a business perspective, I've needed to use similar methodologies, though far from needing air-gap requirements and relying heavily on web search, to evaluate potentially fraudulent transactions and relationships between parties.

      What are the competing hypotheses, other than fraud, when a person makes a massive luxury purchase, but with red-flag-adjacent inconsistencies in other information provided? If we need to identify whether there's a weird or competitive ownership relationship behind a potential opportunity, how do we determine if an initial hypothesis about relationships is correct?

      If ArkhamMirror has an online mode with web search as a tool call, I'd be curious to try it out to automate some of these ACH-adjacent workflows.

      1 reply →

  • Description on the repo says it's for journalism, but I build similar rigs that I use for research in companies that have entered bankruptcy proceedings.

    Commonly there is a lot of information and it might as well be unstructured, and then I need to get answers quickly because my clients aren't going to pay me for going about it slowly.

    • It's mainly useful for journalism purposes, yes. Audit and compliance uses were also a consideration. It's a unified tool for right now, but I'm working on turning the base of it into the frame and adding individual shards for specialized applications.

Excellent! Thank you for releasing this.

Notice the "Knowledge Graph" feature that lets you "Visualize hidden connections between People, Orgs, and Places" just like the cork board meme.

This is the essence of what good "conspiracy theorists" do. Whenever investigative journalists uncover a conspiracy among the elite, they are talked down to and dismissed as "conspiracy theorists". But that is what good conspiracy theorists are: investigative journalists.

  • For sure - "conspiracy theorists" are just another group of people trying to find truth, patterns in the world and trying to connect the dots. The cork board feel was very much intentional in some of the visualizations. Specifically, the "lie web" visualization that uses "red yarn" visuals to connect detected contradictions across different entities and documents.

    If I had the skills, I would totally map that onto a cork board.