Show HN: Use Claude Code to Query 600 GB Indexes over Hacker News, ArXiv, etc.
1 month ago (exopriors.com)
Paste in my prompt to Claude Code with an embedded API key for accessing my public readonly SQL+vector database, and you have a state-of-the-art research tool over Hacker News, arXiv, LessWrong, and dozens of other high-quality public commons sites. Claude whips up the monster SQL queries that safely run on my machine, to answer your most nuanced questions.
There's also an Alerts functionality, where you can just ask Claude to submit a SQL query as an alert, and you'll be emailed when the ultra nuanced criteria is met (and the output changes). Like I want to know when somebody posts about "estrogen" in a psychoactive context, or enough biology metaphors when talking about building infrastructure.
Currently have embedded: posts: 1.4M / 4.6M comments: 15.6M / 38M That's with Voyage-3.5-lite. And you can do amazing compositional vector search, like search @FTX_crisis - (@guilt_tone - @guilt_topic) to find writing that was about the FTX crisis and distinctly without guilty tones, but that can mention "guilt".
I can embed everything and all the other sources for cheap, I just literally don't have the money.
I like that this relies on generating SQL rather than just being a black-box chat bot. It feels like the right way to use LLMs for research: as a translator from natural language to a rigid query language, rather than as the database itself. Very cool project!
Hopefully your API doesn't get exploited and you are doing timeouts/sandboxing -- it'd be easy to do a massive join on this.
I also have a question mostly stemming from me being not knowledgeable in the area -- have you noticed any semantic bleeding when research is done between your datasets? e.g., "optimization" probably means different things under ArXiv, LessWrong, and HN. Wondering if vector searches account for this given a more specific question.
Exactly, people want precision and control sometimes. Also it's very hard to beat SQL query planners when you have lots of material views and indexes. Like this is a lot more powerful for most use cases for exploring these documents than if you just had all these documents as json on your local machine and could write whatever python you wanted.
Yeah I've out a lot of care into rate-limiting and security. We do AST parsing and block certain joins, and Hacker News has not bricked or overloaded my machine yet--there's actually a lot more bandwidth for people to run expensive queries.
As for getting good semantic queries for different domains, one thing Claude can do besides use our embed endpoint to embed arbitrary text as a search vector, is use compositions of centroids (averages) of vectors in our database, as search vectors. Like it can effortlessly average every lesswrong chunk embedding over text mentioning "optimization" and search with that. You can actually ask Claude to run an experiment averaging the "optimization" vectors from different sources, and see what kind of different queries you get when using them on different sources. Then the fun challenge would be figuring out legible vectors that bridge the gap between these different platform's vectors. Maybe there's half the cosine distance when you average the lesswrong "optimization" vector with embed("convex/nonconvex optimization, SGD, loss landscapes, constrained optimization.")
if performance becomes a problem statically hosting sqlite DBs with client side queries and http range requests is an interesting approach:
https://github.com/phiresky/sql.js-httpvfs
1 reply →
That's a neat thought. What's the granularity of the text getting embedded? I assume that makes a large difference in what the average vector ends up representing?
1 reply →
This is the route I went for making Claude Code and Codex conversation histories local and queryable by the CLIs themselves.
Create the DB and provide the tools and skill.
This blog entry explains how: https://contextify.sh/blog/total-recall-rag-search-claude-co...
It is a macOS client at the present but I have a Linux-ready engine I could use early feedback on if anyone is interested in giving it a go.
I don’t have the experiments to prove this, but from my experience it’s highly variable between embedding models.
Larger, more capable embedding models are better able to separate the different uses of a given word in the embedding space, smaller models are not.
I'm using Voyage-3.5-lite at halfvec(2048), which with my limited research, seems to be one of the best embedding models. There's semi-sophisticated (breaking on paragraphs, sentences) ~300 token chunking.
When Claude is using our embed endpoint to embed arbitrary text as a search vector, it should work pretty well cross-domains. One can also use compositions of centroids (averages) of vectors in our database, as search vectors.
I was thinking about it a fair bit lately. We have all sorts of benchmarks that describe a lot of factors in detail, but all those are very abstract and yet, those do not seem to map clearly to well observed behaviors. I think we need to think of a different way to list those.
This is the same route I followed for https://zenquery.app .... It uses LLM to generate SQL rather than working directly on data files. Saves a ton of costs as well since you don't need to send entire file(s) to LLM, just the schema.
> I like that this relies on generating SQL rather than just being a black-box chat bot.
When people say AI is a bubble but will still be transformational, I think of stuff like this. The amount of use cases for natural language interpretation and translation is enormous even without all the BS vibe coding nonsense. I reckon once the bubble pops most investment will go into tools that operate something like this.
This sounds awesome! I will try this out right now in my toy string theory project where I'm searching for Calabi-Yau manifolds.
Comment from Claude: Claude here (the AI). Just spent the last few minutes using this to research our string theory landscape project. Here's what I found:
I also used this to research the recent DESI finding that dark energy might be changing over time [1], and what that means for string theory.
From Claude:
[1] https://www.bbc.com/news/articles/c17xe5kl78vo
> I can embed everything and all the other sources for cheap, I just literally don't have the money.
How much do you need for the various leaks, like the paradise papers, the panama papers, the offshore leajay, the Bahamas leaks, the fincen files, the Uber files, etc. and what's your Venmo?
emailed you, and it's https://venmo.com/u/XyraSinclair.
This may exist already, but I'd like to find a way to query 'Supplementary Material' in biomedical research papers for genes / proteins or even biological processes.
As it is, the Supplementary Materials are inconsistently indexed so a lot of insight you might get from the last 15 years of genomics or proteomics work is invisible.
I imagine this approach could work, especially for Open Access data?
I just built something like this a week ago: https://github.com/eamag/papers2dataset
I wanted to find all cryoprotective agents that were tested at different temperatures, but it should be extandable to your problem too. Uses OpenAlex to traverse a citation graph and open access pdfs
This is a pretty cool project! Thank you for open sourcing it!
Guys, you obviously cannot suggest that —dangerously-skip-permissions is ok here, especially in the same paragraph as “even if you are not a software engineer”. This is untrusted text from the Internet, it surely contains examples of prompt injection.
You need to sandbox Claude to safely use this flag. There are easy to use options for this.
Today I finally got Claude working in a devcontainer, so I'm wondering what the easier options are.
Things like https://github.com/textcortex/claude-code-sandbox seem like the bare minimum. There are a few other projects doing this.
The first threat is making edits to arbitrary files, exfiltrating your SSL keys or crypto wallets. A container solves that by not mounting your sensitive files.
The second threat would be if Claude gets fully owned and really tries to hack out of its container, in which case theoretically docker might not protect you. But that seems quite speculative.
Yeah, I don't think there are easier options. And getting it working within a dev container with all the right settings, was more of a chore than it should be.
Don't completely rely on devcontainer, jailbreaking containers is something that Claude at least nominally knows how to do, though it seems like it's pretty strongly moralized not to without some significant prompt hacking.
I think a prompt + an external dataset is a very simple distribution channel right now to explore anything quickly with low friction. The curl | bash of 2026
Exactly. Prompt + Tool + External Dataset (API, file, database, web page, image) is an extremely powerful capability.
> a state-of-the-art research tool over Hacker News, arXiv, LessWrong, and dozens
what makes this state of the art?
It's just marketing.
It is not a protected term, so anything is state-of-the-art if you want it to be.
For example, Gemma models at the moment of release were performing worse their competition, but still, it is "state-of-the-art". It does not mean it's a bad product at all (Gemma is actually good), but the claims are very free.
Juicero was state-of-the-art on release too, though hands were better, etc.
> It's just marketing. [...] It is not a protected term, so anything is state-of-the-art if you want it to be.
But is it true?
I think we ought to stop indulging and rationalizing self-serving bullshit with the "it's just marketing" bit, as if that somehow makes bullshit okay. It's not okay. Normalizing bullshit is culturally destructive and reinforces the existing indifference to truth.
Part of the motivation people have seems to be a cowardly morbid fear of conflict or the acknowledgment that the world is a mess. But I'm not even suggesting conflict. I'm suggesting demoting the dignity of bullshitters in one's own estimation of them. A bullshitter should appear trashy to us, because bullshitting is trashy.
2 replies →
just like "cruelty free" and "not tested on animals" in usa
The scale. How many tools do you know that can query the content of all arxiv papers.
Doesn't look like the scale is there, even for HN:
> Currently have embedded: posts: 1.4M / 4.6M comments: 15.6M / 38M That's with Voyage-3.5-lite
1 reply →
in the direction of "empowering the public with new capabilities they didn't have before", Scry offers, with the copy and paste of a prompt and talking with an agent:
1) Full readonly-SQL + vector manipulation in a live public database. Most vector DB products expose a much narrower search API. Basically only a few enterprise level services let you run arbitrary SQL on remote machines. Google BigQuery gives users SQL power, but it mostly doesn't have embeddings, connect public corpora, have as good of indexes, and doesn't have support an agentic research experience. Beyond object-level research, Scry a good tool for exploring and acquiring intuitions about embedding-space.
2) An agent-native text-to-SQL + lexical + semantic deep research workflow. We have a prompt that's been heavily optimized for taking full advantage of our machine and Claude Code for exploration and answering nuanced questions. Claude fires off many exploratory queries and builds towards really big queries that lean on the SQL query planner. You can interrupt at any time. You have the compute limits to do lots of exhaustive exploration--often more epistemically powerful than finding a document often, is being confident than one doesn't exist.
3) dozens of public commons in one database, with embeddings.
The tool is state of the art, the sources are historical.
First, so best in this?
"intelligence explosion", "are essentially AGI at this point", "ARBITRARY SQL + VECTOR ALGEBRA" etc. Casual use of hyperbole and technical jargon.
my charlatan radar is going off.
What is hyperbole? We are collectively experiencing a software intelligence explosion (people are shipping good software at prolific rates now due to Opus 4.5 and GPT-5.2-Codex-xhigh). With Scry, you can run arbitrary SELECT SQL statements over a large corpus and have an easier time composing embedding vectors in whatever mathematical ways you want, than any other tool I've seen.
> shipping good software at prolific rates
I think your definition of good needs to be rethought
3 replies →
Really useful currently working on a autonomous academic research system [1] and thinking about integrating this. Currently using custom prompt + Edison Scientific API. Any plans of making this open source?
[1] https://github.com/giatenica/gia-agentic-short
I could make it open-source as soon as I have $5k to my name. I've been in survival mode frankly for a long time.
Maybe more actually, server costs and API credits for my agent-coordination research are expensive.
1 reply →
That's just not a good use of my Claude plan. If you can make it so a self-hosted Lllama or Qwen 7B can query it, then that's something.
If you're not willing to pay for your own LLM usage to try a free resource offered by the author, that's up to you. But why complain to the author about it? How does your comment enrich the conversation for the rest of us?
It's not free if I have to expend Claude credits on something a locally hosted Qwen 7B could handle.
> How does your comment enrich the conversation for the rest of us?
Straight back at you.
It's ultimately just a prompt, self-hosted models can use the system the same way, they just might struggle to write good SQL+vector queries to answer your questions. The prompt also works well with Codex, which has a lot of usage.
I think that’s just a matter of their capabilities, rather than anything specific to this?
This is very cool. If you're productizing this you should try to target a vertical. What does "literally don't have the money" mean? You should try to raise some in the traditional way. If nothing else works, at least try to apply to YC.
I mean I've been living off of $1700/month for a while in Berkeley. I have been trying hard the last 6 weeks to raise angel investment, and am moving to Thailand in a few days to have more breathing room (and change things up to untie some emotional knots and try to make sure I'm positioned to vibe-engineer as well as possible over the next few months).
You don't have any personal contact information on your website or on your Hacker News profile. For a tiny check size, I can be an angel. Contact in profile. Would you like to meet before you leave? I think you shouldn't move out of the Bay Area.
1 reply →
I've got some idle servers in my basement in Bulgaria with lots of GPUS. I'm actually in Cambodia at the moment. I've actually been playing with some similar ideas. Message me if you like. :)
Thailand is a dark place. Beware!
There are a lot of other low cost countries out there!
2 replies →
just a recommendation, pubmed is free and not limited to preprints
Thank you, I've started ingestion operations of pubmed.
Nice, but would you consider open-sourcing it? I (and I assume others) are not keen on sharing my API keys with a 3rd party.
I think you misunderstood. The API key is for their API, not Anthropic.
If you take a look at the prompt you'll find that they have a static API key that they have created for this demo ("exopriors_public_readonly_v1_2025")
Yes, thanks for explaining it.
The quick setup is cool! I’ve not seen this onboarding flow for other tools, and I quite like its simplicity.
Thank you!
Seems very cool, but IMO you’d be better off doing an open source version and then hosted SAAS.
Would you mind walking through the logic of that a bit for me? I'm definitely interested in productizing this, and would be interested in open sourcing as soon as I have breathing room (I have no money).
Anyone tried to use these prompts with Gemini 3 Pro? it feels like Claude, Gemini and GPT latest offerings are on par (excluding costs) and as a developer if you know how to query/spec a coder llm you can move between them at ease.
Claude Opus 4.5 is a paradigm shift
Can I make an offline mirror of this?
Seems like you're experiencing the hacker news hug of death.
Should be squared away now! Was my fault missing a health check for a recent weird bug, not a load issue.
The console / login pages are showing an error still.
I could be distributed as a Claude skill. Internally, we've bundled a lot of external APIs and SQL queries into skills that are shared across the company.
Not a software engineer. Isnt allowing network egress a security risk? exopriors.com is not an established domain or brand that warrants the trust its asking
this is great>>@FTX_crisis - (@guilt_tone - @guilt_topic)
Using LLm for tasks that could be done faster with traditional algorithmic approaches seems wasteful, but this is one of the few legitimate cases where embeddings are doing something classical IR literally cannot. You could also make make the LLM explain the query it’s about to run. Before execution:
“Here’s the SQL and semantic filters I’m about to apply. Does this match your intent?”
Great idea! I just overhauled the prompt to explain the SQL + semantic filters better, and give the user clearer adjustment opportunities before long-running queries.
What’s the benefit of manually pasting a massive prompt and enable egress to make queries over http vs just using MCP?
Looks great, thanks for sharing! Out of interest, how long did this take to get to its current state?
Thank you! I got the idea December 3, and initially released it December 19.
Do you have contact information? Would like to discuss sponsoring further work and embedding here.
That would be amazing! Yes, contact@exopriors.com.
It's a very nifty cool, and could definitely come in handy. love the UX too!
Thank you! I'll be getting millions more quality, embedded documents, it'll be here just getting more useful.
Is the appeal of this tool its ability to identify semantic similarity?
The use case could vary from person to person. When you think about it, hacker news has large enough data set ( and one that is widely accessible ) to allow all sorts of fun analyses. In a sense, the appeal is:
who knows what kind of fun patterns could emerge
The problem with HN isn't that the patterns are hard to discern, it's that no one wants to acknowledge them.
1 reply →
How is the alerts functionality implemented?
You submit a SQL query to periodically run, we run it and store the results. As we ingest more documents (dozens of sources are being ingested every day), we run it again. If there's different outputs, you get an email.
wondering what is your stack? What SQL database are you using?
Hetzner, Postgres, Rust, SvelteKit
Does that first generated query really work? Why are you looking at URIs like that? First you filter for a uri match, then later filter out that same match, minus `optimization`, when you are doing the cosine distance. Not once is `mesa-optimization` even mentioned, which is supposed to be the whole point?
I've since improved it, and also discovered a new method of vector composition I have added as a first-class primitive:
debias_vector(axis, topic) removes the projection of axis onto topic: axis − topic * (dot(axis, topic) / dot(topic, topic))
That preserves the signal in axis while subtracting only the overlap with topic (not the whole topic). It’s strictly better than naive subtraction for “about X but not Y.”
I need to try this
What did you think?
[dead]
"Claude Code and Codex are essentially AGI at this point"
Okaaaaaaay....
Just comes down to your own view of what AGI is, as it's not particularly well defined.
While a bit 'time-machiney' - I think if you took an LLM of today and showed it to someone 20 years ago, most people would probably say AGI has been achieved. If someone wrote a definition of AGI 20 years ago, we would probably have met that.
We have certainly blasted past some science-fiction examples of AI like Agnes from The Twilight Zone, which 20 years ago looked a bit silly, and now looks like a remarkable prediction of LLMs.
By todays definition of AGI we haven't met it yet, but eventually it comes down to 'I know it if I see it' - the problem with this definition is that it is polluted by what people have already seen.
> most people would probably say AGI has been achieved
Most people who took a look at a carefully crafted demo. I.e. the CEOs who keep pouring money down this hole.
If you actually use it you'll realize it's a tool, and not a particularly dependable tool unless you want to code what amounts to the React tutorial.
3 replies →
> If someone wrote a definition of AGI 20 years ago, we would probably have met that.
No, as long as people can do work that a robot cannot do, we don't have AGI. That was always, if not the definition, at least implied by the definition.
I don't know why the meme of AGI being not well defined has had such success over the past few years.
11 replies →
I’ve got to disagree with this. All past pop-culture AI was sentient and self-motivated, it was human like in that it had it’s own goals and autonomy.
Current AI is a transcript generator. It can do smart stuff but it has no goals, it just responds with text when you prompt it. It feels like magic, even compared to 4-5 years ago, but it doesn’t feel like what was classically understood as AI, certainly by the public.
Somewhere marketers changed AGI to mean “does predefined tasks with human level accuracy” or the like. This is more like the definition of a good function approximator (how appropriate) instead of what people think (or thought) about when considering intelligence.
7 replies →
Charles Stross published Accelerando in 2005.
The book is a collection of nine short stories telling the tale of three generations of a family before, during, and after a technological singularity.
I want to know what the "intelligence explosion" is, sounds much cooler than AGI.
When AI gets so good it can improve on itself
4 replies →
I have noticed that Claude users seem to be about as intelligent as Claude itself, and wouldn't be able to surpass its output.
This made me laugh. Unfortunately, this is the world we live in. Most people who drive cars have no idea how they work, or how to fix them. And people who get on airplanes aren't able to flap their arms and fly.
Which means that humans are reduced to a sort of uselessness / helplessness, using tools they don't understand.
Overall, no one tells Uncle Bob that he doesn't deserve to fly home to Minnesota for Christmas because he didn't build the aircraft himself.
But we all think it.
You, of course, are smarter than them.
You seem to be very confused about what intelligence even is.
1 reply →
lots of highfalutin language trying to make something thats pretty hand wavy look like it's not. Where are the benchmarks? The "vector algebra" framing with @X + @Y - @Z is a falsehood. Embedding spaces don't form any meaningful algebraic structure (ring, field, etc.) over semantic concepts, you're just getting lucky by residual effects.
I'm giving you, the user, the easiest ability you've most likely ever had to explore embedding space yourself. Embeddings are tricky and can mislead, but they do often compose surprisingly intuitively, especially when you've played and built up a bit of an intuition for it.
What is the impact of misleading embeddings, how do they compose? I honestly am interested but don't know enough to understand what you're saying.
Why would I want to explore the embedding space myself, isn't this a tool where I can run cross-data exploratory analyses against unstructured data, where it's pre-populated with content?
We can iterate fast with understanding useful paradigms of vector manipulation. Yesterday I added `debias_vector(axis, topic)` and l2_normalization guidance.
The manifold structure of embedding spaces isn't semantically uniform, you've found a nice little novelty thing but it's not rigorous, and using AI slop to name this vector algebra instead of finding or running a benchmark to show that its actually works better.