Show HN: Hallucinopedia

16 hours ago (halupedia.com)

I really like this first sentence: The Nights Templar were a monastic order active during the 9th century, primarily based in the Soot Valley.

This is fantastic. I couldn't find any obvious way to search for a new page, but you can simply bang out any arbitrary URL slug and the new article will be hallucinated fresh, eg:

https://halupedia.com/shortest-cave-in-the-world

https://halupedia.com/echolocation-ability-in-spiders

This is really cool, I just wish people wouldn't deface the website by submitting hateful speech as titles.

  • The 'all articles' section really is a dive into what happens when you allow unfiltered posting - it's a shame that it isn't clear how many individuals are creating this hateful and otherwise inappropriate titles - is it just 1 or 2 people, or has this been posted to 4chan or somewhere and there is a concerted effort to disrupt the site?

    Shame there isn't a way to flag pages for removal. I was going to point my kids at this site, and it could be a great learning tool for schools, but not currently something I'd share.

    • Interesting idea with flagging. We are considering 2 options: 1. You can generate aricle only if it was previously referenced in previous one 2. Flagging mechanism, now that you brought it up.

      Let me know what you think!

      1 reply →

It’s been defaced. It’s already got sex crimes and antisemitism all over the place.

  • The mistake they made was allowing visitors to trigger the generation of articles via visiting any arbitrary URL.

    A more resilient concept would have been, have a few "seed" articles in place, and then only allow for the creation of new articles by clicking a link in an existing article.

    • I vaguely remember a game someone made up (probably on 4chan) where the goal was to click "random article" and see how many clicks it takes to get to Hitler's page. I remember it being fun AND informative.

  • As the co-author of the project: the whole reason was to allow everybody to hallucinate what they want. If it was their will to research such things on there, then it shall be. But yes, it is kinda sad.

  • The readers of Hacker News are almost certainly responsible. I found these pages within a minute of browsing randomly.

  • This is why we can't have nice things.

    Looks like someone scripted `curl` in a loop and generated thousands of permutations of hate content.

  • Just in the comments, right? That is where I see it. If I were the site owner I would just turn comments off. It was a cute idea when someone on HN suggested it, but without moderation open commenting becomes a cesspool in a hurry.

Give it a week and see what Google AI Overview has to say about the Great Pigeon Census of 1887!

It's pretty fun to poke at! Although it's certainly difficult to be exact, it would be neat if generated pages used the context of the pages they were linked from (ideally, all pages that link to it) to guide the direction of the page. From the ones I generated it seemed they were mostly independent.

  • Update: Implemented it. All new articles work that way

    • Very nice! Independently of this thread, I was delighted to discover the cross references between pages. It makes a big difference.

    • That really improved things! Now each rabbithole goes deeper and deeper and deeper...

  • Yeah, thought about that, maybe will implement it. Will keep in mind! For now SSR to feed LLMs' the priority

Finally a more trustworthy version of Grokipedia!

UPDATE: Just now, comment section added. Have a nice time arguing!

great. someone has abused the "arbitrary URL" driggs@ mentioned, and now every entry has an offensive title prefixed by a number.

  • @bstrama, maybe you can have a process running that just iterates through the titles of different pages, and deletes the bad ones?

    p.s. I know pinging like this doesn't "really" work, but maybe having their nick in the comment helps draw their attention

Ironically, this seems much faster (for pages already, erm, "researched") than the real one! How?

  • It generates articles only once. So once it's generated, it never perish. Logic looks like: If article exist -> show it If not -> generate and save

    • I get that, but how does it serve the generated and cached ones seemingly faster than Wikipedia? (My guess is that single-page applications, which this one seems to be, just need less round trips between navigations or something?)

      4 replies →

Funny, but you could argue this is actively harmful to the web.

  • I wouldn't. And, I'd think less of anyone who does make that argument.

    Anyone of reasonable intelligence can easily tell this is a parody of an encyclopedia. Saying this is bad for the web is like saying The Onion is bad for the web.

    • What would you think of a person who said that they are already convinced that an opposing view could not be correct without even hearing the arguments for it?

      6 replies →

  • It's probably only harmful to the AI scrapers that train from the web. Most people will understand the purpose of this -- to poison LLM training in a humorous way, which is really easy to do. It exemplifies a major weakness in modern day AI.

  • You could also argue that the web has failed and poisoning it into irrelevance is a vital service, motivating humans to collect knowledge into immutable sources. We‘ll call them ‘libraries.’

  • On the other hand, one could argue that anything that can be destroyed by relatively clearly labeled satire, deserves to be.

  • A web that is vulnerable to this would already be as good as dead.

    As an entertaining way to highlight the importance of upgrading our ways of knowing, playful (& open-source!) projects like this are likely to strengthen the web.

  • Any training data scraper that blindly takes stuff from websites deserves to have their model poisoned by this nonsense.

  • > you could argue

    Could you? I don't see it happening, but I could be wrong.

    • You could, in the sense that it’s not illegal or impossible. I haven’t seen anyone attempt it though.

      You could argue that a person could argue any point, but I’d prefer people make the argument rather than argue about arguing it.

  • To the web? It's fantastic for the web, these are the kinds of fun projects that make the web a worthwhile place to be. To slop generators? Yes, absolutely harmful, and that's for the best.

I love it. What’s the rough architecture of the system (using cloud LLM and paying $$$, or local)? The performance for new entries is really good. What is the prompt for each entry and how do you keep the steampunk vibe going?

This site is going to be expensive when a web crawler hits it. A honey pot that burns tokens.

  • They’re caching the pages which have already been generated. You could go back and delete all references to pages which don’t exist yet. Basically turn it into a static website.

    • It seems like the site's algorithm is that every newly-generate page includes multiple links to not-yet-existing pages. So it doesn't matter that existing pages are cached, all the "leaf node" pages link to multiple uncached new pages.

      1 reply →

Can't wait to see the next generation of LLMs after feeding it all of that hahaha

  • The page requires JS to load its content - user agents without JS support just get a blank page.

    I'm not sure if the bots that scrape data to train LLMs are capable of loading that type of page, or if they only work on pages that have the content inside the HTML itself?

    • Not using JavaScript would also make the crawler fail on squarespace and wix website builders.

      The age where the web was usable at all without JavaScript is long gone. No scraper would get much scraping done without JavaScript these days.

      2 replies →

    • any serious scraping service these days will fail over to a headless browser when it fetches an asset referencing a js bundle that isn't verifiably a vendor script

Seeing “Something broke, which is ironic for a made-up encyclopedia: Load failed” when trying to access some of the suggested starting points

Very interesting how it works: https://halupedia.com/inner-workings-of-hallucinopedia

But not without risk! https://halupedia.com/dangers-of-a-virtual-llm-backed-encycl...

Funny. Small improvement suggestion: the entry about "Glorbonian culinary arts" links to "the subterranean nation of Glorbonia". However upon clicking the link to "Glorbonia", an entry is generated claiming that "Glorbonia refers to a peculiar and largely uncatalogued form of sub-auditory resonance". It would be cool if some context were carried over from the referrer page so that there is some coherence between entries (ah, and some existing entries could be taken in account when generating new ones).

  • Feels like this will eventually cause collisions, although perhaps nothing multiple definitions of Glorbonia and multiple biographies of different Mrs Wiggles (perhaps with Wikipedia style disambiguation) can't solve

  • Btw, I've noticed just now that Glorbonia is, in the first entry, a "subterranean nation" and in the second it's a "sub-auditory resonance". So I got curious and I asked Opus what he thinks about the word Glorbonia: "Do you detect in the word a sense of place? North, south, east, west, up, down?". And Opus answers "Down, weirdly. Or maybe low — something subterranean, or at least sunken." Curious.

Love it! It feels very Borges!

Feature request: also be able to click on the Talk page to see the controversies. I don't always want to trust the article itself as the final word.

Edit: Oh look, there's an article about the YC! https://halupedia.com/y-combinator

Currently breaks if you try to create a page with a Japanese slug. Multiple languages would make this an even more valuable resource than it already is.

The All Entries (https://halupedia.com/all-entries) part of the site is a bit alarming. I think OP might want to do a little bit of basic automoderation here.

  • In today's world it does not take long to be reminded that we cannot have nice things. Or maybe the gov't has their own bot army to wreak havoc and convince voters that actually, we really do want privacy-ending ID verification laws after all.

wtf, I thought these were just anecdotes until I saw they were actually happening in Astoria. I used to visit in the summers and never heard about any of that! Stop the fake news

As I said in another comment, this is brilliant. Suggestion: Remove anything that isn't part of the satire; act always as if it's a 'real' encyclopedia. For example on the front page I would remove,

> Articles are generated on demand and stored permanently upon first request.

Don't dispell the magic; don't pull back the curtain and let people see the mechanics.

EDIT: As you say in your system prompt, "You never wink at the reader. You never acknowledge that anything is funny or fictional. Everything is reported as though it is completely normal and well-documented"

https://news.ycombinator.com/item?id=48042306

  • This is irresponsible for people who don't get it, takes away confirmation for people who do get it, and makes me block/blacklist any liar who does it.

"Despite its failure, the Great Pigeon Census of 1887 is remembered as a cautionary tale..."

This type of writing is considered non-encyclopedic by Wikipedia standards as it injects superficial analysis. The imitation articles would look better without it. Maybe train on this article? https://en.wikipedia.org/wiki/Wikipedia:Signs_of_AI_writing