← Back to context

Comment by fc417fc802

16 hours ago

> What are nerve agents and how do they work (for a layman)?

On the one hand I can appreciate the wisdom of not serving up certain easily abused knowledge on a silver platter. On the other, that prompt (and far worse) is more or less directly answered by Wikipedia's summary of the subject at which point what purpose could the refusal possibly serve?

Perhaps Wikipedia shouldn't list off the precise chemical compositions of various hand grenades as well as various synthesis methods for each of the related compounds but given that we inhabit a world where it does perhaps a more fruitful approach would be to flag conversations that go in a certain direction and then just keep an (automated) eye on things?

Maybe the difference is that just reading Wikipedia only help you part of the way. While an LLM could help you step by step (e2e) producing a functional weapon. And setting a more complex rule where claude tells you some things about this and not other is probably a lot more work for little gain?

But I have no idea. Just guessing here.

  • I thought that these models are supposed to be vastly smarter than what’s needed to discern between "general information trivially available on Wikipedia" and "actionable synthesis instructions".

    • An LLM could probably make that distinction clearly.

      a commercial LLM provider training their own models is however likely to bias the model(/guardrail) harder, in an effort to make them harder to jailbreak, to minimize bad press.

      For example:

      - refusing to talk even about the well-known parts of forbidden topics (this) - tending toward sycophancy to avoid ever seeming rude or unhelpful

      2 replies →

  • That query would not more provide actionable guidance than ‘tell me how a nuclear weapon works (for a layman)’. Aka not at all.

    • I believe a sufficiently advanced model could provide a layman with actionable step by step instructions for building a nuclear weapon. They're complicated but not (AFAIK) that complicated. The more or less insurmountable barrier there is weapons grade material. Thankfully refinement is prohibitive in cost, expertise, and equipment.

      In comparison, basic munitions are incredibly simple given a recipe and shop tooling. But just because something is conceptually simple doesn't mean it's a good idea to go out of the way to disseminate step by step instructions.

      7 replies →