← Back to context

Comment by timr

5 days ago

In this case, the visual display was fine -- I was instructing it to fix bad code from a previous round that happened to deliver the right results.

Like I said, this is just an example that happens to be CSS. I see this stuff daily, if not hourly.

That's interesting. As I said I haven't tried using LLMs at this level, although I'm about to embark on some this week.

What I've found helps (at least at the other layers) is to have principles documents and standards documents for the AI to reference when it's modifying code. Principles documents describe the why, and standards documents describe the how.

So for example a few parts from my initial CSS-standards.md (still needs a lot of revision):

    ## Utility-first discipline

    **Raw utilities everywhere by default. Never `@apply` for "components."** `@apply` exists only for
    true low-level primitives that can't live in a template (e.g., `prose` overrides, embedded
    third-party widget shells).

    Wathan's stated position: extract only on "worrisome duplication." The Tailwind team explicitly
    describes `@apply` as a tool you reach for after first reaching for templates. **Premature CSS
    abstraction is the failure mode.**

    ## Spacing

    Use only the default scale (`0, 0.5, 1, 1.5, 2, 3, 4, 6, 8, 12, 16, 24…`). **Never `p-[13px]`.** If
    you need a value, change the scale in `@theme`:

    ```css
    @theme {
      --spacing: 0.25rem;
    }
    ```

    v4 uses a single `--spacing` multiplier; everything derives from it.

    ## Anti-patterns (banned)

    - **`!` important prefix** (`!bg-red-500`). Fix specificity properly.
    - **Arbitrary values for colour** (`bg-[#1da1f2]`). Define in `@theme`.
    - **Arbitrary pixel offsets** as default (`top-[3px]`). Use the spacing scale. Tolerated only as
      rare one-offs.
    - **Nested custom CSS more than one level deep.**
    - **`@apply` for any class that wraps fewer than ~5 utilities** or appears in fewer than ~3
      templates.
    - **Dynamic class string interpolation** (`text-${level}-500`) — purger can't see these.
    - **Custom breakpoints in v1.**
    - **Inline `<style>` blocks.** All CSS goes through `assets/css/app.css`.

  • Yeah, I have those, but it's still pretty hit and miss, and obviously, it ends up being a game of whack-a-mole for everything I find.

    I don't mean to over-state the importance of these little errors, just to say that agents do plenty of dumb stuff, even today, and the people who say otherwise are selling something or (hot take incoming) some combination of stupid, lazy and/or delusional.

Great example.

Just IME, the quality of the prompt often significantly affects whether it does bad stuff like your example. It's not easy by any stretch and I'm still getting there, but I'm up to a couple dozen or so "Agent Instructions" in my CLAUDE.md files for various projects that have to say things like: "when doing TDD, don't write tests to verify bug fixes in tests" because the agent is really good at following things literally. I am sure it will continue to improve, but until then every project needs some bandaid things like that.