Comment by dewey
1 day ago
Building your AI agent "toolkit" is becoming the equivalent of the perfect "productivity" setup where you spend your time reading blog posts, watching YouTube videos telling you how to be productive and creating habits and rituals...only to be overtaken by a person with a simple paper list of tasks that they work through.
Plain Claude, ask it to write a plan, review plan, then tell it to execute still works the best in my experience.
Lots of money being made by luring people into this trap.
The reality is that if you actually know what you want, and can communicate it well (where the productivity app can be helpful), then you can do a lot with AI.
My experience is that most people don't actually know what they want. Or they don't understand what goes into what they want. Asking for a plan is a shortcut to gaining that understanding.
This is why the grill me skill went viral - https://github.com/mattpocock/skills/blob/main/grill-me/SKIL...
I asked Claude whether these elaborate words like "walk down the design tree" actually mean anything to the LLM and make a difference. The answer confirmed my gut feeling: You can just tell me to "be critical" and get mostly the same results. Matt did incredible work teaching people TS, but this feels more like trying to create FOMO to sell snake oil and AI courses.
4 replies →
Problem is they don’t know how to express themselves and many people, especially those interested in tech, don’t want to learn.
I can’t tell you how many times I have a CS student in my office for advising and they tell me they only want to take technical courses, because anything reading or writing or psychology or history based is “soft”, unrelated to their major, and a waste of their time.
I’ve spent years telling them critical reading and expressive writing skills are very important to being a functioning adult, but they insist what they need to know can only be found in the Engineering college.
Much of my time at work is reading through quickly typed messages from my boss and understanding exactly what questions I need to ask in order to make it easy for him to answer clearly.
Engineers who lack soft skills cannot be effective in team environments.
2 replies →
Or, as I like to put it: I need to activate my personal transformers on my inner embeddings space to figure what is it I really want. And still, quite often, I think in terms of the programming language I'm used to and the library I'm familiar with.
So, to really create something new that I care about, LLMs don't help much.
They are still useful for plenty of other tasks.
Bikeshedding seems to have shifted from code to LLMs which is a step further.
We used to have the very difficult task of producing working scalable maintainable code describing complex systems which do what we need them to do.
Now on top of it we have the difficult task of producing this code using constantly mutating complex nondeterministic systems.
We are the circus bear riding a bicycle on a high wire now being asked to also spin plates and juggle chainsaws.
Maybe singularity means that time sunk into managing LLMs is equal to time needed to manually code similar output in assembly or punch cards.
its not though if you're working in a massive codebase or on a distributed system that has many interconnected parts.
skills that teach the agent how to pipe data, build requests, trace them through a system and datasources, then update code based on those results are a step function improvement in development.
ai has fundamentally changed how productive i am working on a 10m line codebase, and i'd guess less than 5% of that is due to code gen thats intended to go to prod. Nearly all of it is the ability to rapidly build tools and toolchains to test and verify what i'm doing.
But... plain Claude does that. At least for my codebase, which is nowhere close to your 10m line. But we do processing on lots of data (~100TB) and Claude definitely builds one-off tools and scripts to analyze it, which works pretty great in my experience.
What sort of skills are you referring to?
I think people are looking at skills the wrong way. It's not like it gives it some kind of superpowers it couldn't do otherwise. Ideally you'll have Claude write the skills anyway. It's just a shortcut so you don't have to keep rewriting a prompt all over again and/or have Claude keep figuring out how to do the same thing repeatedly. You can save lots of time, tokens and manual guidance by having well thought skills. Some people use these to "larp" some kind of different job roles etc and I don't think that's productive use of skills unless the prompts are truly exceptional.
4 replies →
If you build up and save some of those scripts, skills help Claude remember how and when to use them.
Skills are crazy useful to tell Claude how to debug your particular project, especially when you have a library of useful scripts for doing so.
[dead]
Even the most complex distributed systems can be understood with the context windows we have. Short of 1M+ loc, and even then you could use documentation to get a more succinct view of the whole thing.
This really doesn’t pan out in practice if you work a lot with these models
And also we know why: effective context depends on inout and task complexity. Our best guess right now is that we are often between 100k to 200k effective context length for frontier, 1m NIHS type models
1 reply →
I consider it more like people installing oh-my-zsh or whatever that brings a TON of features they'll never use, just because some cool tech influencer said it's cool to use it.
The proper way to do this is find a personal pain point, figure out how to fix it, fix it, and then continue.
That's how I built my own system, zero skills, just a git submodule with shared guides how to do stuff the way _I_ like it. I can just refer any agent to read that directory and they'll usually get it on the first go.
All I want is for my agent to save me time, and to become a _compounding_ multiplier for my output. As a PM, I mostly want to use it for demos and prototypes and ideation. And I need it to work with my fractured attention span and saturated meeting schedule, so compounding is critical.
I’m still new to this, but the first obvious inefficiency I see is that I’m repeating context between sessions, copying .md files around, and generally not gaining any efficiency between each interaction. My only priority right now is to eliminate this repetition so I can free up buffer space for the next repetition to be eliminated. And I don’t want to put any effort into this.
How are you guys organizing this sort of compounding context bank? I’m talking about basic information like “this is my job, these are the products I own, here’s the most recent docs about them, here’s how you use them, etc.” I would love to point it to a few public docs sites and be done, but that’s not the reality of PM work on relatively new/instable products. I’ve got all sorts of docs, some duplicated, some outdated, some seemingly important but actually totally wrong… I can’t just point the agent at my whole Drive and ask it to understand me.
Should I tell my agent to create or update a Skill file every time I find myself repeating the same context more than twice? Should I put the effort into gathering all the best quality docs into a single Drive folder and point it there? Should I make some hooks to update these files when new context appears?
It's too early. People are trying all of the above. I use all of the above, specifically:
- A well-structured folder of markdown files that I constantly garden. Every sub-folder has a README. Every files has metadata in front-matter. I point new sessions at the entry point to this documentation. Constantly run agents that clean up dead references, update out of date information, etc. Build scripts that deterministically find broken links. It's an ongoing battle.
- A "continuation prompt" skill, that prompts the agent to collect all relevant context for another agent to continue
- Judicious usage of "memory"
- Structured systems made out of skills like GSD (Get Shit Done)
- Systems of "quality gate" hooks and test harnesses
For all of these, I have the agent set them up and manage them, but I've yet to find a context-management system that just works. I don't think we understand the "physics" of context management yet.
On your first point, one unexpected side effect I’ve noticed is that in an effort to offload my thinking to an agent, I often end up just doing the thinking myself. It’s a surprisingly effective antidote to writer’s block… a similar effect to journaling, and a good reason why people feel weird about sharing their prompts.
The best thing you can do is help build and maintain high quality docs.
Great docs help you, your agents, your team and your customers.
If you’re confused and the agent can’t figure it out reliably how can anyone?
Easier said than done of course. And harder now than ever if the products are rapidly changing from agentic coding too.
One of my only universal AGENTS.md rules is:
> Write the pull request title and description as customer facing release notes.
I’ve been thinking about this a lot. It’s obviously the ideal state of things. The challenge is that we’ve got existing docs frameworks and teams and inertia and unreleased features… and I don’t have time to wait for that when I’m trying to get something done today. Not to mention the trade off of writing in public vs. private.
One quick win I’ve thought could bridge this is updating our docs site to respond to `Accept: text/markdown` requests with the markdown version of the docs.
Sounds like you need OpenClaw's assistance.
Let me give you a counterexample. I'm working on a product for the national market, and i need to do all financial tasks, invoicing, submit to national fiscal databse etc. through a local accounting firm. So i integrate their API in the backend; this is a 100% custom API developed by this small european firm, with a few dozen restful enpoints supporting various accounting operations, and I need to use it programmatically to maintain sync for legal compliance. No LLM ever heard of it. It has a few hundred KB of HTML documentation that Claude can ingest perfectly fine and generate a curl command for, but i don't want to blow my token use and context on every interaction.
So I naturally felt the need to (tell Claude to) build a MCP for this accounting API, and now I ask it to do accounting tasks, and then it just does them. It's really ducking sweet.
Another thing I did was, after a particularly grueling accounting month close out, I've told Claude to extract the general tasks that we accomplished, and build a skill that does it at the end of the month, and now it's like having a junior accountant in at my disposal - it just DOES the things a professional would charge me thousands for.
So both custom project MCPs and skills are super useful in my experience.
That's what you should be doing. Start from plain Claude, then add on to it for your specific use cases where needed. Skills are fantastic if used this way. The problem is people adding hundreds or thousands of skills that they download and will never use, but just bloat the entire system and drown out a useful system.
Sure, it's basic use and nothing to flex about - was just responding specifically to the line that plan-review-implement is all you need.
Though, you get such a huge bang from customizing your config that I can easily see how you could go down that slippery slope.
this is exactly how i use it too. i have a few custom MCP servers running on a mac mini homelab, one for permission management, one for infra gateway stuff. the key thing i learned is keeping CLAUDE.md updated with what each MCP server actually does and what inputs it expects. otherwise claude code will either not use the tool when it should, or call it with wrong params and waste a bunch of back and forth. once you document it properly it really does feel like having a team member who just knows how your stack works. the accounting use case is a great example because nobody else's generic tooling would ever cover that.
Your use is maybe more vanilla than you think. I think you are just getting shit done. Which is good.
Claude and an mcp and skill is plain to me. Writing your own agent connecting to LLMs to try to be better than Claude code, using Ralph loops and so on is the rabbit hole.
What exactly does it do that a professional would charge you thousands for?
(I'm genuinely asking)
The basic problem is that the reporting and accounting rules are double plus bureaucratic and you need to have on hand multiple registers that show the financial situation at any time, submit them to the tax authority etc.
To give you a small taste: you need to issue an electronic invoice for each unique customer, and submit on the fly the tax authority - but these need to correlated monthly with the money in your business bank account. The paid invoices don't just go into your bank account, they are disbursed from time to time by the payment processor, on random dates that don't sync with the accounting month, so at end of month you have to have correlate precisely what invoice is paid or not. But wait, the card processor won't just send you the money in a lump sum, it will deduct from each payment some random fee that is determined by their internal formula, then, at the end of each month, add all those deducted fees (even for payments that have not been paid to you) and issue another invoice to you, which you need to account for in you books as being partially paid each month (from the fees deducted from payments already disbursed). You also have other payment channels, each with their fees etc. So I need to balance this whole overlapping intervals mess with all sort of edge cases, chargebacks and manual interventions I refuse to think about again.
This is one example, but there are also issues with wages and their taxation, random tax law changes in the middle of the month etc. The accountant can of course solve all this for you, but once you go a few hundred invoices per month (if you sell relatively cheap services) you are considered a "medium" business, so instead of paying for basic accounting services less than 100€ per month (have the certified accountant look over your books and sign them, as required by law), you will need more expensive packages which definitely add up to thousands in a few months.
Go be an entrepreneur, they said.
Agree. For what it’s worth, in interviews Cherny (Claude Code creator) and Steinberger (OpenClaw creator) say they keep things simple and use none of the workflow frameworks. The latter even said he doesn’t even use plan mode, but I find that very useful: exiting plan mode starts clean with compressed context.
They backed out the “clear context and execute plan” thing recently. It’s a bummer, I thought it was great.
Maybe they figured it wasn't need with 1M context?
1 reply →
This resonates with me. Sometimes I build up some artifacts within the context of a task, but these almost always get thrown away. There are primarily three reason I prefer a vanilla setup.
1. I have many and sometimes contradictory workflows: exploration, prototyping, bug fixing debugging, feature work, pr management, etc. When I'm prototyping, I want reward hacking, I don't care about tests or lint's, and it's the exact opposite when I manage prs.
2. I see hard to explain and quantify problems with over configuration. The quality goes down, it loses track faster, it gets caught in loops. This is totally anecdotal, but I've seen it across a number of projects. My hypothesis is that is related to attention, specifically since these get added to the system prompt, they pull the distribution by constantly being attended to.
3. The models keep getting better. Similar to 2, sometime model gains are canceled out by previously necessary instructions. I hear the anthropic folks clear their claude.md every 30 days or so to alleviate this.
[flagged]
> Plain Claude, ask it to write a plan, review plan, then tell it to execute still works the best in my experience.
Working on an unspecified codebase of unknown size using unconfigured tooling with unstated goals found that less configuration worked better than more.
Emacs init file bikeshedding comes to mind…
but now you can build your AI agent toolkit to work on your init file for you
My init.el file went from some 300 lines to under 50 with Claude's assistance. Some of that had to do with updating Emacs, but I really only use Emacs for Org mode so that contribution was minimal.
I've put some stuff in my global Claude.md to avoid things like...
* Claude trying to install packages into my Python system interpreter - (always use uv and venvs)
* Claude pushing to main - (don't push to main ever)
* When creating a PR, completely ignoring how to contribute (always read CONTRIBUTING.md when creating a PR)
* Yellow ANSI text in console output - (Color choices must be visible on both dark and light backgrounds)
Because I got sick of repeating myself about the basics.
Mine has "always run `task build` before claiming a task as done"
For all of my projects task build runs linters, tests and builds the project with as little output as possible on a happy path.
This catches a bunch of "it's a pre-existing issue" stuff from Claude. Sometimes I ask it to run build first, then start implementing just so that it can prove to itself that no, it wasn't a pre-existing issue, you broke something.
at work i've spent some time setting up our claude.md files and curated the .claude directory with relevant tools such as linear, figma, sentry, LSP, browser testing. sensible stuff anyone using these tools would want, it all works pretty well.
my only machine-specific config is overriding haiku usage with sonnet in claude code. i outline what i want in linear, have claude synthesize into a plan and we iterate until we're both happy, then i let it rip. works great.
then one of my juniors goes and loads up things like "superpowers" and all sorts of stuff that's started littering his PRs. i'm just not convinced this ricing of agents materially improves anything.
This is what I do; frankly I can't be arsed to take the time to write all these commands and skills and whatnot. I did use /init to get Claude to create a CLAUDE.md file, and I occasionally -- very occasionally -- go through it and correct anything that's no longer valid due to code changes (and then ask Claude to do the same).
But beyond that, I just ask it what I want it to ask, and that's it. I'm not convinced that putting more time into building the "toolbox" will actually give me significant returns on that time.
I do think that some of this (commands, skills, breaking up CLAUDE.md into separate rules files) can be useful, but it's highly context-dependent, and I think YAGNI applies here: don't front-load this work. Only set those up if you run into specific problems or situations where you think doing this work will make Claude work better.
Understandable - I find skills for odd duck things and a simple set of rules you routinely prune work the best for me. Went from crappy code in niche projects to it nailing things first prompt almost every time now.
I heavily advocate for rawdogging AI agents.
All the fancy frameworks are vibe coded, so why could they do better than something you do by yourself?
At most get playwright MCP in so the agent can see the rendered output
This. At work I have described this phenomenon as the equivalent of tinkering with the margins and fonts in your word processor instead of just writing your paper.
I've had the same thought recently and this definitely is a thing that you can do - but there are also cases where you get dramatically better results if you put some more effort into your setup.
e.g. spend time creating a skill about how to query production logs
if you work on platforms, frameworks, tools that are public knowledge, then yeah. If there’s nothing unique to your project or how to write code in it, build it, deploy it, operate it, yeah.
But for some projects there will be things Claude doesn’t know about, or things that you repeatedly want done a specific way and don’t want to type it in every prompt.
Reminds me of the days spent optimizing Jira
example https://news.ycombinator.com/item?id=47501214
[dead]