Comment by ripped_britches
4 months ago
Please comment under this thread if you have actually tried this and can compare it to another tool like Cursor, Codex, raw Claude, etc.
I’m super not interested in hearing what people have to say from a distance without actually using it.
FWIW, finished an eval of claude code against various tasks that amplifier works well on:
The agent demonstrated strong architectural and organizational capabilities but suffered from critical implementation gaps across all three analyzed tasks. The primary pattern observed is a "scaffold without substance" failure mode, where the agent produces well-structured, well-documented code frameworks that either don't work at all or produce placeholder outputs instead of real functionality. Of the three tasks analyzed, two failed due to placeholder/mock implementations (Cross-Repo Improvement Tool, Email Drafting Tool), and one failed due to insufficient verification of factual claims (GDPVAL Extraction). The common thread is a lack of validation and testing before delivery, combined with a tendency to prioritize architecture over functional implementation.
I've tried it. It works better than raw Claude. We're working on benchmarks now. But... it's a moving target as amplifier (an experimental project) is evolving rapidly.