← Back to context

Comment by spmurrayzzz

1 month ago

This has also been an interesting social experiment in that we get to see what work people think is actually impressive vs trivial.

Folks who have spent years effectively snapping together other people’s APIs like LEGOs (and being well-compensated for it) are understandably blown away by the current state of AI. Compare that to someone writing embedded firmware for device microcontrollers, who would understandably be underwhelmed by the same.

The gap in reactions says more about the nature of the work than it does about the tools themselves.

>Compare that to someone writing embedded firmware for device microcontrollers, who would understandably be underwhelmed by the same.

One datum for you: I recently asked Claude to make a jerk-limited and jerk-derivative-limited motion planner and to use the existing trapezoidal planner as reference for fuzzy-testing various moves (to ensure total pulses sent was correct) and it totally worked. Only a few rounds of guidance to get it to where I wanted to commit it.

  • My comment above I hope wasn't read to mean "LLMs are only good at web dev." Only that there are different capability magnitudes.

    I often do experiments where I will clone one of our private repos, revert a commit, trash the .git path, and then see if any of the models/agents can re-apply the commit after N iterations. I record the pass@k score and compare between model generations over time.

    In one of those recent experiments, I saw gpt-oss-120b add API support to swap tx and rx IQ for digital spectral inversion at higher frequencies on our wireless devices. This is for a proprietary IC running a quantenna radio, the SDK of which is very likely not in-distribution. It was moderately impressive to me in part because just writing the IQ swap registers had a negative effect on performance, but the model found that swapping the order of the IQ imbalance coefficients fixed the performance degradation.

    I wouldn't say this was the same level of "impressive" as what the hype demands, but I remain an enthusiastic user of AI tooling due to somewhat regular moments like that. Especially when it involves open weight models of a low-to-moderate param count. My original point though is that those moments are far more common in web dev than they are elsewhere currently.

    EDIT: Forgot to add that the model also did some work that the original commit did not. It removed code paths that were clobbering the rx IQ swap register and instead changed it to explicitly initialize during baseband init so it would come up correct on boot.

    • Ah yes the magic is more developed for commonly documented cases than niche stuff, 100% sorry I misinterpreted your post to mean that they are not useful for embedded rather than less capable for embedded. Also, your stuff is way more deep than anything I am doing (motion planning stuff is pretty well discussed online literature).

This is not true. You can see people who are much older and built a lot of the "internet scale" equally excited about it, e.g: freebsd OG developers, Steve himself (who wrote gas town) etc.

In fact, I would say I've seen more people who are "OG Coders" excited (and in their >50s) then mid generation

  • I think you're shadow-boxing with a point I never made. I never said experienced devs are not or can not be excited about current AI capabilities.

    Lots of experienced devs who work in more difficult domains are excited about AI. In fact, I am one of them (see one of my responses in this thread about gpt-oss being able to work on proprietary RF firmware in my company [1]).

    But that in no way suggests that there isn't a gap in what impresses or surprises engineers across any set of domains. Antirez is probably one of the better, more reasoned examples of this.

    [1] https://news.ycombinator.com/item?id=46682604

I think this says a lot about yourself and where your prejudices and preferences lie.

  • Preferences I think I get, but prejudices?

    The OED defines prejudice as a "preconceived opinion that is not based on reason or actual experience."

    My day to day work involves: full stack web dev, distributed systems, embedded systems, and machine learning. In addition to using AI tooling for dev tasks, we also use agents in production for various workflows and we also train/finetune models (some LLMs, but also other types of neural networks for anomaly detection, fault localization, time series forecasting, etc). I am basing my original commentary in this thread on all of that cumulative experience.

    It has been my observation over the last almost 30 years of being a professional SWE that full stack web dev has been much easier and simpler than the other domains I work in. And even further, I find that models are much better at that domain on average than the other domains, measured by pass@k scores on private evals representing each domain. Anecdotal experience also tends to match the evals.

    This tracks with all the other information we have pertaining to benchmark saturation, the "we need harder evals" crowd has been ringing this bell for the last 8-12 months. Models are getting very good at the less complex tasks.

    I don't believe it will remain that way forever, but at present its far more common to see someone one shot a full stack web app from a single prompt than something like kernel driver for a NIC. One class of devs is seeing a massive performance jump, another class is not.

    I don't see how that can be perceived as prejudice, it just may be an opinion you don't agree with or an observation that doesn't match your own experience (both of which are totally valid and understandable).