Comment by tomcam
2 days ago
What bothers me more than any of this particular discussion is that we seem to be incapable of determining programmer productivity in a meaningful way since my debut as a programmer 40 years ago.
2 days ago
What bothers me more than any of this particular discussion is that we seem to be incapable of determining programmer productivity in a meaningful way since my debut as a programmer 40 years ago.
I’m confused as to why anyone would think this would be possible to determine.
Like can we determine the productivity of doctors, lawyers, journalists, or pastry chefs?
What job out there is so simple that we can meaningfully measure all the positive and negative effects of the worker as well as account for different conditions between workers.
I could probably get behind the idea that you could measure productivity for professional poker players (given a long enough evaluation period). Hard to think of much else.
People in charge love to measure productivity and, just as harmfully, performance. The main insight people running large organisations (big business and governments) have into how they are doing is metrics, so they will use what measures they can have regardless of how meaningful they are.
The British government (probably not any worse than anyone else, just what I am most familiar with) does measure the productivity of the NHS: https://www.england.nhs.uk/long-read/nhs-productivity/ (including doctors, obviously).
They also try to measure the performance of teachers and schools and introduced performance league tables and special exams (SATS - exams sat at various ages school children in the state system, nothing like the American exams with the same name) to do this more pervasively. They made it better by creating multi-academy trusts which adds a layer of management running multi-schools so even more people want even more metrics.
The same for police, and pretty much everything else.
Won't stop MBAs from trying though.
Yet paradoxically, the user knows instinctively. I know exactly when I'll get my next medical checkup, and when the test results will arrive. I know if a software app improves my work, and what it will cost to get a paid license.
The hard thing is occupations where the quantity of effort is unrelated to the result due to the vast number of confounding factors.
Theres an interesting recent study showing how poorly developers measure their own productivity:
https://secondthoughts.ai/p/ai-coding-slowdown
HN discussion: https://news.ycombinator.com/item?id=44526912
We can determine the productivity of factory workers, and that is still(!) how we are seen by some managers.
And to be fair, some crud work is repetitive enough so it should be possible to get a fair measure of at least the difference in speed between developers.
But that building simple crud services with rest interfaces takes as much time as it does is a failure of the tools we use.
Duly upvoted! I tend to agree. Yet the shibboleth of productivity haunts us still.
> Like can we determine the productivity of doctors, lawyers, journalists, or pastry chefs?
Yes, yes we can.
Programmers really need to stop this cope about us being such special snowflakes that we can't be assessed and that our maangers just need to take that we're worth keeping around on good faith.
News to me. How do you determine the productivity of a doctor? Patients seen? Patients cured? (for real, where did you get that data?) Number of medicines prescribed? Procedures performed? Does a triple bypass surgery count the same a pap smear? Hours worked? Amount of help they provided to colleagues? Easy to come up with another 100 other metrics that might be worth looking out. How are they all weighted?
Like I get that in SWE (like all other fields), managers have to make judgement calls and try to evaluate which reports contribute the most, but the GP post seemed surprised that this wasn't a solved problem by now, which just seems incomprehensible to me.
7 replies →
> Yes, yes we can.
Of course we can. But can we do it in a meaningful way, such that the metric itself doesn't become a subject to optimization?
"When a measure becomes a target, it ceases to be a good measure"
4 replies →
> Yes, yes we can.
Could you make an effort to explain how, or at the very least link to some reasoning? Otherwise your comment is basically the equivalent of “nuh-uh”, which doesn’t meaningfully contribute to the discussion.
> Programmers really need to stop this cope about us being such special snowflakes
Which is not at all what is happening in your parent comment. On the contrary, they’re putting developers on even footing with other professions.
2 replies →
"Software engineers can be qualitatively assessed for the purposes of pay and promotion" and "software engineers can have their productivity measured and quantified" are two very different things.
So what methods do you use?
[dead]
Part of that, may be what we measure “product” to be.
My entire life, I have written “ship” software. It’s been pretty easy to say what my “product” is.
But I have also worked at a fairly small scale, in very small teams (often, only me). I was paid to manage a team, but it was a fairly small team, with highly measurable output. Personally, I have been writing software as free, open-source stuff, and it was easy to measure.
Some time ago, someone posted a story about how most software engineers have hardly ever actually shipped anything. I can’t even imagine that. I would find that incredibly depressing.
It would also make productivity pretty hard to measure. If I spent six months, working on something that never made it out of the cręche, would that mean all my work was for nothing?
Also, really experienced engineers write a lot less code (that does a lot more). They may spend four hours, writing a highly efficient 20-line method, while a less-experienced engineer might write a passable 100-line method in a couple of hours. The experienced engineers’ work might be “one and done,” never needing revision, while the less-experienced engineer’s work is a slow bug farm (loaded with million-dollar security vulnerability tech debt), which means that the productivity is actually deferred, for the more experienced engineer. Their manager may like the less-experienced engineer's work, because they make a lot more noise, doing it, are "faster," and give MOAR LINES. The "down-the-road" tech debt is of no concern to the manager.
I worked for a company that held the engineer Accountable, even if the issue appears, two years after shipping. It encouraged engineers to do their homework, and each team had a dedicated testing section, to ensure that they didn't ship bugs.
When I ask ChatGPT (for example) for a code solution, I find that it’s usually quite “naive” (pretty prolix). I usually end up rewriting it. That doesn’t mean that’s a bad thing, though. It gives me a useful “starting point,” and can save me several hours of experimenting.
> When I ask ChatGPT (for example) for a code solution, I find that it’s usually quite “naive” (pretty prolix). I usually end up rewriting it. That doesn’t mean that’s a bad thing, though. It gives me a useful “starting point,” and can save me several hours of experimenting.
The usual counter-point is that if you (commonly) write code by experimenting, you are doing it wrong. Better think the problem through, and then write decent code (that you finally turn into great code). If the code that you start with is that as "naive" as you describe, in my experience it is nearly always better to throw it away (you can't make gold out of shit) and completely start over, i.e. think the problem through and then write decent code.
BTW: I guess I should be a bit more forthcoming about the way that I work. I know that we are always looking to ding others for not working the way that we do, but I find my way works for me, quite well. I won't tell other people that they are wrong, unless they are working for me. I am constantly learning new techniques and approaches, by staying open to, and observing, how others do things. I learn from the examples set forth by others; even ones that do things in a way that I may initially disapprove of.
"Experimenting" is a vital part of my process. I call it "Evolutionary Design,"[0] and it involves a lot of iteration. I have found that it's vital to UI[1], because I can almost never predict how UI will act, when actually presented to the user. The same goes for a lot of communication workflows. I have to "run it up the flagpole, and see who salutes." I almost always find that my theorized approach has issues, and I need to make changes. The old "Measure twice; cut once" approach to software development has caused me great trouble, over the years, and I have found that I need to adjust to new tools, and new contexts.
For example, right now, I am revamping one of my UI widgets[2]. It started as a minor tweak for iOS26, but I realized that it's a bit "long in the tooth," and that I can make it more robust, simple, and usable. I have been running the test harness all morning, seeing issues, and going back to the code, and tweaking.
[0] https://littlegreenviper.com/evolutionary-design-specificati...
[1] https://littlegreenviper.com/the-road-most-traveled-by/
[2] https://github.com/RiftValleySoftware/RVS_Checkbox
That’s often what I do. It saves me the “blind alleys.”
I find they often cause more trouble than they are worth, because they are completely wrong, and need to be “unlearned.”
But nevertheless, productivity objectively exists. Some people/teams are more productive as others.
I suppose it would be simpler to compare productivity for people working on standard, "normalized" tasks, but often every other task a programmer is assigned is something different to the previous one, and different developers get different tasks.
It's difficult to measure productivity based on real-world work, but we can create an artificial experiment: give N programmers the same M "normal", everyday tasks and observe whether those using AI tools complete them more quickly.
This is somewhat similar to athletic competitions — artificial in nature, yet widely accepted as a way to compare runners’ performance.
[dead]
We can determine productivity for the purpose of studies like this. Give a bunch of developers the exact same task and measure how quickly they can produce a defect-free solution. Unfortunately, this study didn’t do that – the developers chose their own tasks.
Is there any AI that can create a defect-free solution to non-trivial programming problems without supervision? This has never worked in any of my tests, so I suspect the answer is currently No.
Team members always know who is productive and who isn’t, but generally don’t snitch to the management because it will be used against them or cause conflicts with colleagues. This team-level productivity doesn’t necessarily translate into something positive for a company.
Management is forced to rely on various metrics which are gamed or inaccurate.
what about the $ you make? isn't that an indicator? you've probably made more than me, so you are more successful while both of us might be doing the same thing.
Productivity has zero to do with salary. Case in point: FOSS.
Some of the most productive devs don't get paid by the big corps who make use of their open source projects, hence the constant urging of corps and people to sponsor projects they make money via.
I don't think there's much of a correlation there.
A better measure would be how much work has been made unnecessary.
The asymptote approached by software engineering is GDP=$0 because all problems are solved by maximally efficient automation. Never gonna happen, but progress on that path is a decent proxy for efficiency.
Too often the job is about introducing problems that are good for some company's bottom line, but that's the opposite of efficiency.
Salary is an indirect and partially useful metric, but one could argue that your ability to self-promote matters more, at least in the USA. I worked at Microsoft and saw that some of the people who made fat stacks of cash, just happened to be at the right place in the right time, or promoted things that looked good, but we’re not good for the company itself.
I made great money running my own businesses, but the vast majority of the programming was by people I hired. I’m a decent talent, but that gave me the ability to hire better ones than me.
what about the $ you generate? im a software developer consultant. we charge by the hour. up front, time and materials, and/or support hours. not too many leaps of logic to see there is a downside to completing a task too quickly or too well
i have to bill my clients and have documented around 3 weeks of development time saved by using LLMs to port other client systems to our system since December. on one hand this means we should probably update our cost estimates, but im not management so for the time ive decided to use the saved time to overdeliver on quality
eventually clients might get wise and not want to overdeliver on quality and we would charge less according to time saved by LLMs. despite a measured increase in "productivity" i would be generating less $ because my overall billable hour % decreases
hopefully overdelivering now reduces tech debt to reduce overhead and introduces new features which can increase our client pipeline to offset the eventual shift in what we charge our clients. thats about all the agency i can find in this situation
It is from a certain point of view. For example at a national level productivity is measured in GDP per hour worked. Even this is problematic - it means you increase productivity by reducing working hours or making low paid workers unemployed.
ON the other hand it makes no sense from some points of view. For example, if you get a pay rise that does not mean you are more productive.
Yeah it only works at a very high level, but from there it's a pretty good measure. Like it's basically "what are the values of the inputs vs. the outputs", which is dead simple. At any lower level there are lots of confounding variables you have to contend with.
In a vacuum I don’t believe pay alone is a very good indicator. What might be a better one is if someone has a history across their career of delivering working products to spec, doing this across companies and with increasing responsibility. This of course can only be determined after the fact.
Probably not, I took a new job at a significantly reduced pay because it makes me feel better and reduced stress. That fact that I can allow myself to work for less seems to me like I'm more successful.
People doing charity work, work for non-profits or work for public benefit corporations typically have vastly lower wages than those who work in e.g high frequency trading or other capital-adjacent industries. Are you comfortable declaring that the former is always vastly less productive than the latter?
Changing jobs typically brings a higher salary than your previous job. Are you saying that I'm significantly more productive right after changing jobs than right before?
I recently moved from being employed by a company to do software development, to running my own software development company and doing consulting work for others. I can now put in significantly fewer hours, doing the same kind of work (sometimes even on the same projects that I worked on before), and make more money. Am I now significantly more productive? I don't feel more productive, I just learned to charge more for my time.
IMO, your suggestion falls on its own ridiculousness.
Is DB2 Admin more productive than Java dev on the same seniority?
What about countries? In my Poland $25k would be an amazing salary for a senior while in USA fresh grads can earn $80k. Are they more productive?
... at the same time, given same seniority, job and location - I'd be willing to say it wouldn't be a bad heuristic.
It doesn't undermine your point, but if you mean gross yearly wage $25k is not an amazing salary for senior software developers in Poland (I guess it depends where in Poland).
2 replies →
I find Poles to be among the very best, but it may be a bias in the sample. Most of the ones I know were motivated enough to move to the States for awesome jobs.
1 reply →
Another metric could be time. Do people work less hours?
Actually, we can’t quantify most of the things we would like to optimize.