Comment by oooyay
2 years ago
Out of curiosity, is there a sizable number of developers that just copy and paste untrusted code from StackOverflow into their applications?
The conjecture that people just copy from StackOverflow is obviously popular but I always thought this was just conjecture and humor until I saw someone do it. Don't get me wrong, I use StackOverflow to give me a head start on solving a problem in an area I'm not as familiar with yet, but I've never just straight copied code from there. I don't do that because rarely does the snippet do exactly and only exactly what I need. It requires me to look at the APIs and form my own solution from the explained approach. StackOverflow has pointed me in the direction of some niche APIs that are useful to me, especially in Python.
I once worked with a developer who wouldn’t let anything come between him seeing an answer and copying it into his code. He wasn’t even reading the question to make sure it was the same problem he was having, let alone the answer. He would literally go Google => follow the first link to Stack Overflow he saw => copy and paste the first code block he saw. Sometimes it wasn’t even the right language. People had to physically take the input away from him if they were pairing with him because there was nothing anybody could say to stop him, and if you tried to tell him it wasn’t right then he’d just be pasting the second code snippet on the page before you could get another word out. He was freakishly quick at it.
Now he was an extreme case, but yes, there are a lot of developers out there with the mindset of “I need code; Stack Overflow has code; problem solved!” that don’t put any thought at all into whether it’s an appropriate solution.
A hiring round nearly two decades ago we realised something was off with the answers to the usual pre-phone interview screening questions. They were simple, and we asked people to only spend like 20 minutes on them. We knew people would "cheat", but they were only there to lighten our load a little bit, so it was ok if they let through some bad candidates.
But for whatever reason, in one hiring round the vast majority had cut and pasted answers from search results verbatim (we dealt with a new recruiter, and I frankly suspected this new recruiter was telling them this was ok despite the instructions we'd given).
These were not subtle. But the very worst one was one who did like the developer you described: He'd found a forum post about a problem pretty close to the question, had cut and pasted the code from the first answer he found.
He'd not even bothered to read a few comments further down in the replies where the answer in question was totally savaged by other commenters explaining why it was entirely wrong.
This was someone who was employed as a senior developer somewhere else, and it was clear in retrospect looking at his CV that he probably kept "fleeing the scene of the crime" on a regular basis before it was discovered he was a total fraud. We regularly got those people, but none that delivered such obviously messed up answers.
For ever developer like this, you're probably right there will be a lot more that are less extreme about it, and more able to make things work well enough that they're not discovered.
It is hard for some people to grasp the sheer amount of fraud in this industry. A while back I worked with two guys, one with a Master's and the other with a PhD. One day they came to me asking for help, because the program they'd written (in Python) wouldn't run. It was supposed to analyze some text, and spit out whatever the result of the analysis was.
The problem? They were passing the input text as hardcoded plaintext, i.e. it wasn't even a string with quotes or anything -- just `foo(here is my raw, non-string input, no quotes necessary lol)`, and they could not conceive of what the issue might be.
2 replies →
This is like grading calculus exams. Student gives the memorized answer which most resembles (in his mind) the question asked.
If you're paying a developer by the hour, and want your app released in the app store using as few hours as possible, then this approach can be the most cost efficient one.
Sure, it isn't good practice. Sure, it probably isn't what NASA should be doing. But if you're literally building yet another uber-like app, you probably shouldn't be spending too long thinking about details.
> this approach can be the most cost efficient one.
No it can’t. Quick and dirty? Sure. Take on some tech debt to get to market quicker. Blindly copying and pasting? You’re never going to build functional software that way. This guy was committing code with syntax errors that he’d obviously never even run. How are you going to get to market quickly that way?
The comment you're responding to said the guy was copying the wrong language at times. Code that won't even compile isn't making it into the app store.
Yeah, those details like whether or not it works really don't matter. NASA is overrated.
8 replies →
That's not software development. That's wild guessing.
I've seen "wild guessing" quite a bit when people don't actually understand the problem they're solving. Mostly students, but it happens in professional contexts as well.
I'm not sure why, maybe people are missing knowledge that would allow them to understand, so they just try random things in the hope that it works? It surprises me every time it happens.
To some that is the same. Try and modify until it sort of works.
2 replies →
Just out of curiosity… what was his salary and how long did it take to fire him? Did they fire the HR manager as well?
No idea. I left before he did.
this is basically how GitHub copilot works
Worse too, because some of the copy/pasters at least remember to copy past the StackOverflow URL, too. GitHub Copilot doesn't even give you that.
> People had to physically take the input away from him if they were pairing with him because there was nothing anybody could say to stop him, and if you tried to tell him it wasn’t right then he’d just be pasting the second code snippet on the page before you could get another word out. He was freakishly quick at it.
Sounds like this guy understands concurrency. :)
Just wait til that guy discovers ChatGPT.
I won’t be surprised if that guy is ChatGPT’s main audience.
Personally I can’t see how it would be faster to ask ChatGPT for an answer then carefully scrutinize the output to make sure I understand what it’s doing. Code is often easier to write than read - especially when it’s not your code.
In hindsight the solution is obvious, just run the code without reading it then try to fix it if it doesn’t produce acceptable results.
ChatGPT could help this dev if they understood the problems they are trying to solve. That is such a fundamental flaw in this. They will be on a PIP and out of a job in any respectable workplace. That would be a mercy.
Yes, and it happens more for things that feel out of scope for the part of the program that I'm interested in. After all, we import library code from random strangers into our programs all the time for the parts we consider "plumbing" and beneath notice. If I wanted to dig in and understand something, I would be more likely to write my own. But if I want this part over here to "just work" so I can get on with the project, it's compiler-error-driven development.
Same, and even more so if it's something that feels like it should be in the library code in the first place.
My most copy-pasted code is projecting a point onto a line segment. I end up needing it all the time, it's never in whatever standard library for vector math I'm using, and it's faster to find on SO than to find and translate the code out of whatever my last project that needed it is. Way faster than re-deriving it.
Your vector math library is probably already code imported from random strangers, likely even imported by random strangers, so adding one more function from a random stranger feels entirely appropriate.
I hardly ever just copy and paste for the exact reason the author talks about. Instead, I try to make sense of the solution, and if I have to, I'll hand-copy it down line-by-line to make sure I properly understand and refactor from there. I also rename variables, since often times there are so many foos and bars and bazes that it's completely unreadable by a human.
Also if I come across the problem a second time, I'll have better luck remembering what I did (as opposed to blindly copying).
Yes, people do that. After looking at a huge number of incorrect TLS related code and configuration at SO, I’m now pretty sure that most systems run without validating certificates properly.
This was more true when libraries and tooling defaulted to not checking.
Somewhere in my history is a recent HN (or maybe Reddit) post where somebody insists Curl has been 100% compatible from day one, and like, no, originally curl ignores certificates, today you need to specify that explicitly if it's what you want.
I think (but don't take my word for it) that Requests (the Python library) was the same. Initially it didn't check, then years back the authors were told that if you don't check you get what you didn't pay for (ie nothing) and they changed the defaults.
Python itself is trickier because it was really hard to convince Python people that DNS names, the names we actually care about in certificates, aren't Unicode. I mean, they can be (IDNs), but not in a way that's useful to a machine. If your job is "Present this DNS name to a user" then sure, here's a bunch of tricky and maybe flawed code to best efforts turn the bytes into human Unicode text, but your cert checking code isn't a human, it wants bytes and we deliberately designed the DNS records and the certificate bytes to be identical, so you're just doing a byte-for-byte comparison.
The Python people really wanted to convert everything messily to Unicode, which is - at best if you do it perfectly - slower with the same results and at worst a security hole for no reason.
OpenSSL is at least partly to blame for terrible TLS APIs. OpenSSL is what I call a "stamp collector" library. It wants to collect all the obscure corner cases, because some of its authors are interested. Did the Belgian government standardise a 54-bit cipher called "Bingle Bongle" in 1997? Cool, let's add that to our library. Does anybody use it? No. Should anybody use it? No. But it exists so we added it. A huge waste of everybody's time.
The other reason people don't validate is that it was easier to turn it off and get their work done, which is a big problem that should be addressed systemically rather than by individually telling people "No".
So I'd guess that today out of a thousand pieces of software that ought to do TLS, maybe 750 of them don't validate certificates correctly, and maybe 400 of those deliberately don't do it correctly because the author knew it would fail and had other priorities.
Apache used to not reject SNI hostname headers ending in a dot, in contravention of RFC 6066. Firefox notoriously didn't strip the trailing dot before sending the header. Some versions of curl (or the underlying libraries?) did, some didn't. I filed a bug at bz.apache.org about it.
requests pulls in certifi (Firefox's trust store, repackaged) via urllib3, so it probably uses those root certs by default, not the system store.
To be fair that might be partly the fault of TLS libraries. There should be a single sane function that does the least surprising thing and then lower level APIs for everything else. Currently you need a checklist of things that must be checked before trusting a connection.
Oh boy, where to begin. You obviously haven't had the pleasure of working in a codebase written by Adderall-fueled 23-year-olds.
What about Adderall-fueled 35 year olds?
What about Red Bull-fueled 43 year olds?
2 replies →
I think the section “ A Study on Attribution” and associated paper might be as good of an answer as you’ll get to that
Well. You (collective you) start by copying and pasting a code snippet first, and then modifying it as needed. Does that count? If no modifications are needed, then it stays.
That's what I do. I almost always rename things to match the coding style of the codebase I'm working on, though.
Plenty of developers paste arbitrary bash commands posted on sites like GitHub without thinking because they look "legit", I suppose. I see it similarly as you do: StackOverflow (and Copilot) can be helpful to start but it's.
Had an exchange like this some time ago:
Me: Hey, I'm reviewing your PR. Looks pretty fine to me. Except for this function which looks like it was copy-pasted from SO: I literally found the same function in an answer on SO (it was written in pure JS while we were using TS in our project).
Dev: Yes, everyone copies from SO.
Me: Well, in that case I hope you always copy the right thing. Because this code might run but it is not good enough (e.g. the variable names are inexpressive, it creates DOM elements without removing them after they are not needed anymore).
There really is, but people do give it a cursory read. See also: https://en.wikipedia.org/wiki/Underhanded_C_Contest
Yes. I was told from a reliable source that at one point they tried to log all the copy and paste events and it brought their systems to their knees.
I wouldn't do it in most professional settings due to licensing...
But for personal projects where I just want to get something running, then yes, I would copy paste and barely even read the code.
I don't really care about bugs like this either - I'm happy to make something that works 99% of the time, and only fix that last 1% if it turns out to be an issue.
> I wouldn't do it in most professional settings due to licensing...
Underrated comment. I think most tech companies' General Counsel would have a heart attack if they were aware of StackOverflow copy-pasting by their developers. I highly doubt some rando-engineer who pastes bubblesort code into their company's code base gave even a passing though to what license the SO code was under, what license his own company's code was under, and whether they were compatible.
The big (FAANG) tech companies I've worked at all have written policies about copying and pasting code from external sources (TLDR: Don't), but I've seen even medium-sized (~1000+) companies with zero guidance for their developers.
In the server side JavaScript world absolutely, it seems like it's standard practice, people are injecting entire dependencies without even remotely looking at the code. Bringing in an entire library for a single function that could be accomplished in a couple lines and usually is posted below the fold.
...you would not believe...
not long ago I worked on a team who actively chose libraries and frameworks based on the likelihood they felt their questions would be answered on StackOverflow.
Yes.
This is why PHP got such a bad reputation. A lot of new developers where copy and pasting quick example code from stack overflow, or code from other new developers who only kind of knew what they were doing.
> This is why PHP got such a bad reputation.
I don't think that's the only reason, lol.
What? SO launched in 2008 and PHP had a bad reputation prior to that.
The point stands, it just wasn't SO they were getting the bad information from prior to 2008.
You're right, prior to that it was random forums,
1 reply →
Less and less every day. Now they are using ChatGPT.
when i had to used python i felt like copy pasting anything was out of scope due to indentation errors.
Millions.
Wait til you find out about chatGPT