We use a mix of static analysis and AI. Flagged packages are escalated to a human review team. If we catch a malicious package, we notify our users, block installation and report them to the upstream package registries. Suspected malicious packages that have not yet been reviewed by a human are blocked for our users, but we don't try to get them removed until after they have been triaged by a human.
In this incident, we detected the packages quickly, reported them, and they were taken down shortly after. Given how high profile the attack was we also published an analysis soon after, as did others in the ecosystem.
We try to be transparent with how Socket work. We've published the details of our systems in several papers, and I've also given a few talks on how our malware scanner works at various conferences:
I'm not exactly pro-AI, but even I can see that their system clearly works well in this case. If you tune the model to favour false positives, with a human review step (that's quick), I can image your response time being cut from days to hours (and your customers getting their updates that much faster).
It's actually pretty easy to detect that something is obfuscated, but it's harder to prove that the obfuscated code is actually harmful. This is why we still have a team of humans review flagged packages before we try to get them taken down, otherwise you would end up with way too many false positives.
I think that would be static analysis. After processing the source code normally (looking for net & sys calls), you decode base64, concatenate all strings and process again (until decode makes no change)
We use a mix of static analysis and AI. Flagged packages are escalated to a human review team. If we catch a malicious package, we notify our users, block installation and report them to the upstream package registries. Suspected malicious packages that have not yet been reviewed by a human are blocked for our users, but we don't try to get them removed until after they have been triaged by a human.
In this incident, we detected the packages quickly, reported them, and they were taken down shortly after. Given how high profile the attack was we also published an analysis soon after, as did others in the ecosystem.
We try to be transparent with how Socket work. We've published the details of our systems in several papers, and I've also given a few talks on how our malware scanner works at various conferences:
* https://arxiv.org/html/2403.12196v2
* https://www.youtube.com/watch?v=cxJPiMwoIyY
So, from what I understand from your paper, you're using ChatGPT with careful prompts?
You rely on LLMs riddled with hallucinations for malware detection?
I'm not exactly pro-AI, but even I can see that their system clearly works well in this case. If you tune the model to favour false positives, with a human review step (that's quick), I can image your response time being cut from days to hours (and your customers getting their updates that much faster).
1 reply →
He literally said "Flagged packages are escalated to a human review team." in the second sentence. Wtf is the problem here?
5 replies →
> We use a mix of static analysis and AI. Flagged packages are escalated to a human review team.
“Chat, I have reading comprehension problems. How do I fix it?”
1 reply →
"LLM bad"
Very insightful.
AI based code review with escalation to a human
I'm curious :)
Does the AI detect the obfuscation?
It's actually pretty easy to detect that something is obfuscated, but it's harder to prove that the obfuscated code is actually harmful. This is why we still have a team of humans review flagged packages before we try to get them taken down, otherwise you would end up with way too many false positives.
1 reply →
I think that would be static analysis. After processing the source code normally (looking for net & sys calls), you decode base64, concatenate all strings and process again (until decode makes no change)
Probably. It’s trivial to plug some obfuscated code into an LLM and ask it what it does.
1 reply →