Comment by krasikra

10 hours ago

Fine-tuned Qwen models run surprisingly well on NVIDIA Jetson hardware. We've deployed several 7B variants for edge AI tasks where latency matters more than raw accuracy – think industrial inspection, retail analytics where you can't rely on cloud connectivity. The key is LoRA fine-tuning keeps the model small enough to fit in unified memory while still hitting production-grade inference speeds. Biggest surprise was power efficiency; a Jetson Orin can run continuous inference at under 15W while a cloud round-trip burns way more energy at scale.

19 comments

krasikra

andai 10 hours ago

Very interesting. Could you give examples of industrial tasks where lower accuracy is acceptable?

dehrmann 1 hour ago

Naive question, but could neural networks handle these use cases?

thot_experiment 1 hour ago

NTA but almost certainly, the advantage is that Qwen3.5 is extremely generic already so adapting it to a specific task is way easier than training a NN from scratch. It's probably akin to how OCR is now just something I use Qwen for even though I have access to dedicated OCR tools, Qwen is good enough and it's already in my vram. Modern VLLMs are pretty great at answering basic questions about an image by default and I'm guessing finetuning takes them from "pretty good" to "good enough to use in production".

w10-1 6 hours ago

> NVIDIA Jetson hardware ... 15W

7B on 15W could be any of the Orin (TOPS): Nano (40), NX (100), AGX (275)

Curious if you've experimented with a larger model on the Thor (2070)

embedding-shape 9 hours ago

> where latency matters more than raw accuracy – think industrial inspection

Huh? Why would industrial inspection, in particular, benefit from lower latency in exchange for accuracy? Sounds a bit backwards, but maybe I'm missing something obvious.

someotherperson 9 hours ago
At a very high level, think fruit sorting[0] where the conveyor belt doesn't stop rolling and you need to rapidly respond, and all the way through to monitoring for things like defects in silicon wafers and root causing it. Some of these issues aren't problematic on their own, but you can aggregate data over time to see if a particular machine, material or process within a factory is degrading over time. This might not be throughout the entire factory but isolated to a particular batch of material or a particular subsection within it. This is not a hypothetical example: this is an active use case.
[0] https://www.youtube.com/watch?v=vxff_CnvPek
- sorenjan 8 hours ago
  
  But that's not something you'd use an LLM for. There have been computer vision systems sorting bad peas for more than a decade[0], of course there are plenty of use cases for very fast inspection systems. But when would you use an LLM for anything like that?
  [0] https://www.youtube.com/watch?v=eLDxXPziztw
  
  4 replies →
- embedding-shape 9 hours ago
  
  But why would I want to results to be done faster but less reliable, vs slower and more reliable? Feels like the sort of thing you'd favor accuracy over speed, otherwise you're just degrading the quality control?
  
  7 replies →