The term "machine vision" is mainly used in highly controlled, narrow industrial applications, think factory assembly lines, steel inspection, monitoring for cracks in materials, shape or size classification of items, etc. The task is usually very well defined, and the same thing needs to be repeated under essentially the same conditions over and over again with high reliability.
But many other things exist outside the "glue some GPT4o vision api stuff together for a mobile app to pitch to VCs" space. Like inspecting and servicing airplanes (Airbus has vision engineers who make tools for internal use, you don't have datasets of a billion images for that). There are also things like 3D motion capture of animals, such as mice or even insects like flies, which requires very precise calibration and proper optical setups. Or estimating the meat yield of pigs and cows on farms from multi-view images combined with weight measurements. There are medical things, like cell counting, 3D reconstruction of facial geometry for plastic surgery, dentistry applications, and a million other things other than chatting with ChatGPT about images or classifying cats vs dogs or drawing bounding boxes of people in a smartphone video.
Thank you for your thoughtful comment! I completely agree.
It’s great to see someone emphasize the importance of mastering the fundamentals—like calibration, optics, and lighting—rather than just chasing trendy topics like LLM or deep learning. Your examples are a great reminder of the depth and diversity in machine vision.
I use LLMs daily for coding. They are great. They are not a replacement for reading a book like the one linked here, or understanding image formation, lenses etc. Many people seem to imagine that all this stuff is now obsolete and all you need to do is wire up some standard APIs, ask an LLM to glue the json and that's all there is to being a computer vision engineer nowadays. Maybe even pros will self denigradinglybsay say say that but after a bit of chatting it will be obvious they have plenty of background knowledge beyond prompting vision language models.
So it's not disdain, I'm simply trying to broaden the horizon for those who only know about computer vision from OpenAI announcement and tech news and FOMO social media influencers.
Here are two examples where the right camera, optics, and lighting make a huge difference:
Semiconductor Wafer Inspection: Detecting tiny defects like scratches or edge chips requires high-resolution cameras, precision optics, and specific lighting (e.g., darkfield) to highlight defects on reflective surfaces. Poor choices here can easily miss critical flaws.
Food Packaging Quality Control: Ensuring labels, seals, and packaging are error-free relies on the right camera and lighting. For instance, polarized lighting reduces glare on shiny surfaces, helping detect issues that might otherwise go unnoticed.
The term "machine vision" is mainly used in highly controlled, narrow industrial applications, think factory assembly lines, steel inspection, monitoring for cracks in materials, shape or size classification of items, etc. The task is usually very well defined, and the same thing needs to be repeated under essentially the same conditions over and over again with high reliability.
But many other things exist outside the "glue some GPT4o vision api stuff together for a mobile app to pitch to VCs" space. Like inspecting and servicing airplanes (Airbus has vision engineers who make tools for internal use, you don't have datasets of a billion images for that). There are also things like 3D motion capture of animals, such as mice or even insects like flies, which requires very precise calibration and proper optical setups. Or estimating the meat yield of pigs and cows on farms from multi-view images combined with weight measurements. There are medical things, like cell counting, 3D reconstruction of facial geometry for plastic surgery, dentistry applications, and a million other things other than chatting with ChatGPT about images or classifying cats vs dogs or drawing bounding boxes of people in a smartphone video.
Thank you for your thoughtful comment! I completely agree.
It’s great to see someone emphasize the importance of mastering the fundamentals—like calibration, optics, and lighting—rather than just chasing trendy topics like LLM or deep learning. Your examples are a great reminder of the depth and diversity in machine vision.
Thanks for the LLM response. Not sure if you meant to be clever here.
3 replies →
Your disdain for LLMs is unfounded.
I use LLMs daily for coding. They are great. They are not a replacement for reading a book like the one linked here, or understanding image formation, lenses etc. Many people seem to imagine that all this stuff is now obsolete and all you need to do is wire up some standard APIs, ask an LLM to glue the json and that's all there is to being a computer vision engineer nowadays. Maybe even pros will self denigradinglybsay say say that but after a bit of chatting it will be obvious they have plenty of background knowledge beyond prompting vision language models.
So it's not disdain, I'm simply trying to broaden the horizon for those who only know about computer vision from OpenAI announcement and tech news and FOMO social media influencers.
Here are two examples where the right camera, optics, and lighting make a huge difference:
Semiconductor Wafer Inspection: Detecting tiny defects like scratches or edge chips requires high-resolution cameras, precision optics, and specific lighting (e.g., darkfield) to highlight defects on reflective surfaces. Poor choices here can easily miss critical flaws.
Food Packaging Quality Control: Ensuring labels, seals, and packaging are error-free relies on the right camera and lighting. For instance, polarized lighting reduces glare on shiny surfaces, helping detect issues that might otherwise go unnoticed.
Any serious production inspection.