i don't have any background in computer vision but enjoyed how the introductory chapter gets right into it illustrating how to build a limited but working simple vision system
About 15, 20 years ago I was still in uni and we had a computer vision lab, the main guy there had been working on that subject for years and dealt with businesses where his stuff was used for quality control.
Without fail, step one of computer vision was to bring the image down to grayscale and / or filter for specific colours so you ended up with a 1 bit representation.
My "algorithm" for a robot that was to follow a line drawn on the floor boiled down to "filter out the colour green, then look at the bottom rows of the image and find the black pixels. If they're to the left, adjust to the left, if to the right adjust to the right". Roughly. I'm sure it could be done a lot more cleverly but I was pretty proud of it AND the whole tool suite was custom made, from editing environment to programming language. Expensive cameras and robot, too.
It may come as a surprise to some that a lot of industrial computer vision is done in grayscale. In a lot of industrial CV tasks, the only things that matter are cost, speed, and dynamic range. Every approach we have to making color images compromises on one of those three characteristics.
I think this kind of thing might have real, practical use cases in industry if it's fast enough.
Ah, I think you work in the same industry as me, machine vision. I completely agree with you, most applications use grayscale images unless it’s color-based application.
Which vision library are you using? I’m using Halcon by MVTec.
I used to work in industrial automation, I was mostly making the process control equipment that your stuff would plug into. PLCs and whatnot. We had a close relationship with Cognex, I don't remember the exact details of their software stack.
Color makes major compromises physically also, since it seems like the Red, Green and Blue channels are sampling from the same physical location but the actual sensor buckets are offset from each other.
Classical machine vision and pattern recognition is absolutely AI. Or at least it was AI before it became too mature to be called that. As they say, any AI problem that gets solved stops being AI and becomes just normal algorithmics.
Classical computer vision is no more AI than quicksort or BFS is. What they say is ML is AI that works. But classic computer vision (CV) is hand rolled algorithms like Eigenfaces to detect faces or Mixture of Gaussians for background subtraction. There's no magic black box model in classic CV, no training on data, no generated pile of "if"s that no one knows how it works. Just linear algebra written and implemented by hand.
It really depends on the application. If the illumination is consistent, such as in many machine vision tasks, traditional thresholding is often the better choice. It’s straightforward, debuggable, and produces consistent, predictable results. On the other hand, in more complex and unpredictable scenes with variable lighting, textures, or object sizes, AI-based thresholding can perform better.
That said, I still prefer traditional thresholding in controlled environments because the algorithm is understandable and transparent.
Debugging issues in AI systems can be challenging due to their "black box" nature. If the AI fails, you might need to analyze the model, adjust training data, or retrain, a process that is neither simple nor guaranteed to succeed. Traditional methods, however, allow for more direct tuning and certainty in their behavior. For consistent, explainable results in controlled settings, they are often the better option.
It can benefit from more complex algorithms, but I would stay away from "AI" as much as possible unless there is indeed need of it.
You can analyse your data and make some dynamic thresholds, you can make some small ML models, even some tiny DL models, and I would try the options in this order.
Some cases do need more complex techniques, but more often than not, you can solve most of your problems by preprocessing your data.
I've seen too many solutions where a tiny algorithm could do exactly what a junior implemented using a giant model that takes forever to run.
Right now the neat future it have is the ability of running custom filters of varied window size of images, and use custom formulas to blend several images
I don't have a tutorial at hand on how to use it, but I have a YouTube video where I show some of its features
The blob-finding algorithm makes me think of the "advent of code" problems - I wouldn't have thought to do a two-pass approach, but now that I see it set out in front of me it's obviously a great idea. Seems like this technique could quite easily be generalised to work with a range of problems.
I had recently learned about using image pyramids[1] in conjunction with template matching algorithms like SAD to do simple and efficient object recognition, it was quite fun.
If you enjoyed this post you may also like the 2024 book foundations of computer vision: https://news.ycombinator.com/item?id=44281506
i don't have any background in computer vision but enjoyed how the introductory chapter gets right into it illustrating how to build a limited but working simple vision system
Thanks for the reference. Looks very from-the-ground-up and comprehensive.
About 15, 20 years ago I was still in uni and we had a computer vision lab, the main guy there had been working on that subject for years and dealt with businesses where his stuff was used for quality control.
Without fail, step one of computer vision was to bring the image down to grayscale and / or filter for specific colours so you ended up with a 1 bit representation.
My "algorithm" for a robot that was to follow a line drawn on the floor boiled down to "filter out the colour green, then look at the bottom rows of the image and find the black pixels. If they're to the left, adjust to the left, if to the right adjust to the right". Roughly. I'm sure it could be done a lot more cleverly but I was pretty proud of it AND the whole tool suite was custom made, from editing environment to programming language. Expensive cameras and robot, too.
It may come as a surprise to some that a lot of industrial computer vision is done in grayscale. In a lot of industrial CV tasks, the only things that matter are cost, speed, and dynamic range. Every approach we have to making color images compromises on one of those three characteristics.
I think this kind of thing might have real, practical use cases in industry if it's fast enough.
Ah, I think you work in the same industry as me, machine vision. I completely agree with you, most applications use grayscale images unless it’s color-based application.
Which vision library are you using? I’m using Halcon by MVTec.
I used to work in industrial automation, I was mostly making the process control equipment that your stuff would plug into. PLCs and whatnot. We had a close relationship with Cognex, I don't remember the exact details of their software stack.
Also resolution & uniformity
Color makes major compromises physically also, since it seems like the Red, Green and Blue channels are sampling from the same physical location but the actual sensor buckets are offset from each other.
Appreciate the old school non-AI approach.
Classical machine vision and pattern recognition is absolutely AI. Or at least it was AI before it became too mature to be called that. As they say, any AI problem that gets solved stops being AI and becomes just normal algorithmics.
Classical computer vision is no more AI than quicksort or BFS is. What they say is ML is AI that works. But classic computer vision (CV) is hand rolled algorithms like Eigenfaces to detect faces or Mixture of Gaussians for background subtraction. There's no magic black box model in classic CV, no training on data, no generated pile of "if"s that no one knows how it works. Just linear algebra written and implemented by hand.
Not AI, not even ML.
3 replies →
But have a look at the "Thresholding" section. It appears to me that AI would be much better at this operation.
It really depends on the application. If the illumination is consistent, such as in many machine vision tasks, traditional thresholding is often the better choice. It’s straightforward, debuggable, and produces consistent, predictable results. On the other hand, in more complex and unpredictable scenes with variable lighting, textures, or object sizes, AI-based thresholding can perform better.
That said, I still prefer traditional thresholding in controlled environments because the algorithm is understandable and transparent.
Debugging issues in AI systems can be challenging due to their "black box" nature. If the AI fails, you might need to analyze the model, adjust training data, or retrain, a process that is neither simple nor guaranteed to succeed. Traditional methods, however, allow for more direct tuning and certainty in their behavior. For consistent, explainable results in controlled settings, they are often the better option.
4 replies →
It can benefit from more complex algorithms, but I would stay away from "AI" as much as possible unless there is indeed need of it. You can analyse your data and make some dynamic thresholds, you can make some small ML models, even some tiny DL models, and I would try the options in this order. Some cases do need more complex techniques, but more often than not, you can solve most of your problems by preprocessing your data. I've seen too many solutions where a tiny algorithm could do exactly what a junior implemented using a giant model that takes forever to run.
There are also many other classical thresholding algos. Don't worry about it :)
It indeed would be much better. There’s a reason the old CV methods aren’t used much anymore.
If you want to anything even moderately complex, deep learning is the only game in town.
1 reply →
sure, if you don't mind it hallucinating different numbers into your image
7 replies →
This is really solid intro to computer vision, bravo!
I was working on a image editor on the browser, https://victorribeiro.com/customFilter
Right now the neat future it have is the ability of running custom filters of varied window size of images, and use custom formulas to blend several images
I don't have a tutorial at hand on how to use it, but I have a YouTube video where I show some of its features
https://youtube.com/playlist?list=PL3pnEx5_eGm9rVr1_u1Hm_LK6...
At some point I would like to add more features as you described in your article; feature detection, image stitching...
Here's the source code if anyone's interested https://github.com/victorqribeiro/customFilter
I vaguely remember XnView having this matrix based custom filters.
The blob-finding algorithm makes me think of the "advent of code" problems - I wouldn't have thought to do a two-pass approach, but now that I see it set out in front of me it's obviously a great idea. Seems like this technique could quite easily be generalised to work with a range of problems.
Really enjoyed this article, thanks for sharing!
I had recently learned about using image pyramids[1] in conjunction with template matching algorithms like SAD to do simple and efficient object recognition, it was quite fun.
1: https://en.wikipedia.org/wiki/Pyramid_%28image_processing%29
Image pyramids are a brilliant method. The technique is hiding in many of the FCNN image segmentation models ive read.
A truly clever image processing method.
I’m not a “C” person but I’ve really enjoyed reading this, it’s quite approachable and well written. Thank you for writing it.
related: https://news.ycombinator.com/item?id=45816673
Referencing "By the power of Grayskull!"
As an aside, "For the honor of grayscale" would work no worse here.
IIIII HHHAAAAAVE THE POWERRRRR
This was a fantastic post. I've never really thought much about image processing, and this was a great introduction.
For those who don't know, the author is a very prolific dev:
https://github.com/zserge?tab=repositories&q=&type=&language...
This title is excellent.
From a 70s kid to an 80s kid, well done!
Ditto. I’ve upvoted this based solely on the amazing title. Best toyline ever.
I too applaud this terrible (amazing) pun.
Didn't recognize George Smiley in those photos. Which makes sense, given he's an espiocrat.
Quality He-Man reference.