Comment by wongarsu
7 months ago
A big part is that training image detection is incredibly easy today. YOLO is a great network with reasonably intuitive tooling. Anyone with a set of images can start labeling them, copy-paste a couple lines into a jupyter notebook and make a decent YOLO finetune.
The difficulty is in the training data, both acquiring it and labeling it. Hence why the readme of WALDO alludes so much to their semi-synthetic data. That's also why this commercial project is happy to give out the models, but doesn't publish their data pipeline.
If you have about 100 satellite images each of T-80s with and without Zs, and a couple other satellite images of other tanks and of landscapes without any tanks you can train a T-80 detecting model in a couple hours. And then spend a couple days in a rabbit hole where you figure out that because in your training set only images with tanks had smoke clouds the model now thinks that smoke clouds are linked to tanks, and you end up making larger and larger data sets with tanks and non-tanks from all angles.
Yep exactly. Commercial satellite inagery isn’t great for detecting anything the size of a tank (yet) but drones will get you there. The real difficulty for detecting anything like this is the training data, IMO that’s the interesting part of the problem. The actual AI models are democratized and getting better every day, but few people focus on the training data. I have my own pipelines for generating data that I have been improving over >3 years, which I have used for commercial clients to make AI systems for specific use cases. WALDO is a way of dogfooding those data synth methods. I am constantly toying with the idea of opening up the data synth stuff as a service company or as an API, but my feeling is there is not quite enough demand yet to warrant that. This is changing very quickly though and now might be the right time…
It's easy but note that the author spent 60k in training costs.
> I'm planning to set up a way for people to get the .pt files and the ONNX models with unlimited outputs for people who support further development of the project on Ko-Fi (https://ko-fi.com/stephansturges), the goal being to offset some of the cost of training these networks (over 60K USD spent on AWS to date! )
Not everyone has that kind of spare cash around
Commercial satellite images? With somewhere between 30cm and 100cm resolution? Looking for the letter 'Z' painted on a sidewall of the vehicle?
Rough.
Medium altitude aerial drone imagery would do it, though - just a matter of building something so cheap & plentiful that it's not worthwhile to shoot down.
Who knows, maybe we've given Ukraine the keys to the castle and they're getting a steady stream of 10cm imagery from the NRO.
That’s not really going to work. You likely won’t be able to identify markings on a tank. Your biggest challenge will be false positives for things that are not tanks.