WALDO: Whereabouts Ascertainment for Low-Lying Detectable Objects

1 year ago (github.com)

> 'arm/mil' --> this class detects certain types of armored vehicles (very unreliable for now, don't use it yet)

Living near a bunch of the military bases, this is what I really need. My suburban defense system keeps mistaking USPS trucks for APCs.

I haven’t received any mail for months.

Sidenote: what are the export restriction?

  • that class never really worked and has been removed from the new version of WALDO FYI, it’s not a military thing and shouldn’t be used as such

  • Ai is going to super charge off grid antigov nuts libertarians?

    • Not just them. I predict it will extend to all classes of folks who use popular schemas for naming taken from self-isolating social forums.

  • > My suburban defense system keeps mistaking USPS trucks for APCs.

    Is... This an american thing? What the heck is a suburban defense system?

    • USPS - United States Postal Service (the governmental mail delivery organization). Not to be confused with:

      - UPS, United Parcel Service (a for-profit; USPS is notoriously not-for-profit).

      - UPS, Uninterruptable Power Supply

      APC - Armored Personell Carrier -- military vechicle for transporting troops. Not to be confused with:

      - APC, Formerly American Power Conversion Company, now an initials-only brand of some conglomerate, who coincidently make UPS units (the electronic device, not the delivery drivers).

I wonder if these achievements are related to war in Ukraine. Do scientists suddenly receive more funding or something? Or it just happens?

Is there a non public version with very reliable arm/mil? Is there a version which can reliably distinguish T-80 with and without Z?

  • A big part is that training image detection is incredibly easy today. YOLO is a great network with reasonably intuitive tooling. Anyone with a set of images can start labeling them, copy-paste a couple lines into a jupyter notebook and make a decent YOLO finetune.

    The difficulty is in the training data, both acquiring it and labeling it. Hence why the readme of WALDO alludes so much to their semi-synthetic data. That's also why this commercial project is happy to give out the models, but doesn't publish their data pipeline.

    If you have about 100 satellite images each of T-80s with and without Zs, and a couple other satellite images of other tanks and of landscapes without any tanks you can train a T-80 detecting model in a couple hours. And then spend a couple days in a rabbit hole where you figure out that because in your training set only images with tanks had smoke clouds the model now thinks that smoke clouds are linked to tanks, and you end up making larger and larger data sets with tanks and non-tanks from all angles.

    • Yep exactly. Commercial satellite inagery isn’t great for detecting anything the size of a tank (yet) but drones will get you there. The real difficulty for detecting anything like this is the training data, IMO that’s the interesting part of the problem. The actual AI models are democratized and getting better every day, but few people focus on the training data. I have my own pipelines for generating data that I have been improving over >3 years, which I have used for commercial clients to make AI systems for specific use cases. WALDO is a way of dogfooding those data synth methods. I am constantly toying with the idea of opening up the data synth stuff as a service company or as an API, but my feeling is there is not quite enough demand yet to warrant that. This is changing very quickly though and now might be the right time…

    • It's easy but note that the author spent 60k in training costs.

      > I'm planning to set up a way for people to get the .pt files and the ONNX models with unlimited outputs for people who support further development of the project on Ko-Fi (https://ko-fi.com/stephansturges), the goal being to offset some of the cost of training these networks (over 60K USD spent on AWS to date! )

      Not everyone has that kind of spare cash around

    • Commercial satellite images? With somewhere between 30cm and 100cm resolution? Looking for the letter 'Z' painted on a sidewall of the vehicle?

      Rough.

      Medium altitude aerial drone imagery would do it, though - just a matter of building something so cheap & plentiful that it's not worthwhile to shoot down.

      Who knows, maybe we've given Ukraine the keys to the castle and they're getting a steady stream of 10cm imagery from the NRO.

    • That’s not really going to work. You likely won’t be able to identify markings on a tank. Your biggest challenge will be false positives for things that are not tanks.

  • Iv seen Ukrainians experimenting with YOLO and it was terrible. Every second bush/tree was flipping between person/tree/rock/nothing. Looks like the model was trained on clean urban environment videos.

    • > Looks like the model was trained on clean urban environment videos.

      But they should now have tons of video data with their landscape and their threats to fine-tune on. It's going to need plenty of labor but they also happen to have tons of maimed soldiers who can't fight anymore.

    • generalisation is 90% of the problem yep. This model is trained with a lot of augmentations and quite diverse data, but it still really needs fine tuning to specific use cases to get great efficiency unless you don’t care about false positives.

Hey, thanks for posting. New release is coming tomorrow on HF BTW. AMA

  • Cool project, any specific reason you went with YOLOv7?

    I know you aren't going to release the dataset but I'd be interesting in any info you are willing to share on augmentations you used and how you generated the synthetic imagery, and what sort of lift you got out of it.

    • Some of the design choices of YOLOv7 make more sense to me in the choices of default augmentations and the structures of the very large versions of the networks. I find I can push it to marginally better recall. It’s slower than Ultralytics’ V8 but if you want to do stuff like offline processing of satellite imagery for instance or get 1fps on occupancy of a parking lot that kind of performance really doesn’t matter.

I worked for a place where we needed to know with precision where in space a large object was relative to a large area we had full control over. I wonder if this could be used in reverse by say dropping QR codes on the ground, using the algorithm to track relative positions and doing the reverse operation from there

  • you don’t need AI for that, classic computer vision will work fine ;)

    • Of course but it was a pain to figure out how to set that up. Much easier, I would guess to run a single pass of yolo and get parameters out, no?

What would be legitimate civilian uses for this technology apart from [0]? After the 10k drone swarm the other day and the pager attacks all I can think of is slaughterbots, which is genuinely freaking me out.

[0] - https://xkcd.com/2128/

  • Buy satellite images of walmart parking lots, run this model to count the cars. Repeat this every week, buy walmart stock when the number goes up and short walmart when the number goes down.

    Buy satellite images of container ports, count the number of containers, predict performance of economy based on containers and invest accordingly.

    Presidential candidate has an open-air rally and you want to figure out how many people are attending? Buy a satellite image scheduled for that exact hour and let WALDO count the people.

    Financing a number of large construction projects but don't trust the progress reports? Buy regularly scheduled satellite images and let WALDO count the number of trucks and construction vehicles.

    Want to invest in the construction business? Guess what, buy satellite images, count trucks and construction vehicles, make investment decisions based on that

    • There's multiple commercial data feed providers for AIS from pretty much every sizable cargo ship in the world (that isn't operating in some weird grey market economy like the Russian sanctions-evading tankers), which are already used to correlate aerial and SAR data with the self-reported AIS positions of vessels.

  • > The basic model shared here, which is the only one published as FOSS at the moment, is capable of detecting these classes of items in overhead images ranging in altitude from about 30 feet to satellite imagery with a resolution of 50cm per pixel or better.

    It's not just for drones, it's for any overhead imaging.

    This can be used for all kinds of things like search and rescue, traffic monitoring, watching for wildfires, disaster response, monitoring parking lots as an economic indicator, etc.

    • it is being used a lot for disaster response and traffic monitoring yep. Also surprisingly to me for analytics on utilisation of all kinds of infrastructure y trading firms. One of the bigger motivations to make it free was to see where people use the base model to understand whether there is a market there for a fine-tuned / more specialized version.

  • Managing the development of farms and construction, monitoring activity levels and the state of things is pretty huge

  • Depending from how high it can reliably work from, collaborate with UK CCTV surveillance so that you can better track individuals with fewer cameras as long as you can collate them with cameras that confirm their position at various points in time.

    Fly a handful of drones over the area of a fleeing suspect and be able to track their whereabouts and look for suspicious behaviors (eg. someone running and making constant turns in a city or doubling back often, cutting through alleys).

    Hell fly a few drones of the city to monitor foot traffic of the population and determine possible points of interest for new developments. Where are people walking to? How do they tend to get there? Can we optimize traffic for them - or more realistically - around them?

    Could be used for other forms of crowd analysis too such as how to best disperse a riot and separate a crowd.

    Sorry I guess I'm about as pessimistic as you are about it. Use in S&R like throwup238 suggested seems like a good non-militaristic fit for it.

    Oh and also this which was posted on HN not too long ago: https://dropofahat.zone/

So it gives us, for all the objects in view, a unique id, location, location history, various alerts.

What else? Any thing-description?

If an object leaves the view and re-enters, does it get the same id?

  • It's just yolo-esque classid, bbox coords, confidence. You'll have to implement some sort of tracking algorithm to get your other traits.