← Back to context

Comment by frontierkodiak

4 days ago

oh man that's awesome. I have been working for quite some time on big taxonomy/classification models for field research, espec for my old research area (pollination stuff). the #1 capability that I want to build is audio input modality, it would just be so useful in the field-- not only for low-resource (audio-only) field sensors, but also just as a supplemental modality for measuring activity out of the FoV of an image sensor.

but as you mention, labeled data is the bottleneck. eventually I'll be able to skirt around this by just capturing more video data myself and learning sound features from the video component, but I have a hard time imagining how I can get the global coverage that I have in visual datasets. I would give anything to trade half of my labeled image data for labeled audio data!

Hi Caleb, thanks for the kind words and enthusiasm! You're absolutely right, audio provides that crucial omnidirectional coverage that can supplement fixed field-of-view sensors like cameras. We actually collect images too and have explored fusion approaches, though they definitely come with their own set of challenges, as you can imagine.

On the labeled audio data front: our Arctic dataset (EDANSA, linked in my original post) is open source. We've actually updated it with more samples since the initial release, and getting the new version out is on my to-do list.

Polli.ai looks fantastic! It's genuinely exciting to see more people tackling the ecological monitoring challenge with hardware/software solutions. While I know the startup path in this space can be tough financially, the work is incredibly important for understanding and protecting biodiversity. Keep up the great work!