Over 20 years ago I had a Toshiba Tablet PC convertible that had a beam forming array of microphones and it came with software that let you point where you wanted to record from.
The use case was for lectures, you could tell the laptop to just record from behind it, pointing the beam in the direction of the professor.
In the golden age of mini camcorders, some Sony Handycams had "zoom" microphones which used beam forming to limit gathered sound to roughly the area equal to the what your sensor sees.
It's used widely in fancy videoconferencing setups.
The mic array for the room figures out who's talking and isolates the audio from them.
(Videoconferencing in large rooms has long picked the loudest microphone to use at any time, to avoid mixing in noise from other mics, but then the beamforming makes it that much better.)
I'm wondering if that's why those kinds of setups offer good audio if only one person speaks and there's clear pauses between people, but as soon as you have a quick back and forth or two people talking the audio turns into complete mush.
I wonder how that worked. Assuming the microphones were on the screen plane rather than the body, it wouldn't be able to tell the difference between "straight in front" and "straight behind".
Straight in front is likely to be unobstructed while straight behind is likely to be obstructed by computer hardware. Therefore, straight in front is likely to have crisp sound while straight behind is likely to be muffled and/or distorted by reflections.
As someone who made the mistake of putting a webcam cover over the tiny little microphone hole above the screen (it picks up very little besides impact noises now), it wouldn't be hard to have a mic hole facing in both directions to solve that problem
The attenuation provided by the tablet case / shell is quite significant. I bet they had some extra foam, or, something, to make it even stronger. So the "right behind" signal would be heard only if "right in front" is not readily drowning it.
Idea I've had for years but never got around to testing due to lack of compute:
Use a microphone array and LIDAR for ground truth, and train a diffusion model to "imagine" what the world looks like conditioned on some signal transformations of the microphone data only.
Could be used by autonomous vehicles to "see" pedestrians through bushes, early detect oncoming emergency vehicles, hear bicyclists before they are visible, and lots of other good things.
This already exists, it's the domain of inverse problems. Inverse problems consider a forward problem (in this case wave propagation) depending on some physical parameters or domain geometry, and deduce the parameters or geometry from observations.
Conceptually, it's quite simple, you need to derive a gradient of the output error with respect to the sought information. And then use that to minimize the error (= "loss function" or "objective" depending on field terminology), like you do in neural networks.
In many cases, the solution is not unique, unfortunately. The choice of emitters and receivers locations is crucial in the case you're interested in.
There's a lot of literature on this topic already, try "acoustic inverse problem" on google scholar.
I like it. I think you'd need to be in known motion around the area to build up a picture -- I don't think it would work with a microphone just sitting in place.
My (never finished) master's thesis was about something similar - taking advantage of the fact that (almost) all smartphones have at least two microphones I wanted to locate and separate a speaker in 3D.
A few takeaways:
-The sampling rate is slightly off between devices - approximately ±1 sample per second - not a lot, but you need to take that into account.
-Spectral characteristics in consumer microphones are all over the place - two phones of the same model, right out of the box, will have not only measurable, but also audible differences.
-Sound bounces off of everything, particularly concrete walls.
-A car is the closest thing to an anechoic chamber you can readily access.
-The Fourier transform of a Gaussian is a Gaussian, which is very helpful when you need to estimate the frequency of a harmonic signal (like speech) with a wavelength shorter than half your window, but just barely.
> - A car is the closest thing to an anechoic chamber you can readily access.
I recall a youtuber solving the anechoic chamber problem by finding a big empty field - nothing to reflect off of except the ground - and maybe putting some foam below the experiment.
It doesn't kill environmental noise, of course, but it apparently did a very good job of killing reflections from his own instruments.
In my case wind noise disturbed the signal too much. Normally there's additional processing which deals with it, but I was working with (next to) raw data.
I didn't have such a place at the time, but I found one and results weren't as good as in a car.
Sound deadening generally requires mass to work for lower frequencies and the seats absorbed them all nicely. I got some reflections from - I assume - the windows, but they were manageable in comparison. Didn't even produce much of a standing wave when I tried.
>The Fourier transform of a Gaussian is a Gaussian, which is very helpful when you need to estimate the frequency of a harmonic signal (like speech) with a wavelength shorter than half your window, but just barely.
I get the gaussian link. But, can you explain your point with more detail?
The log of a Gaussian is a parabola, which makes finding where exactly a peak in the spectrum lies a question of solving a quadratic equation.
My plan was to detect the frequency of the speaker by counting (with weights) which distance between peaks is the most common. I wanted to avoid calculating the power cepstrum as I felt that I was running out of computing power on the devices already[0] - a mistaken belief in the long run, but I was too proud of my little algorithm and how stable it was to let go.
[0] Sending raw sample data to a more powerful machine was out of the question as I wanted to remain within the bandwidth offered by Bluetooth at the time due to power consumption considerations.
Wow, this really puts into perspective how much work has to be put into even the most insignificant details of getting Linux to run on (Apple Silicon) Macs. I say "insignificant" with all due respect because, well, the built-in microphone sees very little use (except if you have forgotten your headset).
The built-in microphone is actually excellent, I often use it even when I have my AirPods Pro in because the sound quality is so much better
If you've got headphones with a wraparound microphone on its own arm then it could be better, but everyday headphones are limited by the position of the microphone
Yeah, no matter how good the microphone actually is on a headset, it uses an ancient codec so until we get Bluetooth 5.3 everywhere with lc3 codex then we won't actually have good mic input from headphones and headsets. I predict that this is all going to change this year and next year. But the full stack has to support it from headphones to Bluetooth chips to OS.
Everyday headphones are limited by the fact that people often use Bluetooth, and Bluetooth audio is just terrible tech that hasn't improved by much in the last 10 years, and still can't do more than 16kHZ when doing both input and output at the same time.
I think this isn't a problem if you're using Apple headphones with Apple devices, but anything else falls back to crappy BT quality, usually with some kind of terrible ANC to boot.
FOr me, crappy audio setups and apps trying to do too much audio processing are the primary reason of "Zoom fatigue". I've done a lot of calls over apps that transmit raw, high-quality audio with no processing whatsoever, and the experience is just so much better.
Plenty of good headsets do beamforming with their microphones as well, just depends on what you're running. Macbook mics are well above average, though, so I agree in most cases they'll be better unless you're picky about your headset mic quality.
The MBP mic is generally preferable to most headset boom mics in my experience with good noise reduction. You also get the benefit of not picking up extraneous mouth noises (gum chewing, coffee slurping, whatever)
I feel like 99% of people I conference with use regular headphones + MBP mic
Main problem with that setup is not being able to hear your own voice in the headphones (feedback, or whatever that's called) which can be annoying sometimes if using NC headphones
Actually my complaint relates to open office designs, the macbook mic picks up louder people from across the room. So if I do use headphones and the MBP mic, other people will hear random noise blurbs from anywhere in the office .
I don't think I recall having a meeting with anyone using plain headphones with the laptop mic instead of a headset of some kind. Wired headphones without a mic are somewhat unusual nowadays to begin with outside audio file circles.
AirPods of various versions is common, as is many other buds. Enterprise headsets like those from EPOS (the old Sennheiser Communication) and Jabra (with or without boom) and speakerphones are common in corporate settings, casual headsets (e.g., Sony, Bose) and wired gaming headsets are common at home.
Well it is simple if you use the whole package as delivered (although Apple has been straying off the road it paved for quite a while now).
The point is, everything they make is vertically integrated. They want to deliver a feature (like Airdrop or Continuity), they will cut across the stack to get it done. If you go the DIY route (which is what Asahi is effectively all about), you get to also DIY the missing software pieces.
The upside is that the entire ecosystem gets to benefit from that work (see e.g. the new DSP in PipeWire). PC hardware is generally crap, and so is Apple's if you omit these extra bits. But "the whole package" sets the bar quite a bit higher. I want to see the FOSS ecosystem meet that bar.
The three-mic array is also found in Intel-based Retina MacBooks, so this might also be useful for proper audio support on that older hardware. (Some early Retina MacBook Pros have a two-mic array only, but most have the full three-mic array.)
Because most mics are still using Bluetooth 5.0 I use the microphone on my Mac even when I'm wearing a headset. Otherwise, it puts me into a weird codec mode of ancient history where I get downgraded to a low bit rate and even my audio input to my ears sounds horrible then. So I always use the Mac microphone when possible.
It's more annoying on Linux where you have to manually switch... at least most apps in windows/mac will automagically put my headset in the correct mode.
I always set my microphone to MacBook's even when wearing a headphones, because the quality is incredibly good even in noisy environments. In Zoom I also set "Original sound for musicians" on if in a quiet location. So much more natural sound.
You can get surprisingly good results from cheap laptop hardware (as well as fancier hardware like an MBP) using software DSP techniques. One of the things I'm pleased about is that quite a bit of Asahi's audio work is just as applicable to generic laptops as it is to Macs.
I already use the Bankstown bass harmonics synthesis plugin developed for Asahi and a convolution EQ on a cheap HP laptop, with startlingly impressive results, using the Pipewire plugin chain autoload feature also developed for Asahi.
I suspect there are quite a few use cases for this beamformer outside of the Asahi ecosystem as well.
Regarding the SIMD optimizations, the authors may want to look into faer. I haven't had a great experience with its underlying library pulp, as I'm trying to things that go beyond its linear algebra roots, but if the goal is primarily to accelerate linear algebra operations, I think it will go well.
I've got a blog post and associated podcast on Rust SIMD in the pipeline, we'll touch on this.
> the microphone array found in the following Apple Silicon laptops:
> MacBook Pro 13" (M1/M2)
> MacBook Air 13" (M1/M2)
> MacBook Pro 14" (M1 Pro/Max, M2 Pro/Max)
> MacBook Pro 16" (M1 Pro/Max, M2 Pro/Max)
> MacBook Air 15" (M2)
Does it mean M2/M3 don't have similar array of microphones or rather not tested?
I'm even curious if this is only supported on Linux or MacOS as well - not sure if apple provides dedicated microphone stream for each mic?
Also, it turned out that some previous work done for the speaker output was reused here for mic input.
> Thanks to the groundwork laid in PipeWire and WirePlumber for speaker support, wiring up a DSP chain including Triforce for the microphones was really simple. We just had to update the config files, and let WirePlumber figure out the rest!
Much like with the speakers, Apple are trying way too hard to be fancy here
Could the author of this package comment on this statement? I'd be really interested in their opinion of their speaker implementation.
What's overly complicated there? The hardware? The software?
As a MBP user and hobbyist audio guy I've been really impressed with the implementation of those speakers, particularly on the larger MBP models.
But I'm just a hobbyist and don't have any knowledge of them other than the driver arrangement (tweeter + dual opposed woofers). It certainly seems like they're pulling the same tricks used by "good" bluetooth speaker designers in order to wring acceptable perf and bass extension from teeny tiny speakers (adaptive EQ etc)
Getting reasonable speaker support in Asahi Linux was a big deal. Part of the problem is that limiting the power usage to prevent overheating requires sophisticated DSP. Without that, you get very limited volume output within safe limits.
> Much like with the speakers, Apple are trying way too hard to be fancy here
It is just a reference that Apple Laptop speakers have been waaay above anything the competition uses - and this is true since multiple generations. Had a MBP from 2014 and multiple friends were astonished about the sound when we watched a movie on the go. Same with the M4 MBP - sounds quality from the speaker is at a level that you probably don't actually need.
I feel like this must be some kind of a language barrier thing - the dev’s name appears to be Spanish, so English may not be their native language. And I think that most native English speakers - as demonstrated by multiple comments asking about it in this thread - would interpret “trying too hard to be fancy” as implying “because you can get similar high-quality results without using such sophisticated techniques”; but it seems like you’re saying (and this makes sense) they meant “because getting such high-quality results is overkill for a consumer laptop”.
Language is fascinating - I can convince myself with enough effort that the latter is just as valid as the former, given the literal meaning of the words, but my linguistic intuition is screaming at me that it’s wrong. How does someone ever learn that? How would a textbook ever explain it?
> It is just a reference that Apple Laptop speakers have been waaay above anything the competition uses
More like the opposite. The MacBook speakers are absolutely rubbish, just like all laptop speakers (there's only so much you can do when constrained to a laptop body). The reason why MacBooks sound good is entirely god-tier signal processing which manages to extract extraordinary performance out of some decidedly very ordinary speakers.
In my experience MBP 2015 sound is pretty thin and high frequencies are prone to clipping at even a moderate volume – soprano vocal parts suffer from this quite a bit. Of course for most uses that’s not a big problem and I’m sure the sound is still much better than that of many other laptops though. But the M series MBP speakers are a crazy improvement.
I have a feeling that this package is for folks that want to run Linux distros on the laptops, and have access to the same capabilities as native MacOS.
I'm confused too. These days, "spatial audio" on speakers (different from on headphones) and beamforming mics is starting to feel standard, at least on premium hardware.
Dumb, noisy, cramped, unbalanced audio just doesn't cut it anymore.
if you think fake 5.1ch sounds better, not like better for enjoying action movies, you've never had exposure to a >$99 pair of bookshelf speakers with a non-USB powered class D amp. change my mind.
For the software to perform beamforming it must be provided the discrete microphone inputs, as opposed to being provided some sort of pre-mixed feed. As such, why is Apple "trying way too hard to be fancy here" if you can just use one of those mics? Or is the alternative that they do the "beamforming" in hardware regardless of the OS?
They're extremely omnidirectional and very sensitive. With a single mic with no beamforming you get basically all of the sounds from every part of the room, including and especially horribly loud sounds from (eg.) the keyboard and mouse.
Apple selected their microphones based on the constraints their system had (beam formed array) rather than the "usual" laptop microphone which is physically not very sensitive and highly directional towards the front of the laptop, and in turn, those microphones are not particularly useful without beam forming.
Other laptops with beamformed arrays simply don't expose the raw mics to userland, by doing the beamforming in firmware, but this of course comes with its own set of issues.
> Other laptops with beamformed arrays simply don't expose the raw mics to userland
Not always true, back in the Windows XP days (!!!) some laptops would expose the array to software and let the user configure where the mics record from.
It is unfortunate that user control has been polished out of modern systems in exchange for "it just kind of works".
Avoiding extra coprocessor and/or avoiding patent dispute like they did with speakers (which differ from a H-K patent by not having a discrete chip implementing it)
> This is an attempt at a beamformer armed only with first year undergrad level engineering maths and some vague idea of the principles gleaned from various webpages and PDFs
Not certain if OP is saying they are currently an undergrad, but impressive if so
It would be great if this was implemented in a way that also other manufacturers can easily start building mic arrays such that it would make them immediately useful.
I would be surprised if Apple didn't have patents on their mic array, meaning that another manufacturer would ideally prefer if their setup is different and incompatible to reduce the chance of accidental patent infringement.
I'd search to see, but reading patents is an info-hazard which increases your chance of infringing, so I've quit reading them entirely.
Maybe they're doing something new, but beamforming microphone arrays can be found in just about any brand of laptop if you go high end enough.
I do think most such devices will present themselves as less capable than they actually are (I.E. just a stereo input) for maximum OS compatibility, but the technique isn't Apple exclusive as far as I know.
Maybe they can still install the array, and we can simply "apt-get install illegal-package".
But all joking aside, there is a tremendous amount of literature on the mathematics of beamforming. I'd be surprised if any of it is patented in a way that isn't circumventable.
Yes, I'm sure they have some patents because that's what big companies do/have to do. But the basic idea has been around for a long time, not just in audio but also in microwave space/domain. So I'm sure there's plenty of prior art.
This is how Apple addressed audio hardware and do something similar for speakers. Instead of trying to make speakers that have the desired frequency response or microphones that produce the desired signal, they let the analog hardware do whatever it does.
Then in software they use digital signal processing. For speakers they modify what gets sent to the hardware so that the actual output then does match the frequency response, and for the microphones they do this work to extract the desired signal.
If Linux addressed the speakers as is, you would get unpleasant sound, and if it read the microphones as is, it would get a lot of noise. That is why Asahi had to add digital signal processing to the audio input and output, to get the "correct" audio.
It does mean the processing is specific to the analogue audio hardware in each of the different Mac models.
The processing could be done in additional hardware, but why bother when you have a very good CPU that can do the work.
> For speakers they modify what gets sent to the hardware so that the actual output then does match the frequency response
As I understand, this is not a magic pill: it probably won't help to pull out frequencies which are suppressed by 30-40 dB and I assume that if the frequency response graph is too wavy (lot of narrow peaks and dips), it won't help either.
Also, you need to have calibration files to use this method, right?
your question was non specific so guessing a bit at what you're asking, because some of it is already answered in the docs... but conceptually it's similar how gps triangulation works, but in the other direction, (information flows from the source point, speaker in this case, to the mic array) and with audio waves instead of rf waves. Each mic will have a slightly different view of the audio coming in, and using the timing between them, you can use the wave form that one mic records to figure out what's to early or too late to be audio from directly in front of the laptop. And then delete that audio, leaving just audio from the speaker directly in front of the laptop.
eg
A ------ MIC1 --- B --- MIC2 ------ C
any sound coming from A, will be picked up by MIC1 well before MIC2, same for sounds coming from C. If you delete that audio from the income waveform, you have beam forming. And thus much better audio noise filtering.
And as it says in the link, Apple decided to implement this is software, not hardware, so you'd need to reimplement it if you're not using macos.
You should be able to send data and record it in a way that measures local geometry like hands since there's a microphone array like this, interesting.
I can't speak for this implementation, but on MacOS, the beamforming is amazing. When used in a noise office or cafe environment it eliminates background noise to an extent I can always tell if a colleague is using it or their worse headphone mic.
I was sitting at a Starbucks next to a VERY noisy street on a google meet call on an M1 Air with usb-c AirPods (the cheap 19$ one) and I asked the person on the other end if they can hear me at all. To my surprise they couldn’t hear any noise just my voice. No idea which part in the whole setup achieved this but I feel like stuff like AI and all have some applications that can blow you away. Not putting the damn thing in everything!
Correction here: it wasn’t apples AirPods, it was bose quit comfort over the ear iirc. That’s why I could hear the other person. But I think they could hear me cause of maybe both meet and good mic array.
You misread. It's more like grudging admiration that Apple took the proprietary software beamforming route. It's a remark to its technical function but acknowledges that because of its closed implementation, the microphones just cannot be used outside of the macOS ecosystem without additional effort (like in this repo).
Which is, as I'm sure you agree, is unfortunate and at least deserving of some (minor) reprisal.
This is one of the cooler features of Apple Vision Pro, it does such good beamforming for the wearer's mouth that someone could be screaming next to you or blasting music, and other parties on Zoom or FaceTime will not hear them.
Yes, except the output is something that has to sound "subjectively good" after all the DSP, vs rf beamforming where you have a very easy metric (dropped packets) that you can optimize the beamforming direction with.
> Much like with the speakers, Apple are trying way too hard to be fancy here, and implement an adaptive beamformer in userspace to try and isolate the desired signal from background noise.
Might be fancy, but it does make for surpisingly good audio from a laptops.
Indeed. I can't help but think that anyone thinking Apple is trying too hard to be fancy on something like "audio quality from microphone in a laptop" doesn't quite grasp what Apple's about.
There are many advantages to vertical integration as regards end-user-experience.
Honestly, with speakers it was mainly a patent avoidance thing (patent on essentially the same thing but done with dedicated hardware, doing it with software on "application processor" bypassed the patent claims)
A lot of similar stuff is done in firmware on x86 laptops, to the point that both AMD and Intel now share considerable portion of the stack, with both using Xtensa cores for DSP, with Sound Open Firmware as SDK. When I use built-in microphone array on my laptop, it's parsed through the DSPs "transparently" to end user.
But technically you can load your own firmware there.
Usually you can't load your own SoF firmware, on most hardware it has to be signed by Intel, with exceptions like Chromebooks, where you have to sign it with a "community" key that is publicly available. There was talk of a way for device owners to add keys, but that isn't implemented yet.
If it was just patent avoidance why aren’t there any non-apple laptops either their sound quality? Both the microphones and the speakers are some of the best audio I’ve ever encountered.
It's not just good, I found it to be way better than a standalone shotgun mic connected via USB. I researched this for WFH and found a lot of people saying you were going to spend hundreds to replicate the quality in a more "professional" mic setup. Super impressive.
Does it record a fixed point, or does it do something fancy like using the camera to attempt tracking the user's movement? Just curious, and I don't have access to a modern Mac. The article seems to imply that it's focusing on a fixed point.
Does it not also run on macOS? It could be useful if you want to tweak usage of the mic array yourself, rather than relying on proprietary magic that gives you the output it deems best
Well, if it's available on crates.io, I guess nobody will think that it's from Apple. Also, it could conceivably be used in other software besides Asahi too...
> Much like with the speakers, Apple are trying way too hard to be fancy here, and implement an adaptive beamformer in userspace to try and isolate the desired signal from background noise.
That’s a rather unfair characterization. I’ve found the array to work very well in practice. It’s hardly trying to hard.
They are atrocious, IME. I continually get near muted. I.e., if I record the signal, my voice is there, but extremely faint. Unusable for VC audio, and I've moved completely to a headset mic because of it.
I find this kind of thing a good case for LLMs as they can dumb down the technical jargon:
From Gemini:
```
Imagine you're trying to record someone talking in a noisy room using your MacBook's built-in microphones. This software acts like a super-smart filter:
* It knows where the microphones are: Apple laptops have multiple tiny microphones.
* It listens to all of them at once: It takes the input from all the microphones.
* It figures out where the person talking is: It analyzes the sound to find the direction of the voice.
* It focuses on that voice: It boosts the sound coming from that direction.
* It quiets down the other noises: It reduces the sound from other directions, like background chatter.
So, instead of getting a muddy recording with lots of noise, you get a clearer recording of the person you want to hear.
Basically, it makes your MacBook's microphones sound much better in noisy environments. And it's designed to work within audio programs that use a specific plugin format called LV2.
blog post with the background story why this was created: https://asahilinux.org/2025/03/progress-report-6-14/#is-this...
Thank you. I was about to ask exactly that.
Over 20 years ago I had a Toshiba Tablet PC convertible that had a beam forming array of microphones and it came with software that let you point where you wanted to record from.
The use case was for lectures, you could tell the laptop to just record from behind it, pointing the beam in the direction of the professor.
Amazing idea and something I haven't seen since.
In the golden age of mini camcorders, some Sony Handycams had "zoom" microphones which used beam forming to limit gathered sound to roughly the area equal to the what your sensor sees.
Another great idea.
Oh. They still make similar stuff: https://electronics.sony.com/imaging/imaging-accessories/all...
I feel like my iPhone does it. But not sure. Sound definitely changes when you zoom while recording
10 replies →
It's used widely in fancy videoconferencing setups.
The mic array for the room figures out who's talking and isolates the audio from them.
(Videoconferencing in large rooms has long picked the loudest microphone to use at any time, to avoid mixing in noise from other mics, but then the beamforming makes it that much better.)
I'm wondering if that's why those kinds of setups offer good audio if only one person speaks and there's clear pauses between people, but as soon as you have a quick back and forth or two people talking the audio turns into complete mush.
I wonder how that worked. Assuming the microphones were on the screen plane rather than the body, it wouldn't be able to tell the difference between "straight in front" and "straight behind".
Straight in front is likely to be unobstructed while straight behind is likely to be obstructed by computer hardware. Therefore, straight in front is likely to have crisp sound while straight behind is likely to be muffled and/or distorted by reflections.
1 reply →
As someone who made the mistake of putting a webcam cover over the tiny little microphone hole above the screen (it picks up very little besides impact noises now), it wouldn't be hard to have a mic hole facing in both directions to solve that problem
1 reply →
They're in somewhat random locations, not symmetric and parallel as one might expect.
Sennheiser has a model that is mounted on ceiling. Haven’t seen this live.
https://www.sennheiser.com/en-us/catalog/products/meeting-an...
1 reply →
The attenuation provided by the tablet case / shell is quite significant. I bet they had some extra foam, or, something, to make it even stronger. So the "right behind" signal would be heard only if "right in front" is not readily drowning it.
Idea I've had for years but never got around to testing due to lack of compute:
Use a microphone array and LIDAR for ground truth, and train a diffusion model to "imagine" what the world looks like conditioned on some signal transformations of the microphone data only.
Could be used by autonomous vehicles to "see" pedestrians through bushes, early detect oncoming emergency vehicles, hear bicyclists before they are visible, and lots of other good things.
This already exists, it's the domain of inverse problems. Inverse problems consider a forward problem (in this case wave propagation) depending on some physical parameters or domain geometry, and deduce the parameters or geometry from observations.
Conceptually, it's quite simple, you need to derive a gradient of the output error with respect to the sought information. And then use that to minimize the error (= "loss function" or "objective" depending on field terminology), like you do in neural networks.
In many cases, the solution is not unique, unfortunately. The choice of emitters and receivers locations is crucial in the case you're interested in.
There's a lot of literature on this topic already, try "acoustic inverse problem" on google scholar.
So basically a kind of passive echolocation?
I like it. I think you'd need to be in known motion around the area to build up a picture -- I don't think it would work with a microphone just sitting in place.
3 replies →
On recent devices with on-device NPU, could be combined with RF imaging of nearby activity and structure via WiFi 7 Sensing Doppler radar.
From the samsung S10 forward, this is a feature while recording video in zoom mode. I was always really curious how they did it.
My (never finished) master's thesis was about something similar - taking advantage of the fact that (almost) all smartphones have at least two microphones I wanted to locate and separate a speaker in 3D.
A few takeaways:
-The sampling rate is slightly off between devices - approximately ±1 sample per second - not a lot, but you need to take that into account.
-Spectral characteristics in consumer microphones are all over the place - two phones of the same model, right out of the box, will have not only measurable, but also audible differences.
-Sound bounces off of everything, particularly concrete walls.
-A car is the closest thing to an anechoic chamber you can readily access.
-The Fourier transform of a Gaussian is a Gaussian, which is very helpful when you need to estimate the frequency of a harmonic signal (like speech) with a wavelength shorter than half your window, but just barely.
> - A car is the closest thing to an anechoic chamber you can readily access.
I recall a youtuber solving the anechoic chamber problem by finding a big empty field - nothing to reflect off of except the ground - and maybe putting some foam below the experiment.
It doesn't kill environmental noise, of course, but it apparently did a very good job of killing reflections from his own instruments.
In my case wind noise disturbed the signal too much. Normally there's additional processing which deals with it, but I was working with (next to) raw data.
Surely a carpeted closet full of clothes is better than a car
I didn't have such a place at the time, but I found one and results weren't as good as in a car.
Sound deadening generally requires mass to work for lower frequencies and the seats absorbed them all nicely. I got some reflections from - I assume - the windows, but they were manageable in comparison. Didn't even produce much of a standing wave when I tried.
Most people's closets aren't a room.
>The Fourier transform of a Gaussian is a Gaussian, which is very helpful when you need to estimate the frequency of a harmonic signal (like speech) with a wavelength shorter than half your window, but just barely.
I get the gaussian link. But, can you explain your point with more detail?
The log of a Gaussian is a parabola, which makes finding where exactly a peak in the spectrum lies a question of solving a quadratic equation.
My plan was to detect the frequency of the speaker by counting (with weights) which distance between peaks is the most common. I wanted to avoid calculating the power cepstrum as I felt that I was running out of computing power on the devices already[0] - a mistaken belief in the long run, but I was too proud of my little algorithm and how stable it was to let go.
[0] Sending raw sample data to a more powerful machine was out of the question as I wanted to remain within the bandwidth offered by Bluetooth at the time due to power consumption considerations.
1 reply →
Wow, this really puts into perspective how much work has to be put into even the most insignificant details of getting Linux to run on (Apple Silicon) Macs. I say "insignificant" with all due respect because, well, the built-in microphone sees very little use (except if you have forgotten your headset).
Or, to quote the progress report (https://asahilinux.org/2025/03/progress-report-6-14/#is-this...): "This is Apple though. Nothing is ever simple."
The built-in microphone is actually excellent, I often use it even when I have my AirPods Pro in because the sound quality is so much better
If you've got headphones with a wraparound microphone on its own arm then it could be better, but everyday headphones are limited by the position of the microphone
Yeah, no matter how good the microphone actually is on a headset, it uses an ancient codec so until we get Bluetooth 5.3 everywhere with lc3 codex then we won't actually have good mic input from headphones and headsets. I predict that this is all going to change this year and next year. But the full stack has to support it from headphones to Bluetooth chips to OS.
16 replies →
Everyday headphones are limited by the fact that people often use Bluetooth, and Bluetooth audio is just terrible tech that hasn't improved by much in the last 10 years, and still can't do more than 16kHZ when doing both input and output at the same time.
I think this isn't a problem if you're using Apple headphones with Apple devices, but anything else falls back to crappy BT quality, usually with some kind of terrible ANC to boot.
FOr me, crappy audio setups and apps trying to do too much audio processing are the primary reason of "Zoom fatigue". I've done a lot of calls over apps that transmit raw, high-quality audio with no processing whatsoever, and the experience is just so much better.
1 reply →
Plenty of good headsets do beamforming with their microphones as well, just depends on what you're running. Macbook mics are well above average, though, so I agree in most cases they'll be better unless you're picky about your headset mic quality.
12 replies →
I hope you do not take notes or brush dust off the macbook whilst in a video call.
5 replies →
Wow not my experience at all.
The MBP mic is generally preferable to most headset boom mics in my experience with good noise reduction. You also get the benefit of not picking up extraneous mouth noises (gum chewing, coffee slurping, whatever)
I feel like 99% of people I conference with use regular headphones + MBP mic
Main problem with that setup is not being able to hear your own voice in the headphones (feedback, or whatever that's called) which can be annoying sometimes if using NC headphones
> feedback, or whatever that's called
Monitoring.
There are umpteenth ways to do that, and I find headsets themselves do it the most poorly of all (if they have the feature at all).
> The MBP mic is generally preferable to most headset boom mics
Another benefit is not paying the '90s GSM handsfree BT profile codec pain (at the cost of A2DP having slightly higher latency)
1 reply →
Actually my complaint relates to open office designs, the macbook mic picks up louder people from across the room. So if I do use headphones and the MBP mic, other people will hear random noise blurbs from anywhere in the office .
1 reply →
I don't think I recall having a meeting with anyone using plain headphones with the laptop mic instead of a headset of some kind. Wired headphones without a mic are somewhat unusual nowadays to begin with outside audio file circles.
AirPods of various versions is common, as is many other buds. Enterprise headsets like those from EPOS (the old Sennheiser Communication) and Jabra (with or without boom) and speakerphones are common in corporate settings, casual headsets (e.g., Sony, Bose) and wired gaming headsets are common at home.
Well it is simple if you use the whole package as delivered (although Apple has been straying off the road it paved for quite a while now).
The point is, everything they make is vertically integrated. They want to deliver a feature (like Airdrop or Continuity), they will cut across the stack to get it done. If you go the DIY route (which is what Asahi is effectively all about), you get to also DIY the missing software pieces.
The upside is that the entire ecosystem gets to benefit from that work (see e.g. the new DSP in PipeWire). PC hardware is generally crap, and so is Apple's if you omit these extra bits. But "the whole package" sets the bar quite a bit higher. I want to see the FOSS ecosystem meet that bar.
The three-mic array is also found in Intel-based Retina MacBooks, so this might also be useful for proper audio support on that older hardware. (Some early Retina MacBook Pros have a two-mic array only, but most have the full three-mic array.)
Because most mics are still using Bluetooth 5.0 I use the microphone on my Mac even when I'm wearing a headset. Otherwise, it puts me into a weird codec mode of ancient history where I get downgraded to a low bit rate and even my audio input to my ears sounds horrible then. So I always use the Mac microphone when possible.
It's more annoying on Linux where you have to manually switch... at least most apps in windows/mac will automagically put my headset in the correct mode.
I always set my microphone to MacBook's even when wearing a headphones, because the quality is incredibly good even in noisy environments. In Zoom I also set "Original sound for musicians" on if in a quiet location. So much more natural sound.
I always prefer headset too, but I did find it striking how good the audio quality of the built in mic was compared to headset when I tried it once..
I exclusively use the built-in microphone for work meetings. I don't even have any other work-issued microphone unless we count my phone.
You can get surprisingly good results from cheap laptop hardware (as well as fancier hardware like an MBP) using software DSP techniques. One of the things I'm pleased about is that quite a bit of Asahi's audio work is just as applicable to generic laptops as it is to Macs.
I already use the Bankstown bass harmonics synthesis plugin developed for Asahi and a convolution EQ on a cheap HP laptop, with startlingly impressive results, using the Pipewire plugin chain autoload feature also developed for Asahi.
I suspect there are quite a few use cases for this beamformer outside of the Asahi ecosystem as well.
Regarding the SIMD optimizations, the authors may want to look into faer. I haven't had a great experience with its underlying library pulp, as I'm trying to things that go beyond its linear algebra roots, but if the goal is primarily to accelerate linear algebra operations, I think it will go well.
I've got a blog post and associated podcast on Rust SIMD in the pipeline, we'll touch on this.
[1]: https://docs.rs/faer/latest/faer/
Github repo https://github.com/chadmed/triforce
Thanks! We've changed the URL above to that from https://crates.io/crates/triforce-lv2.
> the microphone array found in the following Apple Silicon laptops: > MacBook Pro 13" (M1/M2) > MacBook Air 13" (M1/M2) > MacBook Pro 14" (M1 Pro/Max, M2 Pro/Max) > MacBook Pro 16" (M1 Pro/Max, M2 Pro/Max) > MacBook Air 15" (M2)
Does it mean M2/M3 don't have similar array of microphones or rather not tested?
I'm even curious if this is only supported on Linux or MacOS as well - not sure if apple provides dedicated microphone stream for each mic?
It's made just for Asahi Linux. MacOS does some very similar beamforming math behind the scenes, so it just presents you with a single unified mic.
They list M2 devices. M3 is just not supported by Asahi Linux, so not being listed is just orthogonal to if M3 has any mics like this.
MacOS has its own software deep within the system for handling this; it's only exposed as a normal microphone to application software.
asahi Linux doesn’t yet support m3 and m4 processors.
There is a more general discussion on the latest Asahi Linux progress report.
> Unfortunately, PDM mics are very omnidirectional and very sensitive. We cannot get by without some kind of beamforming.
https://asahilinux.org/2025/03/progress-report-6-14/
Also, it turned out that some previous work done for the speaker output was reused here for mic input.
> Thanks to the groundwork laid in PipeWire and WirePlumber for speaker support, wiring up a DSP chain including Triforce for the microphones was really simple. We just had to update the config files, and let WirePlumber figure out the rest!
Could the author of this package comment on this statement? I'd be really interested in their opinion of their speaker implementation.
What's overly complicated there? The hardware? The software?
As a MBP user and hobbyist audio guy I've been really impressed with the implementation of those speakers, particularly on the larger MBP models.
But I'm just a hobbyist and don't have any knowledge of them other than the driver arrangement (tweeter + dual opposed woofers). It certainly seems like they're pulling the same tricks used by "good" bluetooth speaker designers in order to wring acceptable perf and bass extension from teeny tiny speakers (adaptive EQ etc)
Getting reasonable speaker support in Asahi Linux was a big deal. Part of the problem is that limiting the power usage to prevent overheating requires sophisticated DSP. Without that, you get very limited volume output within safe limits.
Probably the best overview to find out more is here: https://github.com/AsahiLinux/asahi-audio
wow I'm surprised overheating is the bottleneck, I would've assumed clipping would damage the drivers before that
3 replies →
> Much like with the speakers, Apple are trying way too hard to be fancy here
It is just a reference that Apple Laptop speakers have been waaay above anything the competition uses - and this is true since multiple generations. Had a MBP from 2014 and multiple friends were astonished about the sound when we watched a movie on the go. Same with the M4 MBP - sounds quality from the speaker is at a level that you probably don't actually need.
I feel like this must be some kind of a language barrier thing - the dev’s name appears to be Spanish, so English may not be their native language. And I think that most native English speakers - as demonstrated by multiple comments asking about it in this thread - would interpret “trying too hard to be fancy” as implying “because you can get similar high-quality results without using such sophisticated techniques”; but it seems like you’re saying (and this makes sense) they meant “because getting such high-quality results is overkill for a consumer laptop”.
Language is fascinating - I can convince myself with enough effort that the latter is just as valid as the former, given the literal meaning of the words, but my linguistic intuition is screaming at me that it’s wrong. How does someone ever learn that? How would a textbook ever explain it?
1 reply →
> It is just a reference that Apple Laptop speakers have been waaay above anything the competition uses
More like the opposite. The MacBook speakers are absolutely rubbish, just like all laptop speakers (there's only so much you can do when constrained to a laptop body). The reason why MacBooks sound good is entirely god-tier signal processing which manages to extract extraordinary performance out of some decidedly very ordinary speakers.
https://github.com/AsahiLinux/asahi-audio#why-this-is-necess...
3 replies →
In my experience MBP 2015 sound is pretty thin and high frequencies are prone to clipping at even a moderate volume – soprano vocal parts suffer from this quite a bit. Of course for most uses that’s not a big problem and I’m sure the sound is still much better than that of many other laptops though. But the M series MBP speakers are a crazy improvement.
My guess (without value judgement) is he was referring to the fact that they don't really work without such software
How's hardware supposed to work without software?
6 replies →
Seems like a common pattern lately that apples hardware people continues to be top notch and the software group is slacking.
2 replies →
I have a feeling that this package is for folks that want to run Linux distros on the laptops, and have access to the same capabilities as native MacOS.
I'm confused too. These days, "spatial audio" on speakers (different from on headphones) and beamforming mics is starting to feel standard, at least on premium hardware.
Dumb, noisy, cramped, unbalanced audio just doesn't cut it anymore.
if you think fake 5.1ch sounds better, not like better for enjoying action movies, you've never had exposure to a >$99 pair of bookshelf speakers with a non-USB powered class D amp. change my mind.
2 replies →
For the software to perform beamforming it must be provided the discrete microphone inputs, as opposed to being provided some sort of pre-mixed feed. As such, why is Apple "trying way too hard to be fancy here" if you can just use one of those mics? Or is the alternative that they do the "beamforming" in hardware regardless of the OS?
> if you can just use one of those mics?
They're extremely omnidirectional and very sensitive. With a single mic with no beamforming you get basically all of the sounds from every part of the room, including and especially horribly loud sounds from (eg.) the keyboard and mouse.
Apple selected their microphones based on the constraints their system had (beam formed array) rather than the "usual" laptop microphone which is physically not very sensitive and highly directional towards the front of the laptop, and in turn, those microphones are not particularly useful without beam forming.
Other laptops with beamformed arrays simply don't expose the raw mics to userland, by doing the beamforming in firmware, but this of course comes with its own set of issues.
> Other laptops with beamformed arrays simply don't expose the raw mics to userland
Not always true, back in the Windows XP days (!!!) some laptops would expose the array to software and let the user configure where the mics record from.
It is unfortunate that user control has been polished out of modern systems in exchange for "it just kind of works".
Avoiding extra coprocessor and/or avoiding patent dispute like they did with speakers (which differ from a H-K patent by not having a discrete chip implementing it)
> This is an attempt at a beamformer armed only with first year undergrad level engineering maths and some vague idea of the principles gleaned from various webpages and PDFs
Not certain if OP is saying they are currently an undergrad, but impressive if so
It would be great if this was implemented in a way that also other manufacturers can easily start building mic arrays such that it would make them immediately useful.
I would be surprised if Apple didn't have patents on their mic array, meaning that another manufacturer would ideally prefer if their setup is different and incompatible to reduce the chance of accidental patent infringement.
I'd search to see, but reading patents is an info-hazard which increases your chance of infringing, so I've quit reading them entirely.
Maybe they're doing something new, but beamforming microphone arrays can be found in just about any brand of laptop if you go high end enough.
I do think most such devices will present themselves as less capable than they actually are (I.E. just a stereo input) for maximum OS compatibility, but the technique isn't Apple exclusive as far as I know.
1 reply →
Maybe they can still install the array, and we can simply "apt-get install illegal-package".
But all joking aside, there is a tremendous amount of literature on the mathematics of beamforming. I'd be surprised if any of it is patented in a way that isn't circumventable.
Golly isn't IP great
There is a customer who has deployed beamforming microphones for decades. They do however have a somewhat different goal and medium.
Yes, I'm sure they have some patents because that's what big companies do/have to do. But the basic idea has been around for a long time, not just in audio but also in microwave space/domain. So I'm sure there's plenty of prior art.
ok noob here - what can i use this thing for? a better desktop-only voice app?
is there a reason apple hasn't exposed a higher level api for this given the hardware (mic array) looks like it's already sufficient in macs?
This is how Apple addressed audio hardware and do something similar for speakers. Instead of trying to make speakers that have the desired frequency response or microphones that produce the desired signal, they let the analog hardware do whatever it does.
Then in software they use digital signal processing. For speakers they modify what gets sent to the hardware so that the actual output then does match the frequency response, and for the microphones they do this work to extract the desired signal.
If Linux addressed the speakers as is, you would get unpleasant sound, and if it read the microphones as is, it would get a lot of noise. That is why Asahi had to add digital signal processing to the audio input and output, to get the "correct" audio.
It does mean the processing is specific to the analogue audio hardware in each of the different Mac models.
The processing could be done in additional hardware, but why bother when you have a very good CPU that can do the work.
> For speakers they modify what gets sent to the hardware so that the actual output then does match the frequency response
As I understand, this is not a magic pill: it probably won't help to pull out frequencies which are suppressed by 30-40 dB and I assume that if the frequency response graph is too wavy (lot of narrow peaks and dips), it won't help either.
Also, you need to have calibration files to use this method, right?
4 replies →
Apple did it as a software function so it's not in hardware, hence this implementation for people wanting to run (presumably) Asahi Linux.
your question was non specific so guessing a bit at what you're asking, because some of it is already answered in the docs... but conceptually it's similar how gps triangulation works, but in the other direction, (information flows from the source point, speaker in this case, to the mic array) and with audio waves instead of rf waves. Each mic will have a slightly different view of the audio coming in, and using the timing between them, you can use the wave form that one mic records to figure out what's to early or too late to be audio from directly in front of the laptop. And then delete that audio, leaving just audio from the speaker directly in front of the laptop.
eg
A ------ MIC1 --- B --- MIC2 ------ C
any sound coming from A, will be picked up by MIC1 well before MIC2, same for sounds coming from C. If you delete that audio from the income waveform, you have beam forming. And thus much better audio noise filtering.
And as it says in the link, Apple decided to implement this is software, not hardware, so you'd need to reimplement it if you're not using macos.
Looks like the ability to use MacBook mic when not using Macos
It's a component of Asahi Linux. It's integrated and enabled by default if you have the right laptop.
You should be able to send data and record it in a way that measures local geometry like hands since there's a microphone array like this, interesting.
I can't speak for this implementation, but on MacOS, the beamforming is amazing. When used in a noise office or cafe environment it eliminates background noise to an extent I can always tell if a colleague is using it or their worse headphone mic.
I was sitting at a Starbucks next to a VERY noisy street on a google meet call on an M1 Air with usb-c AirPods (the cheap 19$ one) and I asked the person on the other end if they can hear me at all. To my surprise they couldn’t hear any noise just my voice. No idea which part in the whole setup achieved this but I feel like stuff like AI and all have some applications that can blow you away. Not putting the damn thing in everything!
That could definitely be Google Meet. I think it does some pretty fancy AI background noise reduction.
5 replies →
Correction here: it wasn’t apples AirPods, it was bose quit comfort over the ear iirc. That’s why I could hear the other person. But I think they could hear me cause of maybe both meet and good mic array.
[flagged]
1 reply →
It’s funny how the author has the chutzpah to simultaneously insult Apple and while also failing to replicate what they have done.
You misread. It's more like grudging admiration that Apple took the proprietary software beamforming route. It's a remark to its technical function but acknowledges that because of its closed implementation, the microphones just cannot be used outside of the macOS ecosystem without additional effort (like in this repo).
Which is, as I'm sure you agree, is unfortunate and at least deserving of some (minor) reprisal.
I'm really enjoying this trend of minimal dependencies, but I'm not taking off my tinfoil hat yet.
This is one of the cooler features of Apple Vision Pro, it does such good beamforming for the wearer's mouth that someone could be screaming next to you or blasting music, and other parties on Zoom or FaceTime will not hear them.
I wonder if there's a way to do this in reverse for people who use the speakerphone or play a video in a restaurant.
Is this akin to a phased array RF antenna (like the Starlink dish) but for audio?
Yes, except the output is something that has to sound "subjectively good" after all the DSP, vs rf beamforming where you have a very easy metric (dropped packets) that you can optimize the beamforming direction with.
Thanks!
How to download and install the app? There are no instructions.
> Much like with the speakers, Apple are trying way too hard to be fancy here, and implement an adaptive beamformer in userspace to try and isolate the desired signal from background noise.
Might be fancy, but it does make for surpisingly good audio from a laptops.
Indeed. I can't help but think that anyone thinking Apple is trying too hard to be fancy on something like "audio quality from microphone in a laptop" doesn't quite grasp what Apple's about.
There are many advantages to vertical integration as regards end-user-experience.
Honestly, with speakers it was mainly a patent avoidance thing (patent on essentially the same thing but done with dedicated hardware, doing it with software on "application processor" bypassed the patent claims)
A lot of similar stuff is done in firmware on x86 laptops, to the point that both AMD and Intel now share considerable portion of the stack, with both using Xtensa cores for DSP, with Sound Open Firmware as SDK. When I use built-in microphone array on my laptop, it's parsed through the DSPs "transparently" to end user.
But technically you can load your own firmware there.
Usually you can't load your own SoF firmware, on most hardware it has to be signed by Intel, with exceptions like Chromebooks, where you have to sign it with a "community" key that is publicly available. There was talk of a way for device owners to add keys, but that isn't implemented yet.
https://github.com/thesofproject/sof/issues/5814
"Time Domain Fixed Beamformer (TDFB)" -- https://thesofproject.github.io/latest/algos/tdfb/time_domai... might be relevant here.
If it was just patent avoidance why aren’t there any non-apple laptops either their sound quality? Both the microphones and the speakers are some of the best audio I’ve ever encountered.
4 replies →
Is it the same story with the Apple touchpad? Is the fancy palm rejection implemented completely in software?
1 reply →
It's not just good, I found it to be way better than a standalone shotgun mic connected via USB. I researched this for WFH and found a lot of people saying you were going to spend hundreds to replicate the quality in a more "professional" mic setup. Super impressive.
Does it record a fixed point, or does it do something fancy like using the camera to attempt tracking the user's movement? Just curious, and I don't have access to a modern Mac. The article seems to imply that it's focusing on a fixed point.
1 reply →
As someone looking to replicate it from a pro mic setup, what do people recommend?
I've been trying to record audio in my noisy server room but only deepfilternet is able to deal with the fan noise in the background.
4 replies →
I think the title should say "for asahi linux", else it's misleading.
of course Apple has this implemented.
Incomplete or 'not 100% obvious' is not really 'misleading'. Titles don't say everything about a story or we wouldn't need stories.
https://hn.algolia.com/?dateRange=all&page=0&prefix=true&que...
Does it not also run on macOS? It could be useful if you want to tweak usage of the mic array yourself, rather than relying on proprietary magic that gives you the output it deems best
Well, if it's available on crates.io, I guess nobody will think that it's from Apple. Also, it could conceivably be used in other software besides Asahi too...
narrator: it wasn't. :)
1 reply →
> Much like with the speakers, Apple are trying way too hard to be fancy here, and implement an adaptive beamformer in userspace to try and isolate the desired signal from background noise.
That’s a rather unfair characterization. I’ve found the array to work very well in practice. It’s hardly trying to hard.
Perhaps the author meant "Apple is succeeding too hard". :-)
They are atrocious, IME. I continually get near muted. I.e., if I record the signal, my voice is there, but extremely faint. Unusable for VC audio, and I've moved completely to a headset mic because of it.
[flagged]
/_\/_\
Triforcin'
I have no idea what any of this means
I find this kind of thing a good case for LLMs as they can dumb down the technical jargon:
From Gemini:
```
Imagine you're trying to record someone talking in a noisy room using your MacBook's built-in microphones. This software acts like a super-smart filter:
* It knows where the microphones are: Apple laptops have multiple tiny microphones.
* It listens to all of them at once: It takes the input from all the microphones.
* It figures out where the person talking is: It analyzes the sound to find the direction of the voice.
* It focuses on that voice: It boosts the sound coming from that direction.
* It quiets down the other noises: It reduces the sound from other directions, like background chatter.
So, instead of getting a muddy recording with lots of noise, you get a clearer recording of the person you want to hear. Basically, it makes your MacBook's microphones sound much better in noisy environments. And it's designed to work within audio programs that use a specific plugin format called LV2.
```