← Back to context

Comment by PaulDavisThe1st

5 hours ago

Theoretically, it makes sense (go look at any of the diagrams of what a "zero crossing" is online, and it totally does.

The problem is that sign(x[n-1]) != sign(x[n]) describes a place where two successive samples differ in sign, but no sample is actually has a value of zero. Thus, to perform an edit there, if your goal is to avoid a click by truncating with a non-zero sample value, you need to add/assign a value of zero to a sample. This introduces distortion - you are artifically changing the shape of the waveform, which implies the introduction of all kinds of frequency artifacts.

Zero crossings are not computed by finding a minimum between two consecutive samples - that would almost never involve a sign change. And if they are computed by finding the minimum between two consecutive samples that also involves a sign change, there's a very good chance that you'll be long way from your desired cut point, even if you ignore the distortion issue.

It really was a completely misguided idea. If the situation was:

     sign(x[n-2) != sign(x[n]) && x[n-1] == 0

then it would be great. But this essentially never happens in real audio.

> Thus, to perform an edit there, if your goal is to avoid a click by truncating with a non-zero sample value, you need to add/assign a value of zero to a sample.

No, you (the editor, not an algorithm) look at the waveform and see where the amplitude begins to significantly oscillate and place the edit at a reasonable point, like where the signal is near the noise floor and at a point where it crosses zero. There's no zero stuffing.

This kind of thing isn't computed, a human being is looking at the waveform and listening back to choose where to drop the edit point. You don't always get it pop-free but it's much better than an arbitrary point as the sample is rising.

I mean, you could use an algorithm for this. It would be a pair of averaging filters with like a VAD, but with lookahead, picking an arbitrary point some position before activity is detected (peak - noise_floor > threshold)) which could be where avg(x[n-N..n]) ~= noise_floor && sign(x[n]) != (sign(x[n-1]).

  • > You don't always get it pop-free but it's much better than an arbitrary point as the sample is rising.

    I agree with this, but that doesn't invalidate anything I've said. When you or a bit of software decide to make the cut at x[n], you are faced with the near certainty that the x[n] != 0. If you set it (or x[n+1]) to zero, you add distortion; if you don't, the risk of a pop is significant.

    By contrast, if you apply a fade, the risk of getting a pop is negligible and you can make the cut anywhere you want without paying attention to 1 sample-per-pixel or finer zoom level and the details of the waveform.