Preliminary experiments for a time-domain noisiness estimator based on zero-crossing rate

Zero-crossing rate (ZCR) has successfully been implemented for the recognition of voiced and unvoiced speech (see for example [Shete et al. 2014]) or the detection of percussive sounds (see for example [Gouyon et al. 2000]). The reason is that ZCR has a correlation to noisy sounds as they are usually broadband signals with high-frequency components, which indeed result in a higher ZCR compared to that of voiced speech (typically within 2kHz) or non-percussive sounds.

Noisiness is another sound feature which I am planning to use in my future works so I started experimenting with ZCR to see if that information by itself, given its correlation to noisy sounds, would have been enough for extracting that feature and obtain a somewhat representative index.

The first test was to compare the ZCR of a sinusoid at a certain frequency with that of a band-limited noise centred around the same frequency. I would have expected to have a higher ZCR in the second signal, although what I found out is that, in some cases, the ZCR in the noisy signal was actually slightly less than that of the sinusoidal one. That was initially surprising, but intuitively I thought that it was normal considering that the noisy signal had components at a lower frequency too. Basically, the ZCR itself wasn’t enough to get an estimation of the noisiness in a more general case, although it still contains the necessary information to obtain that.

What characterises noise is its non-periodicity, and this is represented by the variations in the ZCR rather than the ZCR itself. I though that the first derivative of the ZCR could have provided an acceptable result, and what follows is the description of a Pure Data patch which is satisfying at this preliminary stage.

The patch is unfortunately not compatible with the Vanilla release of Pure Data because I needed the ZCR object [zerox~], which is in the Pure Data Extended. [zerox~] outputs the number of zero-crossings in each vector from its left outlet, or a full-amplitude impulse each time a zero-crossing is detected from the right outlet. Here I am using the impulses in order to have the ZCR in the range [0;1], namely by averaging the impulses using a low-pass filter over a 10-msec window. (By the way, the average of the impulses will also tell you what the frequency of a sinusoidal sound is with a good degree of accuracy.) In order to obtain the degree of change in the ZCR I use a differentiator over successive averaging windows. This is nothing but subtracting the current average and the previous one. Considering that the direction of the variation is not relevant, the absolute value is taken. The resulting signal is then smoothed out using a 1-Hz low-pass filter, normalised to approximately 1 using white noise as a reference input signal, and clipped in order to avoid any possible value exceeding 1. This way the resulting signal should be a [0;1] index where, in theory, 0 represents a perfectly periodic signal, and 1 represents a totally non-periodic one. Lastly, this range has been mapped over a logarithmic scale to have a closer relationship with what the perception seemed to be.

The algorithm is still being tested and it will surely need some adjustments, although its behaviour seems to be correct for most of the cases. I’m not sure whether the screenshot of the patch is readable from this post, if not you can have a look at the patch here.