Audio acquisition and analysis with ARM Cortex-M0 and I2S microphone


Dear Clemens,

LoRa and LoRaWAN (via LMIC) are totally different beasts when it comes to scheduling / task lifecycles. Please take that into account. It’s actually not about parallelism (there isn’t actually any), it’s all about timing to get things right.



Thanks for letting us know. Will be happy to support you further if we are through this. Are there any things around reading from I2S I can provide better assistance for? If not: Good luck figuring things out and please don’t hesitate to contact me for even more rubber duck debugging - it really worked well for the last time, right?


I assume it’s really just that you are calling I2S.begin() after I2S.end() which most probably makes things go south when doing this too quickly. Please get rid of this for the first tests to achieve stable and deterministic results of this building block which are absolutely required before assembling things into larger systems.

I have recognized your motivation for doing this, but still think it fits into the premature optimization category regarding deep sleep mode - which you really should approach differently than by putting I2S.begin() / I2S.end() into the measurement cycle.


Hi Andreas. I continued some test with the Adafruit_Zero FFT and I2S libraries. I was utterly confused for a while, but then it turned out that something was wrong with the hardware (I should perhaps up my ESD safe working habits) as the microphone output had no relation anymore to the sound coming into it. I had another one (ICS43434 this time) sitting around, and with that one things made sense again.

This one doesn’t show the jumps in the spectrum seen earlier. It looks promising. Though, the difference is not understood. contains the test-code. If you’d like to have a look. I’d welcome comments!

1 Like

Hi @clemens and @Andreas,

A small update: I did extensive testing with the Adafruit_ZeroI2S library and the new ICS43434 and I don’t see any of the issues noticed earlier with the Arduino I2S library:

  • The spectrum looks nice and clean, without these half, quarter, etc, jumps shown earlier in this thread.
  • The spectrum looks the same for code w/ and w/o LMIC.
  • LMIC joins the network nicely, even if I don’t disable the I2S code temporarily.

W.r.t. to the latter comment: on this particular hardware the issue of not being able to join TTN with I2S active was not seen anymore. In contrast to an earlier set of hardware with different I2S microphone and perhaps different crystal in the MCU. So, it’s not to say if Adafruit_ZeroI2S.h is better or worse in that respect.

One more comment: Adafruit_ZeroI2S.h doesn’t implement DMA for the SAMD21. I would imagine that for small sample sizes, recorded in an intermittant non-streaming manner, DMA is not really needed. @Andreas, does that thinking make sense?

Finally, Adafruit_ZeroFFT.h does its magic on 16 bit data, whereas the microphone delivers 24 bit data. Compressing the range of the microphone data in 16 bits is easy enough, so not really an issue. For the moment I am quite happy that it now finally seems to work fine, so I won’t yet start looking into the FFT algorithms provided in arm_math.h that operate on q31_t type data.

UPDATE: GitHub - wjmb/FeatherM0_BEEP is now public. Please note that it is work in progress code. Please comment if you’d feel it could be improved in one way or another!

Best regards,



Dear Diren and others,

@Diren, thanks for the BeABee files ! I tested my code quite a bit and feel confident that it works as needed, so yesterday I played “miks boden-04-may-16-00.wav” via a headphone to my node in order to check how the thing would respond to real bee-sounds. While looking at the data coming out, I was wondering about signal to noise and how to deal with that.

Although I have no way to calibrate the sound pressure level coming out of the headset into the microphone, I do believe I played it at a reasonable volume. I sample the microphone with 6250 Hz and take 1024 samples. On that data I do a FFT. The spectrum looks quite noisy; in fact, too noisy to do the analysis of comparing different bins and getting consistent results. Here is a picture, to give you a sense of the signal:

In previous tests, my remedy to this was to take a number of spectra and just average over these. But, how many spectra to take and average over? The picture below shows averaging over 10 and 100 spectra.

Is there some way to determine when the signal to noise is decent enough? Are there other methods than simply taking n spectra and lumping them together? Something that doesn’t require a lot of memory.

Probably this is simple stuff for anyone skilled in the art of dealing with noisy data … I would love to get a few pointers on what to try!

Best regards,



Dear Wouter,

after some trips to other clearings in our ecosystem I’m just coming back to this. Thank you so much for the effort and spirit you are putting into this and great to hear you have been able to make that kind of progress.

Unfortunately, I’m not able to go through all the details right now and I don’t even know if I would have appropriate answers on the analysis aspects. If you feel there are still infrastructural aspects you might need my assistance for, please let me know.

Thanks for that!

About the other things you were asking:

I hear you. However, I’m not a signal guy either. Would computing the RMS level make some sense at this place? Maybe @weef or @tiz could contribute some suggestions here? Thanks already!

With kind regards,


Hi Wouter and everybody,

thanks for your good work!

In fact, the results of my FFT also look quite noisy (black line).
Especially for plotting, I only save the mean value of 20 Hz frequency bands (red dots).

The plot shows the spectrum of the first 10 seconds of “miks boden-04-may-16-00.wav”.
I think we did not use the same audio snippets, but it seems that your approach works well and the results make sense, even though you rerecorded the audio!

I would say this really depends on the method you want to use. As you showed, it is important for the algorithm developed by the people in Kursk.
It also really depends on the amount of data that can be send via LoRa and if you want to run a Machine Learning algorithm at the node or some server. Deep Learning algorithms should be able to deal with a “bad” (low) signal to noise ratio. If they are pretrained, they could even run at the node.
Also, if you just use the spectrum to calculate something like the mean frequency, the signal to noise ratio is not that important.
Potentially, the noise even contains some information.

Probably yes, but I guess if your approach works, it works ;)
Not sure how much memory you want to use, but in general possible approaches could be bin smoothing, polynomials or some sort of splines, e.g. penalised regression splines.



Hi @Diren and others,

In the end I decided to use simple averaging of spectrograms. The node records at 6250 Hz for 1024 data points. From this audio data the FFT is calculated (in 6.1 Hz wide bins). This is done some number of times (e.g. 100) and these FFT’s are averaged. In the figure below this is shown as “fine spectrogram”, which is kept internal to the node and not sent over LoRaWAN. Instead, from that average spectrogram, several quantities are calculated, as listed in the graph. Among these is the “course spectrogram”, which is obtained by lumping eight “fine” bins together into one “course” bin.


Somehow I have pretty high hopes for the s_fmax quantities, which represent the one global and another four local maxima in the spectrogram. It will be interesting to see how these evolve over time after I insert the node into one of my hives.

Best regards,


1 Like

Hi, here is some first data coming out of my new node put together in one overview by means of GnuPlot. I just put all there is in there, so it has become a bit complex.

20190506_Feather_M0_LoRa_data.pdf (207.8 KB)

A few comments:

  • the top graph shows the spectrum in false colours, as before.

  • the points superimposed on it are the local maxima in the spectrum at a given time: light coloured open square is the global maximum, the somewhat darker and smaller open circle is the second maximum, the yet darker/smaller diamond is the third, smaller pentagon the fourth and the darkest dot the fifth maximum. The continuous line is the spectral centroid.

  • In the middle pane the (uncalibrated) sound pressure level in dB. Warble is the fraction of signal in the 225-285 Hz spectral range. Rugosity is a measure for the “roughness” of the sound. Entropy is the spectral entropy in bits.

  • In the bottom pane the pressure is just the air pressure. Temperature and humidity are measured inside the hive too.

The weather was quite dark and cold last week. I think it is nice to see that a bump in humidity seems related to nectar flow (as also know from literature). Furthermore, some of the spikes in the audio data are related to occasional hail or rain.