Developing Saraswati: Getting started with GStreamer

Introduction

This is about how to acquire and process audio data from hardware or other sources.

Thoughts

There are different valid variants for designing such a system. In the following posts, we want to give an overview about some of them and finally will converge into the path we want to take.

To make up some pillars for better distinction, let’s split the general discussion into two families of ways of audio acquisition:

  • Variant A: On top of Linux
  • Variant B: Just the bare metal

Variant A.1: Use standard Unix tools

First, we have been looking at the obvious, basic variant based on arecord and optionally sox and flac, as can already be seen at GitHub - opensourcebeehives/DataLogger: Long term audio and video datalogger or others. We know others running such a setup and have heard about mixed results.

Considerations

Pros

The software is very basic, can be adapted to own needs and the whole system can be brought into the field very quickly.

Eventual improvements

For making such a thing multi-channel and platform-grade, we would need to wrap the mechanics of the Bash program AVrecord.sh interacting with arecord into a more solid process manager. This would certainly be doable. Additionally, we would improve the program a bit on the upload side by using rsync or similar tools. Easy.

Cons

But then, how to control this thing when rolled out onto 20 machines or so? SSHing into each machine? Further, how do we monitor internals of the system regarding detected hardware, loss of connectivity or other parameters? After operating this for a while, one might think about sending control commands to reduce the recording time and just send chunks of data to the downstream data collector.

Variant A.2: Use a software library

When using a software library, we don’t need to juggle external processes which could get tedious when running such a chain of programs for 16 audio channels or so.

However, we would probably have to step into multithreading. To the rescue, GitHub - y0va/hummingpy: Python sound spy on RPI3 for bee hives has done this already in humming.py.

However, this is running the audio data through Python itself. Would it be possible to handle 16 channels or more in this manner?

Variant A.3: Use a software framework

After @weef brought up GStreamer, which would essentially solve all our requirements without further ado, we didn’t hesitate to start with some evaluations into this direction.

One could easily run a flexible recording, transcoding and submission pipeline through "gst-launch", the swiss-army knife of GStreamer. However, we would like to control this at runtime.

In the source code repository of Saraswati, we started with an example using the GStreamer Python bindings.

Some updates