Since 2017, Espressif is working on integrated hardware and software for audio acquisition and processing which is available through different boards and kits and a software framework. Espressif is targeting smart speaker applications with this, like Alexa and her friends.
The ESP32 is 2.4 GHz Wi-Fi and Bluetooth combo, 32 bit dual core chip running up to 240 MHz, designed for mobile, wearable electronics, and Internet-of-Things (IoT) applications.
It has several peripherals on board including I2S interfaces to easy integrate with dedicated audio chips. These hardware features together with the Espressif Audio Development Framework (ESP-ADF) software provide a powerful platform to implement audio applications including native wireless networking and powerful user interface.
The ESP-ADF provides a range of API components including Audio Streams, Codecs and Services organized in Audio Pipeline, all integrated with audio hardware through Media HAL and with Peripherals onboard of ESP32.
ESP32-WROOM module series
The modules from the ESP32 WROOM Series are powerful, generic Wi-Fi+BT+BLE MCU modules that target a wide variety of applications, ranging from low-power sensor networks to the most demanding tasks, such as voice encoding, music streaming and MP3 decoding. At the core of this module is the ESP32-D0WDQ6 chip.
Modules in the ESP32 WROOM Series contain the ESP32 SoC, flash memory, precision discrete components and a PCB/IPEX antenna which achieves outstanding RF performance in space-constrained applications.
All modules in the ESP32 WROOM Series are suitable for commercial application development with a robust 4-layer FCC, CE ( RED ), IC, TELEC, SRRC & KCC-compliant design and a wide operating temperature range of -40°C to 85°C.
The ESP32-LyraT development board is a hardware platform designed for the dual-core ESP32 audio applications, e.g., Wi-Fi or BT audio speakers, speech-based remote controllers, smart-home appliances with audio functionality(ies), etc.
The ESP32-WROVER module contains ESP32 chip to provide Wi-Fi / BT connectivity and data processing power as well as integrates 32 Mbit SPI flash and 32 Mbit PSRAM for flexible data storage. This is a step upgrade of ESP32-WROOM-32x modules with an additional 8MB SPI PSRAM (Pseudo static RAM).
Audio Codec Chip
The Audio Codec Chip, an ES8388, is a low power stereo audio codec with a headphone amplifier. It consists of 2-channel ADC, 2-channel DAC, microphone amplifier, headphone amplifier, digital sound effects, analog mixing and gain functions. It is interfaced with ESP32-WROVER Module over I2S and I2S buses to provide audio processing in hardware independently from the audio application.
ESP32-LyraT V4.3 Board Layout
ESP32-LyraT V4.3 Electrical Block Diagram
ESP32-LyraT Functional Block Diagram
In the press
ESP32-A1S audio development kit
Based on the A.I. Thinker ESP32-A1S module, which is Ai-Thinker’s equivalent to Espressif’s ESP32-WROVER series module. Clocked at up to 240MHz with 520 KB SRAM, and 8MB on-module PSRAM.
- 3.5mm headphone jack for stereo audio output
- 3.5mm LINE-IN jack
- 2x 2-pin header for left and right speakers up to 4Ω/3W output
- 2x built-in microphones
In the press
The hardware gets supported by the Espressif ESP-ADF Audio Development Framework for ESP32.
The audio data is typically acquired using an input Stream, processed with Codecs and Filters, and finally output with another Stream. There is an Event Interface to facilitate communication of the application events. Interfacing with specific hardware is done using Peripherals.
The [GStreamer] framework is based on plugins that will provide the various codec and other functionality. The plugins can be linked and arranged in a pipeline. This pipeline defines the flow of the data.
Elements of the Audio Development Framework
Sample Organization of Elements in Audio Pipeline