Introduction
Since 2017, Espressif is working on integrated hardware and software for audio acquisition and processing which is available through different boards and kits and a software framework. Espressif is targeting smart speaker applications with this, like Alexa and her friends.
The ESP32 is 2.4 GHz Wi-Fi and Bluetooth combo, 32 bit dual core chip running up to 240 MHz, designed for mobile, wearable electronics, and Internet-of-Things (IoT) applications.
It has several peripherals on board including I2S interfaces to easy integrate with dedicated audio chips. These hardware features together with the Espressif Audio Development Framework (ESP-ADF) software provide a powerful platform to implement audio applications including native wireless networking and powerful user interface.
The ESP-ADF provides a range of API components including Audio Streams, Codecs and Services organized in Audio Pipeline, all integrated with audio hardware through Media HAL and with Peripherals onboard of ESP32.
Hardware
ESP32-WROOM module series
The modules from the ESP32 WROOM Series are powerful, generic Wi-Fi+BT+BLE MCU modules that target a wide variety of applications, ranging from low-power sensor networks to the most demanding tasks, such as voice encoding, music streaming and MP3 decoding. At the core of this module is the ESP32-D0WDQ6 chip.
Modules in the ESP32 WROOM Series contain the ESP32 SoC, flash memory, precision discrete components and a PCB/IPEX antenna which achieves outstanding RF performance in space-constrained applications.
All modules in the ESP32 WROOM Series are suitable for commercial application development with a robust 4-layer FCC, CE ( RED ), IC, TELEC, SRRC & KCC-compliant design and a wide operating temperature range of -40°C to 85°C.
– ESP32-WROOM Datasheet
– ESP32 Modules and Boards » ESP32-WROOM-32D / ESP32-WROOM-32U
ESP32-LyraT board
Introduction
The ESP32-LyraT development board is a hardware platform designed for the dual-core ESP32 audio applications, e.g., Wi-Fi or BT audio speakers, speech-based remote controllers, smart-home appliances with audio functionality(ies), etc.
– https://docs.espressif.com/projects/esp-adf/en/latest/design-guide/board-esp32-lyrat-v4.3.html
– https://docs.espressif.com/projects/esp-adf/en/latest/get-started/get-started-esp32-lyrat.html
ESP32-WROVER module
The ESP32-WROVER module contains ESP32 chip to provide Wi-Fi / BT connectivity and data processing power as well as integrates 32 Mbit SPI flash and 32 Mbit PSRAM for flexible data storage. This is a step upgrade of ESP32-WROOM-32x modules with an additional 8MB SPI PSRAM (Pseudo static RAM).
– ESP32-WROVER Datasheet
– ESP32 Modules and Boards » ESP32-WROVER
Audio Codec Chip
The Audio Codec Chip, an ES8388, is a low power stereo audio codec with a headphone amplifier. It consists of 2-channel ADC, 2-channel DAC, microphone amplifier, headphone amplifier, digital sound effects, analog mixing and gain functions. It is interfaced with ESP32-WROVER Module over I2S and I2S buses to provide audio processing in hardware independently from the audio application.
Imagery
ESP32-LyraT V4.3 Board Layout
ESP32-LyraT V4.3 Electrical Block Diagram
ESP32-LyraT Functional Block Diagram
In the press
ESP32-A1S audio development kit
Based on the A.I. Thinker ESP32-A1S module, which is Ai-Thinker’s equivalent to Espressif’s ESP32-WROVER series module. Clocked at up to 240MHz with 520 KB SRAM, and 8MB on-module PSRAM.
Audio hardware
- 3.5mm headphone jack for stereo audio output
- 3.5mm LINE-IN jack
- 2x 2-pin header for left and right speakers up to 4Ω/3W output
- 2x built-in microphones
Imagery
In the press
Software
The hardware gets supported by the Espressif ESP-ADF Audio Development Framework for ESP32.
Documentation
The Espressif Audio Development Framework (ESP-ADF).
Introduction
The API provides a way to develop audio applications using Elements like Codecs, Streams or Filters.
The application is developed by combining the Elements into a Pipeline.The audio data is typically acquired using an input Stream, processed with Codecs and Filters, and finally output with another Stream. There is an Event Interface to facilitate communication of the application events. Interfacing with specific hardware is done using Peripherals.
When reading about the pipeline-based architecture of the ESP-ADF software framework, this really feels like a trimmed down verbatim copy of GStreamer:
The [GStreamer] framework is based on plugins that will provide the various codec and other functionality. The plugins can be linked and arranged in a pipeline. This pipeline defines the flow of the data.
Pretty cool.