Speech Generation

Part 4 of the Speech Recognition System Series

This project is a stage-4 audio generator for the Speech_Capture pipeline, running on a Raspberry Pi Pico (RP2040).

It joins the same I2C bus used by stage-2 and stage-3 devices.
It operates as an I2C slave at address 0x65.
It receives control/features from stage-3 (Speech_Recognition_Translator).
It runs a single-channel reverse neural network on Core 1.
It reconstructs a 16 kHz audio stream from 40 spectral bins on Core 0.
It stores generated spectra in a 40 x 100 byte output image (40 bins × 100 lines).
It outputs audio through an 8-bit GPIO DAC (R-2R ladder or resistor network).
It supports spoken system prompts that can be used by robots as a voice interface.
It also supports the new-word learning loop, where the system can ask the user how an unrecognized word is spelled and which language it belongs to.
Because this reverse network is trained from data aligned to Speech_Process_8bit_relu, it is expected to develop a speaking accent similar to the dominant accent present in that stage-2 training set.

What This Stage Does

Think of this module as the inverse of Speech_Process_8bit_relu:

Stage 2 (Speech_Process_8bit_relu) maps spectral features → phoneme-like outputs.
Stage 4 (Speech_Generation) maps control/features → spectral bins → time-domain audio-like output.

This implementation is intentionally beginner-friendly and modular. It provides a complete scaffold and register interface so you can improve model quality, DSP quality, and waveform reconstruction over time.

In addition to speech output, this stage is part of the interactive vocabulary-learning process for unknown words: it can generate prompts that help collect spelling and language metadata for dictionary updates.

Core Architecture

Core 0 (I2C + Synthesis + GPIO DAC)

I2C slave register handling (0x65)
Register-based configuration and live data I/O
Per-line inverse-transform rendering (40 bins -> 256 samples)
Playback pointer stepping at 62.5 lines/s (256 samples/line)
8-bit sample output across GPIO 0-7

Core 1 (Reverse Neural Network)

Receives input feature buffer from Core 0
Runs 1-hidden-layer 8-bit fixed-point NN
Produces 40 output bins (500-5500 Hz)
Returns bins to Core 0

Hardware Pinout

I2C Slave Bus (to Stage 3 master)

GPIO	Function	Notes
20	SDA	I2C0 SDA
21	SCL	I2C0 SCL

8-bit DAC GPIO Output

GPIO	Bit
0	DAC bit 0 (LSB)
1	DAC bit 1
2	DAC bit 2
3	DAC bit 3
4	DAC bit 4
5	DAC bit 5
6	DAC bit 6
7	DAC bit 7 (MSB)

Use these pins with an R-2R resistor ladder or weighted resistor DAC network, then route into an audio amplifier/filter.

Beginner R-2R DAC Wiring (8-bit, GitHub-friendly diagram)

Use a simple R-2R ladder so GPIO 0-7 become one analog output.

Recommended starter values:

R = 10 kΩ
2R = 20 kΩ

ASCII concept diagram (MSB on left, LSB on right):

GPIO7 (MSB) --R--+--2R--+--2R--+--2R--+--2R--+--2R--+--2R--+--2R--+----> DAC_OUT
                 |       |       |       |       |       |       |       |
GPIO6 -----------R-------+       |       |       |       |       |       |
                         |       |       |       |       |       |       |
GPIO5 -------------------R-------+       |       |       |       |       |
                                 |       |       |       |       |       |
GPIO4 ---------------------------R-------+       |       |       |       |
                                         |       |       |       |       |
GPIO3 -----------------------------------R-------+       |       |       |
                                                 |       |       |       |
GPIO2 -------------------------------------------R-------+       |       |
                                                         |       |       |
GPIO1 ---------------------------------------------------R-------+       |
                                                                 |       |
GPIO0 (LSB) -----------------------------------------------------R-------+

Ladder end termination: final 2R to GND

Practical analog output chain for robots:

DAC_OUT → small RC low-pass filter (for example 1 kΩ + 10 nF)
Filter output → audio amplifier input (or powered speaker module)
Keep grounds shared between Pico and amplifier

Safety note for beginners:

Do not connect speaker directly to GPIO pins.
Always use the resistor network first, then an amplifier/buffer stage.

Register Control Interface

The device uses a register system similar to Speech_Process_8bit_relu.

See REGISTER_MAP.md for the full table.

Key registers:

0x00 Control/Status (16-bit)
0x02 Input pointer
0x03 Input data (auto-increment)
0x04 Trigger NN run
0x06 Bin pointer
0x07 Bin data (auto-increment)
0x10 Image line pointer
0x11 Image line data (40-byte row)
0x12 Phoneme ID for generation
0x13 Generation/training command
0x14 Feedback score
0x15 Training target phoneme

Training flow (host-driven):

Write one phoneme ID and trigger image generation.
Read 100 lines × 40 bins from image buffer.
Feed lines to Speech_Process_8bit_relu channel 2 for scoring.
If target confidence is below 80%, write feedback and trigger one backprop step.
Repeat until confidence threshold is reached.

Playback conversion details:

Each image line is converted into a 256-sample audio block.
Output samples are stepped at 16 kHz, so each line consumes 16 ms.
This yields exactly 62.5 lines per second.

Build

cd Speech_Generation
mkdir -p build
cd build
cmake ..
ninja

UF2 output:

build/Speech_Generation.uf2

Flash

picotool load build/Speech_Generation.uf2 -fx

Files

speech_generation.c - Main firmware (I2C slave, NN, synthesis, DAC output)
REGISTER_MAP.md - I2C register reference
ARCHITECTURE.md - Deeper design notes for learners
QUICKSTART.md - Practical bring-up steps

Beginner Notes

Start by writing known values into the spectral bin registers (0x06, 0x07).
Observe GPIO DAC output with an oscilloscope.
Then trigger NN inference using register 0x04.
Iterate on weight values through weight registers to understand how output changes.

Part of the Speech Recognition System

Speech_Recognition_AudioCapture - beamforming + FFT
Speech_Process_8bit_relu - feature/phoneme NN processing
Speech_Recognition_Translator - sequence/word translation + control
Speech_Generation (this project) - reverse synthesis to audio output

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.gitignore		.gitignore
.markdownlint.json		.markdownlint.json
.markdownlintignore		.markdownlintignore
ARCHITECTURE.md		ARCHITECTURE.md
CMakeLists.txt		CMakeLists.txt
DOCS_MARKDOWN_CHECKLIST.md		DOCS_MARKDOWN_CHECKLIST.md
QUICKSTART.md		QUICKSTART.md
README.md		README.md
REGISTER_MAP.md		REGISTER_MAP.md
pico_sdk_import.cmake		pico_sdk_import.cmake
speech_generation.c		speech_generation.c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Speech Generation

Part 4 of the Speech Recognition System Series

What This Stage Does

Core Architecture

Core 0 (I2C + Synthesis + GPIO DAC)

Core 1 (Reverse Neural Network)

Hardware Pinout

I2C Slave Bus (to Stage 3 master)

8-bit DAC GPIO Output

Beginner R-2R DAC Wiring (8-bit, GitHub-friendly diagram)

Register Control Interface

Build

Flash

Files

Beginner Notes

Part of the Speech Recognition System

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Speech Generation

Part 4 of the Speech Recognition System Series

What This Stage Does

Core Architecture

Core 0 (I2C + Synthesis + GPIO DAC)

Core 1 (Reverse Neural Network)

Hardware Pinout

I2C Slave Bus (to Stage 3 master)

8-bit DAC GPIO Output

Beginner R-2R DAC Wiring (8-bit, GitHub-friendly diagram)

Register Control Interface

Build

Flash

Files

Beginner Notes

Part of the Speech Recognition System

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages