Skip to content

Cyber-One/Speech_Recognition_Speech_Generation

Repository files navigation

Speech Generation

Part 4 of the Speech Recognition System Series

This project is a stage-4 audio generator for the Speech_Capture pipeline, running on a Raspberry Pi Pico (RP2040).

  • It joins the same I2C bus used by stage-2 and stage-3 devices.
  • It operates as an I2C slave at address 0x65.
  • It receives control/features from stage-3 (Speech_Recognition_Translator).
  • It runs a single-channel reverse neural network on Core 1.
  • It reconstructs a 16 kHz audio stream from 40 spectral bins on Core 0.
  • It stores generated spectra in a 40 x 100 byte output image (40 bins × 100 lines).
  • It outputs audio through an 8-bit GPIO DAC (R-2R ladder or resistor network).
  • It supports spoken system prompts that can be used by robots as a voice interface.
  • It also supports the new-word learning loop, where the system can ask the user how an unrecognized word is spelled and which language it belongs to.
  • Because this reverse network is trained from data aligned to Speech_Process_8bit_relu, it is expected to develop a speaking accent similar to the dominant accent present in that stage-2 training set.

What This Stage Does

Think of this module as the inverse of Speech_Process_8bit_relu:

  • Stage 2 (Speech_Process_8bit_relu) maps spectral features → phoneme-like outputs.
  • Stage 4 (Speech_Generation) maps control/features → spectral binstime-domain audio-like output.

This implementation is intentionally beginner-friendly and modular. It provides a complete scaffold and register interface so you can improve model quality, DSP quality, and waveform reconstruction over time.

In addition to speech output, this stage is part of the interactive vocabulary-learning process for unknown words: it can generate prompts that help collect spelling and language metadata for dictionary updates.


Core Architecture

Core 0 (I2C + Synthesis + GPIO DAC)

  • I2C slave register handling (0x65)
  • Register-based configuration and live data I/O
  • Per-line inverse-transform rendering (40 bins -> 256 samples)
  • Playback pointer stepping at 62.5 lines/s (256 samples/line)
  • 8-bit sample output across GPIO 0-7

Core 1 (Reverse Neural Network)

  • Receives input feature buffer from Core 0
  • Runs 1-hidden-layer 8-bit fixed-point NN
  • Produces 40 output bins (500-5500 Hz)
  • Returns bins to Core 0

Hardware Pinout

I2C Slave Bus (to Stage 3 master)

GPIO Function Notes
20 SDA I2C0 SDA
21 SCL I2C0 SCL

8-bit DAC GPIO Output

GPIO Bit
0 DAC bit 0 (LSB)
1 DAC bit 1
2 DAC bit 2
3 DAC bit 3
4 DAC bit 4
5 DAC bit 5
6 DAC bit 6
7 DAC bit 7 (MSB)

Use these pins with an R-2R resistor ladder or weighted resistor DAC network, then route into an audio amplifier/filter.

Beginner R-2R DAC Wiring (8-bit, GitHub-friendly diagram)

Use a simple R-2R ladder so GPIO 0-7 become one analog output.

Recommended starter values:

  • R = 10 kΩ
  • 2R = 20 kΩ

ASCII concept diagram (MSB on left, LSB on right):

GPIO7 (MSB) --R--+--2R--+--2R--+--2R--+--2R--+--2R--+--2R--+--2R--+----> DAC_OUT
                 |       |       |       |       |       |       |       |
GPIO6 -----------R-------+       |       |       |       |       |       |
                         |       |       |       |       |       |       |
GPIO5 -------------------R-------+       |       |       |       |       |
                                 |       |       |       |       |       |
GPIO4 ---------------------------R-------+       |       |       |       |
                                         |       |       |       |       |
GPIO3 -----------------------------------R-------+       |       |       |
                                                 |       |       |       |
GPIO2 -------------------------------------------R-------+       |       |
                                                         |       |       |
GPIO1 ---------------------------------------------------R-------+       |
                                                                 |       |
GPIO0 (LSB) -----------------------------------------------------R-------+

Ladder end termination: final 2R to GND

Practical analog output chain for robots:

  1. DAC_OUT → small RC low-pass filter (for example 1 kΩ + 10 nF)
  2. Filter output → audio amplifier input (or powered speaker module)
  3. Keep grounds shared between Pico and amplifier

Safety note for beginners:

  • Do not connect speaker directly to GPIO pins.
  • Always use the resistor network first, then an amplifier/buffer stage.

Register Control Interface

The device uses a register system similar to Speech_Process_8bit_relu.

See REGISTER_MAP.md for the full table.

Key registers:

  • 0x00 Control/Status (16-bit)
  • 0x02 Input pointer
  • 0x03 Input data (auto-increment)
  • 0x04 Trigger NN run
  • 0x06 Bin pointer
  • 0x07 Bin data (auto-increment)
  • 0x10 Image line pointer
  • 0x11 Image line data (40-byte row)
  • 0x12 Phoneme ID for generation
  • 0x13 Generation/training command
  • 0x14 Feedback score
  • 0x15 Training target phoneme

Training flow (host-driven):

  1. Write one phoneme ID and trigger image generation.
  2. Read 100 lines × 40 bins from image buffer.
  3. Feed lines to Speech_Process_8bit_relu channel 2 for scoring.
  4. If target confidence is below 80%, write feedback and trigger one backprop step.
  5. Repeat until confidence threshold is reached.

Playback conversion details:

  • Each image line is converted into a 256-sample audio block.
  • Output samples are stepped at 16 kHz, so each line consumes 16 ms.
  • This yields exactly 62.5 lines per second.

Build

cd Speech_Generation
mkdir -p build
cd build
cmake ..
ninja

UF2 output:

  • build/Speech_Generation.uf2

Flash

picotool load build/Speech_Generation.uf2 -fx

Files

  • speech_generation.c - Main firmware (I2C slave, NN, synthesis, DAC output)
  • REGISTER_MAP.md - I2C register reference
  • ARCHITECTURE.md - Deeper design notes for learners
  • QUICKSTART.md - Practical bring-up steps

Beginner Notes

  1. Start by writing known values into the spectral bin registers (0x06, 0x07).
  2. Observe GPIO DAC output with an oscilloscope.
  3. Then trigger NN inference using register 0x04.
  4. Iterate on weight values through weight registers to understand how output changes.

Part of the Speech Recognition System

  1. Speech_Recognition_AudioCapture - beamforming + FFT
  2. Speech_Process_8bit_relu - feature/phoneme NN processing
  3. Speech_Recognition_Translator - sequence/word translation + control
  4. Speech_Generation (this project) - reverse synthesis to audio output

About

This project is a **stage-4 audio generator** for the Speech_Capture pipeline, running on a Raspberry Pi Pico (RP2040).

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors