Skip to content

Cyber-One/Speech_Recognition_Process_8bit_relu

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Speech Process 8-bit ReLU

Overview

This sub-project implements a small 8-bit fixed-point neural network on RP2040. It receives input data over I2C1 in slave mode, runs a 1-hidden-layer ReLU network, and exposes control/output over I2C0 in slave mode.

  • Input side: I2C1 slave (raw packet input)
  • Output/control side: I2C0 slave (register map)
  • Network: 41 input bytes → 100 hidden → 200 output, ReLU activation
  • Backprop: Single-pass update when control bit 2 is set
  • Output FIFO: 16 entries, each entry = 5 bytes (ID + max value + female value + male value + user ID)

Pin Configuration

I2C1 (Input)

GPIO Function Description
6 SDA I2C1 Data
7 SCL I2C1 Clock

I2C0 (Output / Control)

GPIO Function Description
20 SDA I2C0 Data
21 SCL I2C0 Clock

I2C Address Selection

GPIO Function Description
2 A0 Address bit 0
3 A1 Address bit 1
4 A2 Address bit 2

Address = $0x60 + (A2\ll2) + (A1\ll1) + (A0\ll0)$ (pulled up by default).

Diagnostic GPIOs

GPIO Function Description
10 NN_PASS Toggles each NN pass
11 FEMALE Female detection (thresholded)
12 MALE Male detection (thresholded)
14 SIL_W Silence inter-word (thresholded)
15 SIL_L Silence inter-sentence (thresholded)
16 WORD_RDY Word data ready (see logic below)

Input Packet (I2C1)

  • Packet size: 41 bytes
  • Byte 0: Header 0xAA
  • Bytes 1–40: input data bytes

Frequency bin centers (Hz)

Derived from $f_s=16000$, $N=256$, start bin 8 (500 Hz), with 40 output bins formed by summing pairs of FFT bins across the 500–5500 Hz range. Approximate bin centers are:

$$f_{bin}(n) = 500 + 125n;\text{Hz},\quad n=0\ldots39$$

40 Bin Center Frequencies (Hz)

Bin Center (Hz)
0 500
1 625
2 750
3 875
4 1000
5 1125
6 1250
7 1375
8 1500
9 1625
10 1750
11 1875
12 2000
13 2125
14 2250
15 2375
16 2500
17 2625
18 2750
19 2875
20 3000
21 3125
22 3250
23 3375
24 3500
25 3625
26 3750
27 3875
28 4000
29 4125
30 4250
31 4375
32 4500
33 4625
34 4750
35 4875
36 5000
37 5125
38 5250
39 5375

The input buffer size matches the output of the Speech_Recognition_AudioCapture output and the I2C_TestDevice input buffer size (41 bytes).

Output FIFO

  • Capacity: 16 entries
  • Stored when the highest output neuron exceeds 80% (value ≥ 204)
  • Each FIFO entry is 5 bytes:
    • Byte 1: highest-value output neuron ID
    • Byte 2: highest-value neuron output
    • Byte 3: female neuron output value
    • Byte 4: male neuron output value
    • Byte 5: user ID (0 = unknown, 1-20 = known users)

Output Neuron Mapping

The phoneme IDs match the microSD PhonemeList.txt. IDs 0x00–0x01 are reserved for speaker features.

User-ID classification neurons are mapped to output neurons 50-69:

  • Neuron 50 -> user ID 1
  • ...
  • Neuron 69 -> user ID 20

If no user neuron exceeds threshold, user ID is reported as 0 (Unknown).

Neuron ID Label
0x00 Female (speaker)
0x01 Male (speaker)
0x02 SIL (inter-word)
0x03 Silence inter-sentence
0x04 Obsolete (silence short)
0x05 AA
0x06 AE
0x07 AH
0x08 AO
0x09 AW
0x0A AY
0x0B B
0x0C CH
0x0D D
0x0E DH
0x0F EH
0x10 ER
0x11 EY
0x12 F
0x13 G
0x14 HH
0x15 IH
0x16 IY
0x17 JH
0x18 K
0x19 L
0x1A M
0x1B N
0x1C NG
0x1D OW
0x1E OY
0x1F P
0x20 R
0x21 S
0x22 SH
0x23 T
0x24 TH
0x25 UH
0x26 UW
0x27 V
0x28 W
0x29 Y
0x2A Z
0x2B ZH
0x2C AX

Word-data-ready GPIO

WORD_RDY is asserted when either inter-word or inter-sentence silence neurons are active and the output FIFO contains non-silence data.

Silence FIFO rules

  • Silence-word and silence-sentence entries are only added if no non-silence data is already in the FIFO.
  • If the last FIFO entry is silence-word and a silence-sentence entry is generated, the silence-word entry is replaced by the silence-sentence entry.

Register Map (I2C0)

I2C0 always receives a register byte first. Read/write behavior depends on the selected register.

Reg Size R/W Description
0x00 16-bit R/W Control/Status register
0x01 16-bit R/W Output FIFO length (write 0 to clear FIFO)
0x02 16-bit R/W Input buffer address pointer
0x03 8-bit R/W Input buffer data (auto-increment pointer)
0x04 8-bit R/W Target neuron ID for backprop
0x05 40-bit R/O Oldest FIFO entry (read pops FIFO)
0x06 8-bit R/W Output weight: neuron index
0x07 8-bit R/W Output weight: weight index
0x08 8-bit R/W Output weight: weight value (signed)
0x09 8-bit R/W Hidden weight: neuron index
0x0A 16-bit R/W Hidden weight: weight index
0x0B 8-bit R/W Hidden weight: weight value (signed)
0x0C 8-bit R/W Page mode (target select)
0x0D 16-bit R/W Page address
0x0E 16-bit R/W Page length
0x0F 8-bit R/W Page data stream

Register 0 (Control/Status)

Bit Access Description
0 R Data is incoming (set when input buffer updates)
1 R/W Freeze incoming data (use for training)
2 R/W Start backprop (auto-clears when done)
3 R/W Pause neural network processing
4–15 R/W Reserved (ignored)

Register 3 (Input Data)

  • Read: returns byte at Register 2 pointer, then increments pointer
  • Write: writes byte to pointer, then increments pointer

Register 5 (Output FIFO)

  • Read: returns oldest FIFO entry and removes it
  • Byte order:
    • Byte 1: neuron ID
    • Byte 2: max neuron value
    • Byte 3: female neuron value
    • Byte 4: male neuron value
    • Byte 5: user ID (0 = unknown)

Output Weight Registers

Use 0x06 and 0x07 to select the output neuron and its weight index (0–99). Read/write 0x08 to access the signed 8-bit weight value.

Hidden Weight Registers

Use 0x09 to select the hidden neuron (0–99). Use 0x0A to set the 16-bit input index (0–40). Read/write 0x0B to access the signed 8-bit weight value.

Page Read/Write Mode

Page mode provides fast bulk access with auto-incrementing address and length decrement.

Modes (REG 0x0C):

  • 0x01 = W1 (hidden weights, size $100\times41$)
  • 0x02 = B1 (hidden bias, size 100)
  • 0x03 = W2 (output weights, size $200\times100$)
  • 0x04 = B2 (output bias, size 200)
  • 0x05 = INPUT (input buffer, size 41)

Usage:

  1. Set 0x0C (mode)
  2. Set 0x0D (start address, 16-bit)
  3. Set 0x0E (length, 16-bit)
  4. Stream bytes via 0x0F (read or write). Each byte auto-increments the address and decrements length.

Build

cd Speech_Process_8bit_relu
mkdir -p build
cd build
cmake ..
ninja

Notes

  • All math is fixed-point, 8-bit weights and activations.
  • Backprop runs a single pass and clears control bit 2 when complete.
  • Neural network runs on Core 1; I2C and control logic run on Core 0.

About

This sub-project implements a small 8-bit fixed-point neural network on RP2040.

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors