Speech Recognition Translator

Overview

This stage aggregates phoneme outputs from five Speech_Process_8bit_relu units, resolves phoneme sequences into text, and emits translated words with gender and direction tags. It also manages microSD storage for neural weights/bias and the translation dictionary.

I2C0 master reads FIFO data from stage‑2 devices at 0x60–0x64
GPIO inputs (one per stage‑2) indicate valid FIFO data
SPI microSD stores dictionary + stage‑2 weights/bias
Output mode selectable via 2 GPIO pins (USB / TTL / I2C)
Diagnostic GPIOs indicate stage‑2 device failure

Pin Configuration

I2C0 (Stage‑2 Read)

GPIO	Function	Description
20	SDA	I2C0 Data
21	SCL	I2C0 Clock

Word‑Ready Inputs (one per stage‑2)

GPIO	Function	Description
6	WR0	Stage‑2 @ 0x60 word‑ready
7	WR1	Stage‑2 @ 0x61 word‑ready
8	WR2	Stage‑2 @ 0x62 word‑ready
9	WR3	Stage‑2 @ 0x63 word‑ready
10	WR4	Stage‑2 @ 0x64 word‑ready

Diagnostic Outputs (stage‑2 not responding)

GPIO	Function	Description
11	FAULT0	Stage‑2 @ 0x60 fault
12	FAULT1	Stage‑2 @ 0x61 fault
13	FAULT2	Stage‑2 @ 0x62 fault
14	FAULT3	Stage‑2 @ 0x63 fault
15	FAULT4	Stage‑2 @ 0x64 fault

Output Mode Select (2 GPIOs)

GPIO	Function	Description
2	MODE0	Output select bit 0 (pull‑up)
3	MODE1	Output select bit 1 (pull‑up)

Mode table:

00 = USB serial
01 = TTL UART
10 = I2C output (future)

TTL UART (Serial Output)

GPIO	Function	Description
0	TX	UART0 TX
1	RX	UART0 RX

microSD (SPI0)

GPIO	Function	Description
18	SCK	SPI0 Clock
19	MOSI	SPI0 MOSI
16	MISO	SPI0 MISO
17	CS	SD Chip Select

LCD + Keypad (shared I2C expanders)

Both the LCD and keypad use PCF8574 I/O expanders on the same I2C0 bus.

Device	I2C Address	Purpose
LCD backpack (PCF8574)	`0x27`	20x4 text display
Keypad expander (PCF8574)	`0x26`	4x4 matrix keypad scan

Beginner wiring notes:

Keep all grounds common: Pico GND, stage-2 boards, LCD module, keypad module, SD module.
Use 3.3 V compatible I2C devices.
Ensure SDA/SCL have pull-ups (many modules include them already).
Avoid address conflicts on I2C0 (0x26, 0x27, 0x60-0x64).

LCD Menu System (20x4)

Default Screen (Screen 0)

Line 0: System status + microSD error status
Lines 1-3: Wrapped history containing the last 10 recognized words

Press # from Screen 0 to open the menu.

Key Mapping (Menu/Input)

A = Up
B = Down
C = Left
D = Right

Main Menu (Paged)

Line 0: Main Menu Pg 0, Main Menu Pg 1, or Main Menu Pg 2
Page 0:
1. Add New User
2. User Menu
3. Start Training
Page 1: 4. Select Word from Unrecognized List 5. Initiate Speech Generator Training 6. Stage 2 ANN Training
Page 2: 7. Save ANN 8. Load Speech ANN

Navigation behavior:

From Page 0, A/C/D have no effect.
From Page 0, B switches to Page 1.
From Page 1, A switches back to Page 0 and B switches to Page 2.
From Page 2, A switches back to Page 1.
Press * to exit menu screens back to Screen 0.

User Menu (from Main Menu option 2)

Line 0: User Menu
Line 1: User ID: <id> <name> for the current selection
A/B cycles up/down through configured users
Only assigned users are shown (unassigned/default entries are excluded)
Press # to select the displayed user and return to Screen 0
Press * to return to Main Menu

Training Menu (Main Menu option 3)

This menu is available only when a user is selected.

Press 3 from Main Menu Page 0 to enter training.
If no user is selected, training does not open and status is set to select user.

Display behavior:

Line 0: Training Memnu X/Y (firmware text), where X is current word index and Y is total words
Line 1: Current training word (centered)
Line 2: Recording: Present or Recording: Missing
Line 3 (idle): A/B:Scroll #:Train
Line 3 (armed/capturing): Speak When Ready (centered)

Key behavior:

A / C = previous word
B / D = next word
# = arm capture for currently displayed word
* = abort training and return to Main Menu Page 1

Stage 2 ANN Training (Main Menu Page 1, option 6)

Entry point: Main Menu Page 1 -> 6
Scope: training is applied only to Stage-2 channel/unit 2
Confirmation required before start:
- Line 0: Stage 2 ANN Train
- Line 1: Are you sure?
- # = Yes, start ANN training
- * = No, return to main menu

During training, the LCD shows:

Current word being trained
Last training result with certainty/epoch info
Running count and latest inferred user ID

ANN training logs are also appended to microSD at:

microsd/logs/<username>_ann_train.log

Training loop behavior:

Incoming stage-1 stream is frozen before ANN training begins.
Each training word (.dat) is replayed into stage-2 and backprop is triggered.
Stage-2 telemetry is read after passes to evaluate sequence/gender/user correctness.
The same word is retrained across epochs until all pass criteria are met (or max epoch limit is hit).

Per-word pass criteria:

Target certainty >= 80%
Phoneme order match >= 80% against the dictionary sequence for that word
Gender check passes (female or male output >= 80% for selected user gender)
User check passes (returned user ID matches selected user and user-neuron confidence >= 80%)

At completion, the system returns to Main Menu.

Speech Generator Training (Main Menu Page 1, option 5)

Entry point: Main Menu Page 1 -> 5
Scope: trains Stage 4 (Speech_Generation, I2C 0x65) against Stage 2 channel 2 (0x62)

Training loop behavior:

Send a single phoneme ID to Stage 4 and trigger image generation.
Capture generated Stage-4 image buffer (40 bins x 100 lines).
Replay each 40-byte line into Stage-2 channel 2 input.
Read Stage-2 target confidence for that phoneme.
If confidence < 80%, trigger one Stage-4 backprop step and retry.

Per-phoneme pass condition:

Stage-2 target confidence for the requested phoneme reaches >= 80%.

During this process, the LCD shows phoneme ID, epoch progress, and running status.

Save ANN (Main Menu Page 2, option 7)

Entry point: Main Menu Page 2 -> 7
Scope: save is performed from Stage-2 channel/unit 2 only
Confirmation required before save:
- Line 0: Save ANN
- Line 1: Are you sure?
- # = Yes, start save
- * = No, return to main menu

During save, LCD displays:

New ANN version ID (vXX)
Current save phase (W1/B1/W2/B2)
Save progress percentage

Saved ANN file behavior:

Path format: microsd/RecognizerANNXX.dat
XX is auto-incremented version number (00 to 99)
File payload contains full ANN weights and bias blocks (W1, B1, W2, B2)

On completion, display returns to Main Menu.

For a concise operator checklist, see Save ANN quick workflow.

Load Speech ANN (Main Menu Page 2, option 8)

Entry point: Main Menu Page 2 -> 8
Loads from saved files at microsd/RecognizerANNXX.dat
Default selection is the highest available version (XX max)

Selection screen behavior:

Line 0: Load Speech ANN
Line 1: selected version (ANN vXX)
Line 2: selection index (current/total)
Line 3: A/B:Sel #:Load *:Bk

Key behavior before load starts:

A/B = move selection up/down through available saved versions
* = return to Main Menu without loading
# = load selected ANN into all 5 Speech_Process_8bit_relu devices

During load, LCD displays version and upload progress per device.

Upload behavior:

Each target stage-2 device is paused/frozen while its ANN is written, then resumed
Process repeats across all 5 stage-2 device addresses
On completion, display returns to Main Menu

Capture lifecycle when # is pressed:

Neural network input is cleared.
System waits for speech (peak > moving average threshold).
Capture continues until speech ends (peak <= moving average threshold) or frame limit is reached.
Input is frozen, buffered data is saved, input is cleared, and processing resumes.

Captured training buffer size is 40 bytes × 100 frames per word capture.

Training data file behavior:

Saved path: microsd/<username>/<word>.dat
Existing file for that word is overwritten by new capture.

Stage‑2 FIFO Read Protocol

The Translator reads each stage‑2 device via I2C0 (master):

0x01 → FIFO length (16‑bit)
0x05 → FIFO entry (40‑bit):
- Byte 1: neuron ID
- Byte 2: max value
- Byte 3: female value
- Byte 4: male value
- Byte 5: user ID (0 = unknown, 1-20 = user)

Phoneme Buffering

When a silence packet is detected (SIL inter‑word or inter‑sentence), the translator checks a 15‑entry phoneme buffer and attempts a dictionary match.

Dictionary Storage (microSD)

The microSD card holds:

Weights/Bias for stage‑2 devices
Translation dictionary
UserList.txt for user ID → name mapping

Dictionary entries:

ID number
phoneme sequence
text word

Dictionary must be sorted by phoneme order for fast matching.

Translation Output

Each translated word includes:

Direction (beam index 0‑4)
Gender tag (female/male)
Confidence (from max/female/male values)

microSD Capacity

For 256GB cards, ensure SDHC/SDXC support and format as FAT32. A FatFs‑based driver is recommended.

Command Set (Stage‑2 Control)

The Translator issues commands to stage‑2 units for configuration:

Load/Save weights & biases (bulk page mode)
Set target neuron for training
Freeze input / pause processing

Build

cd Speech_Recognition_Translator
mkdir -p build
cd build
cmake ..
ninja

Status

This module provides a working stage-3 translator pipeline with:

Stage-2 FIFO reads over I2C
15-phoneme sequence buffering and silence-triggered lookup
microSD FatFs dictionary storage
Unknown-word capture (NewWords.dat)
20x4 LCD + keypad menu flow
User profile selection via UserList.txt

Multilingual Support & Unknown Word Tracking

The translator supports multiple languages through a language-ID system and tracks unrecognized phoneme sequences using a two-file dictionary design:

Dictionary.dat: Main sorted dictionary (73 bytes/record, fixed-width text)
NewWords.dat: Sequential unknown-word file (73 bytes/record, fixed-width text)
Language.dat: Language ID ↔ name mapping (text records, 20 entries)

Two-File Architecture

Dictionary.dat (Sorted)         NewWords.dat (Sequential)
├─ Maintains sort order         ├─ Unknown words appended
├─ Binary search capable        │  without sort overhead
├─ Stable, pre-validated data   └─ Linear search within file
└─ Primary lookup source           (typically <100 entries)

Lookup Flow:
1. Search Dictionary.dat
2. If not found, search NewWords.dat
3. If still not found, append to NewWords.dat

Unknown Word Lifecycle

Phoneme Sequence Detected
    ↓
dict_lookup_word(seq) → NOT FOUND
    ↓
dict_add_unknown_word(seq)
    ├─ Generate label: UnRecognisedXX
    ├─ Set language_id = 0
    └─ Append to NewWords.dat

Note: the label text in files uses British spelling (UnRecognisedXX), while UI text may use Unrecognized.

File Specifications

Language.dat

Purpose: Language ID mapping
Format: Text records (HH LanguageName\r\n)
- Chars 0-1: Language ID (2-digit hex)
- Char 2: Space separator
- Chars 3..N: Language name
- Line ending: CRLF
Entries: 20 predefined languages
Size: Variable (~235 bytes for 20 default entries)
Location: /microsd/Language.dat

Dictionary.dat

Purpose: Primary sorted lookup table
Format: 73 bytes/record (fixed-width text)
- Chars 0-44: 15 two-digit hex phoneme IDs separated by spaces (includes trailing space)
- Chars 45-70: Word field (26 chars, space padded)
- Chars 71-72: CRLF
Sort Order: Lexicographic by phoneme sequence
Location: /microsd/Dictionary.dat

NewWords.dat

Purpose: Unknown-word accumulation during runtime
Format: 73 bytes/record (same as Dictionary.dat)
Word Label: UnRecognised00 ... UnRecognised99
Location: /microsd/NewWords.dat

Language IDs

#define LANG_UNKNOWN      0
#define LANG_ENGLISH      1
#define LANG_SPANISH      2
#define LANG_FRENCH       3
#define LANG_GERMAN       4
#define LANG_ITALIAN      5
#define LANG_PORTUGUESE   6
#define LANG_RUSSIAN      7
#define LANG_CHINESE      8
#define LANG_JAPANESE     9
#define LANG_KOREAN      10
#define LANG_ARABIC      11
#define LANG_HINDI       12
#define LANG_DUTCH       13
#define LANG_SWEDISH     14
#define LANG_TURKISH     15
#define LANG_POLISH      16
#define LANG_GREEK       17
#define LANG_HEBREW      18
#define LANG_VIETNAMESE  19

API Reference

dict_lookup_word(const uint8_t *seq, char *word_out, size_t word_out_len)
- Searches Dictionary.dat, then NewWords.dat
dict_add_unknown_word(const uint8_t *seq)
- Appends unknown sequences into NewWords.dat
dict_merge_new_words(void)
- Appends NewWords.dat into Dictionary.dat, then deletes NewWords.dat
create_language_file(void)
- Creates Language.dat if missing, without overwriting existing data

Usage Examples

Generate initial dictionary files:

python3 generate_dicts.py /path/to/microsd

Expected outputs:

/path/to/microsd/Language.dat (text format, ~235 bytes with defaults)
/path/to/microsd/Dictionary.dat (empty template)

Constants

#define DICT_HEX_FIELD_CHARS 45
#define DICT_WORD_SIZE 26
#define DICT_RECORD_SIZE 73

Performance Notes

Dictionary lookup: O(log n) binary search on sorted Dictionary.dat
NewWords lookup: O(m), where m is unknown-word count
Unknown append: O(1)
Merge operation: O(k), where k is merge record count

File Size Summary

File	Count	Size	Bytes/Entry
Language.dat	20	~235 B	variable
Dictionary.dat (1000 words)	1000	73 KB	73
NewWords.dat (100 words)	100	7.3 KB	73
Total	1120	~80 KB	-

Testing Checklist

Language.dat created with 20 entries (text format)
Dictionary.dat created (initially empty)
Unknown word added to NewWords.dat
Output includes [NEW] tag for unknown words
Same unknown sequence recognized on second occurrence
dict_merge_new_words() merges entries successfully
NewWords.dat removed after merge

Troubleshooting

Problem	Cause	Solution
No Language.dat	SD not ready	Check SD mount status
NewWords not created	SD write failed	Verify write permissions
Words not found	Phoneme mismatch	Check phoneme sequence
Merge fails	Append error	Free up SD space
Performance slow	NewWords too large	Run merge operation

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
microsd		microsd
third_party/fatfs		third_party/fatfs
tools		tools
.gitignore		.gitignore
.markdownlint.json		.markdownlint.json
.markdownlintignore		.markdownlintignore
BUILD_VERIFICATION.md		BUILD_VERIFICATION.md
CMakeLists.txt		CMakeLists.txt
COMPLETION_SUMMARY.md		COMPLETION_SUMMARY.md
DOCS_MARKDOWN_CHECKLIST.md		DOCS_MARKDOWN_CHECKLIST.md
Dictionary.dat		Dictionary.dat
IMPLEMENTATION_SUMMARY.md		IMPLEMENTATION_SUMMARY.md
QUICKSTART.md		QUICKSTART.md
README.md		README.md
SD_INTEGRATION.md		SD_INTEGRATION.md
generate_dicts.py		generate_dicts.py
pico_sdk_import.cmake		pico_sdk_import.cmake
sd_driver.c		sd_driver.c
speech_recognition_translator.c		speech_recognition_translator.c

Folders and files

Latest commit

History

Repository files navigation

Speech Recognition Translator

Overview

Table of Contents

Pin Configuration

I2C0 (Stage‑2 Read)

Word‑Ready Inputs (one per stage‑2)

Diagnostic Outputs (stage‑2 not responding)

Output Mode Select (2 GPIOs)

TTL UART (Serial Output)

microSD (SPI0)

LCD + Keypad (shared I2C expanders)

LCD Menu System (20x4)

Default Screen (Screen 0)

Key Mapping (Menu/Input)

Main Menu (Paged)

User Menu (from Main Menu option 2)

Training Menu (Main Menu option 3)

Stage 2 ANN Training (Main Menu Page 1, option 6)

Speech Generator Training (Main Menu Page 1, option 5)

Save ANN (Main Menu Page 2, option 7)

Load Speech ANN (Main Menu Page 2, option 8)

Stage‑2 FIFO Read Protocol

Phoneme Buffering

Dictionary Storage (microSD)

Translation Output

microSD Capacity

Command Set (Stage‑2 Control)

Build

Status

Multilingual Support & Unknown Word Tracking

Two-File Architecture

Unknown Word Lifecycle

File Specifications

Language.dat

Dictionary.dat

NewWords.dat

Language IDs

API Reference

Usage Examples

Constants

Performance Notes

File Size Summary

Testing Checklist

Troubleshooting

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages