This stage aggregates phoneme outputs from five Speech_Process_8bit_relu units, resolves phoneme sequences into text, and emits translated words with gender and direction tags. It also manages microSD storage for neural weights/bias and the translation dictionary.
- I2C0 master reads FIFO data from stage‑2 devices at 0x60–0x64
- GPIO inputs (one per stage‑2) indicate valid FIFO data
- SPI microSD stores dictionary + stage‑2 weights/bias
- Output mode selectable via 2 GPIO pins (USB / TTL / I2C)
- Diagnostic GPIOs indicate stage‑2 device failure
- Pin Configuration
- LCD Menu System (20x4)
- Stage‑2 FIFO Read Protocol
- Phoneme Buffering
- Dictionary Storage (microSD)
- Translation Output
- microSD Capacity
- Command Set (Stage‑2 Control)
- Build
- Status
- Multilingual Support & Unknown Word Tracking
| GPIO | Function | Description |
|---|---|---|
| 20 | SDA | I2C0 Data |
| 21 | SCL | I2C0 Clock |
| GPIO | Function | Description |
|---|---|---|
| 6 | WR0 | Stage‑2 @ 0x60 word‑ready |
| 7 | WR1 | Stage‑2 @ 0x61 word‑ready |
| 8 | WR2 | Stage‑2 @ 0x62 word‑ready |
| 9 | WR3 | Stage‑2 @ 0x63 word‑ready |
| 10 | WR4 | Stage‑2 @ 0x64 word‑ready |
| GPIO | Function | Description |
|---|---|---|
| 11 | FAULT0 | Stage‑2 @ 0x60 fault |
| 12 | FAULT1 | Stage‑2 @ 0x61 fault |
| 13 | FAULT2 | Stage‑2 @ 0x62 fault |
| 14 | FAULT3 | Stage‑2 @ 0x63 fault |
| 15 | FAULT4 | Stage‑2 @ 0x64 fault |
| GPIO | Function | Description |
|---|---|---|
| 2 | MODE0 | Output select bit 0 (pull‑up) |
| 3 | MODE1 | Output select bit 1 (pull‑up) |
Mode table:
00= USB serial01= TTL UART10= I2C output (future)
| GPIO | Function | Description |
|---|---|---|
| 0 | TX | UART0 TX |
| 1 | RX | UART0 RX |
| GPIO | Function | Description |
|---|---|---|
| 18 | SCK | SPI0 Clock |
| 19 | MOSI | SPI0 MOSI |
| 16 | MISO | SPI0 MISO |
| 17 | CS | SD Chip Select |
Both the LCD and keypad use PCF8574 I/O expanders on the same I2C0 bus.
| Device | I2C Address | Purpose |
|---|---|---|
| LCD backpack (PCF8574) | 0x27 |
20x4 text display |
| Keypad expander (PCF8574) | 0x26 |
4x4 matrix keypad scan |
Beginner wiring notes:
- Keep all grounds common: Pico GND, stage-2 boards, LCD module, keypad module, SD module.
- Use 3.3 V compatible I2C devices.
- Ensure SDA/SCL have pull-ups (many modules include them already).
- Avoid address conflicts on I2C0 (
0x26,0x27,0x60-0x64).
- Line 0: System status + microSD error status
- Lines 1-3: Wrapped history containing the last 10 recognized words
Press # from Screen 0 to open the menu.
- A = Up
- B = Down
- C = Left
- D = Right
- Line 0:
Main Menu Pg 0,Main Menu Pg 1, orMain Menu Pg 2 - Page 0:
- Add New User
- User Menu
- Start Training
- Page 1: 4. Select Word from Unrecognized List 5. Initiate Speech Generator Training 6. Stage 2 ANN Training
- Page 2: 7. Save ANN 8. Load Speech ANN
Navigation behavior:
- From Page 0, A/C/D have no effect.
- From Page 0, B switches to Page 1.
- From Page 1, A switches back to Page 0 and B switches to Page 2.
- From Page 2, A switches back to Page 1.
- Press
*to exit menu screens back to Screen 0.
- Line 0:
User Menu - Line 1:
User ID: <id> <name>for the current selection - A/B cycles up/down through configured users
- Only assigned users are shown (unassigned/default entries are excluded)
- Press
#to select the displayed user and return to Screen 0 - Press
*to return to Main Menu
This menu is available only when a user is selected.
- Press
3from Main Menu Page 0 to enter training. - If no user is selected, training does not open and status is set to
select user.
Display behavior:
- Line 0:
Training Memnu X/Y(firmware text), whereXis current word index andYis total words - Line 1: Current training word (centered)
- Line 2:
Recording: PresentorRecording: Missing - Line 3 (idle):
A/B:Scroll #:Train - Line 3 (armed/capturing):
Speak When Ready(centered)
Key behavior:
- A / C = previous word
- B / D = next word
- # = arm capture for currently displayed word
- * = abort training and return to Main Menu Page 1
- Entry point: Main Menu Page 1 -> 6
- Scope: training is applied only to Stage-2 channel/unit 2
- Confirmation required before start:
- Line 0:
Stage 2 ANN Train - Line 1:
Are you sure? #= Yes, start ANN training*= No, return to main menu
- Line 0:
During training, the LCD shows:
- Current word being trained
- Last training result with certainty/epoch info
- Running count and latest inferred user ID
ANN training logs are also appended to microSD at:
microsd/logs/<username>_ann_train.log
Training loop behavior:
- Incoming stage-1 stream is frozen before ANN training begins.
- Each training word (
.dat) is replayed into stage-2 and backprop is triggered. - Stage-2 telemetry is read after passes to evaluate sequence/gender/user correctness.
- The same word is retrained across epochs until all pass criteria are met (or max epoch limit is hit).
Per-word pass criteria:
- Target certainty >= 80%
- Phoneme order match >= 80% against the dictionary sequence for that word
- Gender check passes (
femaleormaleoutput >= 80% for selected user gender) - User check passes (returned user ID matches selected user and user-neuron confidence >= 80%)
At completion, the system returns to Main Menu.
- Entry point: Main Menu Page 1 -> 5
- Scope: trains Stage 4 (
Speech_Generation, I2C0x65) against Stage 2 channel 2 (0x62)
Training loop behavior:
- Send a single phoneme ID to Stage 4 and trigger image generation.
- Capture generated Stage-4 image buffer (
40 bins x 100 lines). - Replay each 40-byte line into Stage-2 channel 2 input.
- Read Stage-2 target confidence for that phoneme.
- If confidence < 80%, trigger one Stage-4 backprop step and retry.
Per-phoneme pass condition:
- Stage-2 target confidence for the requested phoneme reaches >= 80%.
During this process, the LCD shows phoneme ID, epoch progress, and running status.
- Entry point: Main Menu Page 2 -> 7
- Scope: save is performed from Stage-2 channel/unit 2 only
- Confirmation required before save:
- Line 0:
Save ANN - Line 1:
Are you sure? #= Yes, start save*= No, return to main menu
- Line 0:
During save, LCD displays:
- New ANN version ID (
vXX) - Current save phase (W1/B1/W2/B2)
- Save progress percentage
Saved ANN file behavior:
- Path format:
microsd/RecognizerANNXX.dat XXis auto-incremented version number (00to99)- File payload contains full ANN weights and bias blocks (W1, B1, W2, B2)
On completion, display returns to Main Menu.
For a concise operator checklist, see Save ANN quick workflow.
- Entry point: Main Menu Page 2 -> 8
- Loads from saved files at
microsd/RecognizerANNXX.dat - Default selection is the highest available version (
XXmax)
Selection screen behavior:
- Line 0:
Load Speech ANN - Line 1: selected version (
ANN vXX) - Line 2: selection index (
current/total) - Line 3:
A/B:Sel #:Load *:Bk
Key behavior before load starts:
- A/B = move selection up/down through available saved versions
*= return to Main Menu without loading#= load selected ANN into all 5Speech_Process_8bit_reludevices
During load, LCD displays version and upload progress per device.
Upload behavior:
- Each target stage-2 device is paused/frozen while its ANN is written, then resumed
- Process repeats across all 5 stage-2 device addresses
- On completion, display returns to Main Menu
Capture lifecycle when # is pressed:
- Neural network input is cleared.
- System waits for speech (
peak > moving average threshold). - Capture continues until speech ends (
peak <= moving average threshold) or frame limit is reached. - Input is frozen, buffered data is saved, input is cleared, and processing resumes.
Captured training buffer size is 40 bytes × 100 frames per word capture.
Training data file behavior:
- Saved path:
microsd/<username>/<word>.dat - Existing file for that word is overwritten by new capture.
The Translator reads each stage‑2 device via I2C0 (master):
0x01→ FIFO length (16‑bit)0x05→ FIFO entry (40‑bit):- Byte 1: neuron ID
- Byte 2: max value
- Byte 3: female value
- Byte 4: male value
- Byte 5: user ID (0 = unknown, 1-20 = user)
When a silence packet is detected (SIL inter‑word or inter‑sentence), the translator checks a 15‑entry phoneme buffer and attempts a dictionary match.
The microSD card holds:
- Weights/Bias for stage‑2 devices
- Translation dictionary
- UserList.txt for user ID → name mapping
Dictionary entries:
- ID number
- phoneme sequence
- text word
Dictionary must be sorted by phoneme order for fast matching.
Each translated word includes:
- Direction (beam index 0‑4)
- Gender tag (female/male)
- Confidence (from max/female/male values)
For 256GB cards, ensure SDHC/SDXC support and format as FAT32. A FatFs‑based driver is recommended.
The Translator issues commands to stage‑2 units for configuration:
- Load/Save weights & biases (bulk page mode)
- Set target neuron for training
- Freeze input / pause processing
cd Speech_Recognition_Translator
mkdir -p build
cd build
cmake ..
ninjaThis module provides a working stage-3 translator pipeline with:
- Stage-2 FIFO reads over I2C
- 15-phoneme sequence buffering and silence-triggered lookup
- microSD FatFs dictionary storage
- Unknown-word capture (
NewWords.dat) - 20x4 LCD + keypad menu flow
- User profile selection via
UserList.txt
The translator supports multiple languages through a language-ID system and tracks unrecognized phoneme sequences using a two-file dictionary design:
- Dictionary.dat: Main sorted dictionary (73 bytes/record, fixed-width text)
- NewWords.dat: Sequential unknown-word file (73 bytes/record, fixed-width text)
- Language.dat: Language ID ↔ name mapping (text records, 20 entries)
Dictionary.dat (Sorted) NewWords.dat (Sequential)
├─ Maintains sort order ├─ Unknown words appended
├─ Binary search capable │ without sort overhead
├─ Stable, pre-validated data └─ Linear search within file
└─ Primary lookup source (typically <100 entries)
Lookup Flow:
1. Search Dictionary.dat
2. If not found, search NewWords.dat
3. If still not found, append to NewWords.datPhoneme Sequence Detected
↓
dict_lookup_word(seq) → NOT FOUND
↓
dict_add_unknown_word(seq)
├─ Generate label: UnRecognisedXX
├─ Set language_id = 0
└─ Append to NewWords.datNote: the label text in files uses British spelling (UnRecognisedXX), while UI text may use Unrecognized.
- Purpose: Language ID mapping
- Format: Text records (
HH LanguageName\r\n)- Chars 0-1: Language ID (2-digit hex)
- Char 2: Space separator
- Chars 3..N: Language name
- Line ending: CRLF
- Entries: 20 predefined languages
- Size: Variable (~235 bytes for 20 default entries)
- Location:
/microsd/Language.dat
- Purpose: Primary sorted lookup table
- Format: 73 bytes/record (fixed-width text)
- Chars 0-44: 15 two-digit hex phoneme IDs separated by spaces (includes trailing space)
- Chars 45-70: Word field (26 chars, space padded)
- Chars 71-72: CRLF
- Sort Order: Lexicographic by phoneme sequence
- Location:
/microsd/Dictionary.dat
- Purpose: Unknown-word accumulation during runtime
- Format: 73 bytes/record (same as Dictionary.dat)
- Word Label:
UnRecognised00...UnRecognised99 - Location:
/microsd/NewWords.dat
#define LANG_UNKNOWN 0
#define LANG_ENGLISH 1
#define LANG_SPANISH 2
#define LANG_FRENCH 3
#define LANG_GERMAN 4
#define LANG_ITALIAN 5
#define LANG_PORTUGUESE 6
#define LANG_RUSSIAN 7
#define LANG_CHINESE 8
#define LANG_JAPANESE 9
#define LANG_KOREAN 10
#define LANG_ARABIC 11
#define LANG_HINDI 12
#define LANG_DUTCH 13
#define LANG_SWEDISH 14
#define LANG_TURKISH 15
#define LANG_POLISH 16
#define LANG_GREEK 17
#define LANG_HEBREW 18
#define LANG_VIETNAMESE 19dict_lookup_word(const uint8_t *seq, char *word_out, size_t word_out_len)- Searches
Dictionary.dat, thenNewWords.dat
- Searches
dict_add_unknown_word(const uint8_t *seq)- Appends unknown sequences into
NewWords.dat
- Appends unknown sequences into
dict_merge_new_words(void)- Appends
NewWords.datintoDictionary.dat, then deletesNewWords.dat
- Appends
create_language_file(void)- Creates
Language.datif missing, without overwriting existing data
- Creates
Generate initial dictionary files:
python3 generate_dicts.py /path/to/microsdExpected outputs:
/path/to/microsd/Language.dat(text format, ~235 bytes with defaults)/path/to/microsd/Dictionary.dat(empty template)
#define DICT_HEX_FIELD_CHARS 45
#define DICT_WORD_SIZE 26
#define DICT_RECORD_SIZE 73- Dictionary lookup: O(log n) binary search on sorted
Dictionary.dat - NewWords lookup: O(m), where m is unknown-word count
- Unknown append: O(1)
- Merge operation: O(k), where k is merge record count
| File | Count | Size | Bytes/Entry |
|---|---|---|---|
| Language.dat | 20 | ~235 B | variable |
| Dictionary.dat (1000 words) | 1000 | 73 KB | 73 |
| NewWords.dat (100 words) | 100 | 7.3 KB | 73 |
| Total | 1120 | ~80 KB | - |
- Language.dat created with 20 entries (text format)
- Dictionary.dat created (initially empty)
- Unknown word added to NewWords.dat
- Output includes
[NEW]tag for unknown words - Same unknown sequence recognized on second occurrence
-
dict_merge_new_words()merges entries successfully - NewWords.dat removed after merge
| Problem | Cause | Solution |
|---|---|---|
| No Language.dat | SD not ready | Check SD mount status |
| NewWords not created | SD write failed | Verify write permissions |
| Words not found | Phoneme mismatch | Check phoneme sequence |
| Merge fails | Append error | Free up SD space |
| Performance slow | NewWords too large | Run merge operation |