一个本地运行的语音对话助手:支持实时录音、句末语音识别(STT)、自动提炼客户问题,并调用大模型生成银行贷款销售话术建议与复盘报告。
- 实时对话(main.py):讯飞 WebSocket 句末识别优先,失败自动回退 Qwen ASR / Whisper / Vosk
- hybrid 预览模式:实时转写预览(Vosk)+ 句末高质量问答
- 说话人角色识别(客户/销售):默认过滤“销售话术里的问题”,只对客户问题问答
- 问句提炼:对口语化短句/指代不明问题,结合上下文改写为可复述的完整问题
- AI 输出模式:简答(效率)/严谨(质量)/自动(按长度阈值走快慢路径)
- 语音预处理:尾部静音裁剪、webrtcvad 语音帧过滤、可选降噪
- 文件工具(test_file_mode.py):识别单个音频文件并问答
- 批处理(process_folder.py):批量处理文件夹内音频
- 安装 Python 3.10+(建议使用虚拟环境)
- 安装依赖:
pip install -r requirements.txt- 下载 Vosk 模型(用于离线识别与 hybrid 预览):
python download_model.py- 配置
.env(见下文),然后启动:
python main.py项目会读取 .env。你可以基于 .env.example 创建 .env(Windows 可直接复制文件后编辑)。
IFLYTEK_APP_ID=你的AppId
IFLYTEK_API_SECRET=你的ApiSecret
IFLYTEK_API_KEY=你的ApiKey可选参数(有默认值):
IFLYTEK_VAD_EOS=2000
IFLYTEK_FRAME_SIZE=8000
IFLYTEK_SEND_INTERVAL=
IFLYTEK_PING_INTERVAL=20
IFLYTEK_PING_TIMEOUT=10
IFLYTEK_WSS_RETRIES=2DASHSCOPE_API_KEY=你的DashScopeKey # 或 QWEN_API_KEY
QWEN_ASR_MODEL=qwen3-asr-flash-realtime
QWEN_ASR_LANGUAGE=zh
QWEN_ASR_TIMEOUT=20
QWEN_ASR_URL=WHISPER_MODEL=base
WHISPER_DEVICE=cpu
WHISPER_LANGUAGE=VOSK_MODEL_PATH=model基础配置(OpenAI 兼容接口):
OPENAI_API_KEY=your_api_key_here
OPENAI_BASE_URL=https://api.openai.com/v1
OPENAI_TIMEOUT=30
OPENAI_MAX_RETRIES=1模型选择:
LLM_MODEL=qwen3-omni-flash-realtime按模型名独立路由(推荐;支持不同 key/base_url/timeout 等):
LLM_QWEN3_OMNI_FLASH_REALTIME_API_KEY=...
LLM_QWEN3_OMNI_FLASH_REALTIME_BASE_URL=...
LLM_QWEN3_OMNI_FLASH_REALTIME_TIMEOUT=30
LLM_QWEN3_OMNI_FLASH_REALTIME_MAX_RETRIES=1LLM_ANSWER_MODE=auto # auto / brief / rigorous
LLM_FAST_ENABLE=1
LLM_FAST_THRESHOLD=140
LLM_FAST_TEMPERATURE=0.2
LLM_FAST_MAX_TOKENS=220
LLM_BRIEF_TEMPERATURE=0.2
LLM_BRIEF_MAX_TOKENS=220
LLM_RIGOROUS_TEMPERATURE=0.4
LLM_RIGOROUS_MAX_TOKENS=650
LLM_TEMPERATURE=0.7
LLM_MAX_TOKENS=500
LLM_TOP_P=1AI_QUEUE_MAX=50
AI_WORKERS=1
AI_USE_CONTEXT=1
AI_REFINE_SHORT_LEN=20
AI_REFINE_ALWAYS=0
CTX_KEEP_SEGMENTS=20
CTX_FOR_AI_SEGMENTS=8
CTX_MAX_CHARS=900ROLE_FILTER_ENABLE=1
ROLE_CONTEXT_INCLUDE_SALES=1
ROLE_PREFIX_ENABLE=1
ROLE_ASSUME_UNKNOWN_AS_CUSTOMER=1
ROLE_SALES_MARKERS=
ROLE_CUSTOMER_MARKERS=MIC_MODE=vad # vad / poll
MIC_DEVICE_INDEX=
MIC_ENERGY_THRESHOLD=450
MIC_SILENCE_CHUNKS=25
MIC_MIN_CHUNKS=8
MIC_SILENCE_MS=
MIC_MIN_MS=
MIC_VAD_ENGINE=energy # energy / silero
SILERO_VAD_THRESHOLD=0.5
SILERO_VAD_THREADS=1
MIC_TRIM_ENABLE=1
MIC_TRIM_PAD_CHUNKS=2
MIC_AUTO_THRESHOLD=1
MIC_CALIBRATE_SECONDS=1.0
MIC_THRESHOLD_MULTIPLIER=3.0
MIC_THRESHOLD_OFFSET=50
MIC_THRESHOLD_MIN=200
MIC_THRESHOLD_MAX=2000
MIC_PRINT_THRESHOLD=0
MIC_DEBUG=0
MIC_DEBUG_INTERVAL=10
PRINT_TRANSCRIPT=0
SUMMARY_ON_EXIT=1
SUMMARY_INCLUDE_STATEMENTS=1CHAT_MODE=final # final / hybrid / preview / stream
PREVIEW_INTERVAL=0.25
PREVIEW_MIN_CHARS=2
AUDIO_WEBRTCVAD_ENABLE=0
AUDIO_WEBRTCVAD_MODE=1
AUDIO_WEBRTCVAD_FRAME_MS=20
AUDIO_WEBRTCVAD_PAD_MS=60
AUDIO_DENOISE_ENABLE=0
AUDIO_DENOISE_NOISE_SECONDS=0.4KEYWORD_CORRECTION_ENABLE=1
KEYWORD_CORRECTIONS=FILE_WSS_PREPROCESS=0
AI_MAX_QUESTIONS_PER_FILE=10
IFLYTEK_SECRET_KEY=python main.pypython test_file_mode.py
python test_file_mode.py guangfeng/motor-saas-call-center-7496353756104101950.mp3python process_folder.py效率优先(简答,低延迟):
$env:MIC_MODE="vad"
$env:MIC_VAD_ENGINE="silero"
$env:MIC_SILENCE_MS="350"
$env:MIC_MIN_MS="180"
$env:MIC_TRIM_ENABLE="1"
$env:MIC_TRIM_PAD_CHUNKS="1"
$env:AI_WORKERS="2"
$env:AI_USE_CONTEXT="0"
$env:ROLE_FILTER_ENABLE="1"
$env:LLM_MODEL="qwen3-omni-flash-realtime"
$env:LLM_ANSWER_MODE="brief"
$env:LLM_FAST_ENABLE="1"
$env:LLM_FAST_THRESHOLD="220"
$env:LLM_BRIEF_MAX_TOKENS="220"
$env:LLM_BRIEF_TEMPERATURE="0.2"
python main.py质量优先(严谨,强上下文):
$env:MIC_MODE="vad"
$env:MIC_VAD_ENGINE="silero"
$env:MIC_SILENCE_MS="850"
$env:MIC_MIN_MS="260"
$env:MIC_TRIM_ENABLE="1"
$env:MIC_TRIM_PAD_CHUNKS="2"
$env:AUDIO_WEBRTCVAD_ENABLE="1"
$env:AUDIO_WEBRTCVAD_MODE="2"
$env:AUDIO_DENOISE_ENABLE="1"
$env:AI_WORKERS="1"
$env:AI_USE_CONTEXT="1"
$env:AI_REFINE_SHORT_LEN="30"
$env:AI_REFINE_ALWAYS="1"
$env:CTX_FOR_AI_SEGMENTS="10"
$env:CTX_KEEP_SEGMENTS="30"
$env:ROLE_FILTER_ENABLE="1"
$env:ROLE_CONTEXT_INCLUDE_SALES="1"
$env:LLM_MODEL="qwen3-coder-plus"
$env:LLM_ANSWER_MODE="rigorous"
$env:LLM_RIGOROUS_MAX_TOKENS="650"
$env:LLM_RIGOROUS_TEMPERATURE="0.4"
python main.py- 未找到 Vosk 模型:先运行
python download_model.py。 - 讯飞报 11201/“licc failed”:脚本会自动尝试切换到 Qwen ASR / Whisper / Vosk。
- 音频格式不支持:工具会用 ffmpeg 转成 16k mono wav;若转换失败,请先确认本机可正常调用 ffmpeg(项目使用 imageio-ffmpeg 内置 ffmpeg)。