doubao-asr

Agent Skill for transcribing audio files via ByteDance Volcengine Seed-ASR 2.0 (豆包录音文件识别模型2.0)

Best-in-class Chinese speech recognition — Mandarin, Cantonese, Sichuan dialect, and 13+ languages. With speaker diarization, up to 5 hours / 512MB per file.

中文语音识别准确率业界领先——支持普通话、粤语、四川话等方言及 13+ 种语言。自带说话人分离，单文件最长 5 小时 / 512MB。

Why this skill?

Setting up Volcengine's Doubao Audio File Recognition 2.0 (豆包录音文件识别模型2.0, recorded audio → text) from scratch involves 4 environment variables across 3 different console pages (Speech console, IAM, TOS). Without guidance, this typically takes 1-2 hours of doc-hunting and trial-and-error.

This skill provides step-by-step bilingual setup instructions (中英双语) baked into SKILL.md, so your AI agent can walk you through the entire process in ~10 minutes.

从零配置火山引擎豆包录音文件识别模型2.0（录音转文字）涉及 3 个不同控制台页面的 4 个环境变量。没有引导的话通常需要 1-2 小时翻文档踩坑。本 skill 在 SKILL.md 中内置了中英双语分步引导，AI agent 可以在约 10 分钟内带你完成全部配置。

Install

Claude Code

claude /install-skill https://github.com/vahnxu/doubao-asr

Or manually copy to ~/.claude/skills/doubao-asr/.

OpenClaw / ClawHub

Available on ClawHub as doubao-asr.

Manual

git clone https://github.com/vahnxu/doubao-asr.git
cd doubao-asr
pip install requests
python3 scripts/transcribe.py /path/to/audio.m4a

Quick start

# Basic transcription
python3 scripts/transcribe.py /path/to/audio.m4a

# Save to file
python3 scripts/transcribe.py /path/to/audio.m4a --out /tmp/transcript.txt

# JSON output
python3 scripts/transcribe.py /path/to/audio.m4a --json --out /tmp/result.json

# Direct URL (skip upload)
python3 scripts/transcribe.py https://example.com/audio.mp3

# Disable speaker diarization
python3 scripts/transcribe.py /path/to/audio.m4a --no-speakers

Credentials

See the detailed setup guide in SKILL.md — with step-by-step instructions for each environment variable.

Variable	Required	Description
`VOLCENGINE_API_KEY`	Yes	ASR API key (UUID) from Speech console
`VOLCENGINE_ACCESS_KEY_ID`	Yes	IAM Access Key ID (starts with `AKLT`)
`VOLCENGINE_SECRET_ACCESS_KEY`	Yes	IAM Secret Access Key
`VOLCENGINE_TOS_BUCKET`	Yes	TOS bucket name
`VOLCENGINE_TOS_REGION`	Yes	TOS region code, must match bucket region. Overseas: e.g. `cn-hongkong`, `ap-southeast-1`; China: `cn-beijing`

Supported formats

WAV, MP3, MP4, M4A, OGG, FLAC — up to 5 hours, 512MB max.

License

Apache-2.0

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
references		references
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
SKILL.md		SKILL.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

doubao-asr

Why this skill?

Install

Claude Code

OpenClaw / ClawHub

Manual

Quick start

Credentials

Supported formats

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Languages

Folders and files

Latest commit

History

Repository files navigation

doubao-asr

Why this skill?

Install

Claude Code

OpenClaw / ClawHub

Manual

Quick start

Credentials

Supported formats

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Languages

Packages