Skip to content

SourceShift/youtube-transcriber

Repository files navigation

✨ YouTube Transcript Downloader ✨

CI Status License: MIT npm version Node.js versions npm downloads

This is a TypeScript/Node.js command-line tool and library to download YouTube video transcripts and subtitles. It supports multiple languages, translation, and various output formats without requiring a headless browser.

Features

  • Fetches transcripts for any YouTube video.
  • Supports manually created and automatically generated subtitles.
  • Allows translation to any language supported by YouTube.
  • No headless browser required (unlike Selenium-based solutions).
  • Usable as a CLI tool and a Node.js library.
  • Multiple output formats (e.g., plain text, JSON, SRT, VTT - via formatters).
  • Proxy support (Generic HTTP/HTTPS and Webshare).
  • Cookie authentication for age-restricted videos.

Install

Install globally to use the CLI:

npm install -g youtube-transcriber

Or add to your project as a dependency:

npm install youtube-transcriber
# or
yarn add youtube-transcriber

CLI Usage

Get transcript for a video (defaults to English):

youtube-transcriber <video_id>

Specify languages (descending priority):

youtube-transcriber <video_id> --languages es en

Translate to a specific language (e.g., German):

youtube-transcriber <video_id> --languages en --translate de

List available transcripts for a video:

youtube-transcriber <video_id> --list-transcripts

Output in JSON format:

youtube-transcriber <video_id> --format json > transcript.json

Exclude auto-generated transcripts:

youtube-transcriber <video_id> --exclude-generated

Exclude manually-created transcripts:

youtube-transcriber <video_id> --exclude-manually-created

Using proxies:

# Generic HTTP/HTTPS proxy
youtube-transcriber <video_id> --http-proxy http://user:pass@host:port --https-proxy https://user:pass@host:port

# Webshare rotating residential proxies
youtube-transcriber <video_id> --webshare-proxy-username <your_username> --webshare-proxy-password <your_password>

Using cookies for authentication (e.g., for age-restricted videos):

youtube-transcriber <video_id> --cookies /path/to/your/cookies.txt

API Usage

import { YouTubeTranscriptApi, GenericProxyConfig, WebshareProxyConfig } from 'youtube-transcriber';

async function getTranscript(videoId: string) {
  try {
    // Simple fetch (defaults to English)
    const transcript = await YouTubeTranscriptApi.fetch(videoId);
    console.log(JSON.stringify(transcript, null, 2));

    // Fetch with specific languages
    const transcriptInSpanish = await YouTubeTranscriptApi.fetch(videoId, { languages: ['es', 'en'] });
    console.log(transcriptInSpanish);

    // List available transcripts
    const api = new YouTubeTranscriptApi(); // Instantiate for list or advanced proxy/cookie use
    const transcriptList = await api.list(videoId);

    // Find a specific transcript from the list and fetch it
    const specificTranscript = transcriptList.findTranscript(['de', 'en']);
    if (specificTranscript) {
      const fetched = await specificTranscript.fetch();
      console.log(fetched);

      // Translate it
      const translated = await specificTranscript.translate('fr').fetch();
      console.log(translated);
    }
  } catch (error) {
    console.error(error);
  }
}

// Example with Webshare proxy
async function getTranscriptWithWebshareProxy(videoId: string) {
  const proxyConfig = new WebshareProxyConfig('YOUR_WEBSHARE_USERNAME', 'YOUR_WEBSHARE_PASSWORD');
  const api = new YouTubeTranscriptApi(undefined, proxyConfig);
  try {
    const transcript = await api.list(videoId).then(list => list.findTranscript(['en'])?.fetch());
    console.log(transcript);
  } catch (error) {
    console.error(error);
  }
}

// getTranscript('dQw4w9WgXcQ');

Formatters

This library supports different output formatters for the transcript data. (Details on how to use formatters can be added here once implemented, similar to the Python version).

Available formatters (planned/included):

  • PlainTextFormatter
  • JSONFormatter
  • SRTFormatter
  • WebVTTFormatter

Example (conceptual):

// import { YouTubeTranscriptApi, JSONFormatter } from 'youtube-transcriber';

// const transcriptData = await YouTubeTranscriptApi.fetch(videoId);
// const formatter = new JSONFormatter();
// const formattedOutput = formatter.formatTranscript(transcriptData, { indent: 2 });
// console.log(formattedOutput);

Contributing

Contributions are welcome! Please follow these steps:

  1. Fork the repository.
  2. Create a new branch (git checkout -b feature/your-feature-name).
  3. Make your changes.
  4. Ensure tests pass (npm test).
  5. Ensure code is formatted and linted (npm run format and npm run lint).
  6. Commit your changes (git commit -am 'feat: Add some feature').
  7. Push to the branch (git push origin feature/your-feature-name).
  8. Open a Pull Request.

To setup the project locally:

npm install

Useful commands:

  • npm run build: Compile TypeScript to JavaScript.
  • npm run lint: Lint the codebase.
  • npm run format: Format the codebase with Prettier.
  • npm test: Run tests with Jest.
  • npm run coverage: Generate a coverage report.
  • npm run precommit: Runs lint, format, test, and build (useful before committing).

License

This project is licensed under the MIT License - see the LICENSE file for details.

About

A command-line tool and Node.js library to download YouTube video transcripts and subtitles. Supports multiple languages and translation.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors