Skip to content

Home

Vid2Txt

A Python package for transcribing videos/audios to text using various speech-to-text services. Currently supports AssemblyAI for high-quality transcription.

Features

  • Download and transcribe from YouTube or any URL (via yt-dlp)
  • Extract audio from video files using FFmpeg
  • Direct support for audio formats (MP3, WAV, M4A, AAC, FLAC, OGG, WMA)
  • Transcribe audio using AssemblyAI API
  • Export transcripts in multiple formats:
    • Plain text (.txt)
    • SubRip subtitles (.srt)
    • Interactive HTML (.html) with embedded video/audio player
  • Language forcing support

Installation

pip install vid2txt

Setup

Set your AssemblyAI API key as an environment variable:

export ASSEMBLYAI_API_KEY="your-api-key-here"

# On Windows (PowerShell):
$env:ASSEMBLYAI_API_KEY="your-api-key-here"

Get a free API key from: https://www.assemblyai.com/dashboard/signup

Usage

Command Line Interface

vid2txt MEDIA_PATH [OPTIONS]

Where MEDIA_PATH can be: - A local video file (.mp4, .mkv, .mov, ...) - A local audio file (.mp3, .wav, ...) - A YouTube or other URL

Options

  • -o, --output-dir: Output directory (default: same as input file)
  • -l, --language: Force transcription language (e.g., 'en', 'ar', 'es')
  • --model: Speech-to-text model to use (currently only 'assemblyai')
  • --force-audio-extract: Force re-extraction of audio from video files
  • --audio: Download audio only when using YouTube URLs (faster)

Examples

# Transcribe a local video file
vid2txt video.mp4

# Transcribe a local audio file
vid2txt podcast.mp3

# Transcribe from YouTube (downloads best video+audio)
vid2txt "https://www.youtube.com/watch?v=dQw4w9WgXcQ"

# Download and transcribe audio only (faster)
vid2txt "https://www.youtube.com/watch?v=dQw4w9WgXcQ" --audio

# Specify output directory and language
vid2txt https://www.youtube.com/watch?v=dQw4w9WgXcQ -o ./output -l en

# Force re-extraction of audio even if cached
vid2txt video.mp4 --force-audio-extract

# Show help
vid2txt -h

Python API

from vid2txt import Transcriber
from pathlib import Path

media_path = Path("video.mp4") # or Path("audio.mp3"), or a URL
output_dir = Path("output") # Output directory
api_key = "your_api_key" # if using assemblyai


transcriber = Transcriber(
    output_dir=output_dir,
    language="en",
    model="assemblyai",
    api_key=api_key
)


segments = transcriber.transcribe(media_path=media_path)

# Save in different formats
transcriber.save_plain_text(
    segments=segments, 
    out_path=output_dir / Path("transcript.txt")
)
transcriber.save_srt(
    segments=segments,
    out_path=output_dir / Path("transcript.srt")
)
transcriber.save_html(
    segments=segments, 
    out_path=output_dir / Path("transcript.html"), 
    media_path=media_path
)

Examples