Twelve Labs

Free | Freemium | Paid | AI Video

Overview

Twelve Labs is a video intelligence platform that gives developers API access to AI models capable of understanding video content the way humans do. Instead of relying on metadata or transcripts alone, Twelve Labs' models — Marengo and Pegasus — simultaneously process video frames, audio, spoken words, and on-screen text to build a rich, multimodal understanding of each video. Developers use this to build applications that can search video libraries with natural language queries ('find all moments where a person enters through the left door'), generate summaries, auto-create clips for social media, and classify visual events at scale. With 30,000+ developers on the platform and backers including NVIDIA, Samsung, Intel, Databricks, and Snowflake, Twelve Labs is a foundational infrastructure layer for video-native AI applications. Industries using it include media and entertainment, advertising, automotive, and government security — use cases range from content discovery to real-time threat detection.

Features

Multimodal video understanding -- indexes visual, audio, speech, and on-screen text together for human-like video comprehension
Natural language video search -- query video libraries in plain English and get back precise timestamps and clips
Marengo model -- purpose-built video search and embedding model with fast, accurate multimodal retrieval
Pegasus model -- generative video AI for summaries, chapter markers, highlight reels, and content analysis
Embed API -- create multimodal embeddings from video, audio, image, or text for custom AI applications
Automated clip generation -- extract shareable highlight clips from long-form content automatically
Content summarization -- generate concise summaries and headlines from any video with a single API call
Scene classification -- categorize and tag video content at the scene level without manual review
Natural language queries -- ask questions about video content and get timestamp-level answers
Real-time processing -- index and analyze video content as part of live or near-live workflows
AWS Bedrock integration -- deploy Twelve Labs models on Amazon Bedrock for enterprise infrastructure
30,000+ developer community -- active developer ecosystem with SDKs, sample apps, and documentation

Best For

Developers building video search and discovery features for media platforms or content libraries, Product teams adding video understanding capabilities to their applications via API, Media and entertainment companies automating content tagging, clipping, and summarization at scale, Advertising platforms using contextual video analysis for intelligent ad placement, Security and government teams building real-time visual event detection and threat monitoring systems

How It Works

Start by indexing your video content through the Twelve Labs API. The Marengo model processes and indexes each video for fast multimodal search — you can then run natural language queries against your video library and get back timestamps, clips, and relevance scores. The Pegasus model adds generative capabilities: analyze a video to get summaries, highlight reels, chapter markers, or answers to specific questions about the content. The Embed API lets you create multimodal embeddings from video, audio, image, or text inputs for use in vector databases and recommendation systems. Everything runs through REST API with Python and Node.js SDKs available. The Free plan gives 600 minutes of indexing to test the platform. The Developer plan bills per usage with no upfront commitment. Enterprise accounts get dedicated infrastructure, custom rate limits, and model fine-tuning options.

Visit Twelve Labs