Agentic Video Editor Automates Ad Production Using Gemini and FFmpeg

A new open-source project demonstrates how AI agents can transform raw footage into polished video advertisements without manual editing. The agentic-video-editor, launched on GitHub on April 14, 2026, uses an ensemble of specialized AI agents coordinated with Google Gemini and FFmpeg to automate the complex multi-step process of video production.

The system takes high-level creative direction—such as "turn this 90-minute webinar into a 3-minute highlight reel"—and autonomously handles scene detection, shot selection, timing optimization, and final rendering. Within four days of launch, the repository accumulated 240 stars, signaling strong community interest in AI-powered video automation.

Five-Stage Agent Architecture Handles Complex Editing Workflows

The agentic-video-editor employs a pipeline of specialized agents, each responsible for distinct editing tasks:

Preprocessing agent: Performs scene detection, transcription, and shot indexing on raw footage
Director agent: Selects appropriate shots and creates an EditPlan based on the creative brief
Trim Refiner: Optimizes cut timing for tight pacing and narrative flow
Editor agent: Renders the final video using FFmpeg, handling concatenation and effects
Reviewer agent: Scores output quality and generates improvement suggestions

The system leverages Gemini 3's ability to process up to an hour of video in a single context window, enabling comprehensive analysis of long-form content without chunking.

Domain Tools and LLMs Work in Coordination

Unlike traditional AI video tools that rely solely on language models, the agentic approach combines LLM reasoning with specialized domain tools. Agents invoke Gemini for planning and evaluation tasks while calling FFmpeg directly for rendering operations. This hybrid architecture allows the system to handle both creative decisions (shot selection, pacing) and technical execution (video encoding, transitions) within a unified workflow.

The project joins a growing ecosystem of agentic video tools, including OpenMontage's 12-pipeline system with 500+ agent skills and n8n's workflow templates for extracting viral clips from YouTube content.

Video Editing Bottleneck Drives Automation Innovation

Video editing has traditionally required significant human expertise and time, creating bottlenecks for content-heavy workflows like social media marketing, highlight reels, and advertising. Agentic systems that accept high-level creative direction and autonomously execute technical production represent a productivity leap for repetitive video formats.

The approach demonstrates how vision models and agent orchestration can tackle domain-specific automation challenges that were impractical with earlier AI generations.

Key Takeaways

The agentic-video-editor uses five specialized AI agents to automate video production from raw footage to final render
The system leverages Gemini 3's hour-long video context window and FFmpeg for professional-grade rendering
The project gained 240 GitHub stars within four days of launching on April 14, 2026
Agents coordinate LLM reasoning (creative decisions) with domain tools (FFmpeg rendering) in a hybrid architecture
The tool targets repetitive video formats like ads, highlight reels, and social media clips where automation can provide significant productivity gains

Five-Stage Agent Architecture Handles Complex Editing Workflows

The agentic-video-editor employs a pipeline of specialized agents, each responsible for distinct editing tasks:

Preprocessing agent: Performs scene detection, transcription, and shot indexing on raw footage

Director agent: Selects appropriate shots and creates an EditPlan based on the creative brief

Trim Refiner: Optimizes cut timing for tight pacing and narrative flow

Editor agent: Renders the final video using FFmpeg, handling concatenation and effects

Reviewer agent: Scores output quality and generates improvement suggestions

The system leverages Gemini 3's ability to process up to an hour of video in a single context window, enabling comprehensive analysis of long-form content without chunking.

Domain Tools and LLMs Work in Coordination

Video Editing Bottleneck Drives Automation Innovation

The approach demonstrates how vision models and agent orchestration can tackle domain-specific automation challenges that were impractical with earlier AI generations.

Key Takeaways

The agentic-video-editor uses five specialized AI agents to automate video production from raw footage to final render

The system leverages Gemini 3's hour-long video context window and FFmpeg for professional-grade rendering

The project gained 240 GitHub stars within four days of launching on April 14, 2026

Agents coordinate LLM reasoning (creative decisions) with domain tools (FFmpeg rendering) in a hybrid architecture

The tool targets repetitive video formats like ads, highlight reels, and social media clips where automation can provide significant productivity gains