WAN VACE: All-in-One Video Creation and Editing

WAN VACE is a powerful unified framework for video tasks enabling reference-to-video generation, video-to-video editing, masked video-to-video editing, and free composition of these tasks in one model.

Get Started with WAN VACE →

WAN VACE Video Showcase

A young boy rises from his chair and walks briskly to the right side of the frame towards the edge of the sun-drenched frame, as if chasing a new adventure. His eyes were bright, and the corners of his mouth were slightly upturned, revealing curiosity and excitement about the unknown...

The video shows a person riding a horse on a wide grassland. He has light purple long hair and are dressed in traditional clothing, wearing a white top and black pants. The animation modeling style gives the impression that they are engaged in some outdoor activity or performance...

In a documentary style, a row of four meerkats dances together in the African savanna at noon...

In the style of classical oil painting, the background is a river, and in the center of the picture is a mature and elegant woman, wearing a long skirt and sitting on a chair. She took the red heart-shaped sunglasses from her arms with both hands and put them on...

An elegant lady is passionately playing the violin, with an entire symphony orchestra behind her...

Anime-style, hot-blooded teenager in bright orange long-sleeved pants sportswear, standing on a surfboard, facing the golden sunshine in the rough sea. The teenager's short yellow hair is flying in the wind, his eyes are firm, and he has a confident smile on the corner of his mouth...

All videos generated using WAN VACE framework

Try WAN VACE Now

Try WAN VACE Live Demo

Experience WAN VACE's capabilities through this interactive demo. Generate, edit, and transform videos with powerful multimodal inputs.

Loading demo, please wait...

Note: This demo runs on shared resources. For best performance, install WAN VACE locally.

Why choose WAN VACE?

Unified Framework

WAN VACE integrates multiple video tasks into a single model, reducing deployment complexity and providing a streamlined user experience.

Task Composition

Combine different video tasks (reference, editing, masking) in flexible ways to create complex creative scenarios previously impossible with single-task models.

Spatiotemporal Consistency

Advanced Video Condition Unit (VCU) technology maintains consistency across both temporal and spatial dimensions for high-quality outputs.

Who is WAN VACE for?

WAN VACE for Creators

Content Creators

Generate diverse video content from reference images, edit videos with precise controls, and seamlessly blend different elements.

WAN VACE for Researchers

Researchers & Developers

Build upon the unified architecture to create new video synthesis models and applications with the flexible Diffusion Transformer framework.

WAN VACE for Studios

Studios & Agencies

Create and edit promotional videos, transform existing footage, and generate new content with unprecedented control and efficiency.

WAN VACE features you'll love

Video Condition Unit (VCU)

Unified interface that integrates diverse input conditions including text, frames, and masks for flexible video generation.

Concept Decoupling

Separates visual modalities in editing and reference tasks, allowing the model to understand what to retain and what to modify.

Context Adapter

Pluggable architecture that injects different task concepts into the model through collaborative spatiotemporal representation.

Multiple Task Support

Handles reference-to-video generation, video-to-video editing, masked editing, and all their combinations in a single framework.

DiT Architecture

Built on Diffusion Transformer architecture for better scalability and handling of long video sequences.

Performance Parity

Achieves results comparable to task-specific models while providing the flexibility of a unified framework.

WAN VACE Models

Model Variants

WAN VACE provides multiple model variants to balance performance, speed, and resource requirements:

  • LTX-Video-based VACE: Faster generation with 2B parameters, suitable for quick iterations
  • Wan-T2V-based VACE: Higher-quality outputs with 14B parameters, supporting resolutions up to 720p
  • Context Adapter Tuning: Efficient parameter training method for faster convergence
HuggingFace Models

VACE Model Hub

on HuggingFace

Implementation

git clone https://github.com/ali-vilab/VACE.git && cd VACE
pip install torch==2.5.1 torchvision==0.20.1 --index-url https://download.pytorch.org/whl/cu124 # If PyTorch is not installed.
pip install -r requirements.txt
pip install wan@git+https://github.com/Wan-Video/Wan2.1 # If you want to use Wan2.1-based VACE.
pip install ltx-video@git+https://github.com/Lightricks/[email protected] sentencepiece --no-deps # If you want to use LTX-Video-0.9-based VACE. It may conflict with Wan.

The WAN VACE framework is also available through various integration methods:

  • Python API for custom integration
  • Gradio web interface for visual interaction
  • HuggingFace Spaces for cloud-based testing
  • Pre-built Docker containers for deployment

WAN VACE FAQ

What makes WAN VACE different from other video generation models?

WAN VACE uniquely integrates multiple video tasks into a single unified framework, allowing for task composition and flexible creative workflows not possible with single-task models.

What hardware is required to run WAN VACE?

For the LTX-Video-based version, modern GPUs with at least 8GB VRAM. The Wan-T2V-based version performs best with 16GB+ VRAM for high-resolution outputs.

Can WAN VACE be used for commercial projects?

Yes, WAN VACE is built for both research and commercial applications. Check the specific license terms for the model variant you're using.

Does WAN VACE support long video generation?

Yes, WAN VACE can handle longer video sequences thanks to its efficient spatiotemporal representation that maintains consistency throughout.

Can I fine-tune WAN VACE models?

Yes, WAN VACE supports both full fine-tuning and the more efficient Context Adapter Tuning method for customizing to specific domains.

How does task composition work in WAN VACE?

The Video Condition Unit (VCU) unifies inputs from different tasks, allowing you to combine reference images, editing instructions, and masks in the same generation process.