A Joint Project by Zhejiang University & vivo AI

MagicTryOn: The New Standard in Video Virtual Try-On

Effortlessly dress any person in any video with any piece of clothing. Experience unparalleled realism, temporal consistency, and detail clarity with tmagictryon.

Get The Code on GitHub Read The Paper

Solving the Three Core Challenges of Virtual Try-On

Traditional methods fail where MagicTryOn excels.

🚫

Garment Deformation & Glitches

Weak U-Net models struggle with details, causing clothes to warp, disappear, or render incorrectly.

🚫

Motion Inconsistency

Poor temporal modeling means garments jitter, lag, or misalign during rapid movements like dancing.

🚫

Lack of Realism

Failure to reproduce textures, outlines, and lighting results in a video that looks artificial and unconvincing.

The MagicTryOn Advantage

Where true creative freedom meets cutting-edge technology.

Universal Compatibility

Use any person's video with any garment image. No need for specific body models, templates, or pose libraries. Total flexibility.

Robust in High-Motion Video

Even with intense movements like dancing or turning, MagicTryOn ensures the garment remains stable, coherent, and perfectly tracked.

Photorealistic Detail & Clarity

Renders textures (lace, prints), contours (seams, collars), and structure with such precision that the output looks like a real-world video recording.

Core Technology Highlights

A brief look at the innovations powering tmagictryon.

Diffusion Transformer (DiT) Backbone

MagicTryOn replaces the traditional U-Net with a powerful Diffusion Transformer. This allows it to model long-range dependencies across video frames, ensuring spatiotemporal consistency.

The result: clothes that move naturally with the person, without jitter, lag, or artifacts, while the diffusion process guarantees high-fidelity, detailed image generation.

# How it works:
1. Extract pose & body shape from video.
2. Extract multi-level features from garment image.
3. Feed video data + noise into DiT model.
4. DiT performs frame-by-frame diffusion generation.
5. Output a new video of the person wearing the garment.

A Two-Stage Control Strategy:

1. Coarse Guidance: A "garment token" provides a strong initial signal about what clothes to wear.
2. Fine-Grained Conditioning: CLIP features, texture maps, and outlines are injected to define material, color, and fit.

Coarse-to-Fine Garment Preservation

To ensure the garment's structure is perfectly preserved, MagicTryOn uses a dual control strategy. This tells the model not only *what* to wear, but exactly *how* it should look and behave, from overall shape down to the finest wrinkle.

Additionally, a unique **mask-aware loss function** focuses the training process exclusively on the garment area, preventing the model from being distracted and dramatically improving clothing fidelity.

Get Started with MagicTryOn

Bring MagicTryOn to your own projects. Follow these steps to set up the environment.

1. Create Conda Environment & Install Requirements

conda create -n magictryon python==3.10
conda activate magictryon
pip install -r requirements.txt

2. Download Model Weights

cd Magic-TryOn
HF_ENDPOINT=https://hf-mirror.com huggingface-cli download LuckyLiGY/MagicTryOn --local-dir ./weights/MagicTryOn_14B_V1

3. Run Demo Inference

For image or video try-on, use the provided prediction scripts. Full instructions for custom try-on are available on GitHub.

# Example for Image Try-On
CUDA_VISIBLE_DEVICES=0 python predict_image_tryon_up.py

Project News

June 9, 2025: Code & weights released on HuggingFace!
May 27, 2025: Our technical paper is now available on arXiv.

Release Roadmap

✅ Source Code
✅ Inference Demo & Pretrained Weights
✅ Customized Try-On Utilities
⬜️ Testing & Training Scripts
⬜️ V2 Model Weights
⬜️ Gradio App Update