How does the @ reference system work in Seedance 2.0?

Type @ in the prompt box to open the asset list, then assign each uploaded file a specific role. For example: '@image1 as the opening frame, reference @video1 for camera movement, use @audio1 for background music'. The model follows your assignments precisely instead of guessing.

What are the input limits for Seedance 2.0?

Up to 9 images, 3 video clips (total ≤15 seconds), 3 audio files in MP3 format (total ≤15 seconds), and text prompts. Maximum 12 reference files per generation.

What video output does Seedance 2.0 produce?

4 to 15 seconds of video at up to 2K resolution, with native audio (dialogue, music, sound effects). Supported aspect ratios: 16:9, 9:16, and 1:1.

Can I upload real human face photos to Seedance 2.0?

No. Seedance 2.0 does not allow uploads containing clear real human faces — these are blocked by the system's content moderation. Use illustrated or stylized character references instead.

The Complete Guide to Seedance 2.0: Multimodal AI Video Creation from Scratch

Q: What are the two creation modes in Seedance 2.0?

Seedance 2.0 offers two modes: First/Last Frame mode (upload one image + text description for simple generations) and All-in-One Reference mode (combine up to 12 files across images, videos, audio, and text for maximum creative control).

February 12, 2026

Seedance 2.0 is ByteDance's multimodal AI video model that generates cinematic video from text, images, video clips, and audio. It offers two creation modes, an @ reference system for precise asset control, and native audio generation — all in one workflow. Here's how to use every feature.

Two Creation Modes

Seedance 2.0 provides two entry points, each suited to different workflows:

First/Last Frame Mode

Upload one image as the opening or closing frame
Add a text description of the desired motion and scene
Best for: simple animations, image-to-video conversions, quick tests

All-in-One Reference Mode (Recommended)

Combine images + video clips + audio + text in a single generation
Supports up to 12 reference files simultaneously
Best for: complex multi-asset productions, music videos, character-driven narratives

Input Specifications

Input Type	Limit	What It Controls
Images	Up to 9	Character appearance, scene style, product details
Video clips	Up to 3 (total ≤15s)	Camera movement, action rhythm, transition effects
Audio files	Up to 3 MP3 (total ≤15s)	Background music, sound effects, voiceover tone
Text	Natural language	Scene description, action instructions, mood

Total file limit: 12 reference files per generation.

The @ Reference System

This is the most important feature to learn. The @ system lets you assign a specific role to each uploaded file — the model follows your assignments precisely instead of guessing.

How to Use @

Upload your assets (images, videos, audio)
In the prompt box, type @ to open the asset picker
Select a file and describe its role in the generation

Example Prompt with @ References

@image1 as the opening frame character,
reference @video1 for camera movement (slow push-in to close-up),
use @audio1 for background music,
@image2 as the environment reference.
The character walks toward the camera under warm sunset lighting.

Key Rules

Every uploaded file should be explicitly assigned with @
Hover over assets to preview and verify you're referencing the correct file
The model executes exactly what you assign — no guessing

Prompt Writing Techniques

1. Write by Timeline

Break your prompt into time segments for precise control:

0–3s: "Wide shot of a city skyline at dawn, slow pan right"
4–8s: "Cut to medium shot, character enters from the left, walking"
9–12s: "Push-in to close-up on character's face, soft focus background"

2. Use Specific Camera Language

The model understands professional cinematography terms:

Push-in / Pull-out — zoom toward or away from the subject
Pan — horizontal camera movement
Tilt — vertical camera movement
Tracking shot — camera follows the subject's movement
Orbit — camera circles around the subject
One-take — continuous unbroken shot

3. Describe Transitions

When creating multi-shot sequences, specify how scenes connect:

"Fade from outdoor scene to indoor close-up"
"Match cut from spinning coin to spinning globe"
"Whip pan transition to the next scene"

4. Distinguish Reference vs. Instruction

Reference: "@video1 for camera movement" — the model extracts and replicates the camera work
Instruction: "slow push-in from wide to close-up" — the model generates the movement from your text description

Core Capabilities

Image Quality

Physics-accurate motion (gravity, fabric draping, fluid dynamics)
Smooth, natural human and animal movement
Precise prompt adherence
Consistent visual style throughout

Multimodal Combination

Extract camera movement from a reference video
Extract character appearance from reference images
Extract musical rhythm from reference audio
Combine all three in a single generation

Character Consistency

Face, clothing, and expression preservation across shots
Brand element consistency (logos, colors, typography)
Scene style consistency (lighting, atmosphere)

Camera and Motion Replication

Replicate specific cinematography techniques from reference videos
Hitchcock zoom, orbit tracking, one-take sequences
Precise motion speed and rhythm matching

Output Specifications

Duration: 4–15 seconds (selectable)
Resolution: Up to 2K / 1080p
Aspect ratios: 16:9 (landscape), 9:16 (portrait), 1:1 (square)
Audio: Native — includes dialogue sync, background music, sound effects
Generation speed: ~30 points per 15-second video, 10x faster than previous generation

Important Notes

No real human faces — uploads containing clear real human faces are blocked by content moderation
Quality over quantity — upload only the assets that have the strongest impact on your desired output
Verify @ assignments — hover over each asset reference to confirm correct file mapping
Model randomness — results vary between generations; generate multiple times and pick the best
Available on: Jimeng (即梦), Doubao (豆包), Volcano Engine (火山引擎)

Frequently Asked Questions

What are the two creation modes?

First/Last Frame mode (one image + text) for simple generations, and All-in-One Reference mode (up to 12 multimodal files) for complex productions.

How does the @ reference system work?

Type @ in the prompt box, select an uploaded file, and describe its role. Example: "@image1 as character reference, @video1 for camera movement." The model follows your assignments precisely.

What are the input limits?

Up to 9 images, 3 video clips (≤15s total), 3 audio files (≤15s total), and text. Maximum 12 files per generation.

What output does it produce?

4–15 seconds of video at up to 2K resolution with native audio, in 16:9, 9:16, or 1:1 aspect ratios.

Can I use real human photos?

No. Uploads with clear real human faces are blocked by content moderation. Use stylized or illustrated character references.

Ready to start creating? Try Seedance 2.0 now — free trial available.