Field Guide
Gemini Omni Prompt Guide
A practical field guide. What works, what breaks, what to avoid. Based on DeepMind's official prompt guide + community testing from PixVerse, Atlas Cloud, Chrome Unboxed, and Medium's Gemini Omni Prompt Playbook.
The base formula
Most working Gemini Omni prompts follow this structure:
[Subject] + [Action] + [Setting] + [Camera] + [Lighting] + [Style] Example: "A red panda chef tossing pizza dough, in a cozy mountain kitchen, low-angle close-up, warm tungsten light, Pixar-style 3D animation".
This is the same DNA as Sora / Veo prompts, but Omni adds two unique elements that most other models don't have:
- The "
@username" character summon syntax (for Avatar feature) - The conversational editing chain ("now change X, keep Y identical")
The opening-line trick
Lock these three things in the very first sentence:
Create a [duration]-second [aspect-ratio] [genre] video in one continuous shot. Why: Omni interprets "one continuous shot" as "no cuts" and respects the time / aspect ratio you specify upfront. Specifying these inline beats putting them in metadata.
Camera vocabulary Omni understands
From DeepMind's prompt guide, these terms are explicitly parsed:
Camera motion verbs
- Push: push in / punch in / dolly zoom
- Pull: pull-back / pull-up / pull-down
- Reveal: ascend revealing / pull-back and rotate
- Orbit: orbit around / sweep around / circle
- Pan: pan left / pan right / vertical pan
- Static: locked off / fixed / oner / continuous shot
Style references
- natural smartphone zoom
- film camera (warm, slight grain)
- webcam style (compressed, slightly soft)
- handheld (micro-jitter, organic)
Duration sweet spots
| Goal | Best duration | Aspect ratio |
|---|---|---|
| Mood / cinematic | 8-10s | 16:9 or 2.39:1 |
| Product hero | 6-8s | 1:1 or 16:9 |
| Reels / TikTok / Shorts | 5-7s | 9:16 |
| Slow-motion impact | 5-6s | 1:1 or 16:9 |
| Timelapse | 10s (max) | 16:9 |
| Avatar talking head | 10s (max) | 16:9 or 9:16 |
Gemini Omni Flash hard limit: 10 seconds per clip. Source: TechCrunch launch coverage.
The "Keep X identical" lock
When using conversational editing (Omni's flagship feature), every follow-up turn should explicitly list what to preserve. Pattern:
[Change instruction]. Keep [X, Y, Z] exactly the same. Without this lock, Omni may re-style the entire scene when you ask it to change one element — losing the consistency that's the whole point of conversational editing. (Documented by Atlas Cloud's hands-on testing.)
The trigger pattern (for VFX)
One of Omni's strongest patterns — used in Google's own viral demos (mirror-arm-transformation, bubble sculpture, origami ships):
[Base scene]. When [specific trigger action], [specific transformation]. Keep [list] identical. Example: "A woman reaches toward a mirror. When her fingertips touch the glass, make the mirror ripple like liquid and her arm turn to reflective mirror material. Keep the parlor and lighting identical."
Known failure modes (be honest about these)
Text rendering
Any onscreen text — labels, signage, captions, brand logos — degrades. Avoid mentioning text overlays in your prompt. Verified by PixVerse hands-on.
Hand articulation
Hands holding objects, sign language, typing — fine articulation drifts. Frame to hide hands when possible, or accept some imperfection.
Multi-shot character consistency
Per Atlas Cloud's multi-turn review:
Omni scores 3/5 on character consistency across 4+ shots. Use @character_name
with a reference image for best results, and accept drift past shot 4.
Complex motion
Per digit.in's test: complex actions (dancing, gymnastics, instrument playing) show AI artifacts more than static shots. Simple actions (walking, standing, talking) work best.
Word count over 50
Per Seaart's analysis: prompts longer than ~50 words dilute focus and reduce output quality. Be specific but concise.
What NOT to do
- Don't write Veo-style adjective stacks — DeepMind explicitly says Omni "doesn't need to be as prescriptive as Veo". Natural language beats template formulas.
- Don't change multiple variables per turn — split into separate conversational turns.
- Don't reference copyrighted IPs by name — "Studio Ghibli style" is risky; "hand-painted watercolor animation" is safer.
- Don't use Veo / Sora-specific syntax — Omni parses some of it but optimizes for its own conversational style.
- Don't request hardware brand names ("shot on DJI Mavic Pro") — Omni doesn't parse camera brands; use motion verbs instead.
Avatar feature hard rules
- Age requirement: 18+
- Geography: NOT available in EEA / Switzerland / UK at launch
- Language: English only at launch (May 2026)
- Watermark: every video carries Google SynthID, non-optional, embedded in pixels
- Reference recording: clear eyes / nose / mouth, no sunglasses / masks / hats covering face, no other faces in background
Source: Google Gemini Avatar help page.
Where to test
- Gemini app — main interface (subscription-tier required for Omni Flash)
- Google Flow — full editing workspace
- YouTube Shorts / YouTube Create — free access to Omni Flash for short-form video
Sources
- Official: DeepMind Gemini Omni Prompt Guide
- Official launch: blog.google announcement
- Avatar specifics: Google Gemini Avatar Help
- Hands-on: Atlas Cloud Features Overview
- Hands-on: PixVerse Model Review
- Avatar hands-on: Chrome Unboxed Avatar Test
- Playbook: Medium — Gemini Omni Prompt Playbook
- Limitations review: digit.in hands-on