GEMINI OMNI PROMPTS

Field Guide

Gemini Omni Prompt Guide

A practical field guide. What works, what breaks, what to avoid. Based on DeepMind's official prompt guide + community testing from PixVerse, Atlas Cloud, Chrome Unboxed, and Medium's Gemini Omni Prompt Playbook.

The base formula

Most working Gemini Omni prompts follow this structure:

[Subject] + [Action] + [Setting] + [Camera] + [Lighting] + [Style]

Example: "A red panda chef tossing pizza dough, in a cozy mountain kitchen, low-angle close-up, warm tungsten light, Pixar-style 3D animation".

This is the same DNA as Sora / Veo prompts, but Omni adds two unique elements that most other models don't have:

The opening-line trick

Lock these three things in the very first sentence:

Create a [duration]-second [aspect-ratio] [genre] video in one continuous shot.

Why: Omni interprets "one continuous shot" as "no cuts" and respects the time / aspect ratio you specify upfront. Specifying these inline beats putting them in metadata.

Camera vocabulary Omni understands

From DeepMind's prompt guide, these terms are explicitly parsed:

Camera motion verbs

Style references

Duration sweet spots

GoalBest durationAspect ratio
Mood / cinematic8-10s16:9 or 2.39:1
Product hero6-8s1:1 or 16:9
Reels / TikTok / Shorts5-7s9:16
Slow-motion impact5-6s1:1 or 16:9
Timelapse10s (max)16:9
Avatar talking head10s (max)16:9 or 9:16

Gemini Omni Flash hard limit: 10 seconds per clip. Source: TechCrunch launch coverage.

The "Keep X identical" lock

When using conversational editing (Omni's flagship feature), every follow-up turn should explicitly list what to preserve. Pattern:

[Change instruction]. Keep [X, Y, Z] exactly the same.

Without this lock, Omni may re-style the entire scene when you ask it to change one element — losing the consistency that's the whole point of conversational editing. (Documented by Atlas Cloud's hands-on testing.)

The trigger pattern (for VFX)

One of Omni's strongest patterns — used in Google's own viral demos (mirror-arm-transformation, bubble sculpture, origami ships):

[Base scene]. When [specific trigger action], [specific transformation]. Keep [list] identical.

Example: "A woman reaches toward a mirror. When her fingertips touch the glass, make the mirror ripple like liquid and her arm turn to reflective mirror material. Keep the parlor and lighting identical."

Known failure modes (be honest about these)

Text rendering

Any onscreen text — labels, signage, captions, brand logos — degrades. Avoid mentioning text overlays in your prompt. Verified by PixVerse hands-on.

Hand articulation

Hands holding objects, sign language, typing — fine articulation drifts. Frame to hide hands when possible, or accept some imperfection.

Multi-shot character consistency

Per Atlas Cloud's multi-turn review: Omni scores 3/5 on character consistency across 4+ shots. Use @character_name with a reference image for best results, and accept drift past shot 4.

Complex motion

Per digit.in's test: complex actions (dancing, gymnastics, instrument playing) show AI artifacts more than static shots. Simple actions (walking, standing, talking) work best.

Word count over 50

Per Seaart's analysis: prompts longer than ~50 words dilute focus and reduce output quality. Be specific but concise.

What NOT to do

Avatar feature hard rules

Source: Google Gemini Avatar help page.

Where to test

Sources