How to Create the Viral AI Empire State Building Flag Video Using Just 2 Prompts
If you've been scrolling through TikTok, Instagram Reels, YouTube Shorts, or X lately, you've probably seen the viral AI Empire State Building Flag videos. They look incredibly realistic, featuring two people standing on top of a towering antenna while holding a giant black flag with custom text. The cinematic camera movement, realistic wind, and helicopter fly-by make the videos look like real documentary footage.
The best part? You can recreate this trend yourself in just a few minutes.
All you need are two AI prompts—one for generating the image and another for turning that image into a realistic cinematic video. In this guide, I'll show you exactly how it works and where to use each prompt.
Step 1: Generate the AI Image
The first prompt creates the viral image.
It generates an ultra-realistic documentary-style photograph of a young man and a young woman balancing on top of a tall communication tower above a modern city skyline while securing a massive black flag.
The most interesting part is the flag.
Inside the prompt you'll find this placeholder:
[FILL HERE]
Simply replace it with any text you'd like to display on the flag.
Some examples include:
GO VIRAL WITH AI!
FOLLOW FOR MORE
LEARN AI TODAY
YOUR BRAND NAME
ANY MOTIVATIONAL QUOTE
The prompt also specifies:
hyper-realistic photography
Sony A1 camera
600mm telephoto lens
Reuters-style documentary look
realistic wind
HDR lighting
8K quality
cinematic atmosphere
These details help the AI produce an image that closely resembles a real photograph instead of digital artwork.
Image Prompt
Ultra-realistic telephoto documentary photograph of a daring rooftop urban marketing stunt. A young man and a young woman, both dressed in all-black tactical outfits with black face coverings, are standing and balancing on top of a very tall metal communication tower high above a modern city skyline. One is carefully holding onto the pole while the other is helping secure a large black flag that dramatically waves in the wind.
The flag is the main focus and features large bold white uppercase sans-serif typography that reads:
[FILL HERE]
The text should be perfectly centered, highly legible, professionally printed, and occupy most of the flag surface.
Background shows luxury modern high-rise apartment buildings with glass balconies, compressed by a 600mm telephoto lens, creating a layered urban look. Slight atmospheric haze, warm daylight, realistic city pollution, muted colors, cinematic contrast.
The communication tower has antennas, cables, climbing ladder, and a red aviation warning light near the top. The man appears to be climbing while the woman is standing confidently holding the edge of the flag. Their faces are mostly hidden, emphasizing anonymity.
The flag is stretched naturally by strong wind with realistic fabric folds and dynamic motion. Extreme sense of height, danger, and scale.
Composition: Vertical portrait (9:16), tower centered, flag extending toward the right side, subjects positioned near the very top of the frame, background softly compressed with shallow atmospheric perspective.
Style: Hyper-realistic, documentary photography, Reuters-style, editorial, ultra-detailed, natural lighting, RAW photo quality, HDR, realistic textures, sharp focus on subjects and flag, subtle film grain.
Camera Settings: Sony A1, 600mm telephoto lens, f/5.6, ISO 200, 1/2000s, ultra-high resolution, 8K.
Negative Prompt:
cartoon, anime, CGI, illustration, painting, fantasy, extra people, duplicate person, blurry, low resolution, watermark, logo, oversaturated colors, unrealistic lighting, distorted anatomy, extra limbs, bad hands, cropped flag, unreadable text, floating objects, low quality, artifacts, motion blur.
Step 2: Turn the Image into a Cinematic Video
Once your image looks good, it's time to animate it.
Instead of creating a completely new scene, the second prompt tells the AI to preserve the original composition while adding realistic movement.
The generated video includes:
a helicopter circling the tower
stronger wind
realistic flag movement
cinematic camera orbit
telephoto lens compression
natural daylight
subtle atmospheric haze
documentary-style camera shake
Everything is designed to look like authentic aerial footage filmed near the Empire State Building.
Video Prompt
Create a fully realistic cinematic 9:16 vertical video from this image, set at the Empire State Building, New York City. Keep the same high-rise urban background, the tall antenna mast, the two people standing on top, and the large black flag with the text clearly visible. The scene should look like a real drone-shot action sequence at the Empire State Building, not CGI.
A realistic helicopter slowly circles around the antenna tower from left to right, flying at medium distance behind and beside the flag. The helicopter blades spin fast with natural motion blur, creating strong wind that makes the black flag wave harder. The flag should wave naturally and realistically in the wind, with authentic fabric movement, folds, and flow like a real flag. The people remain balanced on the mast, slightly reacting to the wind, with clothes moving naturally.
Camera movement: start with a slow cinematic push-in from a distance, then smoothly orbit around the mast in a dramatic parallax motion. The camera should feel like a professional helicopter/drone camera, handheld but stable, with slight natural vibration from height and wind. Keep the flag as the main focus, with the text staying readable as the fabric moves.
Make the skyline and architecture feel like midtown Manhattan, with the recognizable setting and atmosphere of the Empire State Building. Lighting should be natural daylight with a premium cinematic tone, slight haze between buildings, realistic shadows, telephoto lens compression, high detail, real-world physics, realistic wind, realistic helicopter movement, no cartoon style, no over-animation, no extra people, no text changes, no distorted flag text, no camera cuts.
Tutorial Video
If you'd like to see the entire process step by step, you can watch the complete tutorial here:
The tutorial walks through generating the image, creating the video, selecting the best settings, and achieving a cinematic result similar to the viral examples circulating on social media.
Final Thoughts
The AI Empire State Building Flag trend is proof that you don't need advanced editing skills to create eye-catching cinematic content anymore. With just one image prompt and one video prompt, you can produce realistic videos that capture the same dramatic style seen across social media.
Whether you're promoting a brand, sharing motivational messages, or simply experimenting with AI-generated visuals, this workflow makes it easy to create personalized videos in just a few minutes.
So replace the [FILL HERE] placeholder with your own message, generate the image, animate it with the video prompt, and see what creative ideas you can bring to life. Your next AI creation could be the one everyone starts sharing.

Post a Comment for "How to Create the Viral AI Empire State Building Flag Video Using Just 2 Prompts"
Post a Comment