On June 11th, 2025, history was made during an NBA game when Kalshi - the event betting platform that lets users trade on the outcome of real-world events - aired a 30-second ad that was 100% AI generated on national television. This ad was created in 2 days on a shoestring budget of $2,000. Creating an ad like this would normally take over 2 months and cost more than $400K. This moment created an incredible buzz, further legitimizing the impact AI is having on film making. It's just a matter of time before a 2-person Hollywood AI film studio produces an Oscar-winning picture. I bet you can trade on the outcome of that event on Kalshi.
Though I'm not a betting guy, I'd take the over on AI-generated films winning Oscars in the coming years. But as someone who spends considerable time thinking about how small businesses can harness AI effectively, this Kalshi moment sparked a different question for me: if AI is good enough to create content for national television, can it help small business owners produce professional quality video ads for their social media campaigns?
AI Claim vs Reality
As I discussed last week, many small businesses rely on social media as their primary distribution channel, spending countless hours and sometimes thousands of dollars creating advertising content. A tool that could help business owners produce more content faster and cheaper would be a complete game changer.
AI video generation works similarly to image models we discussed last week, but instead of creating a single picture, it's making smart guesses about how things should move and change over time. When you type "create a video of a dog running through a park," the AI doesn't actually understand what dogs, running, or parks are. Instead, it's been trained on millions of video clips paired with text descriptions, learning what these things typically look like in motion.
Think of it this way: imagine you're a video editor and I showed you hundreds of thousands of cake decorating videos while describing each technique out loud. Eventually, you'd start recognizing patterns such as how frosting flows, how decorations are placed and how finished cakes should look. When a client asks you to create content for their custom gourmet treat company, you'd have context to create the best approximation of what it should look like in motion. That's exactly what AI does, but at a massive scale across millions of video clips.
My goal this week was to explore whether AI is already good enough to help small business owners create video content for social media. Take Kabsy, the owner of a Houston based custom gourmet company called KabsyKreations. Kabsy spends considerable amount of time creating content for her social media pages. The clip below was produced by a real video editor based on real footage provided by Kabsy of of her cake preparation and assembly.
My goal was to try to recreate this video using Sora or Veo 2, two very popular AI video generation tools created by OpenAI and Google respectively.
Original clip created by human video editor
1st Shot
For my first shot, I created a prompt to replicate the scene in the original clip above. I started by describing my best interpretation of each scene in the video. Based on what I saw, I identified a three-stage process: preparation and batter mixing, cake assembly, and decoration. I described each phase in my own words, then passed my description to ChatGPT and asked it to act as a film director and video editor to convert my description into a script for high-fidelity video generation.
After iterating on the script a few times, I broke it down into three prompts representing each stage of cake preparation (see prompt below). I then passed each prompt to Sora and Veo 2 respectively to generate video clips and used CapCut to stitch the clips together.
Prompt Generated by ChatGPT
Scene 1: Begin with a cinematic close-up of real butter stacked in a yellow bowl on a marble kitchen countertop. The lighting is warm, with soft shadows and natural tones. At 3 seconds, cut to a side view of a Black baker operating a stand mixer. The mixer blends butter and flour while the paddle spins and flour dust gently rises. Emphasize motion, texture, and a cozy kitchen atmosphere with shallow depth of field.
Scene 2: At 7 seconds, cut to a top-down close-up of silky white batter swirling inside a metal bowl. The batter is glossy and smooth. At 11 seconds, transition to a Black baker’s hand folding vibrant red velvet batter using a red silicone spatula in a stainless steel bowl. Focus on the spatula’s slow swirl and the deep, rich red texture. Highlight contrast between batter and bowl using natural daylight from the side.
Scene 3: Start with a wide-angle shot of multiple cake layers wrapped in plastic and stacked on a metal tray. Slowly pan across. At 16s, show the Black baker’s hands placing pastel-colored cake tiers on a spinning turntable. One hand adjusts the top tier while the other spreads pink frosting with an offset spatula. At 18s, cut to a slow push-in on the finished cake: pastel frosting, candy sprinkles, a waffle cone topper, and colorful lollipops. Emphasize handcrafted detail and visual delight.
Sora clip output
Veo 2 clip output
Though both clips looked different from the original KabsyKreations video, I was particularly impressed by the Veo 2 output. Its ability to understand context, camera angles, and produce hyper-realistic video that follows the laws of physics was promising. Veo also naturally included relevant sound even though I didn't prompt it to do so. The Sora output was decent, but some movements looked awkward and unnatural, making it far easier to tell the clip was AI-generated.
2nd Shot
For my second shot, I tried something different. I kept the same prompts but extracted still frames from the original KabsyKreations clip (see below) and attached them as additional context to my Sora and Veo prompts. I was very specific about which frames to reference for each scene. The output from both tools improved significantly. Sora's result was much closer to the original video, though some object movements still looked perceptibly unnatural. Veo 2 consistently created superior quality that was nearly indistinguishable from human-generated clips.
Sstill frames from original KabsyKreation clip
Sora clip output
Veo 2 clip output
The Verdict
So is the claim that AI can generate professional video clips that can be used by small business owners like Kabsy in social media ads accurate? Yes! However, you'll need patience and willingness to experiment extensively with prompting to find great clips. You'll also need post-production work to stitch clips together for a coherent video.
Veo 2 from Google Gemini was clearly superior, though Sora can create respectable quality with enough prompt iteration. This process took me about 3-5 hours using three paid tools: Sora, Veo 2, and CapCut.
I learned three key lessons for optimizing your output. First, break it down, don't one-shot your video production. Long prompts often produce worse results, and some models like Sora truncate them. Break your video into distinct 5-10 second scenes and generate each separately. Second, be precise. For a 10-second clip, clearly specify what happens first, then next, breaking it down by timeframe to guide the AI (see sample below). Third, provide context. I included still frames from Kabsy's actual kitchen preparation, and the output improved dramatically. The additional context makes a material difference in quality.
Sample prompt
Scene 1: Begin with a cinematic close-up of real butter stacked in a yellow bowl on a marble kitchen countertop. The lighting is warm, with soft shadows and natural tones. At 3 seconds, cut to a side view of a Black baker operating a stand mixer. The mixer blends butter and flour while the paddle spins and flour dust gently rises. Emphasize motion, texture, and a cozy kitchen atmosphere with shallow depth of field.
So, for small business owners, my recommendation will be to use Sora and CapCut to generate videos and ChatGPT to iterate on your prompts. Though you will have to invest time upfront once you get good at this and even create some prompt libraries you can reuse, this can save you significant time and money and allow you to churn content with much higher velocity.
Pro Tip
When generating videos with AI, think about your storyboard first and describe each part in plain English. Then pass this storyboard to ChatGPT and ask it to create prompts for video generation. A sample prompt you could use is: "Act as a film director and video producer and convert this storyboard into cinematic prompts to generate a video using popular models like Sora and Veo 2. Please keep the length of description for each prompt to clips of 5-10 seconds each."
Once you get your first iteration of prompts, run them through Veo 2, assess the output, and go back to ChatGPT to iterate on the parts you didn't like or want to improve. Repeat this process until you get your desired output.