Writer Ed Zitron recently published a comprehensive look (read: takedown) of Sora, the new generative video platform from OpenAI. Functionally, Sora is not much different from text generation via ChatGPT, or image generation via MidJourney—which is to say, the so-called artificial intelligence is hardly intelligent, even when it produces the kind of passable content that wows some people despite its absolute hollowness.
Like all generative AI tools, the intended purpose of Sora is primarily to replace workers so that executives can take an even higher cut of the profits (read: rent extraction) from their captive audiences. But Zitron points to a crucial problem with this idea, particularly as it relates to filmmaking:
The very nature of filmmaking is taking different shots of the same thing, something that I anticipated Sora would be incapable of doing as each shot is generated fresh, as Sora itself (much like all generative AI) does not "know" anything. When one asks for a man with a yellow balloon as his head, Sora must then look over the parameters spawned during its training process and create an output, guessing what a man looks like, what a balloon looks like, what color yellow is, and so on.
It repeats this process for each shot, with each "man with balloon as his head" character subtly (or not-so-subtly) different with each repetition, forcing users to pick the outputs that are the most consistent. Achieving a perfect like-for-like copy isn't guaranteed, but rather, filmmakers must pick the ones that are "good enough."
This becomes extremely problematic when you're working in film or television, where viewers are far more likely to see when something doesn't look right — a problem exacerbated by moving images, high-resolution footage, and big TV screens.
This, as it turns out, is exactly what happens when you work with Sora, as the film production company Shy Kid explained in an interview with FxGuide. For one thing, the AI doesn't understand filmmaking terminology, which means it's incapable of producing multiple takes of the same shot:
With cinematic shots, the ideas of 'tracking', 'panning', 'tilting' or 'pushing in' are all not terms or concepts captured by metadata. As much as object permanency is critical for shot production, so is being able to describe a shot, which Patrick noted was not initially in SORA. "Nine different people will have nine different ideas of how to describe a shot on a film set. And the (OpenAI) researchers, before they approached artists to play with the tool, hadn't really been thinking like filmmakers."
And then there's the render time:
Clips can be rendered in varying segments of time, such as 3 secs, 5 sec, 10 sec, 20sec, up to a minute. Render times vary depending on the time of day and the demand for cloud usage. "Generally, you're looking at about 10 to 20 minutes per render," Patrick recalls. "From my experience, the duration that I choose to render has a small effect on the render time. If it's 3 to 20 seconds, the render time tends not to vary too much from between a 10 to 20-minute range. We would generally do that because if you get the full 20 seconds, you hope you have more opportunities to slice/edit stuff out and increase your chances of getting something that looks good."
[…]
For the minute and a half of footage that ended up in the film, Patrick estimated that they generated "hundreds of generations at 10 to 20 seconds a piece". Adding, "My math is bad, but I would guess probably 300:1 in terms of the amount of source material to what ended up in the final."
Sure, filmmaking is already a tedious process. And is of course perfectly common for hours of unused footage to get left on the metaphorical floor of the editing suite. But it sounds like Sora isn't actually offering a solution to that; it's just manifesting it in a different way. What's the point of taking the same amount of time to make a movie, but without actual having as much fine creative control over the end result?
Oh yeah: not having to pay workers.
Expectations Versus Reality [Ed Zitron / Where's Your Ed At?]
Actually using SORA [Mike Seymour / FX Guide]