How Lionsgate Revealed the Limits of Fine-Tuning in Machine Learning

3 min read

In September 2024, Lionsgate announced a partnership with AI video company Runway to fine-tune a machine learning model on the studio’s catalog of more than 20,000 film and television titles. The studio’s vice chair expected to save “millions and millions of dollars.” A year later, the project had hit a wall. The reason, according to sources familiar with the situation: “The Lionsgate catalog is too small to create an AI video model. In fact, the Disney catalog is too small to create a model.”

That’s a striking claim. Lionsgate owns John Wick, The Hunger Games, and La La Land. If their library isn’t enough, what is? The answer has to do with how machine learning models are trained, and the limits of fine-tuning.


How Runway’s AI video models work


To understand why that’s true, it helps to understand what video generation models are doing at a basic level. Runway’s models are built on a class of architecture called diffusion models, and we cover that in detail here. At generation time, the model starts from random noise and iteratively removes it, guided by a text description or a reference image, until a coherent video clip emerges. Ask it for a car chase through a rainy city at night and it constructs the scene from scratch, frame by frame, guided by everything it learned during pre-training. The reason Runway’s publicly available models work as well as they do is that they were trained on an enormous and diverse collection of video – far more than any single studio’s library could provide.


Where Fine-Tuning Fits In, and Where It Breaks Down

Fine-tuning is what happens next. Once a video model has built up that broad understanding of the visual world, you can take it and continue training it on a much smaller, targeted dataset, nudging its patterns toward something more specific. A model fine-tuned on footage from a particular studio starts to reproduce that studio’s visual tendencies: its color grading, its lighting choices, its approach to action sequences. It keeps everything it learned during pre-training and adds a layer of specialization on top. Fine-tuning is one of the most practically useful ideas in machine learning. It is also, as Lionsgate recently discovered, one of the easiest to get wrong.

Fine-tuning is worth understanding on its own terms because it shows up constantly in production. A large pre-trained model has already done the hugely computationally expensive work of learning general structure from enormous amounts of data. Fine-tuning takes that model and continues training it on a smaller, targeted dataset, adjusting its parameters to specialize for a particular domain, style, or task. Modern techniques have made this more efficient still, training only a small set of additional parameters on top of the base model rather than updating everything, which keeps costs manageable.


The analogy that holds up reasonably well: a cinematographer with twenty years of experience across many genres picks up the visual language of a specific director quickly, because so much foundational knowledge is already in place. Fine-tuning does something structurally similar. It works well when the base model already understands the domain broadly, and the fine-tuning data adds specificity on top of that foundation.

The problem Lionsgate ran into is that fine-tuning is not a substitute for pre-training. When Runway trains its general model on vast amounts of diverse video, the model learns how fire behaves, how water moves, how human faces animate, how lighting changes across different environments. These are things it can only learn from having seen them in enormous variety. A studio’s film catalog, however large it seems, contains a relatively narrow slice of that variety. A John Wick film shows a lot of action sequences. It doesn’t show nearly enough of everything else to teach a model the physics and visual dynamics it needs to generate compelling new footage from scratch.

Fine-tuning on top of that foundation can help the model reproduce a studio’s color grading tendencies or cinematographic preferences. It cannot make up for gaps in the underlying model’s general understanding. And generating convincing explosions, realistic crowd scenes, or physically plausible stunts requires exactly the kind of broad foundational knowledge that only comes from pre-training on a much larger and more varied dataset.


What Lionsgate Says Happened


Lionsgate’s official position is more optimistic than the reporting suggests. The studio’s chief communications officer told Gizmodo: “We view AI as an important tool for serving our filmmakers, and we have already successfully applied it to multiple film and television projects to enhance quality, increase efficiency, and create exciting new storytelling opportunities.” The studio didn’t name the projects.
The gap between that statement and The Wrap’s reporting points to something common in early ML deployments: a model can work well enough for limited tasks while falling well short of the more ambitious applications its backers were hoping for. Using Runway’s tools to clean up a shot or generate a simple background is a different problem than using a custom-trained model to produce entire sequences in a recognizable studio style. The first is tractable with existing technology. The second, it turns out, requires more data than any studio currently has.


The Pattern This Reveals


The failure mode here is instructive beyond Hollywood. Fine-tuning on proprietary data is a genuinely powerful technique, and it’s being deployed successfully across industries: medical imaging companies fine-tune general vision models on their own radiology archives, legal tech companies fine-tune language models on case law and contracts. In those cases, the proprietary data is dense, consistent, and large enough relative to the task that the fine-tuning adds real value.

The lesson from Lionsgate is that proprietary data has to be evaluated against what the task actually requires. A radiology archive with millions of annotated scans is a meaningful addition to a vision model’s understanding of medical images. A studio’s film library, however impressive as a cultural artifact, is a thin slice of the visual world a video generation model needs to understand. Studios, like many large organizations, assumed their proprietary data was an asset. It wasn’t large enough or varied enough to be predictive. Lionsgate found that out the hard way.

Machine Learning in Healthcare: AI Scribes and Heart Disease…

How ambient AI scribes turn doctor-patient conversations into clinical notes, and how a University of Michigan model detects hidden heart disease from a 10-second...
mladvocate
4 min read

Machine Learning in the NBA: Player Tracking and Injury…

Every NBA arena has cameras mounted in the catwalks that track every player and the ball at 25 frames per second. Machine learning in...
mladvocate
2 min read

Strike Zone Prediction Model: How Neural Networks Analyze MLB…

Baseball has always been a numbers game. Batting averages, earned run averages, and RBIs have been part of the sport for over a century....
mladvocate
2 min read

Leave a Reply

Your email address will not be published. Required fields are marked *

ML Advocate Assistant
Answers from the blog
Hi! 👋 Ask me anything about machine learning and AI! I'll answer using ML Advocate blog posts.