Show HN: Fine-tuned Qwen2.5-7B on 100 films for probabilistic story graphs

9 hours ago (cinegraphs.ai)

Hi HN, I'm a computer systems engineering student in Mexico who switched from film school. I built CineGraphs because my filmmaker friends and I kept hitting the same wall—we'd have a vague idea for a film but no structured way to explore where it could go. Every AI writing tool we tried output generic, formulaic slop. I didn't want to build another ChatGPT wrapper, so I went a different route.

The idea is simple: you input a rough concept, and the tool generates branching narrative paths visualized as a graph. You can sculpt those branches into a structured screenplay format and export to Fountain for use in professional screenwriting software.

Most AI writing tools are trained on generic internet text, which is why they output generic results. I wanted something that understood actual cinematic storytelling—not plot summaries or Wikipedia synopses, but the actual structural DNA of films. So I spent a month curating 100 films I consider high-quality cinema. Not just popular films, but works with distinctive narrative structures: Godard's jump cuts and essay-film digressions, Kurosawa's parallel character arcs, Brakhage's non-linear visual poetry, Tarkovsky's slow-burn temporal structures. The selection was deliberately eclectic because I wanted the model to learn that "story" can mean many things.

Getting useful training data from films is harder than it sounds. I built a 1000+ line Python pipeline using Qwen3-VL to analyze each film with subtitles enabled. The pipeline extracts scene-level narrative beats, character relationships and how they evolve, thematic threads, and dialogue patterns. The tricky part was getting Qwen3-VL to understand cinematic structure rather than just summarizing plot. I had to iterate on the prompts extensively to get it to identify things like "this scene functions as a mirror to the opening" or "this character's arc inverts the protagonist's." That took weeks and I'm still not fully satisfied with it, but it's good enough to produce useful training data.

From those extractions I generated a 10K example dataset of prompt-to-branching-narrative pairs, then fine-tuned Qwen2.5-7B-Instruct with a LoRA optimized for probabilistic story branching. The LoRA handles the graph generation—exploring possible narrative directions—while the full 7B model generates the actual technical screenplay format when you export. I chose the 7B model because I wanted something that could run affordably at scale while still being capable enough for nuanced generation. The whole thing is served on a single 4090 GPU using vLLM. The frontend uses React Flow for the graph visualization. The key insight was that screenwriting is fundamentally about making choices—what if the character goes left instead of right?—but most writing tools force you into a linear document too early. The graph structure lets you explore multiple paths before committing, which matches how writers actually think in early development.

The biggest surprise was how much the film selection mattered. Early versions trained on more mainstream films produced much more formulaic outputs. Adding experimental and international cinema dramatically improved the variety and interestingness of the generations. The model seemed to learn that narrative structure is a design space, not a formula.

We've been using it ourselves to break through second-act problems—when you know where you want to end up but can't figure out how to get there. The branching format forces you to think in possibilities rather than committing too early.

You can try it at https://cinegraphs.ai/ — no signup required to test it out. You get a full project with up to 50 branches without registering, though you'll need to create an account to save it. Registered users get 3 free projects. I'd love feedback on whether the generation quality feels meaningfully different from generic AI tools, and whether the graph interface adds value or just friction.

I _really_ think you have an interesting tool, but the workflow loop isn't fully there.

Please let me revise or remix a suggested node. I find them extremely engaging, and I can envision ways of sort of "spinning off" even further than it's suggestions. Think Brian Eno's Oblique Strategies.

To me, this is hinting at really interesting creative processes that feel much more humane than how most LLMs work today.

  • The Oblique Strategies comparison is exactly the vibe I was going for—generative constraints that open up possibilities rather than closing them down. Node editing is the top priority right now. You're right that the loop isn't complete without it.

Can I edit a node to modify the next branch, or add my own in if the offerings are not quite right? Do you foresee this being useful in evaluating scripts by pulling out the story structure and 'grading' the story graphing?

  • Not yet, but both are on the roadmap. Manual node editing is the next priority—letting you tweak branches or add your own directions.

    The script evaluation idea is interesting—you're not the first to suggest it. The extraction pipeline already identifies structural elements like scene mirroring and character arc inversions, so exposing that as an analysis tool isn't a huge leap. Definitely something I want to explore.

The output looks pretty useful. It got a bit weird when I wanted to explore alternative branchens and nodes started to overlap each other (I tried free/unregistered, if that helps).

  • Thanks for flagging that—the node overlap issue is a known bug when the graph gets dense. I'm working on better auto-layout. Appreciate the report.

I have a writing assistant working on a script and I’d like to give it access to your tool. Is this a mode of operation you want to support?

  • Interesting use case. I don't have a public API yet, but it's something I'd consider if there's demand. What kind of integration are you imagining—feeding prompts in and getting branches back, or something else?

    • Enthusiastic amateur here. I have no film school or real writing experience.

      I’d use it to explore branching first. My system has a rich knowledge tree and could write some very detailed prompts. We would discuss the results and integrate them into our tree.

      I want to have that visual graph representation for myself as well, but it seems like my text repo really needs some structural formality.

      I also think your system would be a good consultant for mine. We have questions about story writing, you have an endpoint for that.

      My current story is being written for the page, no film contemplated yet, so I haven’t thought about access your screenplay tools yet.

      1 reply →

It's cool.

My one issue is stories can't end in the tool. That should be an option instead of more branches appearing

  • Good point—you're right, there should be a way to mark a branch as an endpoint instead of forcing more branches. Adding that to the list. Thanks.

Awesome work. Would be cool if you could publish the list of movies that you chose for finetuning. Just out of curiosity.

  • Thanks! Here's the full list: 2001: A Space Odyssey, 8½, Aguirre the Wrath of God, Ali: Fear Eats the Soul, All That Heaven Allows, Apocalypse Now, Ashes and Diamonds, A Woman Under the Influence, Barry Lyndon, Bicycle Thieves, Breathless, Casablanca, Céline and Julie Go Boating, Chinatown, Chinese Roulette, Citizen Kane, City Lights, City of Pirates, Contempt, Daisies, Damnation, Dishonored, Earth, Electra My Love, El Topo, Eraserhead, Eyes Wide Shut, Film Socialisme, Fitzcarraldo, Fuego en Castilla, Hiroshima Mon Amour, Holy Motors, Inauguration of the Pleasure Dome, In a Year with 13 Moons, India Song, Inland Empire, Irma Vep, Koyaanisqatsi, La Dolce Vita, La Jetée, Late Spring, L'Eau de la Seine, Le Voyage dans la Lune, Lolita, Los Olvidados, Lost Highway, Lucifer Rising, Man with a Movie Camera, Metropolis, Mirror, Mulholland Drive, Night Music, Ordet, Orpheus, Persona, Pickpocket, Playtime, Psycho, Rebecca, Rosemary's Baby, Rumble Fish, Scarface, Seven Samurai, Sherlock Jr., Singin' in the Rain, Stalker, Sunset Boulevard, Taste of Cherry, Taxi Driver, Testament of Orpheus, The 400 Blows, The Blood of a Poet, The Cabinet of Dr. Caligari, The Color of Pomegranates, The Green Ray, The Holy Mountain, The Isle, The Lady from Shanghai, The Night of the Hunter, The Passion of Joan of Arc, The Seventh Seal, The Spirit of the Beehive, The Tales of Hoffmann, The Tree of Life, The Turin Horse, Time of the Gypsies, Tokyo Story, Touch of Evil, Trans-Europ-Express, Ugetsu, Un Chien Andalou, Uncle Boonmee Who Can Recall His Past Lives, Vampyr, Videodrome, Wavelength, Werckmeister Harmonies, Wild Strawberries, Moonlight Sonata, Stellar, The Haunted House Biased toward European art cinema, experimental work, and directors who broke conventional narrative rules.

Ha, some days ago I made Qwen generate a scenario for a documentary Seeds in the Web [1]. I put the beginning in the start window and at some point it suggested to pass to episode 2, with a name compatible with the original scenario. How is that possible?

[1] https://imar.ro/~mbuliga/glc-grok-qwen.html

  • That's wild. Hard to say for sure—could be that your scenario's structure pattern-matched something in Qwen's training data, or it picked up on implicit episodic cues in your writing. The base model has seen a lot of serialized content. What was the original scenario about?

Can you provide more guidance on to use it? What makes a good first prompt? What if I don't like any of the recommended choices? Seems like I should be able to add my own.

  • There's a demo on the landing page that walks through it. Basically you input any idea—no matter how vague—and the system generates branching directions you could take it. You explore the branches, and when you're satisfied you can export and the system generates a technical screenplay based on your choices. There's no "right" first prompt—I've thrown some of my dumbest ideas at it just to see where the system takes me. That's kind of the point. Regarding adding your own branches—yes, that's on my roadmap. Letting users create their own options and shape the graph more directly. Still a work in progress!

The first group to package this idea into a smooth, usable, repeatable tool and pitch it to a big studio is going to make a lot of money and destroy media forever. I'm sure there are already a ton of people working on it, does anyone know of any?