Lara's Storytelling Resources

Skip table of contents

Here is a non-exhaustive list of various resources you might want if you're interested in automated story generation, interactive fiction (IF), or related research areas (such as tabletop roleplaying games—TRPGs). This list was first created when I co-taught Interactive Fiction and Text Generation at UPenn with Chris Callison-Burch.

I also made a list of related researchers, and I try to keep up a list of upcoming conference and workshop deadlines. If you want me to add or update anything on any of these lists, please let me know! You can unscramble my email address here:

Note: This is not a list of papers in the field, but rather a list of corpora & code and their corresponding papers if they have it.
If you're looking for paper lists, you might be interested in @arnicas's list of text generation papers found on arXiv, Stephen Ware's Narrative Intelligence Lab reading list, or the Tsinghua Natural Language Processing Group's text generation list.

Story Datasets

Dataset
Papers
Paper Code (Baselines)
Hugging Face Link
Leaderboard
Deep Dungeons and Dragons (DDD) Corpus – roleplayerguild.com Deep Dungeons and Dragons: Learning Character-Action Interactions from Role-Playing Game Transcripts
ROCStories – 5-sentence crowdsourced stories for Story Cloze Test A Corpus and Cloze Evaluation for Deeper Understanding of Commonsense Stories and LSDSem 2017 Shared Task: The Story Cloze Test https://competitions.codalab.org/competitions/15333
CaTeRS – Causal and temporal relations using ROC Stories CaTeRS: Causal and Temporal Relation Scheme for Semantic Annotation of Event Structures
Scifi TV Plots – science fiction episode summaries from Fandom Story Realization: Expanding Plot Events into Sentences https://github.com/rajammanabrolu/StoryRealization https://huggingface.co/datasets/lara-martin/Scifi_TV_Shows
WritingPrompts – r/WritingPrompts Hierarchical Neural Story Generation https://github.com/pytorch/fairseq https://huggingface.co/datasets/rewardsignal/reddit_writing_prompts
Lit Bank – annotated Project Gutenberg An Annotated Dataset of Literary Entities and Literary Event Detection
STORIUM – storium.com (gamified storytelling) STORIUM: A Dataset and Evaluation Platform for Machine-in-the-Loop Story Generation https://github.com/dojoteef/storium-gpt2
ESTER – tagged events from news articles from the TempEval3(TE3) workshop ESTER: A Machine Reading Comprehension Dataset for Event Semantic Relation Reasoning https://github.com/PlusLabNLP/ESTER https://eventqa.github.io/
CMU Movie Summary Corpus – Wikipedia movie summaries Learning Latent Personas of Film Characters
The Children’s Book Test – kids' books from Project Gutenberg The Goldilocks Principle: Reading Children’s Books with Explicit Memory Representations https://github.com/facebookarchive/bAbI-tasks
Cornell Movie Dialog – movie scripts and metadata Chameleons in Imagined Conversations: A New Approach to Understanding Coordination of Linguistic Style in Dialogs https://convokit.cornell.edu/documentation/movie.html https://huggingface.co/datasets/cornell_movie_dialog
ScriptWriter – from GraphMovie, which no longer exists (descriptions of movie plots) ScriptWriter: Narrative-Guided Script Generation https://github.com/DaoD/ScriptWriter
NarrativeQA – movie scripts from various sources and Project Gutenberg books The NarrativeQA Reading Comprehension Challenge https://github.com/deepmind/narrativeqa https://huggingface.co/datasets/narrativeqa https://paperswithcode.com/sota/question-answering-on-narrativeqa
MCTest – 150-300 word stories written by crowdworkers MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text https://huggingface.co/datasets/sagnikrayc/mctest https://paperswithcode.com/dataset/mctest
InSentive – authored stories from BookCorpus Inspiration through Observation: Demonstrating the Influence of Automatically Generated Text on Creative Writing https://github.com/roemmele/InSentive
CoAuthor – collaborative writing dataset CoAuthor: Designing a Human-AI Collaborative Writing Dataset for Exploring Language Model Capabilities
TimeTravel – stories and counterfactual continuations Counterfactual Story Reasoning and Generation https://github.com/qkaren/Counterfactual-StoryRW
TellMeWhy – Q&A for stories TellMeWhy: A Dataset for Answering Why-Questions in Narratives
PerSenT – author sentiment prediction (news articles) Author's Sentiment Prediction
EmotionLines – dialog from the Friends TV show & EmotionPush private chat logs EmotionLines: An Emotion Corpus of Multi-Party Conversations
TVRecap – TV shows from Fandom and TVMegaSite (soap operas) TVRecap: A Dataset for Generating Stories with Character Descriptions
FanFiction Archive – fanfiction.net Beyond Canonical Texts: A Computational Analysis of Fanfiction
HPAC Harry Potter and the Action Prediction Challenge from Natural Language https://github.com/aghie/hpac
SummScreen SummScreen: A Dataset for Abstractive Screenplay Summarization
SQuAD 2.0 (Stanford Question Answering Dataset) – reading comprehension SQuAD: 100,000+ Questions for Machine Comprehension of Text and Know What You Don't Know: Unanswerable Questions for SQuAD https://worksheets.codalab.org/worksheets/0x8212d84ca41c4150b555a075b19ccc05/
Naive Psychology of Characters in Simple Commonsense Stories – "cause and effect of mental state changes of characters in a story" Modeling Naive Psychology of Characters in Simple Commonsense Stories
Character Relations
TVShowGuess TVShowGuess: Character Comprehension in Stories as Speaker Guessing

Mixed Visual & Textual Datasets

Dataset
Papers
Paper Code
Hugging Face Link
Leaderboard
BookCorpus Aligning Books and Movies: Towards Story-Like Visual Explanations by Watching Movies and Reading Books and Skip-thought vectors https://github.com/ryankiros/skip-thoughts https://huggingface.co/datasets/bookcorpus
COIN COIN: A Large-scale Dataset for Comprehensive Instructional Video Analysis https://github.com/coin-dataset
WikiHow WikiHow: A Large Scale Text Summarization Dataset https://github.com/mahnazkoupaee/WikiHow-Dataset https://huggingface.co/datasets/wikihow
VIST – Visual storytelling data + task Visual Storytelling https://paperswithcode.com/dataset/vist
MovieGraphs – knowledge graphs, images, and descriptions MovieGraphs: Towards Understanding Human-Centric Situations from Videos
KG-Story Knowledge-Enriched Visual Storytelling
Character-Preserving Coherent Story Visualization (CP-CSV) – character-based story visualization Character-Preserving Coherent Story Visualization
StoryGAN – story visualization StoryGAN: A Sequential Conditional GAN for Story Visualization
Pororo-SV – StoryGAN CLEVR dataset StoryGAN: A Sequential Conditional GAN for Story Visualization https://paperswithcode.com/sota/story-visualization-on-pororo
DramaQA – Video Story Understanding on Korean TV Show "Another Miss Oh" DramaQA: Character-Centered Video Story Understanding with Hierarchical QA https://github.com/liveseongho/DramaQA
MovieQA MovieQA: Understanding Stories in Movies through Question-answering https://github.com/makarandtapaswi/MovieQA_CVPR2016/

Story Evaluation & Cloze Tests

Data Scrapers & Processors

Dataset
Info
Novel Chapter Summaries full book chapters and their summaries
Archive of Our Own Scraper scraper for Archive of Our Own fanfiction
Fanfiction Scraper scraper for fanfiction.net
BookNLP process your own book data
Newspaper3k news scraper
Homemade BookCorpus recreation of BookCorpus

Interactive Fiction Environments

Interactive Fiction Agents

Story Planning Systems

Planner
Papers
Glaive - a fast planner for multi-agent stories Glaive: a state-space narrative planner supporting intentionality and conflict
Sabre - next-gen Glaive Sabre: A Narrative Planner Supporting Intention and Deep Theory of Mind
StoryAssembler - "a narrative system for procedurally generating choice-based interactive narratives" StoryAssembler: An Engine for Generating Dynamic Choice-Driven Narratives
Belief and Intentional PDDL Using Domain Compilation to Add Belief to Narrative Planners
Winnow - "declarative domain-specific query language for story sifting" Winnow: A Domain-Specific Language for Incremental Story Sifting
Felt - "simple story sifting and simulation engine for emergent narrative play experiences" Felt: A Simple Story Sifter
Recurve (C++) - decompositional planner
STRIPS Planner (Python)
Partial Order Causal-Link (POCL) Planner (Python)

Knowledge Bases & Commonsense Reasoning

Knowledge Base
Papers
Hugging Face Link
VerbNet VerbNet: A Broad-Coverage, Comprehensive Verb Lexicon
FrameNet FrameNet II: Extended Theory and Practice
WordNet WordNet: An Electronic Lexical Database
ConceptNet ConceptNet 5.5: An Open Multilingual Graph of General Knowledge https://huggingface.co/datasets/conceptnet5
ATOMIC (ATlas Of MachIne Commonsense) ATOMIC: An Atlas of Machine Commonsense for If-Then Reasoning https://huggingface.co/datasets/atomic
COMET (COMmonsEnse Transformers) - uses ATOMIC and ConceptNet COMET: Commonsense Transformers for Automatic Knowledge Graph Construction
GLUCOSE (GeneraLized and COntextualized Story Explanations) GLUCOSE: GeneraLized and COntextualized Story Explanations https://huggingface.co/datasets/glucose
Power and Agency in modern films Connotation Frames of Power and Agency in Modern Films
Eraser - Movie Rationales ERASER: A Benchmark to Evaluate Rationalized NLP Models https://huggingface.co/datasets/movie_rationales
ECIpedia
The NOC List Round Up The Usual Suspects: Knowledge-Based Metaphor Generation
NULEX - combines WordNet, VerbNet, and Wiktionary NULEX: An Open-License Broad Coverage Lexicon
CausalBank Guided Generation of Cause and Effect
SCRUPLES - ethical judgements SCRUPLES: A Corpus of Community Ethical Judgments on 32,000 Real-life Anecdotes
PeKo - event preconditions PeKo: A Large Scale Precondition Knowledge Dataset
SWAG (Situations With Adversarial Generations) - NLI from video captions SWAG: A Large-Scale Adversarial Dataset for Grounded Commonsense Inference
HellaSwag (Harder Endings, Longer contexts, and Low-shot Activities for Situations With Adversarial Generations) - commonsense inference (harder SWAG) HellaSwag: Can a Machine Really Finish Your Sentence? https://huggingface.co/datasets/hellaswag
CLUTRR (Compositional Language Understanding with Text-based Relational Reasoning) CLUTRR: A Diagnostic Benchmark for Inductive Reasoning from Text https://huggingface.co/datasets/CLUTRR/v1

Story Generation Code

Libraries & Toolkits

Library
Info
Hugging Face Hugging Face provides state-of-the-art general-purpose neural language model architectures like BERT, GPT-2, and others.
Hugging Face Transformer Library
AllenNLP Deep learning for NLP with state of the art models
Spacy "Industrial-Strength Natural Language Processing" in Python
NLTK - Natural Language Toolkit Basic NLP tools for Python & interfacing with some external models
Stanford NLP various NLP models in Java
Stanza Stanford NLP for Python
ConvKit Cornell Conversation Analysis Toolkit
Open IE information extraction on sentences

Extras

Programming Languages for Writing Interactive Fiction

Notable IF Games

Tutorials

RPG/IF Inspiration

Name
Info
Polygon's Favorite Actual Play Podcasts Personal recommendation: The Adventure Zone
Actual Play Podcasts
Roll 20 Play tabletop games with friends virtually
chooseyourstory.com
AI Dungeon
Interactive Fiction on Itch.io Find cool indie IF games
Interactive Fiction Database IMDb for IF
Interactive Fiction Wiki

Related Courses

Course
Taught By
Year
Interactive Narrative Nick Montfort 2019 (Fall)
Interactive Fiction and Text Generation Lara J. Martin & Chris Callison-Burch 2022 (Spring)
AI Storytelling in Virtual Worlds Mark Riedl 2022 (Spring)

Generators for TRPGs and IF

Name
Info
Print graph paper just blank graph paper!
donjon random generators for tabletop games
RPG Maps in Wolfram Language code to tile hex pieces together to make a map
RPG Map Editor 2 downloadable app for making maps
RPGgen a collection of generators

Various Tools

Name
Info
Versu "an engine for telling interactive stories about people"
WOOL "dialogue platform for creating virtual agent conversations"