StoryPerceptions – crowd annotations of narratives in social media posts
|
2024
|
The Empirical Variability of Narrative Perceptions of Social Media Texts
|
|
|
Quest-GPT-2 – quests and descriptions from 6 RPGs
|
2024
|
Generating Role-Playing Game Quests With GPT Language Models
|
|
|
StorySeeker – finding stories in online communities
|
2024
|
Where Do People Tell Stories Online? Story Detection Across Online Communities
|
|
|
MirrorStories
|
2024
|
MirrorStories: Reflecting Diversity through Personalized Narrative Generation with Large Language Models
|
|
|
EmoBench – English and Chinese stories and LLM's judgments of emotional intelligence
|
2024
|
EmoBench: Evaluating the Emotional Intelligence of Large Language Models
|
https://github.com/Sahandfer/EmoBench
|
|
STORYSUMM – faithful abstractive summarization of stories
|
2024
|
STORYSUMM: Evaluating Faithfulness in Story Summarization
|
https://github.com/melaniesubbiah/storysumm
|
|
Choice-75 – branching scripts
|
2024
|
Choice-75: A Dataset on Decision Branching in Script Learning
|
https://github.com/JoeyHou/branching
|
|
HEART-felt Narratives (Human Empathy and Narrative Taxonomy) – narrative styles that lead to empathy
|
2024
|
HEART-felt Narratives: Tracing Empathy and Narrative Style in Personal Stories with LLMs
|
|
|
DnD Spells – structured information for Dungeons and Dragons spells
|
2024
|
Leveraging Large Language Models for Spell-Generation in Dungeons & Dragons
|
https://github.com/m-elio/spell_generation
|
|
STORiCo TTS – story TTS in Hindi
|
2024
|
STORiCo: Storytelling TTS for Hindi with Character Voice Modulation
|
|
https://huggingface.co/datasets/Pavankalyan/Hindi_story_telling
|
SAGA (Story Alternatives and Goal Applicability) – annotated goals in stories from the perspective of the participant
|
2024
|
SAGA: A Participant-specific Examination of Story Alternatives and Goal Applicability for a Deeper Understanding of Complex Events
|
|
|
StoryNory TTS – expressive text-to-speech for storytelling
|
2023
|
Narrator or Character: Voice Modulation in an Expressive Multi-speaker TTS
|
https://github.com/tpavankalyan/Storynory
|
https://huggingface.co/datasets/Pavankalyan/StoryNoryTTS
|
CONCOCT (CONCrete Outline ConTrol) – long-form story generation
|
2023
|
Improving Pacing in Long-Form Story Planning
|
|
https://huggingface.co/datasets/ZachW/GPT-BookSum
|
PASTA – participant states in stories
|
2023
|
PASTA: A Dataset for Modeling PArticipant STAtes in Narratives
|
https://github.com/StonyBrookNLP/pasta
|
|
NarrativeXL – long stories
|
2023
|
NarrativeXL: a Large-scale Dataset for Long-Term Memory Models
|
https://github.com/r-seny/NarrativeXL
|
|
r\AmITheAsshole stories
|
2023
|
Author as Character and Narrator: Deconstructing Personal Narratives from the r/AmITheAsshole Reddit Community
|
|
|
FIREBALL – Avrae Discord bot commands + natural language
|
2023
|
FIREBALL: A Dataset of Dungeons and Dragons Actual-Play with Structured Game State Information
|
|
https://huggingface.co/datasets/lara-martin/FIREBALL
|
NEAT (Narrative Elements AnnoTation) – text annotated with narrative elements
|
2022
|
Detecting Narrative Elements in Informational Text
|
|
|
POQue (Participant Outcome Questions) – participant outcomes from events in stories
|
2022
|
POQue: Asking Participant-specific Outcome Questions for a Deeper Understanding of Complex Events
|
|
|
NarraSum – data for narrative summarization
|
2022
|
NarraSum: an Abstractive Narrative Summarization Dataset
|
|
|
FairytaleQA
|
2022
|
Fantastic Questions and Where to Find Them: FairytaleQA – An Authentic Dataset for Narrative Comprehension
|
|
https://huggingface.co/datasets/WorkInTheDark/FairytaleQA
|
SYMON (SYnopses of MOvie Narratives) – movie synopses from video summaries
|
2022
|
Synopses of Movie Narratives: a Video-Language Dataset for Story Understanding
|
|
|
TVShowGuess
|
2022
|
TVShowGuess: Character Comprehension in Stories as Speaker Guessing
|
https://github.com/YisiSang/TVSHOWGUESS
|
|
TV Tropes – movie scripts annotated with tropes from TVTropes
|
2022
|
Computational Support for Trope Analysis of Textual Narratives
|
https://github.com/mandarsc/TropeAnalysis
|
|
Possible Stories – questions about stories
|
2022
|
Possible Stories: Evaluating Situated Commonsense Reasoning under Multiple Possible Scenarios
|
https://github.com/nii-cl/possible-stories
|
|
CoAuthor – collaborative writing dataset
|
2022
|
CoAuthor: Designing a Human-AI Collaborative Writing Dataset for Exploring Language Model Capabilities
|
|
|
SummScreen
|
2022
|
SummScreen: A Dataset for Abstractive Screenplay Summarization
|
https://github.com/mingdachen/SummScreen
|
|
LiSCU (Literature Summary and Character Understanding) – character descriptions, summaries, and names
|
2021
|
“Let Your Characters Tell Their Story”: A Dataset for Character-Centric Narrative Understanding
|
https://github.com/fabrahman/char-centric-story
|
|
InSentive – authored stories from BookCorpus
|
2021
|
Inspiration through Observation: Demonstrating the Influence of Automatically Generated Text on Creative Writing
|
https://github.com/roemmele/InSentive
|
|
ESTER (Event Semantic Relation Reasoning) – tagged events from news articles from the TempEval3(TE3) workshop
|
2021
|
ESTER: A Machine Reading Comprehension Dataset for Event Semantic Relation Reasoning
|
https://github.com/PlusLabNLP/ESTER
|
|
TellMeWhy – Q&A for stories
|
2021
|
TellMeWhy: A Dataset for Answering Why-Questions in Narratives
|
|
https://huggingface.co/datasets/StonyBrookNLP/tellmewhy
|
TVRecap – TV shows from Fandom and TVMegaSite (soap operas)
|
2021
|
TVRecap: A Dataset for Generating Stories with Character Descriptions
|
|
|
Scifi TV Plots – science fiction episode summaries from Fandom
|
2020
|
Story Realization: Expanding Plot Events into Sentences
|
https://github.com/rajammanabrolu/StoryRealization
|
https://huggingface.co/datasets/lara-martin/Scifi_TV_Shows
|
STORIUM – storium.com (gamified storytelling)
|
2020
|
STORIUM: A Dataset and Evaluation Platform for Machine-in-the-Loop Story Generation
|
https://github.com/dojoteef/storium-gpt2
|
|
ScriptWriter – from GraphMovie, which no longer exists (descriptions of movie plots)
|
2020
|
ScriptWriter: Narrative-Guided Script Generation
|
https://github.com/DaoD/ScriptWriter
|
|
PerSenT – author sentiment prediction (news articles)
|
2020
|
Author's Sentiment Prediction
|
|
https://huggingface.co/datasets/community-datasets/per_sent
|
WikiHow Goal-Step
|
2020
|
WikiHow: A Large Scale Text Summarization Dataset
|
https://github.com/zharry29/wikihow-goal-step
|
|
Lit Bank – annotated Project Gutenberg
|
2019
|
An Annotated Dataset of Literary Entities
and
Literary Event Detection
|
|
https://huggingface.co/datasets/coref-data/litbank_raw
|
TimeTravel – stories and counterfactual continuations
|
2019
|
Counterfactual Story Reasoning and Generation
|
https://github.com/qkaren/Counterfactual-StoryRW
|
https://huggingface.co/datasets/wza/TimeTravel
|
HPAC (Harry Potter's Action prediction Corpus)
|
2019
|
Harry Potter and the Action Prediction Challenge from Natural Language
|
https://github.com/aghie/hpac
|
|
SQuAD 2.0 (Stanford Question Answering Dataset) – reading comprehension
|
2018
|
SQuAD: 100,000+ Questions for Machine Comprehension of Text
and
Know What You Don't Know: Unanswerable Questions for SQuAD
|
|
https://huggingface.co/datasets/rajpurkar/squad_v2
|
WritingPrompts – r/WritingPrompts
|
2018
|
Hierarchical Neural Story Generation
|
https://github.com/facebookresearch/fairseq/tree/main/examples/stories
|
https://huggingface.co/datasets/rewardsignal/reddit_writing_prompts
|
Naive Psychology of Characters in Simple Commonsense Stories – "cause and effect of mental state changes of characters in a story"
|
2018
|
Modeling Naive Psychology of Characters in Simple Commonsense Stories
|
|
|
NarrativeQA – movie scripts from various sources and Project Gutenberg books
|
2018
|
The NarrativeQA Reading Comprehension Challenge
|
https://github.com/google-deepmind/narrativeqa
|
https://huggingface.co/datasets/deepmind/narrativeqa
|
Deep Dungeons and Dragons (DDD) Corpus – roleplayerguild.com
|
2018
|
Deep Dungeons and Dragons: Learning Character-Action Interactions from Role-Playing Game Transcripts
|
|
|
EmotionLines – dialog from the Friends TV show & EmotionPush private chat logs
|
2018
|
EmotionLines: An Emotion Corpus of Multi-Party Conversations
|
|
|
RACE (ReAding Comprehension dataset from Examinations)
|
2017
|
RACE: Large-scale ReAding Comprehension Dataset From Examinations
|
https://github.com/qizhex/RACE_AR_baselines
|
https://huggingface.co/datasets/ehovy/race
|
CNN/Daily Mail
|
2016
|
Abstractive Text Summarization using Sequence-to-sequence RNNs and Beyond
|
|
https://huggingface.co/datasets/abisee/cnn_dailymail
|
FanFiction Archive – fanfiction.net
|
2016
|
Beyond Canonical Texts: A Computational Analysis of Fanfiction
|
|
|
ROCStories (ROChester stories) – 5-sentence crowdsourced stories for Story Cloze Test
|
2016
|
A Corpus and Cloze Evaluation for Deeper Understanding of Commonsense Stories
and
LSDSem 2017 Shared Task: The Story Cloze Test
|
|
|
CaTeRS (Causal and Temporal Relation Scheme) – Causal and temporal relations using ROC Stories
|
2016
|
CaTeRS: Causal and Temporal Relation Scheme for Semantic Annotation of Event Structures
|
|
|
Character Relations
|
2015
|
Annotating Character Relations in Literary Texts
|
|
|
bAbI & The Children’s Book Test (CBT) – kids' books from Project Gutenberg
|
2015
|
The Goldilocks Principle: Reading Children’s Books with Explicit Memory Representations
and
Towards AI-Complete Question Answering: A Set of Prerequisite Toy Tasks
|
https://github.com/facebookarchive/bAbI-tasks
|
https://huggingface.co/datasets/facebook/babi_qa
|
MCTest – 150-300 word stories written by crowdworkers
|
2013
|
MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text
|
|
https://huggingface.co/datasets/sagnikrayc/mctest
|
CMU Movie Summary Corpus – Wikipedia movie summaries
|
2013
|
Learning Latent Personas of Film Characters
|
|
|
Cornell Movie Dialog – movie scripts and metadata
|
2011
|
Chameleons in Imagined Conversations: A New Approach to Understanding Coordination of Linguistic Style in Dialogs
|
https://convokit.cornell.edu/documentation/movie.html
|
https://huggingface.co/datasets/cornell_movie_dialog
|
Various corpora from UCSC's Natural Language and Dialogue Systems (NLDS) lab
|
|
|
|
|