Dungeons and Dragons as a Dialog Challenge for Artificial Intelligence – dndbeyond.com
|
Dungeons and Dragons as a Dialog Challenge for Artificial Intelligence
|
|
|
|
Deep Dungeons and Dragons (DDD) Corpus – roleplayerguild.com
|
Deep Dungeons and Dragons: Learning Character-Action Interactions from Role-Playing Game Transcripts
|
|
|
|
ROCStories – 5-sentence crowdsourced stories for Story Cloze Test
|
A Corpus and Cloze Evaluation for Deeper Understanding of Commonsense Stories
and
LSDSem 2017 Shared Task: The Story Cloze Test
|
|
|
https://competitions.codalab.org/competitions/15333
|
CaTeRS – Causal and temporal relations using ROC Stories
|
CaTeRS: Causal and Temporal Relation Scheme for Semantic Annotation of Event Structures
|
|
|
|
Scifi TV Plots – science fiction episode summaries from Fandom
|
Story Realization: Expanding Plot Events into Sentences
|
https://github.com/rajammanabrolu/StoryRealization
|
https://huggingface.co/datasets/lara-martin/Scifi_TV_Shows
|
|
WritingPrompts – r/WritingPrompts
|
Hierarchical Neural Story Generation
|
https://github.com/facebookresearch/fairseq/tree/main/examples/stories
|
https://huggingface.co/datasets/rewardsignal/reddit_writing_prompts
|
|
Lit Bank – annotated Project Gutenberg
|
An Annotated Dataset of Literary Entities
and
Literary Event Detection
|
|
|
|
STORIUM – storium.com (gamified storytelling)
|
STORIUM: A Dataset and Evaluation Platform for Machine-in-the-Loop Story Generation
|
https://github.com/dojoteef/storium-gpt2
|
|
|
ESTER – tagged events from news articles from the TempEval3(TE3) workshop
|
ESTER: A Machine Reading Comprehension Dataset for Event Semantic Relation Reasoning
|
https://github.com/PlusLabNLP/ESTER
|
|
https://eventqa.github.io/
|
CMU Movie Summary Corpus – Wikipedia movie summaries
|
Learning Latent Personas of Film Characters
|
|
|
|
The Children’s Book Test – kids' books from Project Gutenberg
|
The Goldilocks Principle: Reading Children’s Books with Explicit Memory Representations
and
Towards AI-Complete Question Answering: A Set of Prerequisite Toy Tasks
|
https://github.com/facebookarchive/bAbI-tasks
|
|
|
Cornell Movie Dialog – movie scripts and metadata
|
Chameleons in Imagined Conversations: A New Approach to Understanding Coordination of Linguistic Style in Dialogs
|
https://convokit.cornell.edu/documentation/movie.html
|
https://huggingface.co/datasets/cornell_movie_dialog
|
|
ScriptWriter – from GraphMovie, which no longer exists (descriptions of movie plots)
|
ScriptWriter: Narrative-Guided Script Generation
|
https://github.com/DaoD/ScriptWriter
|
|
|
NarrativeQA – movie scripts from various sources and Project Gutenberg books
|
The NarrativeQA Reading Comprehension Challenge
|
https://github.com/deepmind/narrativeqa
|
https://huggingface.co/datasets/narrativeqa
|
https://paperswithcode.com/sota/question-answering-on-narrativeqa
|
MCTest – 150-300 word stories written by crowdworkers
|
MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text
|
|
https://huggingface.co/datasets/sagnikrayc/mctest
|
https://paperswithcode.com/dataset/mctest
|
InSentive – authored stories from BookCorpus
|
Inspiration through Observation: Demonstrating the Influence of Automatically Generated Text on Creative Writing
|
https://github.com/roemmele/InSentive
|
|
|
CoAuthor – collaborative writing dataset
|
CoAuthor: Designing a Human-AI Collaborative Writing Dataset for Exploring Language Model Capabilities
|
|
|
|
TimeTravel – stories and counterfactual continuations
|
Counterfactual Story Reasoning and Generation
|
https://github.com/qkaren/Counterfactual-StoryRW
|
|
|
TellMeWhy – Q&A for stories
|
TellMeWhy: A Dataset for Answering Why-Questions in Narratives
|
|
|
|
PerSenT – author sentiment prediction (news articles)
|
Author's Sentiment Prediction
|
|
|
|
EmotionLines – dialog from the Friends TV show & EmotionPush private chat logs
|
EmotionLines: An Emotion Corpus of Multi-Party Conversations
|
|
|
|
TVRecap – TV shows from Fandom and TVMegaSite (soap operas)
|
TVRecap: A Dataset for Generating Stories with Character Descriptions
|
|
|
|
FanFiction Archive – fanfiction.net
|
Beyond Canonical Texts: A Computational Analysis of Fanfiction
|
|
|
|
HPAC
|
Harry Potter and the Action Prediction Challenge from Natural Language
|
https://github.com/aghie/hpac
|
|
|
SummScreen
|
SummScreen: A Dataset for Abstractive Screenplay Summarization
|
|
|
|
SQuAD 2.0 (Stanford Question Answering Dataset) – reading comprehension
|
SQuAD: 100,000+ Questions for Machine Comprehension of Text
and
Know What You Don't Know: Unanswerable Questions for SQuAD
|
|
|
https://worksheets.codalab.org/worksheets/0x8212d84ca41c4150b555a075b19ccc05/
|
Naive Psychology of Characters in Simple Commonsense Stories – "cause and effect of mental state changes of characters in a story"
|
Modeling Naive Psychology of Characters in Simple Commonsense Stories
|
|
|
|
Character Relations
|
Annotating Character Relations in Literary Texts
|
|
|
|
TVShowGuess
|
TVShowGuess: Character Comprehension in Stories as Speaker Guessing
|
|
|
|
Various corpora from UCSC's Natural Language and Dialogue Systems (NLDS) lab
|
|
|
|
|