## Introduction
This is the companion notebook to [HW 2](https://laramartin.net/interactive-fiction-class/homeworks/plan-and-write/plan-and-write.html) for CMSC 491/691 Interactive Fiction and Text Generation.

Code snippets for calling Mistral from https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Mistral_v0.3_(7B)-Conversational.ipynb and https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Mistral_(7B)-Text_Completion.ipynb

## Installations
 Install the necessary libraries.

In [18]:
%%capture
import os, re
if "COLAB_" not in "".join(os.environ.keys()):
    !pip install unsloth
else:
    # Do this only in Colab notebooks! Otherwise use pip install unsloth
    import torch; v = re.match(r"[0-9\.]{3,}", str(torch.__version__)).group(0)
    xformers = "xformers==" + ("0.0.32.post2" if v == "2.8.0" else "0.0.29.post3")
    !pip install --no-deps bitsandbytes accelerate {xformers} peft trl triton cut_cross_entropy unsloth_zoo
    !pip install sentencepiece protobuf "datasets>=3.4.1,<4.0.0" "huggingface_hub>=0.34.0" hf_transfer
    !pip install --no-deps unsloth
!pip install transformers==4.55.4
!pip install --no-deps trl==0.22.2

**Important: You will probably need to restart your session after installing the libraries above.**

## Getting the Data

This is a modified dataset from the [Plan, Write, and Revise paper](https://aclanthology.org/N19-4016/). It contains the 5-sentence stories, their extracted keywords, and their corresponding titles. We will only be looking at 20 stories from this dataset.

I have written functions below to extract the relevant data.
* `load_data` will return a list of all of the data in the file.
* `get_story` will return a list of the sentences in the story.
* `get_title` will return the title of a story from a given line.
* `get_keywords` will return the keywords of a story from a given line.

In [None]:
!wget https://raw.githubusercontent.com/lara-martin/interactive-fiction-class/refs/heads/master/homeworks/plan-and-write/keyword-story-roc.txt

In [None]:
#fixing my car <EOT> car turn tested alternator bad <EOL> </s> i went to start my car last friday . </s> my car would n't turn over . </s> i took my alternator off to be tested . </s> the parts store said that it was bad . </s> i replaced my alternator with a new one .

from collections import defaultdict

def load_data():
  stories = []
  with open('keyword-story-roc.txt','r') as in_file:
    reader = in_file.readlines()
    for line in reader[1:21]:
      stories.append(line.strip())
  return stories

def get_title(line):
  title, _ = line.split(" <EOT> ")
  return title

def combine_words(sentence):
  s = sentence.strip().replace(" .",".").replace(" n't","n't").replace(" '","'").replace(" ,",",").replace(" i "," I ").replace(" !","!")
  return s.upper()[0]+s[1:]

def get_story(line):
  _, story = line.split(" <EOL>")
  sentences = [sentence.strip() for sentence in story.split(" </s>") if sentence]
  return sentences


def get_keywords(line):
  _, rest = line.split(" <EOT> ")
  keywords,_ = rest.split(" <EOL> ")
  keydict = defaultdict(list)
  index = 0
  for keyword in keywords.split():
    if keyword == "#":
      index+=1
    else:
      keydict[index].append(keyword)
  return keydict

raw_stories = load_data()

In [None]:
# Here are some print statements to show what each of these looks like
print(get_keywords(raw_stories[0])) #dictionary of keywords for each sentence; keys are sentence index, values are lists of keywords
print(get_title(raw_stories[0]))
print(get_story(raw_stories[0]))

defaultdict(<class 'list'>, {0: ['dreams', 'singer'], 1: ['practiced'], 2: ['greet'], 3: ['people'], 4: ['amazement', 'wanted']})
Big Break
['Ben was a struggling actor.', 'He went to several auditions.', 'He finally got a call back on one of them.', 'Ben got the part.', 'He was excited to start getting paid to do what he loved.']


## Setup the model

This next step will take a couple minutes to download the model, although we will be using a library (Unsloth) that will load it faster.

If you are curious about the model you'll be using, you can check out the [Mistral documentation on HuggingFace](https://huggingface.co/docs/transformers/v4.57.1/model_doc/mistral)

In [34]:
#code from https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Mistral_v0.3_(7B)-Conversational.ipynb

from unsloth import FastLanguageModel
import torch
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "unsloth/mistral-7b-v0.3",
    max_seq_length = 2048, # Choose any!
    dtype = None, # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+
    load_in_4bit = True, # Use 4bit quantization to reduce memory usage. Can be False.
)

==((====))==  Unsloth 2025.10.4: Fast Mistral patching. Transformers: 4.55.4.
   \\   /|    Tesla T4. Num GPUs = 1. Max memory: 14.741 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.8.0+cu126. CUDA: 7.5. CUDA Toolkit: 12.6. Triton: 3.4.0
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.32.post2. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


model.safetensors:   0%|          | 0.00/4.14G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/157 [00:00<?, ?B/s]

tokenizer_config.json: 0.00B [00:00, ?B/s]

tokenizer.model:   0%|          | 0.00/587k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/446 [00:00<?, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

## Generating Stories

Now you will use Mistral to generate stories. You can change the `messages` variable to provide the model with a different prompt.

Follow the instructions on [the homework page](https://laramartin.net/interactive-fiction-class/homeworks/plan-and-write/plan-and-write.html) to know what you should be generating.

In [4]:
uncontrolled_stories = []
controlled_stories = []

In [35]:
%%capture
from unsloth.chat_templates import get_chat_template

tokenizer = get_chat_template(
    tokenizer,
    chat_template = "mistral",
    mapping = {"role" : "from", "content" : "value", "user" : "human", "assistant" : "gpt"}, # ShareGPT style
    map_eos_token = True, # Maps <|im_end|> to </s> instead
)

FastLanguageModel.for_inference(model) # Enable native 2x faster inference

In [48]:
def decodeUncontrolled(title):
  #TODO: edit the prompt ("messages") to use the title for generating the story
  messages = [
      {"from": "human", "value": "A long time ago in a galaxy far, far away"},
  ]
  inputs = tokenizer.apply_chat_template(
      messages,
      tokenize = True,
      add_generation_prompt = True, # Must add for generation
      return_tensors = "pt",
  ).to("cuda")

  outputs = model.generate(input_ids = inputs, max_new_tokens = 64, use_cache = True)
  return tokenizer.batch_decode(outputs)


In [50]:
def decodeControlled(title, keywords):
  #TODO: edit the prompt ("messages") to use the title and keywords for generating the story
  messages = [
      {"from": "human", "value": "A long time ago in a galaxy far, far away..."},
  ]
  inputs = tokenizer.apply_chat_template(
      messages,
      tokenize = True,
      add_generation_prompt = True, # Must add for generation
      return_tensors = "pt",
  ).to("cuda")

  outputs = model.generate(input_ids = inputs, max_new_tokens = 64, use_cache = True)
  return tokenizer.batch_decode(outputs)


In [49]:
#Example output to show how it's formatted
decodeUncontrolled("")

['<s>[INST] A long time ago in a galaxy far, far away [/INST]\n\nThe Star Wars saga is one of the most popular and enduring film franchises of all time. The original trilogy, released between 1977 and 1983, introduced audiences to a galaxy far, far away and its iconic characters, including Luke Skywalker,']

Run the following two blocks to iterate through the stories and call your prompts.

In [None]:
for story in raw_stories:
  decoded_text = decodeUncontrolled(get_title(story))
  uncontrolled_stories.append(decoded_text)

In [None]:
for story in raw_stories:
  decoded_text = decodeControlled(get_title(story),get_keywords(story))
  controlled_stories.append(decoded_text)

# Evaluation

Now you will compare your the unguided generated stories to the original stories and compare the guided generated stories to the original stories using the following libraries:

BLEU -
[https://www.nltk.org/api/nltk.translate.bleu_score.html](https://www.nltk.org/api/nltk.translate.bleu_score.html)
- Run *modified n-gram precision* with unigrams and bigrams

ROUGE -
[https://pypi.org/project/rouge-score/](https://pypi.org/project/rouge-score/)
- Run unigrams, bigrams, and ROUGE-L




In [None]:
import statistics
def BLEU(hypothesis, target, n=1):
  # TODO: calculate BLEU by comparing the generated story (hypothesis)
  # to the original story (target) looking at n-grams for multiple n's
  # return the average score across the sentences of the stories
  return

def ROUGE(hypothesis, target, n="L"):
  # TODO: calculate ROUGE by comparing the generated story (hypothesis)
  # to the original story (target) looking at n-grams for multiple n's AND ROUGE-L
  # return the average score across the sentences of the stories
  return


In [None]:
avg_BLEU_1_controlled = []
avg_BLEU_2_controlled = []
avg_BLEU_1_uncontrolled = []
avg_BLEU_2_uncontrolled = []

avg_ROUGE_1_controlled = []
avg_ROUGE_2_controlled = []
avg_ROUGE_L_controlled = []
avg_ROUGE_1_uncontrolled = []
avg_ROUGE_2_uncontrolled = []
avg_ROUGE_L_uncontrolled = []


# Iterate through BLEU and ROUGE for all 20 story prompts & average the scores
for i, line in enumerate(raw_stories):
  print(get_story(line))
  print(controlled_stories[i])
  avg_BLEU_1_controlled.append(BLEU(get_story(line),controlled_stories[i], 1))
  avg_BLEU_2_controlled.append(BLEU(get_story(line),controlled_stories[i], 2))
  avg_BLEU_1_uncontrolled.append(BLEU(get_story(line),uncontrolled_stories[i], 1))
  avg_BLEU_2_uncontrolled.append(BLEU(get_story(line),uncontrolled_stories[i], 2))

  avg_ROUGE_1_controlled.append(ROUGE(get_story(line),controlled_stories[i], "1"))
  avg_ROUGE_2_controlled.append(ROUGE(get_story(line),controlled_stories[i], "2"))
  avg_ROUGE_L_controlled.append(ROUGE(get_story(line),controlled_stories[i], "L"))
  avg_ROUGE_1_uncontrolled.append(ROUGE(get_story(line),uncontrolled_stories[i], "1"))
  avg_ROUGE_2_uncontrolled.append(ROUGE(get_story(line),uncontrolled_stories[i], "2"))
  avg_ROUGE_L_uncontrolled.append(ROUGE(get_story(line),uncontrolled_stories[i], "L"))


print("\t\t| BLEU-1\t| BLEU-2\t| ROUGE-1\t| ROUGE-2\t| ROUGE-L\t|")
print("Controlled\t| {}\t\t| {}\t\t| {}\t\t| {}\t\t| {}\t\t|".format(statistics.mean(avg_BLEU_1_controlled),statistics.mean(avg_BLEU_2_controlled),statistics.mean(avg_ROUGE_1_controlled),statistics.mean(avg_ROUGE_2_controlled),statistics.mean(avg_ROUGE_L_controlled)))
print("Uncontrolled\t| {}\t\t| {}\t\t| {}\t\t| {}\t\t| {}\t\t|".format(statistics.mean(avg_BLEU_1_uncontrolled),statistics.mean(avg_BLEU_2_uncontrolled),statistics.mean(avg_ROUGE_1_uncontrolled),statistics.mean(avg_ROUGE_2_uncontrolled),statistics.mean(avg_ROUGE_L_uncontrolled)))