Skip to main content
The assignment is due on Wednesday, February 28, 2024 before 11:59PM.
Submission Link: https://classroom.google.com/c/NjUwNDE2MzEwMzQx/a/NjYyODU5Mzg4NDEw/details

Please be sure to double check the academic integrity and generative AI policies listed on the syllabus.

Homework 1: Being up to the Task

Learning Objectives

  • Searching for basic information about NLP tasks.
  • Exploring a dataset.
  • Coming up with appropriate tasks for an application & providing your reasoning behind it.
  • Determining appropriate inputs and outputs for the tasks.
  • Creating a system diagram.

Description

You work for SuperDuperAI (SDAI), a start-up company that makes AI tools that their customers can use. You are their NLP specialist. One of SDAI’s customers recently came to the company with a database of textbooks that they collected. They want SDAI to make them an app that can quiz people when they select a textbook.

The flow of the app will look like this:
a. The user types in a keyword that they’re interested in, and the app finds relevant textbooks.
b. They select the textbook and chapter they want to use.
c. The app displays a question relevant to the chapter.
d. The user answers the question.
e. The app gives a numerical score for how well the user answered the question.

Being the NLP specialist on the team, you are in charge of figuring out what is needed to create parts a, c, and e.

Question 1: Define the tasks (10 points)

You know about the following tasks, but you forget exactly what they do.

  1. Document Ranking
  2. Information Extraction
  3. Part-of-Speech Tagging
  4. Question Answering
  5. Relation Extraction
  6. Semantic Role Labelling
  7. Sentence Boundary Disambiguation
  8. Sentence Similarity
  9. Summarization
  10. Text segmentation

For this question, look up each of these NLP tasks, find a source that gives a definition, and give a direct quote of what your source said (and, of course, include where you found it).

For example, if I was defining what UMBC is:
UMBC: “University of Maryland, Baltimore County (UMBC) is a top-ranked national university with an inclusive culture that connects innovative teaching and learning, research across disciplines, and civic engagement.” (UMBC, https://umbc.edu/about/)

Please note that you will be graded on the accuracy of your definition, so make sure your source is reliable.

Important: You will get zero (0) points for this if you do not reference your sources.

Question 2: Select what task(s) are right for each step (18 points, 6 for each part)

There are no particular “right” answers that we are looking for. However, whatever you decide to pick, you must 1) explain why it would be a good fit (2 pts each) and 2) explain what would be the inputs (2 pts each) & outputs (2 pts each) to a model for the task. Select and motivate your selection for each of a, c, and e from the app description. (Do not do b or d.) You can use 1 or more tasks for each part.

Remember, you are using this corpus of textbooks. Give concrete examples of how your ideal inputs and outputs would look using this data! If you need any other data for your tasks, explain what that dataset might look like—you don’t have to actually find it. For example, if I was training an image recognition model, the inputs would be the images and the outputs would be class labels for what object is in the image.

Question 3: Draw a diagram (5 points)

Now that you have all of the components, draw a flow diagram of where the data is coming from and going to, what processes you have to do (nodes with your chosen tasks), and how the data would be transformed. Include all steps a-e. You will be graded on whether it includes all the steps and clearly conveys the information.

You can draw your diagram in any app you want. Diagram apps I use often are MS Powerpoint and draw.io. No hand-drawn diagrams, please.

Grading

  • Question 1 - 10 points
  • Question 2 - 18 points
  • Question 3 - 5 points