Assignment 4: Tagging with HMMs

Original price was: $35.00.Current price is: $30.00.

Rate this product

csc 384

Programming Assignment 4: Tagging with HMMs
Natural Language Processing (NLP) is a subset of AI that focuses on the understanding and generation
of written and spoken language. This involves a series of tasks from low-level speech recognition on
audio signals up to high-level semantic understanding and inferencing on the parsed sentences.
One task within this spectrum is Part-Of-Speech (POS) tagging. Every word and punctuation symbol is
understood to have a syntactic role in its sentence, such as nouns (denoting people, places or things),
verbs (denoting actions), adjectives (which describe nouns) and adverbs (which describe verbs), just to
name a few. Each word in a piece of text is therefore associated with a part-of-speech tag (usually
assigned by hand), where the total number of tags can depend on the organization tagging the text.
While this task falls under the domain of NLP, having prior language experience doesn’t offer any
particular advantage. To minimize the requirement of the prior knowledge for NLP, we use a very simple
set of POS tags: {“VERB” , “NOUN” , “PRON” , “ADJ” , “ADV” , “ADP” , “CONJ” , “DET” , “NUM” , “PRT”
, “X” , “.”}. In the end, the main task is to create a HMM model that can figure out a sequence of
underlying states, given a sequence of observations.
What You Need To Do:
Your task for this assignment is to create a Hidden Markov Model (HMM) for POS tagging including
1. Training probability tables (i.e., initial, transition and emission) for HMM from training files containing
text-tag pairs
2. Doing inference with your trained HMM to predict appropriate POS tags for untagged text.
Your solution will be graded based on the learned probability tables and the accuracy on our (private)
test files, as well as the computational efficiency of your algorithm. See Mark Breakdown for more
Starter Code
The starter code contains one Python starter file and a number of training and test files. You can
download all the code and supporting files as a zip file In that archive you will find the
following files:
Project File (the file you will edit and submit on Markus): The file where you will implement your POS tagger; this is the only
file to be submitted and graded.
8/4/2021 Programming Assignment 4: Tagging with HMMs 3/5
Public Training Files (look, but don’t modify):
data/train-public.txt Containing large texts with POS tags on each word.
data/train-public.ind Containing starting indices for each sentence of the training text.
Public Testing Files (look, but don’t modify):
Test files, identical to the training files but without the POS tags.
The version with a large text is used for testing the efficiency of
your solution.
Identical to those for training.
The ground-truth POS labels for the testing files.
Grader Files (the files used to sanity test your solution):
The public autograding script for testing your solution with provided
training/testing files.
Running the Code
You can run the POS tagger by typing the following at a command line:
$ python3 -d <training file name> -t <test file name>
where the parameters consist of:
a training file name without suffix
a test file name without suffix
The script will read <training file name>.txt and <training file name>.ind for training, and similarly for
testing. The test output (POS predictions) will be written to <test file name>.pred automatically. Here is
an example:
$ python3 -d data/train-public -t data/test-public-small
8/4/2021 Programming Assignment 4: Tagging with HMMs 4/5
Making HMM Tagger
Creating a HMM for POS tagging involves the creation of probability tables (i.e., initial, transition,
emission) that define it which are calculated during training. Then these tables are used to calculate the
most likely tags for a given sequence of words with inference algorithms like the Viterbi algorithm (or
your own variation). You are required to split the training and inference procedure in . In
In train_HMM , the input is the training file name and it is expected to return 3 objects, the prior and
transition distributions which will both be tables and an emission distribution which will be a
dictionary. Detailed specifications can be found in its docstring.
In the function tag , you are expected to call train_HMM and then use the trained model to do POS
prediction on test data.
It is recommended that during training/testing, the whole text file be split into sentences using the
provided sentence indices (in the *.ind files) for training and prediction.
Mark Breakdown
Your mark for this assignment will be split as:
Correctness of the training procedure (42%): this is based on the learned HMM probability tables
(14% for each).
POS prediction accuracy on a small test file (35%)
POS prediction accuracy on a large test file (23%): this will test the computational efficiency of your
solution. You may lose these marks if your code runs out of time during our grading.
To get full marks on any single accuracy test we want to see your code predict correct tags at least 85%
of the time. If your accuracy is less than 85%, the mark you receive will reflect your accuracy. For
example, if you achieve 72% accuracy on a test worth 10 points you will get (72/10) = 7.2 points on that
Sanity Check Your Solution
We provide a grader file for sanity checking your solution which you can run with:
$ python3
This code is only meant as a sanity check to see how your HMM performs; getting 100% on the grader
means that your HMM is not broken. But it does not mean you will get 100% as your final mark! You will
want to create tests of your own that vary training/testing files to see how your code generalizes.
8/4/2021 Programming Assignment 4: Tagging with HMMs 5/5
Different training and testing files than those used in the file will be used to grade your
What to Submit
You will be using MarkUs to submit your assignment. You will submit only one file: your modified .
Make sure to document the code to help our TAs understand the approach you used.
The assignment is due on August 16 (last day of class) at 10pm!


There are no reviews yet.

Be the first to review “Assignment 4: Tagging with HMMs”

Your email address will not be published. Required fields are marked *

Scroll to Top