Assignment 5: Exercises with ANNs

Deadline: July 13, 2020 @08:00 CEST

This assignment includes a set of exercises with artificial neural networks. In particular, we will experiment with feed-forward (MLP) and recurrent network architectures.

The problem we are trying to solve is a simplified POS tagging problem. We are interested in predicting the POS tag of a word using only the characters (or character sequences) present in the word. In a real-world POS tagging application, it is almost unthinkable not to use the context of the word. However, the way we formulate the problem keeps the computation requirements low and allows for additional experimentation.

For this set of exercises the data comes from Universal Dependencies treebanks. The data is not included in your repository, and you are free to work on any language you like. However, your code should run on any valid CoNLL-U-formatted treebank. You are expected to use Keras for defining the neural networks in this exercise. As usual, please implement all exercises as indicated in the provided template.

Exercises

5.1 Read the data (1p)

Implement the function read_data() in the template, which reads a CoNLL-U-formatted treebank, and returns unique word-POS tag pairs, with some optional pre-processing and filtering. In particular:

Note that we are working on unique word-POS combinations. For example, for English, the data you return should have the-DET pair only once.

Our usage does not require any complicated processing of CoNLL-U files. You can treat the input as tab-separated files (after skipping comments and blank lines), and read them without using any special library. However, you can also to use an external library if you prefer.

5.2 Encoding words (3p)

Implement the class WordEncoder which is used for encoding a set of words as a sequence of one-hot representations of characters in each word. Similar to earlier assginemnts, the API defines a fit() method which collects information from a set of words (training set), and a transform() method that encodes the given list of words based on the information collected in the fit() method. Please follow the instructions in the template for details of the API.

The transform() method is required to output two related but different encodings for a given word. Assume we have the following one-hot codes for letters a, b, and c:

a [0,0,0,0,1,0,0]
b [0,0,0,1,0,0,0]
c [0,0,1,0,0,0,0]

and we want to encode the words ‘bb’ and ‘acc’, and the maximum word length is set to 4 (You are also asked to append beginning- and end-of-sequence symbols. We skip them here for simplicity).

5.3 Training and testing an MLP (4p)

Implement the function train_test_mlp(), which trains a simple MLP predicting the POS tags from the encoded words. Your function should

Note that the dimensions of input, and the type/size of the final output (classification) layer is determined by the data (and the problem). You can freely choose the options that are not specified above, e.g., mini batch size, or optimization algorithm, or use the library defaults when applicable.

For computational efficiency the above procedure suggests performing a ‘naive’ early-stopping method for determining when to stop (best epoch). However, you are encouraged to experiment with tuning other hyperparameters, for example:

The model used in this exercise is simply ‘wrong’ for processing sequences. You should think about why this model is not suitable for the task, and why it works as much as it works on this particular problem.

5.4 Training and testing an RNN (2p)

Implement the function train_test_rnn(), which trains a gated recurrent neural network for solving the same problem. Use a gated recurrent network of your choice (e.g., GRU or LSTM), with 64 hidden units. The classifier layer should be trained on the final representation built by the RNN for the whole sequence (this is the default behaviour in Keras’ RNN layers).

Your function should perform the same steps as in Exercise 5.3. However, you should use an RNN instead of an MLP.

As well as the additional questions/tasks listed for Exercise 5.3, you are encouraged to experiment with different RNN architectures, including the simple RNNs, and bidirectional versions of them.