Loading
Round 1: 78 days left

EPFL ML Text Classification 2019

Project 2: build our own text classifier system, and test its performance.

1058
86
30

Introduction

See detailed instructions on the course github, including the PDF project description.

Dataset

File descriptions -

  • train_pos.txt and train_neg.txt - a small set of training tweets for each of the two classes. (Dataset available in the zip file, see link below)
  • train_pos_full.txt and train_neg_full.txt - a complete set of training tweets for each of the two classes, about 1M tweets per class. (Dataset available in the zip file, see link below)
  • test_data.txt - the test set, that is the tweets for which you have to predict the sentiment label.
  • sampleSubmission.csv - a sample submission file in the correct format, note that each test tweet is numbered. (submission of predictions: -1 = negative prediction, 1 = positive prediction)

Note that all tweets have been tokenized already, so that the words and punctuation are properly separated by a whitespace. # Evaluation Criteria

Your submission will be evaluated in terms of classification error (accuracy).

Rules

Each participant is allowed to make 5 submissions per day (i.e. up to 15 submissions per team per day). Failed submissions (e.g. wrong submission file format) do not count.