EPFL ML Text Classification
Project 2: build our own text classifier system, and test its performance.
See detailed instructions on the course github, including the PDF project description.
File descriptions -
- train_pos.txt and train_neg.txt - a small set of training tweets for each of the two classes. (Dataset available in the zip file, see link below)
- train_pos_full.txt and train_neg_full.txt - a complete set of training tweets for each of the two classes, about 1M tweets per class. (Dataset available in the zip file, see link below)
- test_data.txt - the test set, that is the tweets for which you have to predict the sentiment label.
- sampleSubmission.csv - a sample submission file in the correct format, note that each test tweet is numbered. (submission of predictions: -1 = negative prediction, 1 = positive prediction)
Note that all tweets have been tokenized already, so that the words and punctuation are properly separated by a whitespace. # Evaluation Criteria
Your submission will be evaluated in terms of classification error (accuracy).
Each participant is allowed to make 5 submissions per day (i.e. up to 15 submissions per team per day). Failed submissions (e.g. wrong submission file format) do not count.