Loading
Feedback
2019: Completed

EPFL ML Text Classification

Project 2: build our own text classifier system, and test its performance.

4290
171
28
1166

Introduction

See detailed instructions on the course github, including the PDF project description.

Dataset

File descriptions -

  • train_pos.txt and train_neg.txt - a small set of training tweets for each of the two classes. (Dataset available in the zip file, see link below)
  • train_pos_full.txt and train_neg_full.txt - a complete set of training tweets for each of the two classes, about 1M tweets per class. (Dataset available in the zip file, see link below)
  • test_data.txt - the test set, that is the tweets for which you have to predict the sentiment label.
  • sampleSubmission.csv - a sample submission file in the correct format, note that each test tweet is numbered. (submission of predictions: -1 = negative prediction, 1 = positive prediction)

Note that all tweets have been tokenized already, so that the words and punctuation are properly separated by a whitespace. # Evaluation Criteria

Your submission will be evaluated in terms of classification error (accuracy).

Rules

Each participant is allowed to make 5 submissions per day (i.e. up to 15 submissions per team per day). Failed submissions (e.g. wrong submission file format) do not count.

Participants

Leaderboard

01
  TWN1
0.909
02
0.904
03
0.897
04
0.885
04 TuT 0.885