AIcrowd | EPFL ML Text Classification

2019: Completed

2020: Completed

2021: Completed

2022: Completed

2023: Completed

2024: Completed

2025: Completed #classroom

EPFL ML

20.3k

877

135

4695

Introduction

See detailed instructions on the course github, including the PDF project description.

Dataset

File descriptions -

train_pos.txt and train_neg.txt - a small set of training tweets for each of the two classes. (Dataset available in the zip file, see link below)
train_pos_full.txt and train_neg_full.txt - a complete set of training tweets for each of the two classes, about 1M tweets per class. (Dataset available in the zip file, see link below)
test_data.txt - the test set, that is the tweets for which you have to predict the sentiment label.
sampleSubmission.csv - a sample submission file in the correct format, note that each test tweet is numbered. (submission of predictions: -1 = negative prediction, 1 = positive prediction)

Note that all tweets have been tokenized already, so that the words and punctuation are properly separated by a whitespace.

Evaluation Criteria

Your submission will be evaluated in terms of classification error (accuracy).

Rules

Each participant is allowed to make 5 submissions per day. If you participate as a team, the whole team gets 5 submissions, not 15 as the rules page states. Failed submissions (e.g. wrong submission file format) do not count.