"Words are more powerful than actions", said the great speaker Chichnas. Help him spread his wise work by segregating the words he said.
🤔 Problem Statement
For this challenge, your input will consist of multiple text transcript covering various domains such as math, news, technology wildlife, food, fitness, chess and programming. Given an abstract of a text transcript, your task is to identify which domain does it belong to and label it accordingly.
Understand with code! Here is the getting started code for you.😄
The training dataset
train.csv contains two columns
tag text. The categories of the text are
[math, news, tech, wildlife, food, fitness, chess, programming]. The training dataset comprises of
79,376 text data points each corresponding to a specific category. The test dataset
test.csv contains just a single
text column. This comprises
19,844 data points for which tags have to be predicted.
Following files are available in the
79,376samples) This CSV file contains two headers. The first header is the
textwhich contains the transcripts and the second header is the
tagwhich contains the label corresponding to that text transcript.
19,844samples) The file that will be used for actual evaluation for the leaderboard score. It only contains the text transcripts for which the tags have to be predicted.
- Prepare a CSV containing only the header
tagwhich contains the predicted category of the corresponding text transcript in the
test.csvfrom the above-specified list of categories with name as
- Sample submission format available at
sample_submission.csvin the resources section.
🖊 Evaluation Criteria
During the evaluation, F1 score will be used to test the efficiency of the model where,
Baseline for TIRING TEXT Challenge