Round 1: Completed #educational Weight: 20.0

Tiring-Text

Words are more powerful than actions!

Felicity Threads Team, IIIT Hyderabad

2722

110

🕵️ Introduction

"Words are more powerful than actions", said the great speaker Chichnas. Help him spread his wise work by segregating the words he said.

🤔 Problem Statement

For this challenge, your input will consist of multiple text transcript covering various domains such as math, news, technology wildlife, food, fitness, chess and programming. Given an abstract of a text transcript, your task is to identify which domain does it belong to and label it accordingly.

Understand with code! Here is the getting started code for you.😄

💾 Dataset

The training dataset train.csv contains two columns text and tag text. The categories of the text are [math, news, tech, wildlife, food, fitness, chess, programming]. The training dataset comprises of 79,376 text data points each corresponding to a specific category. The test dataset test.csv contains just a single text column. This comprises 19,844 data points for which tags have to be predicted.

📁 Files

Following files are available in the resources section:

train.csv - (79,376 samples) This CSV file contains two headers. The first header is the text which contains the transcripts and the second header is the tag which contains the label corresponding to that text transcript.
test.csv - (19,844 samples) The file that will be used for actual evaluation for the leaderboard score. It only contains the text transcripts for which the tags have to be predicted.

🚀 Submission

Prepare a CSV containing only the header tag which contains the predicted category of the corresponding text transcript in the test.csv from the above-specified list of categories with name as submission.csv.
Sample submission format available at sample_submission.csv in the resources section.

🖊 Evaluation Criteria

During the evaluation, F1 score will be used to test the efficiency of the model where,

$F1 = 2 * \frac{precision*recall}{precision+recall}$