AI Blitz #9: Completed #educational Weight: 15.0

AIcrowd

5661

311

259

🌈 Welcome thread | 👥 Looking for teammates? | 🚀 Easy-2-Follow Code Notebooks

📝 Don't forget to participate in the Community Contribution Prize!

Introduction

Through the previous puzzle of Emotional Detection, you performed a binary classification task. With this puzzle, we are leveling up and going to perform a multi-class classification. Your input dataset consists of text taken from research papers. You need to build a model which will correctly classify this with a label from 0 to 3.

To solve this challenge, you will be using the concepts of LSTM and Vectorization while employing Tensorflow.

Now, what is LSTM?!

💪 Getting Started

Long Short-Term Memory (LSTM) is a type of recurrent neural network (RNN) that can learn sequence to sequence tasks such as texts. Unlike most feedforward neural networks, LSTM has a feedback connection that helps LSTM to retain the previous information of a text to be able to predict the next set of texts. Read more about the concept of LSTM over here.

Word Vectorization is the second process used in this challenge. Simply put, it converts words into numbers. Why? Because converting words into numbers helps in word prediction and word similarity and semantics. Know more about the concept here.

To solve this challenge, you need to convert text into tokens and encode them using Vectorization. After this, we will train the Tensorflow model with LSTM layers. Test and submit the results to get your score.

AIcrowd's easy-to-use baseline has a breakdown of all the tools and codes required to get started. Find the starter code-kit here.

💾 Dataset

The dataset is fairly easy to understand, again! in any training/validation dataset, there will be two columns - text & label. The text is the abstract from the research papers and the label column represents the category that the research paper falls in.

text	label
Estimating 3D hand meshes from single RGB ...... Each technical component above meaningfully improves the accuracy in the ablation study.	2
The emergence of collective ...... classes and overlapping structures of data.	0

The label categories are as follows - Artificial Intelligence, Machine Learning, Robotics, Computer Vision.

📁 Files

Following files are available in the resources section:

train.csv - (31499 samples) This CSV file containing a text column as the sentence and a label column as the category of the research paper.
val.csv - (2699 samples) This CSV file containing a text column as the sentence and a label column as the emotion of the category of the research paper.
test.csv - (10799 samples) This CSV file containing a text column as the sentence and a label column containing the category of the research paper. This file also serves the purpose of sample_submission.csv

🚀 Submission

Creating a submission directory
Use test.csv and fill the corresponding labels.
Save the test.csv in the submission directory. The name of the above file should be submission.csv.
Inside a submission directory, put the .ipynb notebook from which you trained the model and made inference and save it as original_notebook.ipynb.

Overall, this is what your submission directory should look like -

Zip the submission directory!

Make your first submission here 🚀 !!

🖊 Evaluation Criteria

During the evaluation, the F1 score ( weighted average ) and Accuracy Score will be used to test the efficiency of the model where,

$F1 = 2 * \frac{precision*recall}{precision+recall}$

$x = {-b \pm \sqrt{b^2-4ac} \over 2a}$

🔗 Links

💪 Challenge Page: https://www.aicrowd.com/challenges/research-paper-classification
🗣️ Discussion Forum: https://www.aicrowd.com/challenges/research-paper-classification/discussion
🏆 Leaderboard: https://www.aicrowd.com/challenges/research-paper-classification/leaderboards

📱 Contact

Shubhamai