AIcrowd | Fake News Detection

Round 1: Completed Weight: 30.0

AIcrowd

2016

🕵️ Introduction

Today, we are producing more information than ever before, but not all information is true. Some of it is actually malicious and harmful. And it makes it harder for us to trust any piece of information we come across! Not only that, now the bad actors are able to use language modelling tools like Open AI's GPT 2 to generate fake news too. Ever since its initial release, there have been talks on how it can be potentially misused for generating misleading news articles, automating the production of abusive or fake content for social media, and automating the creation of spam and phishing content.

How do we figure out what is true and what is fake? Can we do something about it?

This challenge does exactly that! In this challenge, you differentiate real news from the fake news generated by GPT 2. Given a dataset of various texts , can you predict whether or not they are real/fake?

With such rampant fake news, our trust in our institutions is starting to shake, and this challenge initiates efforts to tackle SDG 16 - Trust in (Government) institutions.

Understand with code! Here is getting started code for you.😄

💾 Dataset

The dataset consists of around 387,000 pieces of texts which has been sourced from various news articles from the web as well as texts generated by Open AI's GPT 2 language model!

The dataset is split into train,val and test such that each of the sets has an equal split of the two classes.

📁 Files

Following files are available in the resources section:

train.csv - (232,003 samples) This csv file contains 232,003 texts and their corresponding labels i.e. whether the text is fake or real.
val.zip - (38,666 samples) This csv file contains 38,666 texts and their corresponding labels i.e. whether the text is fake or real.
test.zip - (115,999 samples) This csv file contains 115,999 texts without their corresponding label.

🚀 Submission

Prepare a CSV containing header as [label] and predicted value as real or fake in the same order as the test set.
Name of the above file should be submission.csv.
Sample submission format available at sample_submission.csv in the resources section.

Make your first submission here 🚀 !!

🖊 Evaluation Criteria

During evaluation F1 score where,

$F1 = 2 * \frac{precision*recall}{precision+recall}$

📱 Contact

References

Open AI's GPT 2