Round 1: Completed #educational Weight: 20.0

AIcrowd

6232

251

226

Welcome to AI Blitz XII! 🚀 | Starter Kit For This Challenge! 🛠

Community Contribution Prizes 📓 | Find Teammates 👯‍♀️

Discord AI Community 🎧

Introduction

The pandemic may have paused your travel ventures, so we decided to give you a world tour through languages! Machine Translation or robot interpretation is an application of NLP, which simply translates text from one language to another without human contribution.

The Machine translation market is highly valued with tech firms wishing to make their content and services available to the masses. Startups like AppTek, Cloudworks, LingoTek, and many more utilize ML to the fullest to achieve smooth multi-lingual services.

Following the NLP theme of AI Blitz⚡XII, we present Language Translation, a challenge where AIcrowd presents its participants to try their hands at Machine Translation through our very own invented language “Crowd Talk”.

Coming with its own vocabulary and grammar, Crowd Talk, is sure to be a challenge waiting to be conquered or rather translated. Your model will be expected to perform straightforward replacement of words in the language by learning translation through NLP techniques.

💪 Getting Started

The puzzle with its roots mapping up to the 1950s, when the prompt was first brought up, tackled through the 3 major Machine Translation Techniques.

Statistical Machine Translation

It operates by referencing statistical models that rely on the analysis of massive amounts of the two languages. And then tries to run inference between a word from the source language and a word from the objective language.

Rule-Based Machine Translation

RBMT translates the fundamentals of grammatical rules from the source language to the objective language by directing a thorough grammatical examination of both.

Hybrid Machine Translation
A mix version of SMT and RBMT, utilizing a translation memory making it a far superior method of the three. It can be implemented using multiple methods including multi-engine, statistical rule generation, multi-pass, and confidence-based.

However, to make things easier we have attached a starter kit, which uses a very simple approach of creating a one to one mapping of each word in crowdtalk to english.

💾 Dataset

For this challenge, we have created a new language just for you. The dataset consists of sentences in English and their corresponding translation to the new language “CrowdTalk”.
The train.csv file contains the sentences with their translation and you need to make the translation for the sentence in the test.csv. A sample submission file is also provided for submission format.

📁 Files

Following files are available in the resources section:

train.csv - This CSV file contains three columns with headers: id, English, crowdtalk where id is a unique identifier for the sentence and English and crowdtalk is the sentence in the respective language.
test.csv - This CSV file has two columns with a header as id and crowdtalk where id is the sentence id and crowdtalk is the sentence you need to translate to english.
Sample_submission.csv - This is the sample submission. Your submission should be in this format for your score to be evaluated

🚀 Submission

Creating a submission directory
Use sample_submission.csv to create your submission. The headers of the columns should be "id" and "prediction".
Inside a submission directory, put the .ipynb notebook from which you trained the model and made inference and save it as original_notebook.ipynb.

Overall, this is what your submission directory should look like -

Zip the submission directory!

Make your first submission here 🚀 !!

🖊 Evaluation Criteria

During the evaluation, Bleu score will be used to test the efficiency of the model.

📱 Contact

Aditya Jha
Shubhamai