🧩 Embedding Game Puzzle:: match embedding to its corresponding words.
🕵🏼♀️ What Is Embedding Game Puzzle About?
The first step to solving any AI puzzle often involves pre-processing the data, which includes a very integral step of embedding your textual input data. What if you lose the key to these embedding and cannot get back your textual data?
💪🏼 What You’ll Learn
In this puzzle, you will learn
- What is embedding
- How to group embedding
📝 The Task
In natural language processing, word embedding is a term used to represent words for text analysis, typically in the form of a real-valued vector that encodes the meaning of the word such that the words that are closer in the vector space are expected to be similar in meaning.
Embedding Game is a cool NLP puzzle where you have to figure out a way to get words corresponding to each embedding. The participant can map a pattern from embedding to the words using some of the Embeddings with corresponding Words provided in the training set. Predict the correct word embeddings for the test set.
👩🏽💻 Explore Dataset
The dataset contains 500 samples across training, placeholder vector and test set.
This dataset is based on the core concept of similarity of embedding for similar words. The dataset contains embedding corresponding to a dictionary of words and asks to map embedding to the words for a similar set of words.
Since embedding can be quite similar for similar words, therefore instead of submitting the exact word, one has to submit a set of 3 words out of which any if either of them matches, it is counted towards accuracy.
🗂 Dataset Files
The following files are available in the resources section:
- train_label.csv - (149 samples) This CSV file contains words with their corresponding embeddings.
- placeholder_vector.csv - (179 samples) This CSV file contains embedding for which the word needs to be predicted.
- testwords.txt - (179 samples) This text file contains the words that need to be matched with their corresponding embedding in placeholdervectors.txt.
- sample_submission.csv - A CSV file with the sample for the submission format.
- Creating a submission directory
- Use sample_submission.csv to create your submission. The headers of the columns should be "id" and "word". The id should correspond to the id of the embedding predicted, and the word should have a list of predicted 3 words predicted for that embedding separated by a comma.
- Inside a submission directory, put the .ipynb notebook from which you trained the model and made inference and save it as original_notebook.ipynb.
🚀 Getting Started
The starter kit breaks down everything from downloading the dataset, loading the libraries, processing the data, creating, training, and testing the model.
Click here to access the basic starter kit. This will share in-depth instructions to
- Download the necessary files
- Setup the AIcrow-CLI environment that will help you make a submission directly via a notebook
- Downloading dataset & importing libraries
- Preprocessing the dataset
- Creating the model
- Setting the model
- Training the model
- Submitting the result
- Uploading the results
Check out the starter kit here! 🚀
🔏 Evaluation Criteria
During the evaluation, we will be matching all the 3 words predicted with the ground truth, and if any one of them matches with the actual word, that will be counted as correct. And finally, we will be using Accuracy Score to test the model's efficiency.
🤫 Hint to get started
Use a model to create embedding of all the words and map these embedding with the embedding provided. Once you have the 1-1 mapping function, map the remaining embedding to the embedding provided in the test data to get their exact words.
📚 Resource Circle
Here is a list of resources that will help you learn about the puzzle
- Learn More about embedding here.
- Check out this blog to understand the intuition behind word embedding creation.
👯♀️ Get Help From Community
Hop over to the AIcrowd Blitz discord server to see ongoing discussions about this puzzle.
🙋♀️ Subscription Queries
This is one of the many free Blitz puzzles you can access forever. To access more puzzles from various domains from the Blitz Library and receive a special new puzzle in your inbox every 2 weeks, you can subscribe to AIcrowd Blitz here.
Solution for submission 171641
Embedding Game: a simple algorithm (0.54+)
Solution for submission 171366
Getting Started Notebook for Embedding Game Challenge