Loading
Feedback
Round 1: 16 days left #fragrance
6428
287
16
300

đŸ•ĩī¸ Introduction

There are so many distinct odors in everything we see or interact with. Our reactions to different smells are almost always instant and instinctual, not cultivated. A particular smell can sometimes trigger a specific memory too. Still, most of us would not know how our brain categorizes different smells from different sensory inputs.


What happens when particles responsible for smell enter our nose?

Our noses have more than 400 types of olfactory receptors expressed in 1 million+ olfactory sensory neurons, which are all on a small tissue - olfactory epithelium. The olfactory sensory neurons send signals to the olfactory bulb in the brain and then to more structures from there, to understand the smell.


We are turning this process digital!

In our noses, what finally goes in are particles that have odorant molecules responsible for the smell. These molecules are the actual building blocks of all fragrances. For this challenge, we take these molecular compounds as an input, parse them through, and predict what multitude of fragrances they contain out of 100+ different ones.

jasmin
ethereal,jasmin,aldehydic,fruity
green,herbal,powdery,grass
cacao,floral,honey
🤔❓🤔


Understand with code! Here is getting started code for you.😄

💾 Dataset

The dataset contains the description of molecules (as its SMILES string), and the odors it possesses. The challenge is a multiclassification problem, each molecule has multiple odors written in a form of a sentence with a single , between each odor. Following are the columns in the dataset with their description:

  • SMILES: Simplified molecular-input line-entry system (SMILES) is a specification in the form of a line notation for describing the structure of chemical species using short ASCII strings.

  • SENTENCE (target): Its a combination of the odors of the molecules. Each odor is separated by a , to form an (odor) sentence.

📁 Files

v0.1

Following files are available in the resources section:

  • train.csv - (4316 molecules) : This csv file contains the attributes describing the molecules along with their "Sentence" .
  • test.csv - (1079 molecules) : File that will be used for actual evaluation for the leaderboard score but does not have the "Sentence" for molecules.
  • vocabulary.txt : A file containing the list of all odors present in the dataset

🚀 Submission

  • Prepare a CSV file containing header as SMILES, PREDICTIONS.

  • The SMILES column has to contain the SMILES values as mentioned in the test set

  • The PREDICTIONS column has to contain the the top-5 predictions of your model separated by ; where each of the odors in each sentence is separated by ,
    For example, if the value of the PREDICTIONS column for a particular row is :
    coconut,cooling,watery;ambergris,plum,ripe;almond,gourmand,pungent;cognac,dry,medicinal;geranium,lactonic,medicinal
    Then, the top-5 predictions of your model are :

  • coconut,cooling,watery

  • ambergris,plum,rip

  • almond,gourmand,pungent

  • cognac,dry,medicinal

  • geranium,lactonic,medicinal

    Note: If any of the sentences contain more than 3 words, then only the first 3 words will be considered for evaluation.

  • Sample submission format available at sample_submission.csv in the Resources section.

🖊 Evaluation Criteria

The evaluation of the submissions is done using the Jaccard Index / Tanimoto Similarity Score.
Description of odour can be heteregenous based on personal experience, perfumer, company, so it is hard to expect to get an unique and perfect description. In this case, we can evaluate the best sentence matching in proposed Top 5 sentences.

For example, if for a single molecule, the ground truth is : floral, green, rose and the top-5 proposed sentences are :

  • rose, green, apricot
  • floral, muguet, jasmin
  • floral, rose, green
  • floral, green, melon
  • muguet, rose, woody

Then the Jaccard Index is computed for all the top-5 sentences in comparison to the ground truth, and the best score across all the 5 predictions is considered for the said molecule. The overall score is computed by taking the mean of the said score across all the molecules in the test set.

📅 Rounds

The competiton consists of 3 separate Rounds.

  • Round-1 : September 8th, 2020 - October 6th, 2020
  • Round-2 : October 6th, 2020 - November 3rd, 2020
  • Round-3 : November 3rd, 2020 - December 15th, 2020

🏆 Prizes

The top 2 participants of the Round-3 will be awarded a cash prize of:

  • 1st Prize : CHF 4,000
  • 2nd Prize : CHF 2,000

📱 Contact

📚 Acknowledgement

We have the permission to use Olfactive descriptions and Molecules from "PMP database" authored by Mans Boelens and distributed by Leffingwell & Associates for this challenge.

Participants

Leaderboard

01 Duck 0.393
02
0.384
03 Fredrik 0.383
04 Sleeper 0.381
05 robert_allaway 0.377

Latest Submissions

Fredrik graded
IuliiVasilievMOSMSU graded
IuliiVasilievMOSMSU graded
BanKhv graded
guillecg graded