Loading
2 Follower
0 Following
casvanboekholdt

Organization

Tilburg University

Location

NL

Badges

2
1
1

Activity

May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
Jan
Feb
Mar
Apr
May
Mon
Wed
Fri

Ratings Progression

Loading...

Challenge Categories

Loading...

Challenges Entered

Self-driving RL on DeepRacer cars - From simulation to real world

Latest submissions

No submissions made in this challenge.

3D Seismic Image Interpretation by Machine Learning

Latest submissions

No submissions made in this challenge.

Predicting smell of molecular compounds

Latest submissions

See All
graded 98203
graded 98202
graded 98183

5 PROBLEMS 3 WEEKS. CAN YOU SOLVE THEM ALL?

Latest submissions

See All
graded 98203
graded 98202
graded 98183
Participant Rating
spiglerg 0
contrebande 0
Participant Rating
casvanboekholdt has not joined any teams yet...

Learning to Smell

Question following the townhall meeting

Over 3 years ago

Thank you for the response, @guillaumegodin.

I can see how the augmentation would work in practice. However, when I create a fingerprint embedding of e.g. Smiles 1 and Smiles1, aug1, they are the same. So then how does this replication add any value to the data? What kind of input representation preserves the difference between these augmented SMILES?

Best,
Cas

Question following the townhall meeting

Over 3 years ago

Dear @guillaumegodin,

I have a question regarding one of your statements in yesterday’s townhall meeting for the learning to smell challenge.

You mentioned that rearranging the SMILES can improve accuracy on tasks. I have been trying to find out a way to use this, but have not yet been successful. I have found your contribution to RDKit for this, which works fine. But now I am stuck finding a way to use these additional SMILES. Any sort of fingerprint type embedding will be the same for all of the generated SMILES, so there is nu use in extra SMILES using fingerprint embeddings. I have tried multiple different ways to represent SMILES without using any embeddings, such as by char_to_int converting with zero padding and LSTMS’s, but none are able to predict above chance level. My background is not in chemistry, so I am likely missing something quite obvious here due to my lack of domain knowledge.

Could you please point us in a direction of a type of input representation that can make use of these newly generated SMILES?

Thank you in advance.

Best,

Cas van Boekholdt

Explained by the Community | 200 CHF Cash Prize X 5

Over 3 years ago

Hi everyone,

I wrote a Google Colab tutorial/explainer on how to use vectors created with the SMILESVec package to train a fully-connected neural network using Tensorflow Keras on the learning to smell dataset:

https://colab.research.google.com/drive/1cePlnWwWOsYxwqs8NWebVHFwRr624tNc?usp=sharing

Let me know if you have any suggestions or questions, always happy to help out!

Cheers,
Cas

Test labels

Over 3 years ago

You can evaluate your model either by making predictions on the test set and uploading them, or splitting the labeled training set into a training and validation set.

https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html

casvanboekholdt has not provided any information yet.

Notebooks

Create Notebook