AIcrowd | Predict Measurements for Nuclear Waste Canister

Round 1: Completed #regression

AIcrowd &

VITA Lab, EPFL

5522

1454

🕵️ Overview

Can you predict the evolution of the temperature around nuclear waste canisters using surrounding sensors and other data such as pressure and relative humidity?

This challenge aims to explore the potential of replacing computationally intensive methods like finite element methods with faster machine learning mechanisms for predicting measurements. The challenge involves working with measurements from approximately 1000 sensors around different canisters and evaluating the ability to predict the evolution of the temperature in the same tunnel configuration for unseen sensors at other locations.

Your task is to investigate and propose your own model to outperform your peers. You must understand any preprocessing and any architecture you use as you will need to give it a description during code submission as well as explain it during the poster presentation.

The challenge ends on the 26th of May at 23:59. So do not postpone submitting your predictions.

💾 Dataset

You can find the dataset on Moodle CIVIL-226.

The data is split between the training data, which you can use to train your models, and the test data, which you will use to make your final predictions.

There are different kind of CSV files. The one containing the name, the material composition around the sensors and the position of the sensors in the 3D-space around the canister. The ones containing the values of pressure and relative humidity, which you may or may not want to use to improve your predictions. And finally the CSV containing the temperature values at each point in time for each sensors, aka the targets. The temperature CSV file is obviously only provided for the training sensors, and it is this missing CSV that you are supposed to upload for the test sensors.

In the test data folder, you will find the example_of_submission CSV file. You are to upload a CSV file with this exact format, filled with your predictions.

As you will notice, some data may be missing or have been incorrectly reported, it is up to you to decide how to deal with them, and we expect your decisions to be explained in your code and poster.

📝 Code

Your code should be a notebook named train.ipynb. It should contain everything from the loading of the dataset to the predictions you make. The notebook should be well documented and organized. You can inspire yourself from the notebook given as exercises during the semester. Be aware that part of the grading will rely on the clarity of your notebook. You should motivate the decisions you made directly in the notebook.

Optional. You may also upload different files and notebooks, in this case, you MUST submit a README file explaining clearly what is each file/notebook for and how to reproduce the results you have obtained. When submitting multiple files, upload them together as a zip.

Edit: Please also include the name of your team on AICrowd (either in the README or in the main train.ipynb notebook).

Your final code and poster should be submitted on Moodle by 30/05 23:59.

🚀 Submission

Please submit only as a TEAM -> Simply click on create a team on the top right of the challenge page.

For the challenge, you must submit your predictions here on AICrowd. The format of the predictions MUST be exactly the same as the example_of_submission.csv file.

You are allowed up to 5 daily submissions so manage your time and progress carefully.

To submit, please upload a .csv file.

Make sure you do not upload any other inputs from the other CSV files in your CSV file, it MUST be exactly like the example provided on MOODLE.

WARNING: The predicted values must be float numbers.

For sending us your project, on Moodle, you will be able to upload your code and poster in a separate link. Please always have all sciper and names of your teammates in the README file (or on the notebook directly) and on the poster.

🖊 Evaluation Criteria

For the Challenge:

The top 5 teams will get bonus points for their grade, proportional to their ranking on the leaderboard. The primary score of the challenge is L1-score. The secondary score, used in case of equality is L2-score.
If you are without a submission however, you will lose points accordingly for failing to submit an acceptable classification model, which is what we ask of you in this project.

For the Code:

You will submit on Moodle a zip with just your notebook and your poster. Do not include the data, as you won't be able to upload your submission to Moodle if you do so.
Please make it tidy and add documentation when needed. Readability counts towards the grading. Your code should be able to reproduce (or come very close to) your best AICrowd submission.
Please make sure your script loads the submitted data with a relative path (e.g., load_the_csv('data/train_set.csv')), and not with an absolute path (e.g., load_the_csv('MyDrive/Users/alice/data_folder/train_set.csv') )

For the Poster:

You will present your models, creative ideas and results in the form of a poster that you must submit with your final code by Tuesday 30/05 23:59. Please note that this replaces the form of a report, which you may have often done in previous courses and would not teach you much.
The poster must explain shortly what your code does, and what are the main ideas and implementations you have done to solve the task. Please add the name and SCIPER numbers of your teammates in the README, as well as your AICrowd team name and the ID of your best submission.

Prizes:

Best results, the team winning the leaderboard, will get a prize and will be presenting their approach to the class on 01/06.
Best poster, the team with the clearest and nicest poster will get a prize and will be presenting their approach to the class on 01/06.
Optional: Most original approach.

🔗 Resources

Poster requirements:

Think of it as mid-way between a report (structure) and a collage of slides, where you can have both bullet points and few full sentences of explanation.

Key Components:

Title: Your project title, teammates
Predicting: Briefly explain the motivation for your topic, what you built, and the results. It’s easier to think of this as a quick summary of the inputs and outputs. (3 sentences max)
Data: Exactly where did your data come from and what does it contain? (ie. What are in the rows and columns? Are examples labeled with ground truth?, etc…) (1-2 sentences max)
Features: How many features have you selected and which features are the raw input data vs. features you have derived? Why are they appropriate for this task? (2-3 sentences max)
Models: Exactly which model(s) are you using or are worth showing? Write out the basic math formulas if applicable and clearly note any modifications or additions. If you have more than one model, make subsections for each. (3-4 sentences max)
Results: Make a compact table of results. Each row should be a different model. The columns should be the training accuracy and the test accuracy. List how many samples are in each of the training and testing data sets. Obviously, these sets should be different. (1-2 sentences max + 1 table max)
Discussion: This is where you share your thoughts about your project. (Hopefully you have a few interesting interpretations!) Briefly summarize what happened. Briefly explain whether or not you expected your results. If your results were good, explain why. If they were not good, explain why. (5 sentences max)
Future: If you had more time to work on this or add a creative idea, what would you do first? (2-3 sentences max)
References: Papers you read to create your model or succeed in the project

Source: http://cs229.stanford.edu/projects.html

Examples of posters from ML conferences: https://web.archive.org/web/20201128110223/https://postersession.ai///#

Extra Guidelines:

Methodology:

Your choice of method needs to be well motivated and you need to show evidence that your work has an eﬀect.

The simplest way to do so is to start with a simple model as a baseline, evaluate it, ﬁnd a way to improve it, evaluate again and repeat. Explain the process that leads to your various improvements, evaluate the results carefully and present evidence using plots and tables. When comparing two models, make sure you tuned the hyper-parameters for both models beforehand. Comparing untuned / ill-deﬁned models is not meaningful.

Source: https://github.com/epfml/ML_course/blob/master/projects/Project_Guidelines.pdf

Code:

README(if applicable): The README should contain the full instructions on how to run your code, how to reproduce your obtained results, and give an overview of the architecture of your code (what are the diﬀerent ﬁles and what they contain). You will also need to specify which libraries should be installed.

Modularisation: Avoid copy-pasting of code as much as possible. Define re-usable functions instead.

Documentation: Clear variable and function names are even better than comments. Indent your code properly. Use Python Docstring convention to explain what a function does. Make multiple short functions with explicit names rather than a 200-lines run function. The more readable your code is, the more likely you are to be understood and given points.

Useful resources:

Libraries:

General purpose:
- NumPy: https://numpy.org/
- Pandas: https://pandas.pydata.org/
ML:
- PyTorch: https://pytorch.org/
- PyTorch Lightning: https://www.pytorchlightning.ai/
- scikit-learn: https://scikit-learn.org/stable/
Visualization:
- matplotlib: https://matplotlib.org/
- seaborn: https://seaborn.pydata.org/

Code and collaboration:

Code editors / IDEs:
- JupyterLab (for notebooks)
- Visual Studio Code: https://code.visualstudio.com/
  - Use the Python extension (https://code.visualstudio.com/docs/python/data-science-tutorial)
  - Supports notebooks too
Collaboration:
- Deepnote (real-time collaboration, like Google Docs but for notebooks): https://deepnote.com/
Free GPUs:
- Google Colab: colab.research.google.com/

Experiment logging

If you want to log and visualize experiments, we recommend you to use TensorBoard, which keeps track of the loss and accuracy.

For more information on how to use TensorBoard with PyTorch, check out the documentation.

Google Colab for GPUs

If you are in need of GPUs, you can run your notebook in Colab.

To use a GPU on Colab, make sure to switch to a GPU runtime (Runtime -> Change runtime type -> GPU)

To use GPUs with PyTorch, you will first need to move your model and data to the GPU. See this tutorial for more information: https://pytorch.org/tutorials/beginner/blitz/cifar10_tutorial.html#training-on-gpu