OpenFood Nutrition Table Challenge
HiddenExtracting structured nutritional data from food packaging images.
In the OpenFood project, labels are being scanned in Swiss supermarkets for all foods available for sale. A database of images of food packages has been prepared, and now nutritional information needs to be extracted from the images into a CSV file.
Nutritional data is presented in tables on food packaging, and nutritional tables contain at a minimum:
- Nutrient
- Units per 100g
Optionally additional columns may also be presented:
- Units per serving size
- Percentage of recommended daily intake
- Other optional columns
The nutritional data needs to be extracted into a CSV file which will then be loaded into the OpenFood database.
Nutrients Master List
Nutrients will be one of the following items in the list below, and using the standard unit of measure as indicated. The nutritional information may be in upper, lower or sentence case, and presented in:
- German (de)
- French (fr)
- Italian (it)
- English (en)
Note that information may be presented in multiple languages. The challenge submission should refer to the nutrients via an integer field as indicated in the nutrient_id field in the list below.
nutrient_id | Nutrient by language | Unit |
---|---|---|
1 | de=>Energie en=>energy fr=>énergie it=>energia | kJ |
2 | de=>Energie (kCal) en=>energy (kCal) fr=>énergie (kCal) it=>energia (kCal) | kCal |
3 | de=>Eiweiss en=>protein fr=>protéines it=>proteine | g |
4 | de=>Fett en=>fat fr=>graisses it=>grassi | g |
5 | de=>Kohlenhydrate en=>carbohydrates fr=>glucides it=>carboidrati | g |
6 | de=>Zucker en=>sugars fr=>sucres it=>zuccheri | g |
7 | de=>Salz en=>salt fr=>sel it=>sale | g |
8 | de=>Ballaststoffe en=>fibre fr=>fibres alimentaires it=>fibre | g |
9 | de=>Gesättigte Fettsäuren en=>saturated fat fr=>graisses saturées it=>grassi saturi | g |
10 | de=>Vitamin C en=>Vitamin C fr=>Vitamine C it=>Vitamina C | mg |
11 | de=>Vitamin B2 (Riboflavin) en=>Vitamin B2 (Riboflavin) fr=>Vitamine B2 (Riboflavine) it=>Vitamina B2 (Riboflavin) | mg |
12 | de=>Natrium en=>Sodium fr=>Sodium it=>Sodio | g |
13 | de=>Selen en=>Selenium fr=>Selenium it=>Selenio | µg |
14 | de=>Vitamin E en=>Vitamin E fr=>Vitamine E it=>Vitamina E | mg |
15 | de=>Kalzium en=>Calcium fr=>Calcium it=>calcio | mg |
16 | de=>Magnesium en=>Magnesium fr=>Magnesium it=>Magnesio | mg |
Processing
A set of images will be provided, with a corresponding CSV file containing all product information which has been manually extracted. The submitted code will process the images and write the values into a single CSV file called product_nutrients.csv.
Participants will submit both the results and the model. A set of additional images will be processed against the model for final scoring of the submission.
Submission
-
Submissions will be run by crowdAI against a Docker container. Details may be found in the Resources section.
-
The /project folder will hold the submission scripts, the install and run scripts (see below)
-
Participants may optionally submit an installation script an
install.sh
. When executed in the container this will download and install any necessary code and libraries. -
If an external API is used it must be indicated on the submission page. A list of APIs may be found in the Resources section.
-
Participants must include a script called
run.sh
which is executed against a folder of images in the /project folder. It will produce a single product_nutrients.csv file as output, also in the /project folder. -
Participant code must search for each nutrient type and record the value in the CSV file. If a nutrient is not specified on the package, the field is completed with ‘-1.0’
Examples
Listed below are examples of Nutritional data raw images and the extracted raw data (the data should be submitted as a single CSV file for all products, but has been presented as a table for illustration purposes).
Product 7207
This product is an example of a label where a single column of data only is available. In this case the data falls into the per_hundred column, as per the example below.
product_id | nutrition_id | per_hundred | per_portion | percent |
---|---|---|---|---|
7207 | 1 | 1180.0 | -1.0 | -1.0 |
7207 | 2 | 282.0 | -1.0 | -1.0 |
7207 | 3 | 14.0 | -1.0 | -1.0 |
7207 | 4 | 25.0 | -1.0 | -1.0 |
7207 | 5 | 1.0 | -1.0 | -1.0 |
7207 | 7 | 1.7 | -1.0 | -1.0 |
7207 | 6 | -1.0 | -1.0 | -1.0 |
7207 | 8 | -1.0 | -1.0 | -1.0 |
7207 | 9 | -1.0 | -1.0 | -1.0 |
7207 | 10 | -1.0 | -1.0 | -1.0 |
7207 | 11 | -1.0 | -1.0 | -1.0 |
7207 | 12 | -1.0 | -1.0 | -1.0 |
7207 | 13 | -1.0 | -1.0 | -1.0 |
7207 | 14 | -1.0 | -1.0 | -1.0 |
7207 | 15 | -1.0 | -1.0 | -1.0 |
7207 | 16 | -1.0 | -1.0 | -1.0 |
Product 7276
This product has a serving site (45g) as well as a percentage. These are columns 3 and 4 in the example below. Sometimes there is an additional column of data, which may be ignored for the purposes of this challenge.
product_id | nutrition_id | per_hundred | per_portion | percent |
---|---|---|---|---|
7276 | 1 | 1720.0 | 774.0 | 9.0 |
7276 | 2 | 410.0 | 185.0 | 9.0 |
7276 | 3 | 7.2 | 3.24 | 6.0 |
7276 | 4 | 19.8 | 8.91 | 12.0 |
7276 | 5 | 49.2 | 22.14 | 8.0 |
7276 | 6 | 14.7 | 6.62 | 7.0 |
7276 | 7 | 1.05 | 0.47 | 7.0 |
7276 | 9 | 10.8 | 4.86 | 24.0 |
7276 | 8 | -1.0 | -1.0 | -1.0 |
7276 | 10 | -1.0 | -1.0 | -1.0 |
7276 | 11 | -1.0 | -1.0 | -1.0 |
7276 | 12 | -1.0 | -1.0 | -1.0 |
7276 | 13 | -1.0 | -1.0 | -1.0 |
7276 | 14 | -1.0 | -1.0 | -1.0 |
7276 | 15 | -1.0 | -1.0 | -1.0 |
7276 | 16 | -1.0 | -1.0 | -1.0 |
Evaluation criteria
- % of correct fields per product averaged across all products.
- A submission must have a grade of 80% or above to be eligible for prizes.
Resources
Docker Container
The Jupyter Notebook Scientific Python Stack Docker container will be the development and evaluation environment for this challenge. The challenge dataset is comprised of source images source_images.tar and the pre-populated answer file product_nutrients.csv.
source_images.tar
The images archive will contain 500 images with the product id embedded in the name, eg:
.
image-2324.jpg
image-2325.jpg
image-2326.jpg
image-2327.jpg
.
Installation of software
Participants may use any FOSS (free and open source) resources to produce the solution, and if used must be installed using a bash shell script called install.sh
For example
sh
#!/bin/sh
apt-get update -y
apt-get install curl -y
External APIs
The following APIs may be used in this project. If they are used, the solution must work with the free tier.
Google Vision IBM Watson Microsoft Vision API clarifai
If you wish to use another API please contact us.
Prizes
The author of the most highly ranked submission above 80% will be invited to the crowdAI winner’s symposium at EPFL in Switzerland on January 30/31, 2017. This symposium is part of the Applied Machine Learning Days to which the winner will have full access. The educational award is given to the participant with the either the most insightful submission posts, or the best tutorial - the recipient of this award will also be invited to the symposium (the crowdAI team will pick the recipient of this award). Expenses for travel and accommodation are covered by crowdAI.
In addition, there is a CHF 2,000 (~ USD 2,000) prize on the most highly ranked submission above 80%.
Datasets License
Participants







