Round 1: Completed

OpenFood Nutrition Table Challenge

Hidden

Extracting structured nutritional data from food packaging images.

crowdAI

6076

In the OpenFood project, labels are being scanned in Swiss supermarkets for all foods available for sale. A database of images of food packages has been prepared, and now nutritional information needs to be extracted from the images into a CSV file.

Nutritional data is presented in tables on food packaging, and nutritional tables contain at a minimum:

Nutrient
Units per 100g

Optionally additional columns may also be presented:

Units per serving size
Percentage of recommended daily intake
Other optional columns

The nutritional data needs to be extracted into a CSV file which will then be loaded into the OpenFood database.

Nutrients Master List

Nutrients will be one of the following items in the list below, and using the standard unit of measure as indicated. The nutritional information may be in upper, lower or sentence case, and presented in:

German (de)
French (fr)
Italian (it)
English (en)

Note that information may be presented in multiple languages. The challenge submission should refer to the nutrients via an integer field as indicated in the nutrient_id field in the list below.

nutrient_id	Nutrient by language	Unit
1	de=>Energie en=>energy fr=>énergie it=>energia	kJ
2	de=>Energie (kCal) en=>energy (kCal) fr=>énergie (kCal) it=>energia (kCal)	kCal
3	de=>Eiweiss en=>protein fr=>protéines it=>proteine	g
4	de=>Fett en=>fat fr=>graisses it=>grassi	g
5	de=>Kohlenhydrate en=>carbohydrates fr=>glucides it=>carboidrati	g
6	de=>Zucker en=>sugars fr=>sucres it=>zuccheri	g
7	de=>Salz en=>salt fr=>sel it=>sale	g
8	de=>Ballaststoffe en=>fibre fr=>fibres alimentaires it=>fibre	g
9	de=>Gesättigte Fettsäuren en=>saturated fat fr=>graisses saturées it=>grassi saturi	g
10	de=>Vitamin C en=>Vitamin C fr=>Vitamine C it=>Vitamina C	mg
11	de=>Vitamin B2 (Riboflavin) en=>Vitamin B2 (Riboflavin) fr=>Vitamine B2 (Riboflavine) it=>Vitamina B2 (Riboflavin)	mg
12	de=>Natrium en=>Sodium fr=>Sodium it=>Sodio	g
13	de=>Selen en=>Selenium fr=>Selenium it=>Selenio	µg
14	de=>Vitamin E en=>Vitamin E fr=>Vitamine E it=>Vitamina E	mg
15	de=>Kalzium en=>Calcium fr=>Calcium it=>calcio	mg
16	de=>Magnesium en=>Magnesium fr=>Magnesium it=>Magnesio	mg

Processing

A set of images will be provided, with a corresponding CSV file containing all product information which has been manually extracted. The submitted code will process the images and write the values into a single CSV file called product_nutrients.csv.

Participants will submit both the results and the model. A set of additional images will be processed against the model for final scoring of the submission.

Submission

Submissions will be run by crowdAI against a Docker container. Details may be found in the Resources section.
The /project folder will hold the submission scripts, the install and run scripts (see below)
Participants may optionally submit an installation script an install.sh. When executed in the container this will download and install any necessary code and libraries.
If an external API is used it must be indicated on the submission page. A list of APIs may be found in the Resources section.
Participants must include a script called run.sh which is executed against a folder of images in the /project folder. It will produce a single product_nutrients.csv file as output, also in the /project folder.
Participant code must search for each nutrient type and record the value in the CSV file. If a nutrient is not specified on the package, the field is completed with ‘-1.0’

Examples

Listed below are examples of Nutritional data raw images and the extracted raw data (the data should be submitted as a single CSV file for all products, but has been presented as a table for illustration purposes).

Product 7207

This product is an example of a label where a single column of data only is available. In this case the data falls into the per_hundred column, as per the example below.

product_id	nutrition_id	per_hundred	per_portion	percent
7207	1	1180.0	-1.0	-1.0
7207	2	282.0	-1.0	-1.0
7207	3	14.0	-1.0	-1.0
7207	4	25.0	-1.0	-1.0
7207	5	1.0	-1.0	-1.0
7207	7	1.7	-1.0	-1.0
7207	6	-1.0	-1.0	-1.0
7207	8	-1.0	-1.0	-1.0
7207	9	-1.0	-1.0	-1.0
7207	10	-1.0	-1.0	-1.0
7207	11	-1.0	-1.0	-1.0
7207	12	-1.0	-1.0	-1.0
7207	13	-1.0	-1.0	-1.0
7207	14	-1.0	-1.0	-1.0
7207	15	-1.0	-1.0	-1.0
7207	16	-1.0	-1.0	-1.0

Product 7276

This product has a serving site (45g) as well as a percentage. These are columns 3 and 4 in the example below. Sometimes there is an additional column of data, which may be ignored for the purposes of this challenge.

product_id	nutrition_id	per_hundred	per_portion	percent
7276	1	1720.0	774.0	9.0
7276	2	410.0	185.0	9.0
7276	3	7.2	3.24	6.0
7276	4	19.8	8.91	12.0
7276	5	49.2	22.14	8.0
7276	6	14.7	6.62	7.0
7276	7	1.05	0.47	7.0
7276	9	10.8	4.86	24.0
7276	8	-1.0	-1.0	-1.0
7276	10	-1.0	-1.0	-1.0
7276	11	-1.0	-1.0	-1.0
7276	12	-1.0	-1.0	-1.0
7276	13	-1.0	-1.0	-1.0
7276	14	-1.0	-1.0	-1.0
7276	15	-1.0	-1.0	-1.0
7276	16	-1.0	-1.0	-1.0

Evaluation criteria

% of correct fields per product averaged across all products.
A submission must have a grade of 80% or above to be eligible for prizes.

Resources

Docker Container

The Jupyter Notebook Scientific Python Stack Docker container will be the development and evaluation environment for this challenge. The challenge dataset is comprised of source images source_images.tar and the pre-populated answer file product_nutrients.csv.

source_images.tar

The images archive will contain 500 images with the product id embedded in the name, eg:

 .
 image-2324.jpg
 image-2325.jpg
 image-2326.jpg
 image-2327.jpg
 .

Installation of software

Participants may use any FOSS (free and open source) resources to produce the solution, and if used must be installed using a bash shell script called install.sh

For example

sh #!/bin/sh apt-get update -y apt-get install curl -y

External APIs

The following APIs may be used in this project. If they are used, the solution must work with the free tier.

Google Vision IBM Watson Microsoft Vision API clarifai

If you wish to use another API please contact us.

Prizes

The author of the most highly ranked submission above 80% will be invited to the crowdAI winner’s symposium at EPFL in Switzerland on January 30/31, 2017. This symposium is part of the Applied Machine Learning Days to which the winner will have full access. The educational award is given to the participant with the either the most insightful submission posts, or the best tutorial - the recipient of this award will also be invited to the symposium (the crowdAI team will pick the recipient of this award). Expenses for travel and accommodation are covered by crowdAI.

In addition, there is a CHF 2,000 (~ USD 2,000) prize on the most highly ranked submission above 80%.