Loading
Feedback
2019: Completed 2020: Completed

EPFL Machine Learning Higgs

Spot the Boson

22.7k
1094
305
7635

See detailed instruction see also the Project 1 PDF description available on the ML course web site.

File descriptions

train.csv - Training set of 250000 events. The file starts with the ID column, then the label column (the y you have to predict), and finally 30 feature columns.
test.csv - The test set of around 568238 events - Everything as above, except the label is missing.
sample-submission.csv - a sample submission file in the correct format. The sample submission always predicts -1, that is ‘background’.

Zip file containing all 3 above files can be downloaded from the resource section.

For detailed information on the semantics of the features, labels, and weights, see the technical documentation from the LAL website on the task. Note that here for the EPFL course, we use a simpler evaluation metric instead (classification error).

Some details to get started:

  • all variables are floating point, except PRI_jet_num which is integer
  • variables prefixed with PRI (for PRImitives) are “raw” quantities about the bunch collision as measured by the detector.
  • variables prefixed with DER (for DERived) are quantities computed from the primitive features, which were selected by the physicists of ATLAS.
  • it can happen that for some entries some variables are meaningless or cannot be computed; in this case, their value is −999.0, which is outside the normal range of all variables.

Participants

Leaderboard

01 no_you_hold_my_beer 0.990
02 hold_my_beer 0.980
03
  149D89
0.849
04
  Nasonex
0.846
05 titanophallus 0.843