Loading
Feedback
2019: Completed 2020: 82 days left

EPFL Machine Learning Higgs

Spot the Boson

13.2k
627
179
4422

See detailed instruction see also the Project 1 PDF description available on the ML course web site.

File descriptions

train.csv - Training set of 250000 events. The file starts with the ID column, then the label column (the y you have to predict), and finally 30 feature columns.
test.csv - The test set of around 568238 events - Everything as above, except the label is missing.
sample-submission.csv - a sample submission file in the correct format. The sample submission always predicts -1, that is โ€˜backgroundโ€™.

Zip file containing all 3 above files can be downloaded from the resource section.

For detailed information on the semantics of the features, labels, and weights, see the technical documentation from the LAL website on the task. Note that here for the EPFL course, we use a simpler evaluation metric instead (classification error).

Some details to get started:

  • all variables are floating point, except PRI_jet_num which is integer
  • variables prefixed with PRI (for PRImitives) are โ€œrawโ€ quantities about the bunch collision as measured by the detector.
  • variables prefixed with DER (for DERived) are quantities computed from the primitive features, which were selected by the physicists of ATLAS.
  • it can happen that for some entries some variables are meaningless or cannot be computed; in this case, their value is โˆ’999.0, which is outside the normal range of all variables.

Participants

Leaderboard

01
0.342
02
  EPFL
0.554
03
  MLakes
0.658
03 victorkras2008 0.658
03 akshatcx 0.658