## 🕵️ Introduction

We have said it earlier and we say it again - 'With Great Power Comes Great Responsibility' And yes we do have the power to do good for the world. Let us be responsible and put that power to use.

This time, we pick up our weapons against `cancer`.

Given information of different `risk` factors in a woman, `predict` as best as possible, the `presence` or `absence` of `cervical cancer` in the woman.

Understand with code!

## 💾 Dataset

This dataset contains indicators and risk factors for predicting whether a woman will get `cervical cancer`. There are total of `15` attributes out of which first `14` features include demographic data such as `age`, `lifestyle`, and `medical history`. The last attribute called `Biopsy` is target variable and it's value is `0` for `Healthy` and `1` for `Cancer`. The first `14` attributes are as:

• Age [ in years ]
• Number of sexual partners
• First sexual intercourse [ age in years ]
• Number of pregnancies
• Smoking [ yes or no ]
• Smoking [ in years ]
• Hormonal contraceptives [ yes or no ]
• Hormonal contraceptives [ in years ]
• Intrauterine device [ yes or no (IUD) ]
• Number of years with an intrauterine device (IUD)
• Has patient ever had a sexually transmitted disease (STD) [ yes or no ]
• Number of STD diagnoses
• Time since first STD diagnosis
• Time since last STD diagnosis
• The biopsy results - Target outcome.[ `0` for `Healthy` or `1` for `Cancer` ]

## 📁 Files

Following files are available in the `resources` section:

• `train.csv` - (`686` samples) This csv file contains the attributes describing the risk factors along with its biopsy results.
• `test.csv` - (`172` samples) File that will be used for actual evaluation for the leaderboard score but does not have its biopsy result.

## 🚀 Submission

• Prepare a CSV containing header as `Biopsy` and predicted value as digit `0` or `1` with name as `submission.csv`.
• Sample submission format available at `sample_submission.csv`.

## 🖊 Evaluation Criteria

During evaluation F1 score and Log Loss will be used to test the efficiency of the model where,

$F1 = 2 * \frac{precision*recall}{precision+recall}$

## 📚 References

• Dua, D. and Graff, C. (2019). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science.

• Source: Kelwin Fernandes (kafc at inesctec dot pt) - INESC TEC & FEUP, Porto, Portugal. Jaime S. Cardoso - INESC TEC & FEUP, Porto, Portugal. Jessica Fernandes - Universidad Central de Venezuela, Caracas, Venezuela.

• Image source