Round 1: Completed

Novartis Pharma Data Science Hackathon



Introduction to the Problem

TafMek has a new brand director in Spain. Since joining the team she has spent a considerable amount of time in understanding the market and meeting key customers and team members.

  • She is preparing for a 2020 strategy think-tank meet next week with the sales and marketing team where she wants to introduce a data driven approach for sales and marketing planning. From her past experience she knows that presenting data driven insights will help in driving the discussions well.

First she sits down to jot down the key points about the brand & market based on her understanding in past 2 months

_ _

Brand and market:

- - Taf Mek is seeing strong growth and the franchise relies on Taf Mek’s performance for overall growth and to fund new launches - However new competition in targeted therapy segment (like Taf Mek) is coming in with strong marketing and sales investments - There is strong brand equity with Drs but the penetration of Immunotherapy drugs are a key threat for the brand growth

Execution & implementation:

- - Sales and marketing initiatives have been implemented per plan in 2019. New channels have also been implemented (Rep triggered e mails etc.) and the team feels they are having a positive impact - Doctor list revised every year. - Calls frequency higher than planned

Outlook for next year

- - Aggressive sales targets for next year. Team is upbeat. Looking forward to next year. - Budgets as always under stretch - Lots of new ideas in pipeline but need to prioritize to manage budgets.

Key questions in mind for data driven insights

- - How many different market archetypes are there? - What is the Return on Investment (RoI) for the different promotional channels being used? - Do they differ by different areas/accounts/customers? - Are we adequately spending what we have? Is there a way to maximize the return?

How can we empower our team members to leverage data easily through a tool for more informed decision making

Tasks and Evaluation Metric

# Task Description Evaluation Weightage Timing Note
1 Download the data Exponentially decaying score based on the timestamp of the first submission made by the team Speed. First one gets 1, then decay in increments of 0.1 10% ASAP  
2 Primary Key Identify the primary keys in the data % of files correctly identified 10% Any Time  
3 Market Arcehtypes Identify the diffetent market archetypes using clustering Silhouette score of the market archetypes using the descriptor variables, transformed to 0 & 1 30% Any Time Use the template to give the final hospital ID, market archetype and the descriptor variables in the martket archetypes
4 Predict Sales Predict Sales for 2019 Jan - 2019 March 1-MAPE, floored at 0 35% Any Time  
5 Optimization Using a given budget for 6 months (Euro 830k), how to generate maximum sales in Euros. Heuristic evaluation, post final submission with suggested strategies 15% At the end of the hackathon. Optional: Please provide a write up on your strategy to achieve the recommended optimization number No promotion can change by more than 25%
6 Visualization Provide easy to understand but compelling visualizations of the data and predictions Jury to select the winning entries 0% Submission at 16:30 on September 19th. This is a separate competition to the prediction challenge.

Description of the Data

_Need to update the process of the data download

Sales – The file contains Hospital/Account level sales(units) data for the brands Tafinlar & Mekinist(TafMek) by SKU for each month starting Feb’17. You might encounter negative values for a few records; these are the returns coming from the hospitals/accounts. You will need to combine and normalize the brand sales for both the brands combined using the normal dosing regimen (for a month for a patient) information. Please see the dosage information for Taf-Mek and convert the sales Taf and Mek units into 30 day equivalent for 1 patient, in suggested dosage.


Promotions – The different promotional channels, which the TafMek brand team uses to reach to the customers. The promotional channels used are –

  • F2F (Face to Face) Calls – Sales/Medical Representative(Reps) detailing the Health Care Professionals (HCPs) about the brand. In the data provided, you will have calls data for a few more brands as well. Filter on the Detailed Products column to get only TafMek calls
  • Emails – there are 2 broad categories of emails:
    • Rep Triggered Emails (RTE) – Emails sent by the Reps to the HCPs
    • Mass Emails – Email campaigns are carried out during international conferences and major events. They are sent together to a group of HCPs (usually by 3rd party agencies on behalf of Novartis) rather than being sent by a specific Rep to a specific HCP (RTE)
  • Events – Events are congresses or meetings sponsored by Novartis where one or more than one HCPs are invited. Events are categorized as Local, National or International (EU or US) events

Targeting – Targeting files contains various segmentation information utilized by the brand team for prioritizing promotional efforts. In the files you will find,

  • Attitudinal Segmentation – Strategic HCP segments identified by the Brand team
  • Hospital Potential – Segmentation of Hospitals/Accounts based on potential
  • Potential by Account – Contains information on Hospital's Population, Incidence and Prevalence rates for Melanoma
  • National Targeting – It is a questionnaire filled every month by a small set of HCPs/Reps with questions like total patients on TafMek, new patients on TafMek the HCP has
  • Tiering – HCP level call plans for each Rep and HCP Tiering for 6 different call plan cycles

Others – These are set of files containing varied information. You will need to utilize this information at different phases from stitching together a final dataset to refining your analysis and final outputs.

  • Centers with Diagnostic Tools contains the names of Hospitals/Accounts, which have the facility for carrying out a special test to detect BRAF gene mutation(primary requirement to prescribe TafMek to patient)
  • Value Transfer NVS Congress and Competitors contains the list of events which were carried out by Novartis and its competitors
  • Centres en COMBI contains the list of HCPs along-with the Hospital/Account name and no. of patients that they have enrolled in the COMBI AD clinical trial
  • Event to Event Type mapping file should be used to identify Local, National etc. events
  • Hospital Name to CNH ID mapping file contains 2 versions of the mapping for Hospital Names to IDs. You will need this to integrate all the datasets to arrive at final Hospital/Account level analysis dataset

Cost Assumptions file contains the per unit cost of promotions and the average monthly cost of TafMek for 1 patient

Submission Instructions

NOTE : You will first need to agree to the rules of the competition by clicking on the Participate button on the challenge page.

  • Download the evaluation template from : here
  • Unzip the file to get 4 different .csv files.
  • Update the values in these CSV files based on your approach
  • When you are ready to submit, zip the CSV files into a single ZIP file such that all the csv files are at the root level of the zip file
  • Upload the Zip file here : https://www.aicrowd.com/challenges/novartis-pharma-data-science-hackathon/submissions/new