Loading
Warm-up Phase: Completed Competition Phase: Completed Final Round : Completed
25.8k
632
92
6458

 

πŸ“Ή Live Q&A with Challenge Organisers: Join the Townhall on Sept 22!

It’s Monday afternoon in a busy office building. The air conditioning ramps up, lights flick on in meeting rooms, and laptops charge at every desk. Energy use naturally rises and falls throughout the day, but sometimes the grid calls for a change: reduce power for the next hour, or even draw more to stabilise supply. The building adapts as thermostats shift, non-essential equipment pauses, and batteries discharge.

From the outside, nothing looks different, but behind the scenes, energy use has flexed by reducing, shifting, or even increasing load to support the grid. To trade in this flexibility market, energy providers need a reliable way to detect not just when demand response happens but also how much energy was actually flexed.

πŸ’‘ Enter FlexTrack Challenge πŸ’‘

This challenge calls on participants to build machine learning models that can identify demand response events and measure their impact by distinguishing normal energy use from intentional shifts.

πŸ€” Don't know how to start? Click here to make your first submission with ease.

πŸ‘¬ Solving a challenge is more fun with friends. Find your teammate here. 

πŸ’¬ Introduction

Demand response is a strategy used to reduce or shift power consumption in response to the needs of the electrical grid.

Demand response can be in the form of a requested decrease in power consumption to reduce peak demand. Demand response can also be in the form of a requested increase in power consumption to maintain grid stability or reduce voltages in the network.

HVAC systems, solar inverters, and batteries can all exhibit β€˜flexibility’ to provide a demand response by adjusting their usage. For example, HVAC systems can temporarily reduce their load by adjusting thermostats or by cycling the compressor. Load can be shifted to a different time of day, to be made up earlier or later (e.g. using batteries). In some circumstances, load can alternatively be shed with temporary loss of service or production.

In some jurisdictions, buildings can sell their ability to flex load to grid operators or aggregators. To trade in the flexibility market, buyers and sellers must be able to detect when load is being flexed and be able to measure how much load was shifted/shed.

To understand more about what demand flexibility trials look like in building, you can read more about it in this case study.

πŸ’» What is FlexTrack?

When building owners and facility managers activate demand response mode, grid operators and aggregators must be able to determine and verify

  1. if a demand response event was activated, and
  2. how much demand response capacity was dispatched. Refer to Figure 3 for how demand response capacity is determined.

Demand response events can deliver two outcomes: either a decrease or an increase in net power over the duration during which the mode was activated. These outcomes are indicated by Demand Response Flags, where -1 and +1 represent a decrease and increase in net power compared with normal (baseline) energy consumption, respectively. A baseline event is indicated by a Demand Response Flag of zero.

Example of demand response events

Figure 1. A building can adjust its load by decreasing (a), increasing (b) or shifting (c). In load shifting, the Demand Response Flag begins with +1 followed by -1. The decrease in building power consumption is shaded in orange, and the increase is shaded in blue. The difference in building power consumption is the Demand Response Capacity when a demand response event is activated.

This challenge focuses on identifying and estimating demand response activity. Participants are to develop a machine learning model that back-cast (from historic time-series data from buildings) to:

  • determine when demand response events were activated and for how long,
  • determine how much energy was increased or decreased (over the event duration), compared with normal consumption, as a result of activating demand response mode.

Participants will use ground truth time-series data with known observed demand response events (identified in the form of demand response flags) to learn site consumption behaviour both (i) when demand response mode is not active and (ii) when demand response mode is activated.

The baseline refers to the normal or business-as-usual pattern of power consumption. It can act as a reference point to compare the difference in consumption as a result of activating the demand response mode. However, a site’s baseline is not constant and varies based on external factors such as weather, seasonal changes and operational schedules, to name a few. Algorithms developed for learning site behaviour should be scalable and transferable to other sites.

πŸ“… Timeline

Website & Registration Launch: 19th August 2025
Data Release: 19th August 2025
Warm Up Phase Start: 19th August 2025, 23:59 UTC
Competition Phase Start: 15th September 2025, 23:59 UTC
Solution Documentation Deadline: 19th October 2025, 23:59 UTC
Challenge Completion: 19th October 2025, 23:59 UTC
Winner Announcement: 15th November 2025   

πŸ† Prizes

The total prize pool is $20,000 AUD. The cash prizes and travel grants are distributed to four winners.

Leaderboard Prize

πŸ† 1st Place: $5,000 AUD
πŸ₯ˆ 2nd Place: $3,000 AUD
πŸ₯‰ 3rd Place: 2 Γ— $1,000 AUD

Travel Grant

The travel grant is awarded to the four winners for presenting their solutions in person at the 2025 IEEE International Conference on Energy Technologies for Future Grids.

πŸ† 1st Place: $5,000 AUD
πŸ₯ˆ 2nd Place: $3,000 AUD
πŸ₯‰ 3rd Place: 2 Γ— $1,000 AUD

Where a winning Team consists of more than one Participant, the prize will be split equally among the Participants of that Team. Please refer to the Challenge Rules for more information.

🧠 Tasks

The competition has two tasks representing two phases that build upon each other. Both tasks will use the same training datasets. Participants may join at any point in the Challenge. 

Task 1: Classification (Warm-Up PHASE)

The Classification Task, representing the Warm-Up Phase , is a time-series classification task to predict the demand response flag of a building site. Participants will have access to Warm-Up Phase datasets for training and testing their algorithms. The ground truth for the public test set in the Warm-up Phase will be provided in the Competition Phase. 
 

This round gives participants an opportunity to familiarise themselves with the competition tasks and raise issues bordering on the problem statement, source code, dataset quality and documentation, to be addressed by the organisers. The submission results will be displayed on the Public Leaderboard, showing the best scoring of the submission attempts. Rankings in this phase are not taken into account during the Competition Phase.

Please see below, under Evaluation Criteria, for the submission format.

Task 2: Regression (Competition Phase)

The Regression Task, representing the Competition Phase, is a regression prediction task to predict the demand response capacity of a building site. 

This task will use the same training sets as in the Warm-Up Phase, with new data added, that being the demand response capacity. Participants must evaluate their model on the public test sets and the private test set. 

Please see below, under Evaluation Criteria, for the submission format.

The submission results on the public test sets will be displayed on a Public Leaderboard so participants can see how they rank against each other. Submission scores on the private test set will be used for the final ranking and will be kept private and only visible to the challenge organisers.

πŸ“Š Dataset Description

The dataset makes use of digital twins from commercial office buildings at the University of Wollongong in Australia to generate synthetic power consumption from three locations, representing different climate zones. The ground truth dataset includes information that energy aggregators have access to, namely, the dry bulb temperature, global horizontal radiation, building power, demand response flag and demand response capacity. The datasets have a time resolution of 15 min. The algorithm developed by the participants should be designed to be context-aware and transferrable to different climate zones and buildings.

The variables to predict at the same time resolution are:

1. Demand response flag in Warm-up Phase
2. Demand response capacity in Competition Phase
 

Illustration of synthetic dataset generation

Figure 2. The illustration depicts the process of generating the synthetic datasets from the digital twin. A Demand Response (DR) Flag is given as an input to request a demand response event, and the building's power consumption is an output. The room or thermal zone comprises of internal heat gains from sources such as building occupants, IT equipment and lights. The HVAC system changes its power consumption by adjusting its zone temperature setpoint based on the DR Flag.

Site Availability Dry Bulb Temperature (Β°C) Global Horizontal Radiation (W/m2) Building Power (kW) Demand Response Flag (unitless) Demand Response Capacity (kW)
Site_A
Site_B
Site_C
Warm-up Phase and Competition Phase βœ… βœ… βœ… βœ… βœ…
Site_D Warm-up Phase and Competition Phase βœ… βœ… βœ… ❌ Warm-up Phase
βœ… Competition Phase
❌
Site_E Competition Phase βœ… βœ… βœ… ❌ ❌
Site_F Competition Phase βœ… βœ… βœ… ❌ ❌

πŸ’― Evaluation Criteria

Warm-Up Phase: Classification Task 

Ground truth

The ground truth is a 3D array of:

  • Site, as provided in the test set, and
  • Timestamp, as provided in the test set, and
  • Demand_Response_Flag

Submission format

You will provide one file: Predicted demand response flag as a CSV file, consisting of:

  • A 3D array (Site, Timestamp, Demand Response Flag)

Metric

The Classification Task will be evaluated and ranked using the Geometric Mean Score

where:

  • TPR is the True Positive Rate, defined as 
  • TNR is the True Negative Rate, defined as  

with TP as true positive, TN as true negative, FP as false positive and FN as false negative.

The F1-score will also be shown on the public leaderboard as a secondary metric for the Classification Phase. 

Competition Phase: Regression Task

Ground truth

The ground truth is a 3D array of:

  • Site, as provided in the test set, and
  • Timestamp, as provided in the test set, and
  • Demand_Response_Capacity_kW

Explanation of Demand Response Capacity

Figure 3. A building site will only have a demand response capacity when the demand response flag is non-zero. In Case A where the Demand Response (DR) Flag is zero, the demand response capacity is zero, regardless of the difference in the building power against the conterfactual baseline. In Case B where the DR Flag is non-zero, a demand response event has occurred and therefore exhibits a demand response capacity.

Submission format

You will provide one file: Predicted demand response capacity as a CSV file, consisting of:

  • A 4D array (Site, Timestamp, Demand Response Flag, Demand Response Capacity)

Metric

The Regression Task will be evaluated and ranked using normalised mean absolute error (NMAE, normalised by the average of the building power of the test site. It will reflect the prediction error as a percentage relative to the average power consumption of the building site. Building a power consumption value of zero is not factored into the mean. This will be the primary metric.

 

where:

  • : predicted demand response capacity at time i
  • : true (actual) demand response capacity at time i
  • : maximum and minimum values of the building power of the tested site. 

The normalised root mean squared error, NRMSE, will be calculated as a secondary metric and shown on the leaderboard. The RMSE will be normalised by the range (minimum to maximum) of the building power of the test site. A predicted building power consumption value of zero is ignored.

where:

 

where: 

: max and min value of the building power of the tested site.


πŸ“˜ Submission & Participation

A Team can be made up of one or more Participants. Each Participant may only be a member of one Team. Any Participant found to be part of more than one Team will be disqualified, and all associated Teams and Participants in those Teams will also be disqualified.

Participants can upload ten (10) submission entries per day in CSV format. Each submission must adhere strictly to the prescribed format to ensure accurate leaderboard evaluations.

To maintain fairness and the integrity of the results, the following challenge rules will apply:

  1. Code and Model Validation: The top submissions that qualify for final consideration will undergo rigorous scrutiny to validate the code and models, ensuring that only the provided dataset has been used and the code can replicate the scores. Participants must submit their code via a private GitLab/GitHub repository and grant access to the competitor organisers. More information will be provided later.
  2. Dataset Limitations: Participants must exclusively train their models using the dataset provided by the organisers. The use of external datasets is strictly prohibited.
  3. Solution Documentation: Winners must document their methodology thoroughly. This includes a mandatory detailed solution report in the provided format (to be released during the competition), fostering transparency and contributing to the community's collective knowledge. For consistency, a template will be provided later.

πŸ“˜ Solution Documentation

⚠️ Important Requirement

All eligible cash prize winners must submit complete solution documentation and contribute to the summary manuscript at the end of the challenge.
Failure to do so will result in ineligibility for receiving cash prizes.

At the organiser's discretion, honourable mentions may be included for academically interesting approaches, such as those using exceptionally little computing or minimal domain knowledge. Honourable mentions will be invited to contribute a shorter section to the paper and have their names included in-line.

Please follow this format for solution documentation: Overleaf Template or Downloadable template (ZIP)

πŸ›οΈ Conference Information

A special session will be held at the 2025 IEEE International Conference on Energy Technologies for Future Grids. The winners of the competition will be invited to present their work at the conference. While it is optional for participants to submit their work as a conference paper, we encourage all participants to attend. Attendance at the conference will be at your own cost, except for the winners, who will receive the Travel Grant.

Deadline for the conference paper: 30th September 2025, 23:45 AEST (UTC + 10)

Template

Please follow the instructions and use the template provided on the website here.

Submission

Please use this link to submit your conference paper.

As the Organisers of the FlexTrack Challenge are not part of the Conference Committee, the paper submissions are subject to the conference review process, and we cannot guarantee their acceptance.

Conference details


πŸ”‘ Starter Kit

Make your first submission easily using the starter kit


🀝 Acknowledgements and Sponsors

This competition is part of the project, NSW Digital Infrastructure for Energy Flexibility, funded by the NSW Government in association with CSIRO, under the Net Zero Plan Stage 1: 2020-2030.

This project is also funded by the Reliable Affordable Clean Energy for 2030 (RACE for 2030) Cooperative Research Centre.

Logos