
Location
Badges
Activity
Challenge Categories
Challenges Entered
Build an LLM agent for five real-world games
Latest submissions
Detecting Energy Flexibility in Buildings
Latest submissions
See All| graded | 300328 | ||
| graded | 300319 | ||
| graded | 300318 |
Improve RAG with Real-World Benchmarks
Latest submissions
See All| graded | 266335 | ||
| graded | 266193 | ||
| graded | 266168 |
A benchmark for image-based food recognition
Latest submissions
Classify images of snake species from around the world
Latest submissions
See All| graded | 10024 | ||
| graded | 6995 | ||
| failed | 6992 |
5 Problems 15 Days. Can you solve it all?
Latest submissions
See All| graded | 74031 | ||
| graded | 73938 | ||
| graded | 73831 |
Disentanglement: from simulation to real-world
Latest submissions
A new benchmark for Artificial Intelligence (AI) research in Reinforcement Learning
Latest submissions
Latest submissions
See All| graded | 73807 | ||
| graded | 73198 | ||
| graded | 72790 |
Testing RAG Systems with Limited Web Pages
Latest submissions
See All| graded | 266167 | ||
| failed | 266122 | ||
| failed | 265909 |
Evaluating RAG Systems With Mock KGs and APIs
Latest submissions
See All| graded | 266168 | ||
| failed | 266069 | ||
| graded | 266068 |
Enhance RAG systems With Multiple Web Sources & Mock API
Latest submissions
See All| graded | 266335 | ||
| graded | 266193 | ||
| failed | 266166 |
Create Videos with Spatially Aligned Stereo Audio
Latest submissions
| Participant | Rating |
|---|---|
alchemi01
|
235 |
yusuf_dogu
|
0 |
| Participant | Rating |
|---|---|
alchemi01
|
235 |
Flextrack Challenge 2025
๐น Townhall Recording & Q&A with Challenge Organisers | How to use digital twin data to predict demand response capacity
2 months agoI think โback-castโ refers to the fact that these datasets were collected in the past. So now we cast predictions for previous events.
In the test dataset v0.2, there are Site D, Site E, and Site F. My take is that these sites are in the private test set, not only the site F.
Issue related to submission
3 months agoHi Tuan, you can check your submissions here: AIcrowd | Flextrack Challenge 2025 | Submissions
Meta Comprehensive RAG Benchmark: KDD Cup 2-9d1937
Did somebody try deploy three llama3-8b on four T4?
Over 1 year agoThere are 4 T4 cards. But I see that the total GPU memory is < 60GB.
Did somebody try deploy three llama3-8b on four T4?
Over 1 year agoHow large is each llama3-8b model? If it is 16GB each, itโs possible.
Whether the task test phase can link to the Internet
Over 1 year agoThere is the Internet when building the Docker image. But no internet when they run submissions.
The CRAG-Mock-API should move to the Meta Comphrehensive RAG Benchmark starter kit project?
Over 1 year agoThe docker file can be customized. It is the environment to run the code in gitlab.
The CRAG-Mock-API should move to the Meta Comphrehensive RAG Benchmark starter kit project?
Over 1 year agoLook at crag_mock_api/apiwrapper/pycragapi.py to see how to use its functions.
The openai interface cannot be used during evaluation? Why?
Over 1 year agoYou need to pass API key. There is also no internet in the submission.
Added tag with "submission-v" prefix but no evaluation issue created
Over 1 year agoGot this issue again.
When will the submission limit be reset?
Over 1 year agoIn the Submission tab of the main challenge page, we can see how many submissions are left this week.
Meta KDD Cup 24 - CRAG - Retrieval Summarization
Failed to communicate with the grader
Over 1 year agoI had the same issue. Is it fixed now? @aicrowd_team
Couldn't instantiate the backend tokenizer
Over 1 year agoHave you tried installing the transformer package?
About Test Set Leakage in Round 1
Over 1 year ago@aicrowd_team I suggest that the Round 2 test set should be truly private and not share any similarity or distribution with the data in Round 1.
Why am I not eligible to participate?
Over 1 year agoI have tried again and still failed with the same reason
๐ Final Results & Next Steps
About 1 month agoThank you for the update. Iโm curious to see which models are causal.