
Badges
Activity
Challenge Categories
Challenges Entered
Build an LLM agent for five real-world games
Latest submissions
See All| failed | 309572 | ||
| graded | 309571 | ||
| graded | 309569 |
Multi-Agent Dynamics & Mixed-Motive Cooperation
Latest submissions
| Participant | Rating |
|---|
| Participant | Rating |
|---|
Global Chess Challenge 2025
Submissions stuck at "Compiling model for Neuron"
2 months agoI got Llama 3.1 8B working with the following configuration:
HF_REPO_TAG=main
NEURON_MODEL_TYPE=llama
VLLM_MAX_MODEL_LEN=512
VLLM_MAX_NUM_BATCHED_TOKENS=512
VLLM_MAX_NUM_SEQS=1
VLLM_DTYPE=bfloat16
VLLM_ENFORCE_EAGER=true
VLLM_INFERENCE_MAX_TOKENS=64
Possible evaluator change: concurrency now 4 (was 1) — Qwen3/Neuron vLLM outputs corrupted (finish_reason=length, missing <uci_move>)
2 months agoHi AIcrowd team — thanks for running the Global Chess Challenge.
I’m seeing what looks like a recent regression for Qwen3 (neuron.model_type=qwen3) on the Neuron/vLLM backend: at higher evaluator concurrency the model often produces garbled output, hits max tokens, and fails to reliably emit <uci_move>...</uci_move>, which causes immediate resignations and extremely high ACPL.
What changed (evidence from evaluation-state logs)
Looking at the config_snapshot field from GET /submissions/<id>/evaluation-state:
-
Submission 305873 (Dec 23):
config_snapshot.concurrency = 1-
finish_reason=stop(100%), reasonable completion lengths,<uci_move>present reliably - Overall ACPL ≈ 119
-
-
Recent submissions (Dec 24) now show
config_snapshot.concurrency = 4- Example 305972:
config_snapshot.concurrency = 4 -
finish_reason=length~100%, completion tokens always hit the cap,<uci_move>rate ~25% - Outputs often look corrupted/garbled (binary-ish text), leading to resignations and ACPL ≈ 864+
- This submission used the same prompt settings as 305873 (
vllm.max_model_len=512,dtype=bfloat16,enforce_eager=true,max_tokens=64).
- Example 305972:
I also tried explicitly requesting --num-games 1 / --concurrency 1 in aicrowd submit-model, but the resulting evaluation logs still show concurrency=4 (and num_games=4), suggesting these are being overridden by the evaluator (e.g. submission 305974).
Questions
- Did the evaluator concurrency change recently from 1 → 4 for this challenge?
- If so, is there a recommended configuration (vLLM/Neuron flags or supported submission fields) to keep Qwen3 stable at
concurrency=4? - Is this a known Neuron/vLLM issue/regression for Qwen3 under concurrent request load?
I’m happy to provide additional req/resp snippets (showing finish_reason=length + corrupted outputs) if that helps debugging. If this should be handled privately instead of on the forum, let me know and I can share details via DM/support.
Thanks again for the challenge and for any guidance here.
Submissions stuck at "Compiling model for Neuron"
2 months agoWould it be possible to allow uploading pre-compiled models?
Submissions stuck at "Compiling model for Neuron"
3 months agoI have the same issue with Llama 3.1 8B. I can confirm that it compiles and evaluates on an AWS trn1.2xlarge instance but gets stuck here. Perhaps it’s due to a config mismatch?
Possible evaluator change: concurrency now 4 (was 1) — Qwen3/Neuron vLLM outputs corrupted (finish_reason=length, missing <uci_move>)
2 months agoYep, same issue here. Local runs look fine, but some Qwen3 submissions on the Neuron/vLLM evaluator produce garbled/unreadable output, often hit finish_reason=length, and then miss <uci_move>. If you are comfortable sharing your submission id, it may help the organizers correlate when they have time.