Loading
0 Follower
0 Following
artist

Badges

1
0
0

Activity

Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
Jan
Feb
Mar
Mon
Wed
Fri

Challenge Categories

Loading...

Challenges Entered

Latest submissions

See All
graded 308499
graded 307909
graded 307675

Build an LLM agent for five real-world games

Latest submissions

See All
failed 309572
graded 309571
graded 309569

Multi-Agent Dynamics & Mixed-Motive Cooperation

Latest submissions

No submissions made in this challenge.
Participant Rating
Participant Rating
  • AGI Global Chess Challenge 2025
    View

Global Chess Challenge 2025

Possible evaluator change: concurrency now 4 (was 1) — Qwen3/Neuron vLLM outputs corrupted (finish_reason=length, missing <uci_move>)

2 months ago

Yep, same issue here. Local runs look fine, but some Qwen3 submissions on the Neuron/vLLM evaluator produce garbled/unreadable output, often hit finish_reason=length, and then miss <uci_move>. If you are comfortable sharing your submission id, it may help the organizers correlate when they have time.

Submissions stuck at "Compiling model for Neuron"

2 months ago

@whoamananand @jyotish

I got Llama 3.1 8B working with the following configuration:

HF_REPO_TAG=main
NEURON_MODEL_TYPE=llama
VLLM_MAX_MODEL_LEN=512
VLLM_MAX_NUM_BATCHED_TOKENS=512
VLLM_MAX_NUM_SEQS=1
VLLM_DTYPE=bfloat16
VLLM_ENFORCE_EAGER=true
VLLM_INFERENCE_MAX_TOKENS=64

Possible evaluator change: concurrency now 4 (was 1) — Qwen3/Neuron vLLM outputs corrupted (finish_reason=length, missing <uci_move>)

2 months ago

Hi AIcrowd team — thanks for running the Global Chess Challenge.

I’m seeing what looks like a recent regression for Qwen3 (neuron.model_type=qwen3) on the Neuron/vLLM backend: at higher evaluator concurrency the model often produces garbled output, hits max tokens, and fails to reliably emit <uci_move>...</uci_move>, which causes immediate resignations and extremely high ACPL.

What changed (evidence from evaluation-state logs)

Looking at the config_snapshot field from GET /submissions/<id>/evaluation-state:

  • Submission 305873 (Dec 23): config_snapshot.concurrency = 1

    • finish_reason=stop (100%), reasonable completion lengths, <uci_move> present reliably
    • Overall ACPL ≈ 119
  • Recent submissions (Dec 24) now show config_snapshot.concurrency = 4

    • Example 305972: config_snapshot.concurrency = 4
    • finish_reason=length ~100%, completion tokens always hit the cap, <uci_move> rate ~25%
    • Outputs often look corrupted/garbled (binary-ish text), leading to resignations and ACPL ≈ 864+
    • This submission used the same prompt settings as 305873 (vllm.max_model_len=512, dtype=bfloat16, enforce_eager=true, max_tokens=64).

I also tried explicitly requesting --num-games 1 / --concurrency 1 in aicrowd submit-model, but the resulting evaluation logs still show concurrency=4 (and num_games=4), suggesting these are being overridden by the evaluator (e.g. submission 305974).

Questions

  1. Did the evaluator concurrency change recently from 1 → 4 for this challenge?
  2. If so, is there a recommended configuration (vLLM/Neuron flags or supported submission fields) to keep Qwen3 stable at concurrency=4?
  3. Is this a known Neuron/vLLM issue/regression for Qwen3 under concurrent request load?

I’m happy to provide additional req/resp snippets (showing finish_reason=length + corrupted outputs) if that helps debugging. If this should be handled privately instead of on the forum, let me know and I can share details via DM/support.

Thanks again for the challenge and for any guidance here.

Submissions stuck at "Compiling model for Neuron"

2 months ago

Would it be possible to allow uploading pre-compiled models?

Submissions stuck at "Compiling model for Neuron"

3 months ago

I have the same issue with Llama 3.1 8B. I can confirm that it compiles and evaluates on an AWS trn1.2xlarge instance but gets stuck here. Perhaps it’s due to a config mismatch?

artist has not provided any information yet.