AIcrowd | artist | Participants

Activity

Apr

May

Jun

Jul

Aug

Sep

Oct

Nov

Dec

Jan

Feb

Mar

Apr

Mon

Wed

Fri

Challenge Categories

Challenges Entered

Completed

Global Chess Challenge 2025

AIcrowd

AGI House

Train LLMs to Play Chess

Latest submissions

See All

graded	308499	Sun, 25 Jan 2026 06:25:39
graded	307909	Sun, 18 Jan 2026 03:10:15
graded	307675	Thu, 15 Jan 2026 02:08:24

Completed

Orak Game Agent Challenge 2025

Krafton AI

Build an LLM agent for five real-world games

Latest submissions

See All

failed	309572	Mon, 9 Feb 2026 01:27:41
graded	309571	Sun, 8 Feb 2026 22:37:54
graded	309569	Sun, 8 Feb 2026 21:35:39

Completed

MeltingPot Challenge 2023

AIcrowd

Cooperative AI Foundation

Multi-Agent Dynamics & Mixed-Motive Cooperation

Latest submissions

No submissions made in this challenge.

Participant	Rating

Participant	Rating

AGI Global Chess Challenge 2025
View

Global Chess Challenge 2025

Possible evaluator change: concurrency now 4 (was 1) — Qwen3/Neuron vLLM outputs corrupted (finish_reason=length, missing <uci_move>)

4 months ago

Yep, same issue here. Local runs look fine, but some Qwen3 submissions on the Neuron/vLLM evaluator produce garbled/unreadable output, often hit finish_reason=length, and then miss <uci_move>. If you are comfortable sharing your submission id, it may help the organizers correlate when they have time.

Submissions stuck at "Compiling model for Neuron"

4 months ago

@whoamananand @jyotish

I got Llama 3.1 8B working with the following configuration:

HF_REPO_TAG=main
NEURON_MODEL_TYPE=llama
VLLM_MAX_MODEL_LEN=512
VLLM_MAX_NUM_BATCHED_TOKENS=512
VLLM_MAX_NUM_SEQS=1
VLLM_DTYPE=bfloat16
VLLM_ENFORCE_EAGER=true
VLLM_INFERENCE_MAX_TOKENS=64

Possible evaluator change: concurrency now 4 (was 1) — Qwen3/Neuron vLLM outputs corrupted (finish_reason=length, missing <uci_move>)

4 months ago

Hi AIcrowd team — thanks for running the Global Chess Challenge.

I’m seeing what looks like a recent regression for Qwen3 (neuron.model_type=qwen3) on the Neuron/vLLM backend: at higher evaluator concurrency the model often produces garbled output, hits max tokens, and fails to reliably emit <uci_move>...</uci_move>, which causes immediate resignations and extremely high ACPL.

What changed (evidence from evaluation-state logs)

Looking at the config_snapshot field from GET /submissions/<id>/evaluation-state:

Submission 305873 (Dec 23): config_snapshot.concurrency = 1
- finish_reason=stop (100%), reasonable completion lengths, <uci_move> present reliably
- Overall ACPL ≈ 119
Recent submissions (Dec 24) now show config_snapshot.concurrency = 4
- Example 305972: config_snapshot.concurrency = 4
- finish_reason=length ~100%, completion tokens always hit the cap, <uci_move> rate ~25%
- Outputs often look corrupted/garbled (binary-ish text), leading to resignations and ACPL ≈ 864+
- This submission used the same prompt settings as 305873 (vllm.max_model_len=512, dtype=bfloat16, enforce_eager=true, max_tokens=64).

I also tried explicitly requesting --num-games 1 / --concurrency 1 in aicrowd submit-model, but the resulting evaluation logs still show concurrency=4 (and num_games=4), suggesting these are being overridden by the evaluator (e.g. submission 305974).

Questions

Did the evaluator concurrency change recently from 1 → 4 for this challenge?
If so, is there a recommended configuration (vLLM/Neuron flags or supported submission fields) to keep Qwen3 stable at concurrency=4?
Is this a known Neuron/vLLM issue/regression for Qwen3 under concurrent request load?

I’m happy to provide additional req/resp snippets (showing finish_reason=length + corrupted outputs) if that helps debugging. If this should be handled privately instead of on the forum, let me know and I can share details via DM/support.

Thanks again for the challenge and for any guidance here.

Submissions stuck at "Compiling model for Neuron"

4 months ago

Would it be possible to allow uploading pre-compiled models?

Submissions stuck at "Compiling model for Neuron"

4 months ago

I have the same issue with Llama 3.1 8B. I can confirm that it compiles and evaluates on an AWS trn1.2xlarge instance but gets stuck here. Perhaps it’s due to a config mismatch?

artist has not provided any information yet.

Notebooks

Create Notebook

Filters

Private

Notebooks

Create Notebook

Filters

Private