Engineer

Lucas Pickup

Microsoft Azure

@lupickup

Confirmed that Azure's 80% benchmark score was caused by running an older vLLM version, now fixed.

Mentions

Articles

Outlets

Associated Topics

Companies Covered

Coverage Patterns

How media typically covers Lucas Pickup

Article Types

analysis1

Media Angles

technical1

Narrative Framing

trend1

Associated AI Models

gpt-oss-120b1

Articles

Most recent first

Quoted In

Directly quoted in these articles

Open Weight LLMs Exhibit Inconsistent Performance Across Providers

Simon WillisonanalysiscautiousAug 18, 2025

The same open weight LLM (gpt-oss-120b) achieves dramatically different performance across hosted providers (36.7% to 93.3% on AIME), due to differences in serving frameworks, quantization, and configuration versions.

“Confirmed that Azure's 80% benchmark score was caused by running an older vLLM version, now fixed.”

Open Source AILLMsAI InfrastructureFoundation Models