Author of "Taalas HC1: Absurdly Fast, Per-User Inference at 17,000 Tokens/Second" in Kaitchup
How this journalist typically writes
Benjamin Marie as author
Taalas HC1 chip achieves 17,000 tokens/second per-user inference on Llama 3.1 8B by hardwiring the model into silicon, delivering 8x faster speeds and 13x cheaper costs than Cerebras.
“Author of "Taalas HC1: Absurdly Fast, Per-User Inference at 17,000 Tokens/Second" in Kaitchup”
DeepSeek's release of R1 with RLVR/GRPO training and MIT licensing catalyzed mainstream adoption of reasoning models and efficient inference techniques like FP4, positioning these as defining trends for 2026.
“Author of "2026 Predictions: Much Faster Inference, Pre-Training with RL, and FP4 Everywhere" in Substack”