Unknown at Unknown
His FlashAttention-2 paper is referenced in comparison to Triton implementations of FlashAttention backward
How media typically covers Tri Dao
Research or work cited
Thinking Machines Lab research demonstrates that LLM inference nondeterminism is not primarily caused by floating-point arithmetic and GPU concurrency as commonly believed, but requires deeper investigation to understand the true root causes.
“Author of the FlashAttention-2 paper referenced in discussion of algorithm implementations”
Mira Murati's lab identifies that the common 'concurrency + floating point' hypothesis does not fully explain LLM inference nondeterminism, suggesting a deeper underlying cause.
“His FlashAttention-2 paper is referenced in comparison to Triton implementations of FlashAttention backward”