Contributed to DA3-Streaming for ultra-long video sequence inference with sliding-window streaming.
How media typically covers Kai Deng
Referenced in coverage
ByteDance released Depth Anything 3, a unified transformer-based model that predicts spatially consistent geometry from arbitrary visual inputs for monocular depth estimation, multi-view depth estimation, pose estimation, and 3D Gaussian generation using a single depth-ray representation.
“Contributed to DA3-Streaming for ultra-long video sequence inference with sliding-window streaming.”