Crab: A Unified Audio-Visual Scene Understanding Model with Explicit Cooperation

Publication
Computer Vision and Pattern Recognition(CVPR) 2025