Meta just open-sourced PE-AV, their audiovisual encoder trained on ~100M audio-video pairs that maps audio, video, and text into a unified embedding space. This is the backbone behind SAM Audio and their large-scale multimodal retrieval systems. Really interesting to see contrastive training at this scale for joint AV understanding
WWW.MARKTECHPOST.COM
Meta AI Open-Sourced Perception Encoder Audiovisual (PE-AV): The Audiovisual Encoder Powering SAM Audio And Large Scale Multimodal Retrieval
Meta researchers have introduced Perception Encoder Audiovisual, PEAV, as a new family of encoders for joint audio and video understanding. The model learns aligned audio, video, and text representations in a single embedding space using large scale contrastive training on about 100M audio video pairs with text captions. From Perception Encoder to PEAV Perception Encoder, […] The post Meta AI Open-Sourced Perception Encoder Audiovisual (PE-AV): The Audiovisual Encoder Powering SAM Audio An
0 Commentarii 0 Distribuiri 18 Views
Zubnet https://www.zubnet.com