Meta AI Open-Sourced Perception Encoder Audiovisual (PE-AV): The...

einen Link geteilt

2025-12-22 20:33:01 -

Meta just open-sourced PE-AV, their audiovisual encoder trained on ~100M audio-video pairs that maps audio, video, and text into a unified embedding space. This is the backbone behind SAM Audio and their large-scale multimodal retrieval systems. Really interesting to see contrastive training at this scale for joint AV understanding

WWW.MARKTECHPOST.COM

Meta AI Open-Sourced Perception Encoder Audiovisual (PE-AV): The Audiovisual Encoder Powering SAM Audio And Large Scale Multimodal Retrieval

Meta researchers have introduced Perception Encoder Audiovisual, PEAV, as a new family of encoders for joint audio and video understanding. The model learns aligned audio, video, and text representations in a single embedding space using large scale contrastive training on about 100M audio video pairs with text captions. From Perception Encoder to PEAV Perception Encoder, […] The post Meta AI Open-Sourced Perception Encoder Audiovisual (PE-AV): The Audiovisual Encoder Powering SAM Audio An

0 Kommentare 1 Geteilt 23 Ansichten