Abstract: We present TokenFlow, a novel unified image tokenizer that bridges the long-standing gap between multimodal understanding and generation. Prior research attempt to employ a single ...
data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language 2022 Audio Continuous WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing 2021 ...