MultiZoo and MultiBench: A Toolkit for Multimodal Deep Learning
Authors: Paul Pu Liang, Yiwei Lyu, Xiang Fan, Arav Agarwal, Yun Cheng, Louis-Philippe Morency, Ruslan Salakhutdinov; Published in 2023, Volume 24(234), Pages 1-7.
Abstract
The process of learning multimodal representations requires the integration of information from various sources of heterogeneous data. To expedite progress in understudied modalities and tasks, while ensuring real-world robustness, we have developed MultiZoo, a publicly accessible toolkit that offers standardized implementations of over 20 core multimodal algorithms. Additionally, we have introduced MultiBench, a large-scale benchmark that encompasses 15 datasets, 10 modalities, 20 prediction tasks, and 6 research areas. These resources provide an automated end-to-end machine learning pipeline that simplifies and standardizes data loading, experimental setup, and model evaluation. To facilitate comprehensive evaluation, we have also devised a methodology to assess generalization, time and space complexity, and modality robustness. MultiBench paves the way for a deeper understanding of the capabilities and limitations of multimodal models, while ensuring ease of use, accessibility, and reproducibility. Both MultiZoo and MultiBench are publicly available, regularly updated, and open to community contributions.
[abs]
[code]