
Multimodal learning and cross-modal intelligence have emerged as key research directions in the fields of artificial intelligence and big data, aiming to overcome the limitations of single-modality information processing by enabling collaborative modeling and deep understanding of heterogeneous data sources, including text, images, speech, video, and sensor data.
This forum focuses on cutting-edge topics such as multimodal representation learning, cross-modal reasoning and generation, and explores theoretical advances, core technologies, and representative applications of multimodal intelligence in the era of large-scale foundation models.
The goal of this forum is to promote in-depth academic and industrial exchanges and foster collaboration in this rapidly evolving research area.
多模态学习与跨模态智能是当前人工智能与大数据领域的重要研究方向,旨在突破单一模态信息处理的局限,实现文本、图像、语音、视频及传感数据等多源信息的协同建模与深度理解。本分论坛聚焦多模态表示学习、跨模态推理与生成等前沿问题,探讨大模型背景下多模态智能的理论进展、关键技术与典型应用,促进学术界与产业界在该领域的深入交流与合作。