ICAIBD 2026

Call for Tracks

Track 2: Multimodal Learning and Cross-Modal Intelligence | 多模态学习与跨模态智能

Organizers | 组织者信息

Chair: Zhong Cao, Associate Professor, School of Electronic and Communication Engineering, Guangzhou University, China
曹忠，副教授，广州大学电子与通信工程学院
Co-Chair: Zheng Zhou, Associate Professor, School of Electronic and Communication Engineering, Guangzhou University, China
周正，副教授，广州大学电子与通信工程学院

Abstract | 论坛简介

Multimodal learning and cross-modal intelligence have emerged as key research directions in the fields of artificial intelligence and big data, aiming to overcome the limitations of single-modality information processing by enabling collaborative modeling and deep understanding of heterogeneous data sources, including text, images, speech, video, and sensor data. This forum focuses on cutting-edge topics such as multimodal representation learning, cross-modal reasoning and generation, and explores theoretical advances, core technologies, and representative applications of multimodal intelligence in the era of large-scale foundation models. The goal of this forum is to promote in-depth academic and industrial exchanges and foster collaboration in this rapidly evolving research area.

多模态学习与跨模态智能是当前人工智能与大数据领域的重要研究方向，旨在突破单一模态信息处理的局限，实现文本、图像、语音、视频及传感数据等多源信息的协同建模与深度理解。本分论坛聚焦多模态表示学习、跨模态推理与生成等前沿问题，探讨大模型背景下多模态智能的理论进展、关键技术与典型应用，促进学术界与产业界在该领域的深入交流与合作。

Topics | 主题范围

Multimodal Representation Learning and Fusion Methods | 多模态表示学习与融合方法
Cross-Modal Understanding, Retrieval, and Reasoning | 跨模态理解、检索与推理
Multimodal Generative Models and Foundation Model Technologies | 多模态生成模型与大模型技术
Applications and Challenges of Multimodal Intelligence in Real-World Scenarios | 多模态智能在实际场景中的应用与挑战