
Nowadays, the scaling of AI systems across distributed and decentralized environments has become increasingly prevalent for both foundation model training and inference.
As the scale and complexity of these systems continue to grow, inefficient inter-node communication, frequent hardware and software failures,
and high computational and storage demands have become major factors limiting the development and application of AI technologies.
To address these challenges, this session invites submissions and discussions focusing on communication optimization, fault-tolerant design,
model compression, and other approaches that improve the efficiency of AI systems.
如今,在分布式和去中心化环境中规模化地部署 AI 系统用于模型训练与推理已愈发普遍。
随着系统规模和复杂度不断提升,节点间通信效率低下、软硬件故障频发以及高昂的计算与存储开销,
正逐渐成为制约 AI 技术发展与应用的重要因素。
为应对上述挑战,本分论坛将围绕通信优化、容错设计、模型压缩等提升 AI 系统整体效率的关键技术方向,
面向相关研究成果和实践经验进行征稿与深入讨论。