Recent years have witnessed transformative changes brought about by artificial intelligence(AI)techniques with billions of parameters for the realization of high accuracy,proposing high demand for the advanced and AI ...Recent years have witnessed transformative changes brought about by artificial intelligence(AI)techniques with billions of parameters for the realization of high accuracy,proposing high demand for the advanced and AI chip to solve these AI tasks efficiently and powerfully.Rapid progress has been made in the field of advanced chips recently,such as the development of photonic computing,the advancement of the quantum processors,the boost of the biomimetic chips,and so on.Designs tactics of the advanced chips can be conducted with elaborated consideration of materials,algorithms,models,architectures,and so on.Though a few reviews present the development of the chips from their unique aspects,reviews in the view of the latest design for advanced and AI chips are few.Here,the newest development is systematically reviewed in the field of advanced chips.First,background and mechanisms are summarized,and subsequently most important considerations for co-design of the software and hardware are illustrated.Next,strategies are summed up to obtain advanced and AI chips with high excellent performance by taking the important information processing steps into consideration,after which the design thought for the advanced chips in the future is proposed.Finally,some perspectives are put forward.展开更多
Robotic computing systems play an important role in enabling intelligent robotic tasks through intelligent algo-rithms and supporting hardware.In recent years,the evolution of robotic algorithms indicates a roadmap fr...Robotic computing systems play an important role in enabling intelligent robotic tasks through intelligent algo-rithms and supporting hardware.In recent years,the evolution of robotic algorithms indicates a roadmap from traditional robotics to hierarchical and end-to-end models.This algorithmic advancement poses a critical challenge in achieving balanced system-wide performance.Therefore,algorithm-hardware co-design has emerged as the primary methodology,which ana-lyzes algorithm behaviors on hardware to identify common computational properties.These properties can motivate algo-rithm optimization to reduce computational complexity and hardware innovation from architecture to circuit for high performance and high energy efficiency.We then reviewed recent works on robotic and embodied AI algorithms and computing hard-ware to demonstrate this algorithm-hardware co-design methodology.In the end,we discuss future research opportunities by answering two questions:(1)how to adapt the computing platforms to the rapid evolution of embodied AI algorithms,and(2)how to transform the potential of emerging hardware innovations into end-to-end inference improvements.展开更多
In this paper,we present a comprehensive overview of artificial intelligence(AI)computing systems for large language models(LLMs)training.The rapid advancement of LLMs in recent years,coupled with the widespread adopt...In this paper,we present a comprehensive overview of artificial intelligence(AI)computing systems for large language models(LLMs)training.The rapid advancement of LLMs in recent years,coupled with the widespread adoption of algorithms and applications such as BERT,ChatGPT,and DeepSeek,has sparked significant interest in this field.We classify LLMs into encoder-only,encoder-decoder,and decoder-only models,and briefly analyze their training and inference processes to emphasize their substantial need for computational resources.These operations depend heavily on Alspecific accelerators like GPUs(graphics processing units),TPUs(tensor processing units),and MLUs(machine learning units).However,as the gap widens between the increasing complexity of LLMs and the current capabilities of accelerators,it becomes essential to adopt heterogeneous computing systems optimized for distributed environments to manage the growing computational and memory requirements of LLMs.We delve into the execution and scheduling of LLM algorithms,underlining the critical role of distributed computing strategies,memory management enhancements,and boosting computational efficiency.This paper clarifies the complex relationship between algorithm design,hardware infrastructure,and software optimization,and provides an in-depth understanding of both the software and hardware infrastructure supporting LLMs training,offering insights into the challenges and potential avenues for future development and deployment.展开更多
基金supported by the Hong Kong Polytechnic University(1-WZ1Y,1-W34U,4-YWER).
文摘Recent years have witnessed transformative changes brought about by artificial intelligence(AI)techniques with billions of parameters for the realization of high accuracy,proposing high demand for the advanced and AI chip to solve these AI tasks efficiently and powerfully.Rapid progress has been made in the field of advanced chips recently,such as the development of photonic computing,the advancement of the quantum processors,the boost of the biomimetic chips,and so on.Designs tactics of the advanced chips can be conducted with elaborated consideration of materials,algorithms,models,architectures,and so on.Though a few reviews present the development of the chips from their unique aspects,reviews in the view of the latest design for advanced and AI chips are few.Here,the newest development is systematically reviewed in the field of advanced chips.First,background and mechanisms are summarized,and subsequently most important considerations for co-design of the software and hardware are illustrated.Next,strategies are summed up to obtain advanced and AI chips with high excellent performance by taking the important information processing steps into consideration,after which the design thought for the advanced chips in the future is proposed.Finally,some perspectives are put forward.
基金supported in part by NSFC under Grant 62422407in part by RGC under Grant 26204424in part by ACCESS–AI Chip Center for Emerging Smart Systems, sponsored by the Inno HK initiative of the Innovation and Technology Commission of the Hong Kong Special Administrative Region Government
文摘Robotic computing systems play an important role in enabling intelligent robotic tasks through intelligent algo-rithms and supporting hardware.In recent years,the evolution of robotic algorithms indicates a roadmap from traditional robotics to hierarchical and end-to-end models.This algorithmic advancement poses a critical challenge in achieving balanced system-wide performance.Therefore,algorithm-hardware co-design has emerged as the primary methodology,which ana-lyzes algorithm behaviors on hardware to identify common computational properties.These properties can motivate algo-rithm optimization to reduce computational complexity and hardware innovation from architecture to circuit for high performance and high energy efficiency.We then reviewed recent works on robotic and embodied AI algorithms and computing hard-ware to demonstrate this algorithm-hardware co-design methodology.In the end,we discuss future research opportunities by answering two questions:(1)how to adapt the computing platforms to the rapid evolution of embodied AI algorithms,and(2)how to transform the potential of emerging hardware innovations into end-to-end inference improvements.
基金supported by the National Natural Science Foundation of China under Grant Nos.61925208,U22A2028,62302483,62222214,62341411,62102399,and 62372436the Chinese Academy of Sciences(CAS)Project for Young Scientists in Basic Research under Grant No.YSBR-029the Youth Innovation Promotion Association of CAS,and Xplore Prize.
文摘In this paper,we present a comprehensive overview of artificial intelligence(AI)computing systems for large language models(LLMs)training.The rapid advancement of LLMs in recent years,coupled with the widespread adoption of algorithms and applications such as BERT,ChatGPT,and DeepSeek,has sparked significant interest in this field.We classify LLMs into encoder-only,encoder-decoder,and decoder-only models,and briefly analyze their training and inference processes to emphasize their substantial need for computational resources.These operations depend heavily on Alspecific accelerators like GPUs(graphics processing units),TPUs(tensor processing units),and MLUs(machine learning units).However,as the gap widens between the increasing complexity of LLMs and the current capabilities of accelerators,it becomes essential to adopt heterogeneous computing systems optimized for distributed environments to manage the growing computational and memory requirements of LLMs.We delve into the execution and scheduling of LLM algorithms,underlining the critical role of distributed computing strategies,memory management enhancements,and boosting computational efficiency.This paper clarifies the complex relationship between algorithm design,hardware infrastructure,and software optimization,and provides an in-depth understanding of both the software and hardware infrastructure supporting LLMs training,offering insights into the challenges and potential avenues for future development and deployment.