In the 9 December 2024 issue of Nature[1],a team of Google engineers reported breakthrough results using“Willow”,their lat-est quantum computing chip(Fig.1).By meeting a milestone“below threshold”reduction in the ...In the 9 December 2024 issue of Nature[1],a team of Google engineers reported breakthrough results using“Willow”,their lat-est quantum computing chip(Fig.1).By meeting a milestone“below threshold”reduction in the rate of errors that plague super-conducting circuit-based quantum computing systems(Fig.2),the work moves the field another step towards its promised super-charged applications,albeit likely still many years away.Areas expected to benefit from quantum computing include,among others,drug discovery,materials science,finance,cybersecurity,and machine learning.展开更多
Robotic computing systems play an important role in enabling intelligent robotic tasks through intelligent algo-rithms and supporting hardware.In recent years,the evolution of robotic algorithms indicates a roadmap fr...Robotic computing systems play an important role in enabling intelligent robotic tasks through intelligent algo-rithms and supporting hardware.In recent years,the evolution of robotic algorithms indicates a roadmap from traditional robotics to hierarchical and end-to-end models.This algorithmic advancement poses a critical challenge in achieving balanced system-wide performance.Therefore,algorithm-hardware co-design has emerged as the primary methodology,which ana-lyzes algorithm behaviors on hardware to identify common computational properties.These properties can motivate algo-rithm optimization to reduce computational complexity and hardware innovation from architecture to circuit for high performance and high energy efficiency.We then reviewed recent works on robotic and embodied AI algorithms and computing hard-ware to demonstrate this algorithm-hardware co-design methodology.In the end,we discuss future research opportunities by answering two questions:(1)how to adapt the computing platforms to the rapid evolution of embodied AI algorithms,and(2)how to transform the potential of emerging hardware innovations into end-to-end inference improvements.展开更多
This paper describes a 2D/3D vision chip with integrated sensing and processing capabilities.The 2D/3D vision chip architecture includes a 2D/3D image sensor and a programmable visual processor.In this architecture,we...This paper describes a 2D/3D vision chip with integrated sensing and processing capabilities.The 2D/3D vision chip architecture includes a 2D/3D image sensor and a programmable visual processor.In this architecture,we design a novel on-chip processing flow with die-to-die image transmission and low-latency fixed-point image processing.The vision chip achieves real-time end-to-end processing of convolutional neural networks(CNNs)and conventional image processing algo-rithms.Furthermore,an end-to-end 2D/3D vision system is built to exhibit the capacity of the vision chip.The vision system achieves real-timing applications under 2D and 3D scenes,such as human face detection(processing delay 10.2 ms)and depth map reconstruction(processing delay 4.1 ms).The frame rate of image acquisition,image process,and result display is larger than 30 fps.展开更多
This paper introduces a new datapath architecture for reconfigurable processors. The proposed datapath is based on Network-on-Chip approach and facilitates tight coupling of all functional units. Reconfigurable functi...This paper introduces a new datapath architecture for reconfigurable processors. The proposed datapath is based on Network-on-Chip approach and facilitates tight coupling of all functional units. Reconfigurable functional elements can be dynamically allocated for application specific optimizations, enabling polymorphic computing. Using a modified network simulator, performance of several NoC topologies and parameters are investigated with standard benchmark programs, including fine grain and coarse grain computations. Simulation results highlight the flexibility and scalability of the proposed polymorphic NoC processor for a wide range of application domains.展开更多
A method for determination of tool-chip contact length is theoreticallypresented in orthogonal metal machining. By using computer simulation and based on the analyses ofthe elastro-plastic deformation with lagrangian ...A method for determination of tool-chip contact length is theoreticallypresented in orthogonal metal machining. By using computer simulation and based on the analyses ofthe elastro-plastic deformation with lagrangian finite element method in the deformation zone, theaccumulated representative length of the low layer, the tool-chip contact length of the chipcontacting the tool rake are calculated, experimental studies are also carried out with 0.2 percentcarbon steel. It is shown that the tool-chip contact lengths obtained from computer simulation havea good agreement with those of measured values.展开更多
Developed a new program structure using in single chip computer system, which based on multitasking mechanism. Discussed the specific method for realization of the new structure. The applied sample is also provided.
In order to cool computer chip efficiently with the least noise, a single phase water-cooling radiator for computer chip driven by piezoelectric pump with two parallel-connection chambers is developed. The structure a...In order to cool computer chip efficiently with the least noise, a single phase water-cooling radiator for computer chip driven by piezoelectric pump with two parallel-connection chambers is developed. The structure and work principle of this radiator is described. Material, processing method and design principles of whole radiator are also explained. Finite element analysis (FEA) software, ANSYS, is used to simulate the heat distribution in the radiator. Testing equipments for water-cooling radiator are also listed. By experimental tests, influences of flowrate inside the cooling system and fan on chip cooling are explicated. This water-cooling radiator is proved more efficient than current air-cooling radiator with comparison experiments. During cooling the heater which simulates the working of computer chip with different power, the water-cooling radiator needs shorter time to reach lower steady temperatures than current air-cooling radiator.展开更多
It is still challenging to fully integrate computing in memory chip as edge learning devices.In recent work published on Science,a fully-integrated chip based on neuromorphic memristors was developed for edge learning...It is still challenging to fully integrate computing in memory chip as edge learning devices.In recent work published on Science,a fully-integrated chip based on neuromorphic memristors was developed for edge learning as artificial neural networks with functionality of synapses,dendrites,and somas.A crossbar-array memristor chip facilitated edge learning including hardware realization,learning algorithm,and cycle-parallel sign-and threshold-based learning(STELLAR)scheme.The motion control and demonstration platforms were executed to improve the edge learning ability for adapting to new scenarios.展开更多
In this paper,we present a comprehensive overview of artificial intelligence(AI)computing systems for large language models(LLMs)training.The rapid advancement of LLMs in recent years,coupled with the widespread adopt...In this paper,we present a comprehensive overview of artificial intelligence(AI)computing systems for large language models(LLMs)training.The rapid advancement of LLMs in recent years,coupled with the widespread adoption of algorithms and applications such as BERT,ChatGPT,and DeepSeek,has sparked significant interest in this field.We classify LLMs into encoder-only,encoder-decoder,and decoder-only models,and briefly analyze their training and inference processes to emphasize their substantial need for computational resources.These operations depend heavily on Alspecific accelerators like GPUs(graphics processing units),TPUs(tensor processing units),and MLUs(machine learning units).However,as the gap widens between the increasing complexity of LLMs and the current capabilities of accelerators,it becomes essential to adopt heterogeneous computing systems optimized for distributed environments to manage the growing computational and memory requirements of LLMs.We delve into the execution and scheduling of LLM algorithms,underlining the critical role of distributed computing strategies,memory management enhancements,and boosting computational efficiency.This paper clarifies the complex relationship between algorithm design,hardware infrastructure,and software optimization,and provides an in-depth understanding of both the software and hardware infrastructure supporting LLMs training,offering insights into the challenges and potential avenues for future development and deployment.展开更多
文摘In the 9 December 2024 issue of Nature[1],a team of Google engineers reported breakthrough results using“Willow”,their lat-est quantum computing chip(Fig.1).By meeting a milestone“below threshold”reduction in the rate of errors that plague super-conducting circuit-based quantum computing systems(Fig.2),the work moves the field another step towards its promised super-charged applications,albeit likely still many years away.Areas expected to benefit from quantum computing include,among others,drug discovery,materials science,finance,cybersecurity,and machine learning.
基金supported in part by NSFC under Grant 62422407in part by RGC under Grant 26204424in part by ACCESS–AI Chip Center for Emerging Smart Systems, sponsored by the Inno HK initiative of the Innovation and Technology Commission of the Hong Kong Special Administrative Region Government
文摘Robotic computing systems play an important role in enabling intelligent robotic tasks through intelligent algo-rithms and supporting hardware.In recent years,the evolution of robotic algorithms indicates a roadmap from traditional robotics to hierarchical and end-to-end models.This algorithmic advancement poses a critical challenge in achieving balanced system-wide performance.Therefore,algorithm-hardware co-design has emerged as the primary methodology,which ana-lyzes algorithm behaviors on hardware to identify common computational properties.These properties can motivate algo-rithm optimization to reduce computational complexity and hardware innovation from architecture to circuit for high performance and high energy efficiency.We then reviewed recent works on robotic and embodied AI algorithms and computing hard-ware to demonstrate this algorithm-hardware co-design methodology.In the end,we discuss future research opportunities by answering two questions:(1)how to adapt the computing platforms to the rapid evolution of embodied AI algorithms,and(2)how to transform the potential of emerging hardware innovations into end-to-end inference improvements.
基金supported in part by the National Key Research and Development Program of China(Grant No.2019YFB2204300)in part by the National Natural Science Foundation of China(Grant Nos.62334008 and 62274154)in part by the Key Program of National Natural Science Foundation of China(Grant No.62134004).
文摘This paper describes a 2D/3D vision chip with integrated sensing and processing capabilities.The 2D/3D vision chip architecture includes a 2D/3D image sensor and a programmable visual processor.In this architecture,we design a novel on-chip processing flow with die-to-die image transmission and low-latency fixed-point image processing.The vision chip achieves real-time end-to-end processing of convolutional neural networks(CNNs)and conventional image processing algo-rithms.Furthermore,an end-to-end 2D/3D vision system is built to exhibit the capacity of the vision chip.The vision system achieves real-timing applications under 2D and 3D scenes,such as human face detection(processing delay 10.2 ms)and depth map reconstruction(processing delay 4.1 ms).The frame rate of image acquisition,image process,and result display is larger than 30 fps.
文摘This paper introduces a new datapath architecture for reconfigurable processors. The proposed datapath is based on Network-on-Chip approach and facilitates tight coupling of all functional units. Reconfigurable functional elements can be dynamically allocated for application specific optimizations, enabling polymorphic computing. Using a modified network simulator, performance of several NoC topologies and parameters are investigated with standard benchmark programs, including fine grain and coarse grain computations. Simulation results highlight the flexibility and scalability of the proposed polymorphic NoC processor for a wide range of application domains.
基金This project is supported by Provincial Natural Science Foundation of Heilongjiang(No.A9809).
文摘A method for determination of tool-chip contact length is theoreticallypresented in orthogonal metal machining. By using computer simulation and based on the analyses ofthe elastro-plastic deformation with lagrangian finite element method in the deformation zone, theaccumulated representative length of the low layer, the tool-chip contact length of the chipcontacting the tool rake are calculated, experimental studies are also carried out with 0.2 percentcarbon steel. It is shown that the tool-chip contact lengths obtained from computer simulation havea good agreement with those of measured values.
文摘Developed a new program structure using in single chip computer system, which based on multitasking mechanism. Discussed the specific method for realization of the new structure. The applied sample is also provided.
基金This project is supported by National Hi-tech Research and Development Program of China (863 Program, No. 2002AA404250)National Natural Science Foundation of China (No. 50575093).
文摘In order to cool computer chip efficiently with the least noise, a single phase water-cooling radiator for computer chip driven by piezoelectric pump with two parallel-connection chambers is developed. The structure and work principle of this radiator is described. Material, processing method and design principles of whole radiator are also explained. Finite element analysis (FEA) software, ANSYS, is used to simulate the heat distribution in the radiator. Testing equipments for water-cooling radiator are also listed. By experimental tests, influences of flowrate inside the cooling system and fan on chip cooling are explicated. This water-cooling radiator is proved more efficient than current air-cooling radiator with comparison experiments. During cooling the heater which simulates the working of computer chip with different power, the water-cooling radiator needs shorter time to reach lower steady temperatures than current air-cooling radiator.
基金funding support from the National Natural Science Foundation of China(52172205).
文摘It is still challenging to fully integrate computing in memory chip as edge learning devices.In recent work published on Science,a fully-integrated chip based on neuromorphic memristors was developed for edge learning as artificial neural networks with functionality of synapses,dendrites,and somas.A crossbar-array memristor chip facilitated edge learning including hardware realization,learning algorithm,and cycle-parallel sign-and threshold-based learning(STELLAR)scheme.The motion control and demonstration platforms were executed to improve the edge learning ability for adapting to new scenarios.
基金supported by the National Natural Science Foundation of China under Grant Nos.61925208,U22A2028,62302483,62222214,62341411,62102399,and 62372436the Chinese Academy of Sciences(CAS)Project for Young Scientists in Basic Research under Grant No.YSBR-029the Youth Innovation Promotion Association of CAS,and Xplore Prize.
文摘In this paper,we present a comprehensive overview of artificial intelligence(AI)computing systems for large language models(LLMs)training.The rapid advancement of LLMs in recent years,coupled with the widespread adoption of algorithms and applications such as BERT,ChatGPT,and DeepSeek,has sparked significant interest in this field.We classify LLMs into encoder-only,encoder-decoder,and decoder-only models,and briefly analyze their training and inference processes to emphasize their substantial need for computational resources.These operations depend heavily on Alspecific accelerators like GPUs(graphics processing units),TPUs(tensor processing units),and MLUs(machine learning units).However,as the gap widens between the increasing complexity of LLMs and the current capabilities of accelerators,it becomes essential to adopt heterogeneous computing systems optimized for distributed environments to manage the growing computational and memory requirements of LLMs.We delve into the execution and scheduling of LLM algorithms,underlining the critical role of distributed computing strategies,memory management enhancements,and boosting computational efficiency.This paper clarifies the complex relationship between algorithm design,hardware infrastructure,and software optimization,and provides an in-depth understanding of both the software and hardware infrastructure supporting LLMs training,offering insights into the challenges and potential avenues for future development and deployment.