This paper analyzes the main elements in NS network simulator, makes adetailed view of dataflow management in a link, a node, and an agent, respectively, and introducesthe information described by its trace file. Base...This paper analyzes the main elements in NS network simulator, makes adetailed view of dataflow management in a link, a node, and an agent, respectively, and introducesthe information described by its trace file. Based on the analysis of transportation and treatmentof different packets in NS, a dataflow state machine is proposed with its states exchange triggeringevents and a dataflow analyzer is designed and implemented according to it. As the machine statefunctions, the analyzer can make statistic of total transportation flux of a specified dataflow andoffer a general fluctuation diagram. Finally, a concrete example is used to test its performance.展开更多
Sub Farm Interface is the event builder of the ATLAS(A Toroidal LHC ApparatuS) Dataflow System. It receives event fragments from the Read Out System, builds full events and sends complete events to the Event Filter ...Sub Farm Interface is the event builder of the ATLAS(A Toroidal LHC ApparatuS) Dataflow System. It receives event fragments from the Read Out System, builds full events and sends complete events to the Event Filter for high level event selection. This paper describes the implementation of the Sub Farm Interface. Furthermore, this paper introduces some issues on SFI(Sub Farm Interface) optimization and the monitoring service inside SFI.展开更多
Driven by continuous scaling of nanoscale semiconductor technologies,the past years have witnessed the progressive advancement of machine learning techniques and applications.Recently,dedicated machine learning accele...Driven by continuous scaling of nanoscale semiconductor technologies,the past years have witnessed the progressive advancement of machine learning techniques and applications.Recently,dedicated machine learning accelerators,especially for neural networks,have attracted the research interests of computer architects and VLSI designers.State-of-the-art accelerators increase performance by deploying a huge amount of processing elements,however still face the issue of degraded resource utilization across hybrid and non-standard algorithmic kernels.In this work,we exploit the properties of important neural network kernels for both perception and control to propose a reconfigurable dataflow processor,which adjusts the patterns of data flowing,functionalities of processing elements and on-chip storages according to network kernels.In contrast to stateof-the-art fine-grained data flowing techniques,the proposed coarse-grained dataflow reconfiguration approach enables extensive sharing of computing and storage resources.Three hybrid networks for MobileNet,deep reinforcement learning and sequence classification are constructed and analyzed with customized instruction sets and toolchain.A test chip has been designed and fabricated under UMC 65 nm CMOS technology,with the measured power consumption of 7.51 mW under 100 MHz frequency on a die size of 1.8×1.8 mm^2.展开更多
This study reported an original end-to-end dataflow engineering framework for the quality transfer principle to overcome the quality challenges in real-world honey manufacturing.Firstly,650 pivotal data points of phys...This study reported an original end-to-end dataflow engineering framework for the quality transfer principle to overcome the quality challenges in real-world honey manufacturing.Firstly,650 pivotal data points of physical and chemical quality attributes from 65 batches of honey intermediates were characterized through multiple sensors,which included rheological properties,acidity,moisture,and sugars.Furthermore,a hypersensitized TAS1R2@AuNPs/SPCE biosensor was developed to identify biological quality attributes of honey,the powerful affinities between honey intermediates and the TAS1R2 receptor were discovered(KD<1×10^(−8)M),and the abnormal batches of B2,B23 and C23 were diagnosed by TAS1R2@AuNPs/SPCE biosensor and multivariable algorithm.Finally,the end-to-end dataflow containing physical,chemical and biological critical quality attributes was successfully established to interpret the quality transfer principle of honey manufacturing,which revealed that the front-end refining process was relatively unstable and the back-end refining process was a negligible influence on the quality of honey manufacturing.This framework embraces quality management,quality transfer,and biosensor information,which will contribute to discovering the quality transfer principle in industrial innovation for intelligent manufacturing.展开更多
The dataflow architecture,which is characterized by a lack of a redundant unified control logic,has been shown to have an advantage over the control-flow architecture as it improves the computational performance and p...The dataflow architecture,which is characterized by a lack of a redundant unified control logic,has been shown to have an advantage over the control-flow architecture as it improves the computational performance and power efficiency,especially of applications used in high-performance computing(HPC).Importantly,the high computational efficiency of systems using the dataflow architecture is achieved by allowing program kernels to be activated in a simultaneous manner.Therefore,a proper acknowledgment mechanism is required to distinguish the data that logically belongs to different contexts.Possible solutions include the tagged-token matching mechanism in which the data is sent before acknowledgments are received but retried after rejection,or a handshake mechanism in which the data is only sent after acknowledgments are received.However,these mechanisms are characterized by both inefficient data transfer and increased area cost.Good performance of the dataflow architecture depends on the efficiency of data transfer.In order to optimize the efficiency of data transfer in existing dataflow architectures with a minimal increase in area and power cost,we propose a Look-Ahead Acknowledgment(LAA)mechanism.LAA accelerates the execution flow by speculatively acknowledging ahead without penalties.Our simulation analysis based on a handshake mechanism shows that our LAA increases the average utilization of computational units by 23.9%,with a reduction in the average execution time by 17.4%and an increase in the average power efficiency of dataflow processors by 22.4%.Crucially,our novel approach results in a relatively small increase in the area and power consumption of the on-chip logic of less than 0.9%.In conclusion,the evaluation results suggest that Look-Ahead Acknowledgment is an effective improvement for data transfer in existing dataflow architectures.展开更多
Edge computing can alleviate the problem of insufficient computational resources for the user equipment,improve the network processing environment,and promote the user experience.Edge computing is well known as a pros...Edge computing can alleviate the problem of insufficient computational resources for the user equipment,improve the network processing environment,and promote the user experience.Edge computing is well known as a prospective method for the development of the Internet of Things(IoT).However,with the development of smart terminals,much more time is required for scheduling the terminal high-intensity upstream dataflow in the edge server than for scheduling that in the downstream dataflow.In this paper,we study the scheduling strategy for upstream dataflows in edge computing networks and introduce a three-tier edge computing network architecture.We propose a Time-Slicing Self-Adaptive Scheduling(TSAS)algorithm based on the hierarchical queue,which can reduce the queuing delay of the dataflow,improve the timeliness of dataflow processing and achieve an efficient and reasonable performance of dataflow scheduling.The experimental results show that the TSAS algorithm can reduce latency,minimize energy consumption,and increase system throughput.展开更多
Dataflow architecture has shown its advantages in many high-performance computing cases. In dataflow computing, a large amount of data are frequently transferred among processing elements through the network-on-chip ...Dataflow architecture has shown its advantages in many high-performance computing cases. In dataflow computing, a large amount of data are frequently transferred among processing elements through the network-on-chip (NoC). Thus the router design has a significant impact on the performance of dataflow architecture. Common routers are designed for control-flow multi-core architecture and we find they are not suitable for dataflow architecture. In this work, we analyze and extract the features of data transfers in NoCs of dataflow architecture: multiple destinations, high injection rate, and performance sensitive to delay. Based on the three features, we propose a novel and efficient NoC router for dataflow architecture. The proposed router supports multi-destination; thus it can transfer data with multiple destinations in a single transfer. Moreover, the router adopts output buffer to maximize throughput and adopts non-flit packets to minimize transfer delay. Experimental results show that the proposed router can improve the performance of dataflow architecture by 3.6x over a state-of-the-art router.展开更多
The pervasiveness of the smart Internet of Things(IoTs) enables many electric sensors and devices to be connected and generates a large amount of dataflow. Compared with traditional big data, the streaming dataflow is...The pervasiveness of the smart Internet of Things(IoTs) enables many electric sensors and devices to be connected and generates a large amount of dataflow. Compared with traditional big data, the streaming dataflow is faced with representative challenges, such as high speed, strong variability, rough continuity, and demanding timeliness, which pose severe tests of its efficient management. In this paper, we provide an overall review of IoT dataflow management. We first analyze the key challenges faced with IoT dataflow and initially overview the related techniques in dataflow management, spanning dataflow sensing, mining, control, security, privacy protection,etc. Then, we illustrate and compare representative tools or platforms for IoT dataflow management. In addition,promising application scenarios, such as smart cities, smart transportation, and smart manufacturing, are elaborated,which will provide significant guidance for further research. The management of IoT dataflow is also an important area, which merits in-depth discussions and further study.展开更多
Double buffering is an effective mechanism to hide the latency of data transfers between on-chip and off-chip memory. However, in dataflow architecture, the swapping of two buffers during the execution of many tiles d...Double buffering is an effective mechanism to hide the latency of data transfers between on-chip and off-chip memory. However, in dataflow architecture, the swapping of two buffers during the execution of many tiles decreases the performance because of repetitive filling and draining of the dataflow accelerator. In this work, we propose a non-stop double buffering mechanism for dataflow architecture. The proposed non-stop mechanism assigns tiles to the processing element array without stopping the execution of processing elements through optimizing control logic in dataflow architecture. Moreover, we propose a work-flow program to cooperate with the non-stop double buffering mechanism. After optimizations both on control logic and on work-flow program, the filling and draining of the array needs to be done only once across the execution of all tiles belonging to the same dataflow graph. Experimental results show that the proposed double buffering mechanism for dataftow architecture achieves a 16.2% average efficiency improvement over that without the optimization.展开更多
With the coming of exascale supercomputing era, power efficiency has become the most important obstacle to build an exascale system. Dataflow architecture has native advantage in achieving high power efficiency for sc...With the coming of exascale supercomputing era, power efficiency has become the most important obstacle to build an exascale system. Dataflow architecture has native advantage in achieving high power efficiency for scientific applications. However, the state-of-the-art dataflow architectures fail to exploit high parallelism for loop processing. To address this issue, we propose a pipelining loop optimization method (PLO), which makes iterations in loops flow in the processing element (PE) array of dataflow accelerator. This method consists of two techniques, architecture-assisted hardware iteration and instruction-assisted software iteration. In hardware iteration execution model, an on-chip loop controller is designed to generate loop indexes, reducing the complexity of computing kernel and laying a good f(mndation for pipelining execution. In software iteration execution model, additional loop instructions are presented to solve the iteration dependency problem. Via these two techniques, the average number of instructions ready to execute per cycle is increased to keep floating-point unit busy. Simulation results show that our proposed method outperforms static and dynamic loop execution model in floating-point efficiency by 2.45x and 1.1x on average, respectively, while the hardware cost of these two techniques is acceptable.展开更多
Inheriting from a data-driven communication pattern other than a location-driven pattern, named data net- working (NDN) offers better support to network-layer dataflow. However, the application developers have to ha...Inheriting from a data-driven communication pattern other than a location-driven pattern, named data net- working (NDN) offers better support to network-layer dataflow. However, the application developers have to handle complex tasks, such as data segmentation, packet verification, and flow control, due to the lack of proper transport-layer protocols over the network layer. In this study, we design a dataflow-oriented programming interface to provide transport strategies for NDN, which greatly improves the efficiency in developing applications. This interface presents two application data unit; (ADU) retrieval strategies according to different data publishing patterns, in which it adopts an adaptive ADU pipelining algorithm to control the dataflow based on the current network status and data generation rate. The interface also offers network measurement strategies to monitor an abundance of critical metrics infuencing the application performance. We verify the functionality and performance of our interface by implementing a video streaming application spanning 11 time zones over the worldwide NDN testbed. Our experiments show that the interface can efficiently support developing high-performance and dataflow-driven NDN applications.展开更多
基金The Natural Science Foundation of Jiangsu Province (BK2001205).
文摘This paper analyzes the main elements in NS network simulator, makes adetailed view of dataflow management in a link, a node, and an agent, respectively, and introducesthe information described by its trace file. Based on the analysis of transportation and treatmentof different packets in NS, a dataflow state machine is proposed with its states exchange triggeringevents and a dataflow analyzer is designed and implemented according to it. As the machine statefunctions, the analyzer can make statistic of total transportation flux of a specified dataflow andoffer a general fluctuation diagram. Finally, a concrete example is used to test its performance.
文摘Sub Farm Interface is the event builder of the ATLAS(A Toroidal LHC ApparatuS) Dataflow System. It receives event fragments from the Read Out System, builds full events and sends complete events to the Event Filter for high level event selection. This paper describes the implementation of the Sub Farm Interface. Furthermore, this paper introduces some issues on SFI(Sub Farm Interface) optimization and the monitoring service inside SFI.
基金supported by NSFC with Grant No. 61702493, 51707191Science and Technology Planning Project of Guangdong Province with Grant No. 2018B030338001+2 种基金Shenzhen S&T Funding with Grant No. KQJSCX20170731163915914Basic Research Program No. JCYJ20170818164527303, JCYJ20180507182619669SIAT Innovation Program for Excellent Young Researchers with Grant No. 2017001
文摘Driven by continuous scaling of nanoscale semiconductor technologies,the past years have witnessed the progressive advancement of machine learning techniques and applications.Recently,dedicated machine learning accelerators,especially for neural networks,have attracted the research interests of computer architects and VLSI designers.State-of-the-art accelerators increase performance by deploying a huge amount of processing elements,however still face the issue of degraded resource utilization across hybrid and non-standard algorithmic kernels.In this work,we exploit the properties of important neural network kernels for both perception and control to propose a reconfigurable dataflow processor,which adjusts the patterns of data flowing,functionalities of processing elements and on-chip storages according to network kernels.In contrast to stateof-the-art fine-grained data flowing techniques,the proposed coarse-grained dataflow reconfiguration approach enables extensive sharing of computing and storage resources.Three hybrid networks for MobileNet,deep reinforcement learning and sequence classification are constructed and analyzed with customized instruction sets and toolchain.A test chip has been designed and fabricated under UMC 65 nm CMOS technology,with the measured power consumption of 7.51 mW under 100 MHz frequency on a die size of 1.8×1.8 mm^2.
基金co-supported by Excellent Young Scientists Fund of National Natural Science Foundation of China(82022073)Major scientific and technological R&D projects in Jiangxi Province(20203ABC28W018)+1 种基金National Natural Science Foundation of China(82274110)the Fundamental Research Funds for the Central Universities(2022-JYB-JBZR-018,2022-JYB-JBZR-019).
文摘This study reported an original end-to-end dataflow engineering framework for the quality transfer principle to overcome the quality challenges in real-world honey manufacturing.Firstly,650 pivotal data points of physical and chemical quality attributes from 65 batches of honey intermediates were characterized through multiple sensors,which included rheological properties,acidity,moisture,and sugars.Furthermore,a hypersensitized TAS1R2@AuNPs/SPCE biosensor was developed to identify biological quality attributes of honey,the powerful affinities between honey intermediates and the TAS1R2 receptor were discovered(KD<1×10^(−8)M),and the abnormal batches of B2,B23 and C23 were diagnosed by TAS1R2@AuNPs/SPCE biosensor and multivariable algorithm.Finally,the end-to-end dataflow containing physical,chemical and biological critical quality attributes was successfully established to interpret the quality transfer principle of honey manufacturing,which revealed that the front-end refining process was relatively unstable and the back-end refining process was a negligible influence on the quality of honey manufacturing.This framework embraces quality management,quality transfer,and biosensor information,which will contribute to discovering the quality transfer principle in industrial innovation for intelligent manufacturing.
基金supported by the Project of the State Grid Corporation of China in 2020"Integration Technology Research and Prototype Development for High End Controller Chip"under Grant No.5700-202041264A-0-0-00.
文摘The dataflow architecture,which is characterized by a lack of a redundant unified control logic,has been shown to have an advantage over the control-flow architecture as it improves the computational performance and power efficiency,especially of applications used in high-performance computing(HPC).Importantly,the high computational efficiency of systems using the dataflow architecture is achieved by allowing program kernels to be activated in a simultaneous manner.Therefore,a proper acknowledgment mechanism is required to distinguish the data that logically belongs to different contexts.Possible solutions include the tagged-token matching mechanism in which the data is sent before acknowledgments are received but retried after rejection,or a handshake mechanism in which the data is only sent after acknowledgments are received.However,these mechanisms are characterized by both inefficient data transfer and increased area cost.Good performance of the dataflow architecture depends on the efficiency of data transfer.In order to optimize the efficiency of data transfer in existing dataflow architectures with a minimal increase in area and power cost,we propose a Look-Ahead Acknowledgment(LAA)mechanism.LAA accelerates the execution flow by speculatively acknowledging ahead without penalties.Our simulation analysis based on a handshake mechanism shows that our LAA increases the average utilization of computational units by 23.9%,with a reduction in the average execution time by 17.4%and an increase in the average power efficiency of dataflow processors by 22.4%.Crucially,our novel approach results in a relatively small increase in the area and power consumption of the on-chip logic of less than 0.9%.In conclusion,the evaluation results suggest that Look-Ahead Acknowledgment is an effective improvement for data transfer in existing dataflow architectures.
基金supported in part by the National Natural Science Foundation of China(No.61572191)Natural Science Foundation of Hunan Province(Nos.2022JJ30398,2022JJ40277 and 2022JJ40278)Scientific Research Fund of Hunan Provincial Education Department(No.17A130).
文摘Edge computing can alleviate the problem of insufficient computational resources for the user equipment,improve the network processing environment,and promote the user experience.Edge computing is well known as a prospective method for the development of the Internet of Things(IoT).However,with the development of smart terminals,much more time is required for scheduling the terminal high-intensity upstream dataflow in the edge server than for scheduling that in the downstream dataflow.In this paper,we study the scheduling strategy for upstream dataflows in edge computing networks and introduce a three-tier edge computing network architecture.We propose a Time-Slicing Self-Adaptive Scheduling(TSAS)algorithm based on the hierarchical queue,which can reduce the queuing delay of the dataflow,improve the timeliness of dataflow processing and achieve an efficient and reasonable performance of dataflow scheduling.The experimental results show that the TSAS algorithm can reduce latency,minimize energy consumption,and increase system throughput.
基金This work was supported by the National High Technology Research and Development 863 Program of China under Grant No. 2015AA01A301, the National Natural Science Foundation of China under Grant No. 61332009, the National HeGaoJi Project of China under Grant No. 2013ZX0102-8001-001-001, and the Beijing Municipal Science and Technology Commission under Grant Nos. Z15010101009 and Z151100003615006.
文摘Dataflow architecture has shown its advantages in many high-performance computing cases. In dataflow computing, a large amount of data are frequently transferred among processing elements through the network-on-chip (NoC). Thus the router design has a significant impact on the performance of dataflow architecture. Common routers are designed for control-flow multi-core architecture and we find they are not suitable for dataflow architecture. In this work, we analyze and extract the features of data transfers in NoCs of dataflow architecture: multiple destinations, high injection rate, and performance sensitive to delay. Based on the three features, we propose a novel and efficient NoC router for dataflow architecture. The proposed router supports multi-destination; thus it can transfer data with multiple destinations in a single transfer. Moreover, the router adopts output buffer to maximize throughput and adopts non-flit packets to minimize transfer delay. Experimental results show that the proposed router can improve the performance of dataflow architecture by 3.6x over a state-of-the-art router.
基金supported in part by the National Natural Science Foundation of China (No.61872038)。
文摘The pervasiveness of the smart Internet of Things(IoTs) enables many electric sensors and devices to be connected and generates a large amount of dataflow. Compared with traditional big data, the streaming dataflow is faced with representative challenges, such as high speed, strong variability, rough continuity, and demanding timeliness, which pose severe tests of its efficient management. In this paper, we provide an overall review of IoT dataflow management. We first analyze the key challenges faced with IoT dataflow and initially overview the related techniques in dataflow management, spanning dataflow sensing, mining, control, security, privacy protection,etc. Then, we illustrate and compare representative tools or platforms for IoT dataflow management. In addition,promising application scenarios, such as smart cities, smart transportation, and smart manufacturing, are elaborated,which will provide significant guidance for further research. The management of IoT dataflow is also an important area, which merits in-depth discussions and further study.
基金This work was supported by the National Key Research and Development Program of China under Grant No. 2016YFB0200501, the National Natural Science Foundation of China under Grant Nos. 61332009 and 61521092, the Open Project Program of State Key Laboratory of Mathematical Engineering and Advanced Computing under Grant No. 2016A04, and the Beijing Municipal Science and Technology Commission under Grant No. Z15010101009.
文摘Double buffering is an effective mechanism to hide the latency of data transfers between on-chip and off-chip memory. However, in dataflow architecture, the swapping of two buffers during the execution of many tiles decreases the performance because of repetitive filling and draining of the dataflow accelerator. In this work, we propose a non-stop double buffering mechanism for dataflow architecture. The proposed non-stop mechanism assigns tiles to the processing element array without stopping the execution of processing elements through optimizing control logic in dataflow architecture. Moreover, we propose a work-flow program to cooperate with the non-stop double buffering mechanism. After optimizations both on control logic and on work-flow program, the filling and draining of the array needs to be done only once across the execution of all tiles belonging to the same dataflow graph. Experimental results show that the proposed double buffering mechanism for dataftow architecture achieves a 16.2% average efficiency improvement over that without the optimization.
基金This work was supported by the National Key Research and Development Program of China under Grant No. 2016YFB0200501, tile National Natural Science Foundation of China under Grant Nos. 61332009 and 61521092, the Open Project Program of State Key Laboratory of Mathematical Engineering and Advanced Computing under Grant No. 2016A04 and tile Beijing Municipal Science and Technology Commission under Grant No. Z15010101009, the Open Project Program of State Key Laboratory of Computer Architecture under Grant No. CARCH201503, China Scholarship Council, and Beijing Advanced hmovation Center for hnaging Technology.
文摘With the coming of exascale supercomputing era, power efficiency has become the most important obstacle to build an exascale system. Dataflow architecture has native advantage in achieving high power efficiency for scientific applications. However, the state-of-the-art dataflow architectures fail to exploit high parallelism for loop processing. To address this issue, we propose a pipelining loop optimization method (PLO), which makes iterations in loops flow in the processing element (PE) array of dataflow accelerator. This method consists of two techniques, architecture-assisted hardware iteration and instruction-assisted software iteration. In hardware iteration execution model, an on-chip loop controller is designed to generate loop indexes, reducing the complexity of computing kernel and laying a good f(mndation for pipelining execution. In software iteration execution model, additional loop instructions are presented to solve the iteration dependency problem. Via these two techniques, the average number of instructions ready to execute per cycle is increased to keep floating-point unit busy. Simulation results show that our proposed method outperforms static and dynamic loop execution model in floating-point efficiency by 2.45x and 1.1x on average, respectively, while the hardware cost of these two techniques is acceptable.
基金This work is supported by the National Natural Science Foundation of China under Grant No. 61373025.
文摘Inheriting from a data-driven communication pattern other than a location-driven pattern, named data net- working (NDN) offers better support to network-layer dataflow. However, the application developers have to handle complex tasks, such as data segmentation, packet verification, and flow control, due to the lack of proper transport-layer protocols over the network layer. In this study, we design a dataflow-oriented programming interface to provide transport strategies for NDN, which greatly improves the efficiency in developing applications. This interface presents two application data unit; (ADU) retrieval strategies according to different data publishing patterns, in which it adopts an adaptive ADU pipelining algorithm to control the dataflow based on the current network status and data generation rate. The interface also offers network measurement strategies to monitor an abundance of critical metrics infuencing the application performance. We verify the functionality and performance of our interface by implementing a video streaming application spanning 11 time zones over the worldwide NDN testbed. Our experiments show that the interface can efficiently support developing high-performance and dataflow-driven NDN applications.