为计算直流应急电网不同故障类型下的短路电流,便于直流应急电网开关器件的型号以及相应的保护措施的选择,提出了一种考虑蓄电池Run-time等效模型的船舶直流应急电网短路电流计算方法。与传统蓄电池的Thevenin等效模型和PNGV(the partne...为计算直流应急电网不同故障类型下的短路电流,便于直流应急电网开关器件的型号以及相应的保护措施的选择,提出了一种考虑蓄电池Run-time等效模型的船舶直流应急电网短路电流计算方法。与传统蓄电池的Thevenin等效模型和PNGV(the partnership for a new generation of vehicles)等效模型相比,所提方法考虑了蓄电池容量衰减、温度、循环次数、存储时长、电流倍率、自身产热等多因素的影响,对蓄电池的故障等值电路进行了精确模拟。最后,利用实际的直流应急电网短路电流仿真和实验验证了所提方法的准确性,与Thevenin等效模型和PNGV等效模型相比,不同故障下蓄电池Run-time等效模型短路电流的计算误差更小。展开更多
3-D Networks-on-Chip (NoC) emerge as a potent solution to address both the interconnection and design complexity problems facing future Multiprocessor System-on-Chips (MPSoCs). Effective run-time mapping on such 3...3-D Networks-on-Chip (NoC) emerge as a potent solution to address both the interconnection and design complexity problems facing future Multiprocessor System-on-Chips (MPSoCs). Effective run-time mapping on such 3-D NoC-based MPSoCs can be quite challenging, as the arrival order and task graphs of the target applications are typically not known a priori, which can be further complicated by stringent energy requirements for NoC systems. This paper thus presents an energy-aware run-time incremental mapping algorithm (ERIM) for 3-D NoC which can minimize the energy consumption due to the data communications among processor cores, while reducing the fragmentation effect on the incoming applications to be mapped, and simultaneously satisfying the thermal constraints imposed on each incoming application. Specifically, incoming applications are mapped to cuboid tile regions for lower energy consumption of communication and the minimal routing. Fragment tiles due to system fragmentation can be gleaned for better resource utilization. Extensive experiments have been conducted to evaluate the performance of the proposed algorithm ERIM, and the results are compared against the optimal mapping algorithm (branch-and-bound) and two heuristic algorithms (TB and TL). The experiments show that ERIM outperforms TB and TL methods with significant energy saving (more than 10%), much reduced average response time, and improved system utilization.展开更多
The run-time security guarantee is a hotspot in current cyberspace security research, especially on embedded terminals, such as smart hardware as well as wearable and mobile devices. Typically, these devices use unive...The run-time security guarantee is a hotspot in current cyberspace security research, especially on embedded terminals, such as smart hardware as well as wearable and mobile devices. Typically, these devices use universal hardware and software to connect with public networks via the Internet, and are probably open to security threats from Trojan viruses and other malware. As a result, the security of sensitive personal data is threatened and economic interests in the industry are compromised. To address the run-time security problems efficiently, first, a TrustEnclave-based secure architecture is proposed, and the trusted execution environment is constructed by hardware isolation technology. Then the prototype system is implemented on real TrustZone-enabled hardware devices. Finally, both analytical and experimental evaluations are provided. The experimental results demonstrate the effectiveness and feasibility of the proposed security scheme.展开更多
Cyber physical systems(CPSs) incorporate computation, communication, and physical processes. The deep coupling and continuous interaction between such processes lead to a significant increase in complexity in the desi...Cyber physical systems(CPSs) incorporate computation, communication, and physical processes. The deep coupling and continuous interaction between such processes lead to a significant increase in complexity in the design and implementation of CPSs. Consequently, whereas developing CPSs from scratch is inefficient, developing them with the aid of CPS run-time supporting platforms can be efficient. In recent years, much research has been actively conducted on CPS run-time supporting platforms. However, few surveys have been conducted on these platforms. In this paper, we analyze and evaluate existing CPS run-time supporting platforms by first classifying them into three categories from the viewpoint of software architecture: component-based platforms, service-based platforms, and agent-based platforms. Then, for each type, we detail its design philosophy, key technical problems, and corresponding solutions with specific use cases. Subsequently, we compare existing platforms from two aspects: construction approaches for CPS tasks and support for non-functional properties. Finally, we outline several important future research issues.展开更多
Real-time multi-media applications are increasingly mapped on modern embedded systems based on multiprocessor systems-on-chip (MPSoC). Tasks of the applications need to be mapped on the MPSoC resources efficiently i...Real-time multi-media applications are increasingly mapped on modern embedded systems based on multiprocessor systems-on-chip (MPSoC). Tasks of the applications need to be mapped on the MPSoC resources efficiently in order to satisity their performance constraints. Exploring all the possible mappings, i.e., tasks to resources combinations exhaustively may take days or weeks. Additionally, the exploration is performed at design-time, which cannot handle dynamism in applications and resources' status. A runtime mapping technique can cater for the dynamism but cannot guarantee for strict timing deadlines due to large computations involved at run-time. Thus, an approach performing feasible compute intensive exploration at design-time and using the explored results at run-time is required. This paper presents a solution in the same direction. Communicationaware design space exploration (CADSE) techniques have been proposed to explore different mapping options to be selected at run-time subject to desired performance and available MPSoC resources. Experiments show that the proposed techniques for exploration are faster over an exhaustive exploration and provides almost the same quality of results.展开更多
Graphics processors have received an increasing attention with the growing demand for gaming,video streaming,and many other applications.During the graphics rendering with OpenGL,host CPU needs the runtime attributes ...Graphics processors have received an increasing attention with the growing demand for gaming,video streaming,and many other applications.During the graphics rendering with OpenGL,host CPU needs the runtime attributes to move on to the next procedure of rendering,which covers almost all the function units of graphics pipeline.Current methods suffer from the memory capacity issues to hold the variables or huge amount of data parsing paths which can cause congestion on the interface between graphics processor and host CPU.This paper refers to the operation principle of commuting bus,and proposes a bus-like data feedback mechanism(BFM)to traverse all the pipeline stages and collect the run-time status data or execution error of graphics rendering,then send them back to the host CPU.BFM can work in parallel with the graphics rendering logic.This method can complete the data feedback ta.sk easily with only 0.6%increase of resource utilization and has no negative impact on performance,which also obtains 1.3 times speed enhancement compared with a traditional approach.展开更多
Partial Reconfigurable FPGAs (Field Programmable Gate Array) allow tasks to be placed and removed dynamically at runtime. One of the challenging problems is the placement of modules on reconfigurable resources. Seve...Partial Reconfigurable FPGAs (Field Programmable Gate Array) allow tasks to be placed and removed dynamically at runtime. One of the challenging problems is the placement of modules on reconfigurable resources. Several modules placement techniques have been introduced in the literature to solve the temporal placement problem. This paper presents a temporal placement approach that manages the resources of a reconfigurable device. In fact, the authors' contribution focuses on introducing a new temporal placement algorithm that aims to minimize the communication cost between modules. Results show an important improvement in communication cost compared with other approaches.展开更多
Despite the rapid development of mobile and embedded hardware, directly executing computationexpensive and storage-intensive deep learning algorithms on these devices’ local side remains constrained for sensory data ...Despite the rapid development of mobile and embedded hardware, directly executing computationexpensive and storage-intensive deep learning algorithms on these devices’ local side remains constrained for sensory data analysis. In this paper, we first summarize the layer compression techniques for the state-of-theart deep learning model from three categories: weight factorization and pruning, convolution decomposition, and special layer architecture designing. For each category of layer compression techniques, we quantify their storage and computation tunable by layer compression techniques and discuss their practical challenges and possible improvements. Then, we implement Android projects using TensorFlow Mobile to test these 10 compression methods and compare their practical performances in terms of accuracy, parameter size, intermediate feature size,computation, processing latency, and energy consumption. To further discuss their advantages and bottlenecks,we test their performance over four standard recognition tasks on six resource-constrained Android smartphones.Finally, we survey two types of run-time Neural Network(NN) compression techniques which are orthogonal with the layer compression techniques, run-time resource management and cost optimization with special NN architecture,which are orthogonal with the layer compression techniques.展开更多
In large-scale distributed simulation, thousands of objects keep moving and interacting in a virtual environment, which produces a mass of messages. High level architecture (HLA) is the prevailing standard for model...In large-scale distributed simulation, thousands of objects keep moving and interacting in a virtual environment, which produces a mass of messages. High level architecture (HLA) is the prevailing standard for modeling and simulation. It specifies two publish-subscribe mechanisms for message filtering: class-based and value-based. However, the two mechanisms can only judge whether a message is relevant to a subscriber or not. Lacking of the ability to evaluate the relevance, all relevant messages are delivered with the same priority even when congestion occurs. It significantly limits the scalability and performance of distributed simulation. Aiming to solve the relevance evaluation problem, speed up message filtering, and filter more unnecessary messages, a new relevance evaluation mechanism Layer of Interest (Lol) was proposed by this paper. Lol defines a relevance classifier based on the impact of spatial distance on receiving attributes and attribute values. An adaptive publish-subscribe scheme was built on the basis of Loh This scheme can abandon most irrelevant messages directly. Run-time infrastructure (RTI) can also apply congestion control by reducing the frequency of sending or receiving object messages based on each objects' Loh The experiment results verify the efficiency of message filtering and RTI congestion control.展开更多
Congestions brought by data explosion may suffer system performance or even cause fatal errors in large-scale simulation systems.This paper analyzes the reasons of generating congestion with queuing theory,and propose...Congestions brought by data explosion may suffer system performance or even cause fatal errors in large-scale simulation systems.This paper analyzes the reasons of generating congestion with queuing theory,and proposes a novel congestion control approach for RTI with hybrid architecture,including framework,sampling policy,queue length predicting,and congestion control algorithm.The framework promotes the usage of federate resources,and improves the performance of HLA simulation systems by dynamically distributing loads on RTIG and LRCs(Local RTI Components).At last,experimental results say that the proposed approach is a flexible,simple and efficient way to control congestion in RTI with hybrid architecture.展开更多
文摘为计算直流应急电网不同故障类型下的短路电流,便于直流应急电网开关器件的型号以及相应的保护措施的选择,提出了一种考虑蓄电池Run-time等效模型的船舶直流应急电网短路电流计算方法。与传统蓄电池的Thevenin等效模型和PNGV(the partnership for a new generation of vehicles)等效模型相比,所提方法考虑了蓄电池容量衰减、温度、循环次数、存储时长、电流倍率、自身产热等多因素的影响,对蓄电池的故障等值电路进行了精确模拟。最后,利用实际的直流应急电网短路电流仿真和实验验证了所提方法的准确性,与Thevenin等效模型和PNGV等效模型相比,不同故障下蓄电池Run-time等效模型短路电流的计算误差更小。
基金This work is supported by the National Natural Science Foundation of China under Grant Nos. 60873112 and 61028004, the National Natural Science Foundation of USA under Grant No. CNS-1126688.
文摘3-D Networks-on-Chip (NoC) emerge as a potent solution to address both the interconnection and design complexity problems facing future Multiprocessor System-on-Chips (MPSoCs). Effective run-time mapping on such 3-D NoC-based MPSoCs can be quite challenging, as the arrival order and task graphs of the target applications are typically not known a priori, which can be further complicated by stringent energy requirements for NoC systems. This paper thus presents an energy-aware run-time incremental mapping algorithm (ERIM) for 3-D NoC which can minimize the energy consumption due to the data communications among processor cores, while reducing the fragmentation effect on the incoming applications to be mapped, and simultaneously satisfying the thermal constraints imposed on each incoming application. Specifically, incoming applications are mapped to cuboid tile regions for lower energy consumption of communication and the minimal routing. Fragment tiles due to system fragmentation can be gleaned for better resource utilization. Extensive experiments have been conducted to evaluate the performance of the proposed algorithm ERIM, and the results are compared against the optimal mapping algorithm (branch-and-bound) and two heuristic algorithms (TB and TL). The experiments show that ERIM outperforms TB and TL methods with significant energy saving (more than 10%), much reduced average response time, and improved system utilization.
基金supported by the National Natural Science Foundation of China (Nos.61572516 and 61503213)
文摘The run-time security guarantee is a hotspot in current cyberspace security research, especially on embedded terminals, such as smart hardware as well as wearable and mobile devices. Typically, these devices use universal hardware and software to connect with public networks via the Internet, and are probably open to security threats from Trojan viruses and other malware. As a result, the security of sensitive personal data is threatened and economic interests in the industry are compromised. To address the run-time security problems efficiently, first, a TrustEnclave-based secure architecture is proposed, and the trusted execution environment is constructed by hardware isolation technology. Then the prototype system is implemented on real TrustZone-enabled hardware devices. Finally, both analytical and experimental evaluations are provided. The experimental results demonstrate the effectiveness and feasibility of the proposed security scheme.
基金supported by the Integrated Science-Technology Innovation Plan of Shaanxi Province,China(No.2015KTZDGY06-03)
文摘Cyber physical systems(CPSs) incorporate computation, communication, and physical processes. The deep coupling and continuous interaction between such processes lead to a significant increase in complexity in the design and implementation of CPSs. Consequently, whereas developing CPSs from scratch is inefficient, developing them with the aid of CPS run-time supporting platforms can be efficient. In recent years, much research has been actively conducted on CPS run-time supporting platforms. However, few surveys have been conducted on these platforms. In this paper, we analyze and evaluate existing CPS run-time supporting platforms by first classifying them into three categories from the viewpoint of software architecture: component-based platforms, service-based platforms, and agent-based platforms. Then, for each type, we detail its design philosophy, key technical problems, and corresponding solutions with specific use cases. Subsequently, we compare existing platforms from two aspects: construction approaches for CPS tasks and support for non-functional properties. Finally, we outline several important future research issues.
基金The authors would like to thank the reviewers for their feedback and suggestions. We also wish to mention that this work is partly supported by Singapore Ministry of Education Academic Research Fund Tier 1 (R-263-000-655-133) and National Natural Science Foundation of China (NSFC) (Grant No. 61173032).
文摘Real-time multi-media applications are increasingly mapped on modern embedded systems based on multiprocessor systems-on-chip (MPSoC). Tasks of the applications need to be mapped on the MPSoC resources efficiently in order to satisity their performance constraints. Exploring all the possible mappings, i.e., tasks to resources combinations exhaustively may take days or weeks. Additionally, the exploration is performed at design-time, which cannot handle dynamism in applications and resources' status. A runtime mapping technique can cater for the dynamism but cannot guarantee for strict timing deadlines due to large computations involved at run-time. Thus, an approach performing feasible compute intensive exploration at design-time and using the explored results at run-time is required. This paper presents a solution in the same direction. Communicationaware design space exploration (CADSE) techniques have been proposed to explore different mapping options to be selected at run-time subject to desired performance and available MPSoC resources. Experiments show that the proposed techniques for exploration are faster over an exhaustive exploration and provides almost the same quality of results.
基金the National Natural Science Foundation of China(Nos.61834005,61772417,61602377,61802304 and 61874087)the International Science and Technology Cooperation Program of Shaanxi China(No.2018KW-006)。
文摘Graphics processors have received an increasing attention with the growing demand for gaming,video streaming,and many other applications.During the graphics rendering with OpenGL,host CPU needs the runtime attributes to move on to the next procedure of rendering,which covers almost all the function units of graphics pipeline.Current methods suffer from the memory capacity issues to hold the variables or huge amount of data parsing paths which can cause congestion on the interface between graphics processor and host CPU.This paper refers to the operation principle of commuting bus,and proposes a bus-like data feedback mechanism(BFM)to traverse all the pipeline stages and collect the run-time status data or execution error of graphics rendering,then send them back to the host CPU.BFM can work in parallel with the graphics rendering logic.This method can complete the data feedback ta.sk easily with only 0.6%increase of resource utilization and has no negative impact on performance,which also obtains 1.3 times speed enhancement compared with a traditional approach.
文摘Partial Reconfigurable FPGAs (Field Programmable Gate Array) allow tasks to be placed and removed dynamically at runtime. One of the challenging problems is the placement of modules on reconfigurable resources. Several modules placement techniques have been introduced in the literature to solve the temporal placement problem. This paper presents a temporal placement approach that manages the resources of a reconfigurable device. In fact, the authors' contribution focuses on introducing a new temporal placement algorithm that aims to minimize the communication cost between modules. Results show an important improvement in communication cost compared with other approaches.
基金supported by the National Key Research and Development Program of China (No. 2018YFB1003605)Foundations of CARCH (No. CARCH201704)+3 种基金the National Natural Science Foundation of China (No. 61472312)Foundations of Shaanxi Province and Xi’an ScienceTechnology Plan (Nos. B018230008 and BD34017020001)the Foundations of Xidian University (No. JBZ171002)
文摘Despite the rapid development of mobile and embedded hardware, directly executing computationexpensive and storage-intensive deep learning algorithms on these devices’ local side remains constrained for sensory data analysis. In this paper, we first summarize the layer compression techniques for the state-of-theart deep learning model from three categories: weight factorization and pruning, convolution decomposition, and special layer architecture designing. For each category of layer compression techniques, we quantify their storage and computation tunable by layer compression techniques and discuss their practical challenges and possible improvements. Then, we implement Android projects using TensorFlow Mobile to test these 10 compression methods and compare their practical performances in terms of accuracy, parameter size, intermediate feature size,computation, processing latency, and energy consumption. To further discuss their advantages and bottlenecks,we test their performance over four standard recognition tasks on six resource-constrained Android smartphones.Finally, we survey two types of run-time Neural Network(NN) compression techniques which are orthogonal with the layer compression techniques, run-time resource management and cost optimization with special NN architecture,which are orthogonal with the layer compression techniques.
基金Supported by the National Basic Research Program of China (Grant No. 2009CB320805)the National Natural Science Foundation of China(Grant No. 60603084)the National High-Tech Research & Development Program of China (Grant No. 2006AA01Z331)
文摘In large-scale distributed simulation, thousands of objects keep moving and interacting in a virtual environment, which produces a mass of messages. High level architecture (HLA) is the prevailing standard for modeling and simulation. It specifies two publish-subscribe mechanisms for message filtering: class-based and value-based. However, the two mechanisms can only judge whether a message is relevant to a subscriber or not. Lacking of the ability to evaluate the relevance, all relevant messages are delivered with the same priority even when congestion occurs. It significantly limits the scalability and performance of distributed simulation. Aiming to solve the relevance evaluation problem, speed up message filtering, and filter more unnecessary messages, a new relevance evaluation mechanism Layer of Interest (Lol) was proposed by this paper. Lol defines a relevance classifier based on the impact of spatial distance on receiving attributes and attribute values. An adaptive publish-subscribe scheme was built on the basis of Loh This scheme can abandon most irrelevant messages directly. Run-time infrastructure (RTI) can also apply congestion control by reducing the frequency of sending or receiving object messages based on each objects' Loh The experiment results verify the efficiency of message filtering and RTI congestion control.
基金supported in part by the Chinese National Science and Technology Support Program(Grant NO.2012BAH35F01).
文摘Congestions brought by data explosion may suffer system performance or even cause fatal errors in large-scale simulation systems.This paper analyzes the reasons of generating congestion with queuing theory,and proposes a novel congestion control approach for RTI with hybrid architecture,including framework,sampling policy,queue length predicting,and congestion control algorithm.The framework promotes the usage of federate resources,and improves the performance of HLA simulation systems by dynamically distributing loads on RTIG and LRCs(Local RTI Components).At last,experimental results say that the proposed approach is a flexible,simple and efficient way to control congestion in RTI with hybrid architecture.