张量转置(tensor transposition)作为基础张量运算原语,广泛应用于信号处理、科学计算以及深度学习等各种领域,在张量数据密集型应用及高性能计算中具有重要作用。随着能效指标在高性能计算系统中的重要性日益凸显,基于数字信号处理器(d...张量转置(tensor transposition)作为基础张量运算原语,广泛应用于信号处理、科学计算以及深度学习等各种领域,在张量数据密集型应用及高性能计算中具有重要作用。随着能效指标在高性能计算系统中的重要性日益凸显,基于数字信号处理器(digital signal processors,DSPs)的加速器已被集成至通用计算系统。然而,传统面向多核CPU和GPU的张量转置库因架构差异无法充分适配DSP架构。一方面,DSP架构的向量化计算潜力尚未得到充分挖掘;另一方面,其复杂的片上存储体系与多层次共享内存结构为张量并行程序设计带来了显著挑战。针对国产多核DSP的架构特点,提出ftmTT算法,并设计实现了一个面向多核DSP架构的通用张量转置库。ftmTT算法通过设计适配DSP架构的高效内存访问模式充分挖掘其并行化和向量化潜力,其核心创新包括:1)采用分块策略将高维张量转置转化为多核DSP平台所提供的矩阵转置内核操作;2)提出基于DMA点对点传输的张量数据块访存合并方案来降低数据搬运开销;3)通过双缓冲设计异步重叠转置计算与DMA传输实现计算通信隐藏,最终面向多核DSP实现高性能并行张量转置。在国产多核DSP平台FT-M7032的实验表明,ftmTT张量转置算法取得了最高达理论带宽75.96%的性能,达到FT-M7032平台STREAM带宽99.23%的性能。展开更多
Naturally degradable capsule provides a platform for sustained fragrance release.However,practical challenges such as low encapsulation efficiency and difficulty in sustained release are still limited in using fragran...Naturally degradable capsule provides a platform for sustained fragrance release.However,practical challenges such as low encapsulation efficiency and difficulty in sustained release are still limited in using fragranceloaded capsules.In this work,the natural materials sodium alginate and gelatine are dissolved and act as the aqueous phase,lavender is dissolved in caprylic/capric triglyceride(GTCC)as the oil phase,and SiO_(2) nanoparticles with neutralwettability as a solid emulsifier to form O/W Pickering emulsions simultaneously.Finally,multi-core capsules are prepared using the drop injection method with emulsions as templates.The results show that the capsules have been successfully prepared with a spherical morphology and multi-core structure,and the encapsulation rate of multi-core capsules can reach up to 99.6%.In addition,the multi-core capsules possess desirable sustained release performance,the cumulative sustained release rate of fragrance at 25℃over 49 days is only 32.5%.It is attributed to the significant protection of multi-core structure,Pickering emulsion nanoparticle membranes,and hydrogel network shell for encapsulated fragrance.This study is designed to deliver a new strategy for using sustained-release technology with fragrance in food,cosmetics,textiles,and other fields.展开更多
Multi-core digital signal processors (DSPs) are widely used in wireless telecommunication, core network transcoding, industrial control, and audio/video processing technologies, among others. In comparison with gene...Multi-core digital signal processors (DSPs) are widely used in wireless telecommunication, core network transcoding, industrial control, and audio/video processing technologies, among others. In comparison with general-purpose multi-processors, multi-core DSPs normally have a more complex memory hierarchy, such as on-chip core-local memory and non-cache-coherent shared memory. As a result, efficient multi-core DSP applications are very difficult to write. The current approach used to program multi-core DSPs is based on proprietary vendor software development kits (SDKs), which only provide low-level, non-portable primitives. While it is acceptable to write coarse-grained task-level parallel code with these SDKs, writing fine-grained data parallel code with SDKs is a very tedious and error-prone approach. We believe that it is desirable to possess a high-level and portable parallel programming model for multi-core DSPs. In this paper, we propose OpenMDSP, an extension of OpenMP designed for multi-core DSPs. The goal of OpenMDSP is to fill the gap between the OpenMP memory model and the memory hierarchy of multi-core DSPs. We propose three classes of directives in OpenMDSP, including 1) data placement directives that allow programmers to control the placement of global variables conveniently, 2) distributed array directives that divide a whole array into sections and promote the sections into core-local memory to improve performance, and 3) stream access directives that promote big arrays into core-local memory section by section during parallel loop processing while hiding the latency of data movement by the direct memory access (DMA) of a DSP. We implement the compiler and runtime system for OpenMDSP on PreeScale MSC8156. The benchmarking results show that seven of nine benchmarks achieve a speedup of more than a factor of 5 when using six threads.展开更多
Multi-core architectures are widely used to in time-to-market and power consumption of the chips enhance the microprocessor performance within a limited increase Toward the application of high-density data signal pro...Multi-core architectures are widely used to in time-to-market and power consumption of the chips enhance the microprocessor performance within a limited increase Toward the application of high-density data signal processing, this paper presents a novel heterogeneous multi-core architecture digital signal processor (DSP), YHFT-QDSP, with one RISC CPU core and 4 VLIW DSP cores. By three kinds of interconnection, YHFT-QDSP provides high efficiency message communication for inner-chip RISC core and DSP cores, inner-chip and inter-chip DSP cores. A parallel programming platform is specifically developed for the heterogeneous nmlti-core architecture of YHFT-QDSP. This parallel programming environment provides a parallel support library and a friendly interface between high level application softwares and multi- core DSP. The 130 nm CMOS custom chip design results benchmarks show that the interconnection structure of in a high speed and moderate power design. The results of typical YHFT-QDSP is much better than other related structures and achieves better speedup when using the interconnection facilities in combing methods. YHFT-QDSP has been signed off and manufactured presently. The future applications of the multi-core chip could be found in 3G wireless base station, high performance radar, industrial applications, and so on.展开更多
Frequent data exchange among all kinds of memories has become an inevitable phenomenon in the process of modern embeddedsoftware design. In order to improve the ability of the embedded system data's throughput and co...Frequent data exchange among all kinds of memories has become an inevitable phenomenon in the process of modern embeddedsoftware design. In order to improve the ability of the embedded system data's throughput and computation, most embeddeddevices introduce Enhanced Direct Memory Access (EDMA) data transfer technology. TMS320C6678 is a multi-core DSPproduced by Texas Instruments (TI). There are ten EDMA transmission controllers in the chip for configuration and datatransmissions are allowed to be performed between any two pieces of storage at the same time. This paper expounds the workingmechanism of EDMA based on multi-core DSP TMS320C6678. At the same time, multiple data sets are provided and thebottleneck of limiting data throughout is analyzed and solved.展开更多
Si P微系统是一种高度集成化的系统,其内部可能集成1个或多个DSP、NOR Flash和DDR存储器、AI加速芯片等,有些复杂的微系统还集成了FPGA芯片。由于内部集成了多个微组件,芯片之间相互连接,传统的测试单一微组件的方法并不适用于微系统的...Si P微系统是一种高度集成化的系统,其内部可能集成1个或多个DSP、NOR Flash和DDR存储器、AI加速芯片等,有些复杂的微系统还集成了FPGA芯片。由于内部集成了多个微组件,芯片之间相互连接,传统的测试单一微组件的方法并不适用于微系统的测试。提出了一套DSP微组件测试方法,该系统包括1块专门的测试板、可调试的电脑测试环境和JTAG通信。与单一的DSP裸芯测试相比,它可以快速稳定地实现DSP微组件的性能测试,满足大批量生产测试的需求。展开更多
A novel kind of multi-core magnetic composite particles, the surfaces of which were respectively mo- dified with goat-anti-mouse IgG and antitransferrin receptor(anti-CD71), was prepared. The fetal nucleated red blo...A novel kind of multi-core magnetic composite particles, the surfaces of which were respectively mo- dified with goat-anti-mouse IgG and antitransferrin receptor(anti-CD71), was prepared. The fetal nucleated red blood cells(FNRBCs) in the peripheral blood of a gravida were rapidly and effectively enriched and separated by the mo- dified multi-core magnetic composite particles in an external magnetic field. The obtained FNRBCs were used for the identification of the fetal sex by means of fluorescence in situ hybridization(FISH) technique. The results demonstrate that the multi-core magnetic composite particles meet the requirements for the enrichment and speration of FNRBCs with a low concentration and the accuracy of detetion for the diagnosis of fetal sex reached to 95%. Moreover, the obtained FNRBCs were applied to the non-invasive diagnosis of Down syndrome and chromosome 3p21 was de- tected. The above facts indicate that the novel multi-core magnetic composite particles-based method is simple, relia- ble and cost-effective and has opened up vast vistas for the potential application in clinic non-invasive prenatal diag- nosis.展开更多
文摘张量转置(tensor transposition)作为基础张量运算原语,广泛应用于信号处理、科学计算以及深度学习等各种领域,在张量数据密集型应用及高性能计算中具有重要作用。随着能效指标在高性能计算系统中的重要性日益凸显,基于数字信号处理器(digital signal processors,DSPs)的加速器已被集成至通用计算系统。然而,传统面向多核CPU和GPU的张量转置库因架构差异无法充分适配DSP架构。一方面,DSP架构的向量化计算潜力尚未得到充分挖掘;另一方面,其复杂的片上存储体系与多层次共享内存结构为张量并行程序设计带来了显著挑战。针对国产多核DSP的架构特点,提出ftmTT算法,并设计实现了一个面向多核DSP架构的通用张量转置库。ftmTT算法通过设计适配DSP架构的高效内存访问模式充分挖掘其并行化和向量化潜力,其核心创新包括:1)采用分块策略将高维张量转置转化为多核DSP平台所提供的矩阵转置内核操作;2)提出基于DMA点对点传输的张量数据块访存合并方案来降低数据搬运开销;3)通过双缓冲设计异步重叠转置计算与DMA传输实现计算通信隐藏,最终面向多核DSP实现高性能并行张量转置。在国产多核DSP平台FT-M7032的实验表明,ftmTT张量转置算法取得了最高达理论带宽75.96%的性能,达到FT-M7032平台STREAM带宽99.23%的性能。
文摘Naturally degradable capsule provides a platform for sustained fragrance release.However,practical challenges such as low encapsulation efficiency and difficulty in sustained release are still limited in using fragranceloaded capsules.In this work,the natural materials sodium alginate and gelatine are dissolved and act as the aqueous phase,lavender is dissolved in caprylic/capric triglyceride(GTCC)as the oil phase,and SiO_(2) nanoparticles with neutralwettability as a solid emulsifier to form O/W Pickering emulsions simultaneously.Finally,multi-core capsules are prepared using the drop injection method with emulsions as templates.The results show that the capsules have been successfully prepared with a spherical morphology and multi-core structure,and the encapsulation rate of multi-core capsules can reach up to 99.6%.In addition,the multi-core capsules possess desirable sustained release performance,the cumulative sustained release rate of fragrance at 25℃over 49 days is only 32.5%.It is attributed to the significant protection of multi-core structure,Pickering emulsion nanoparticle membranes,and hydrogel network shell for encapsulated fragrance.This study is designed to deliver a new strategy for using sustained-release technology with fragrance in food,cosmetics,textiles,and other fields.
基金supported by the National High Technology Research and Development 863 Program of China under Grant No.2012AA010901the National Natural Science Foundation of China under Grant No.61103021
文摘Multi-core digital signal processors (DSPs) are widely used in wireless telecommunication, core network transcoding, industrial control, and audio/video processing technologies, among others. In comparison with general-purpose multi-processors, multi-core DSPs normally have a more complex memory hierarchy, such as on-chip core-local memory and non-cache-coherent shared memory. As a result, efficient multi-core DSP applications are very difficult to write. The current approach used to program multi-core DSPs is based on proprietary vendor software development kits (SDKs), which only provide low-level, non-portable primitives. While it is acceptable to write coarse-grained task-level parallel code with these SDKs, writing fine-grained data parallel code with SDKs is a very tedious and error-prone approach. We believe that it is desirable to possess a high-level and portable parallel programming model for multi-core DSPs. In this paper, we propose OpenMDSP, an extension of OpenMP designed for multi-core DSPs. The goal of OpenMDSP is to fill the gap between the OpenMP memory model and the memory hierarchy of multi-core DSPs. We propose three classes of directives in OpenMDSP, including 1) data placement directives that allow programmers to control the placement of global variables conveniently, 2) distributed array directives that divide a whole array into sections and promote the sections into core-local memory to improve performance, and 3) stream access directives that promote big arrays into core-local memory section by section during parallel loop processing while hiding the latency of data movement by the direct memory access (DMA) of a DSP. We implement the compiler and runtime system for OpenMDSP on PreeScale MSC8156. The benchmarking results show that seven of nine benchmarks achieve a speedup of more than a factor of 5 when using six threads.
基金supported by the National Science and Technology Major Project of the Ministry of Science and Technology of China under Grant No.2009ZX01034-001-001-006the National High Technology Research and Development 863 Program of China under Grant No.2007AA01Z108the Program for Changjiang Scholars and Innovative Research Team in Universities of China under Grant No.IRT0614.
文摘Multi-core architectures are widely used to in time-to-market and power consumption of the chips enhance the microprocessor performance within a limited increase Toward the application of high-density data signal processing, this paper presents a novel heterogeneous multi-core architecture digital signal processor (DSP), YHFT-QDSP, with one RISC CPU core and 4 VLIW DSP cores. By three kinds of interconnection, YHFT-QDSP provides high efficiency message communication for inner-chip RISC core and DSP cores, inner-chip and inter-chip DSP cores. A parallel programming platform is specifically developed for the heterogeneous nmlti-core architecture of YHFT-QDSP. This parallel programming environment provides a parallel support library and a friendly interface between high level application softwares and multi- core DSP. The 130 nm CMOS custom chip design results benchmarks show that the interconnection structure of in a high speed and moderate power design. The results of typical YHFT-QDSP is much better than other related structures and achieves better speedup when using the interconnection facilities in combing methods. YHFT-QDSP has been signed off and manufactured presently. The future applications of the multi-core chip could be found in 3G wireless base station, high performance radar, industrial applications, and so on.
文摘Frequent data exchange among all kinds of memories has become an inevitable phenomenon in the process of modern embeddedsoftware design. In order to improve the ability of the embedded system data's throughput and computation, most embeddeddevices introduce Enhanced Direct Memory Access (EDMA) data transfer technology. TMS320C6678 is a multi-core DSPproduced by Texas Instruments (TI). There are ten EDMA transmission controllers in the chip for configuration and datatransmissions are allowed to be performed between any two pieces of storage at the same time. This paper expounds the workingmechanism of EDMA based on multi-core DSP TMS320C6678. At the same time, multiple data sets are provided and thebottleneck of limiting data throughout is analyzed and solved.
文摘A novel kind of multi-core magnetic composite particles, the surfaces of which were respectively mo- dified with goat-anti-mouse IgG and antitransferrin receptor(anti-CD71), was prepared. The fetal nucleated red blood cells(FNRBCs) in the peripheral blood of a gravida were rapidly and effectively enriched and separated by the mo- dified multi-core magnetic composite particles in an external magnetic field. The obtained FNRBCs were used for the identification of the fetal sex by means of fluorescence in situ hybridization(FISH) technique. The results demonstrate that the multi-core magnetic composite particles meet the requirements for the enrichment and speration of FNRBCs with a low concentration and the accuracy of detetion for the diagnosis of fetal sex reached to 95%. Moreover, the obtained FNRBCs were applied to the non-invasive diagnosis of Down syndrome and chromosome 3p21 was de- tected. The above facts indicate that the novel multi-core magnetic composite particles-based method is simple, relia- ble and cost-effective and has opened up vast vistas for the potential application in clinic non-invasive prenatal diag- nosis.