The microwave-induced thermoacoustic imaging(TAI)technology has both the advantages of high contrast of microwave imaging and high resolution of ultrasound imaging(UI),so it has carried out exploratory application res...The microwave-induced thermoacoustic imaging(TAI)technology has both the advantages of high contrast of microwave imaging and high resolution of ultrasound imaging(UI),so it has carried out exploratory application research in various areas,such as the early detection of breast tumors and cerebrovascular diseases.However,the microwave generator used in the traditional TAI technology is huge and expensive,and the temporal resolution is also too low due to the single-element scanning mechanism.Thus,it is difficult to meet the needs of clinical applications.In this paper,the iterative process and the analysis of related application scenarios from single-element scanning to portable and array-based TAI,such as the miniaturized microwave generator,handheld antenna,multi-channel data acquisition,and UI/TAIdual-modality imaging,are reviewed,and the future trends of this technology are discussed.This review helps researchers in the field of TAI learn the technological development process and future trends.It also deepens clinicians’understanding of TAI so as to put forward more application requirements.展开更多
连接是数据查询处理中最耗时、使用最频繁的操作之一,对提高连接操作的速率具有重要意义。阵列众核处理器是一类重要的众核处理器,具有强大的并行能力,可用来加速并行计算。基于阵列众核处理器的结构,设计和优化了一种高效的多层分区Has...连接是数据查询处理中最耗时、使用最频繁的操作之一,对提高连接操作的速率具有重要意义。阵列众核处理器是一类重要的众核处理器,具有强大的并行能力,可用来加速并行计算。基于阵列众核处理器的结构,设计和优化了一种高效的多层分区Hash连接算法。该算法通过多层划分的策略大大降低了主存访问次数,通过分区重排方法有效消除了数据倾斜的影响,获得了很高的性能。在异构融合阵列众核处理器DFMC(Deeply-Fused Many Core)原型系统上的实验结果表明,DFMC上多层分区Hash连接算法的性能是CPU-GPU耦合结构上最快的连接算法的8.0倍,表明利用阵列众核处理器加速数据查询应用具有优势。展开更多
3D reverse time migration in tiled transversly isotropic(3D RTM-TTI) is the most precise model for complex seismic imaging.However,vast computing time of 3D RTM-TTI prevents it from being widely used,which is addresse...3D reverse time migration in tiled transversly isotropic(3D RTM-TTI) is the most precise model for complex seismic imaging.However,vast computing time of 3D RTM-TTI prevents it from being widely used,which is addressed by providing parallel solutions for 3D RTM-TTI on multicores and many-cores.After data parallelism and memory optimization,the hot spot function of 3D RTMTTI gains 35.99 X speedup on two Intel Xeon CPUs,89.75 X speedup on one Intel Xeon Phi,89.92 X speedup on one NVIDIA K20 GPU compared with serial CPU baseline.This study makes RTM-TTI practical in industry.Since the computation pattern in RTM is stencil,the approaches also benefit a wide range of stencil-based applications.展开更多
The major mechanism for ring chromosome formation is thought to result from breakage and reunion at the breakpoints on the long and short arms of a chromosome.This fusion event can produce terminal arm inversions,dele...The major mechanism for ring chromosome formation is thought to result from breakage and reunion at the breakpoints on the long and short arms of a chromosome.This fusion event can produce terminal arm inversions,deletions,and duplications that determine the resulting phenotype.[1] Ring chromosome 13 is relatively uncommon,with an estimated incidence of 1/58,000 live births.展开更多
The Distributed Shared Memory(DSM)architecture is widely used in today’s computer design to mitigate the ever-widening processing-memory gap,and it inevitably exhibits Non-Uniform Memory Access(NUMA)to shared-memory ...The Distributed Shared Memory(DSM)architecture is widely used in today’s computer design to mitigate the ever-widening processing-memory gap,and it inevitably exhibits Non-Uniform Memory Access(NUMA)to shared-memory parallel applications.Failure to adapt to the NUMA effect can significantly downgrade application performance,especially on today’s manycore platforms with tens to hundreds of cores.However,traditional approaches such as first-touch and memory policy fall short in false page-sharing,fragmentation,or ease of use.In this paper,we propose a partitioned shared-memory approach that allows multithreaded applications to achieve full NUMA-awareness with only minor code changes and develop an accompanying NUMA-aware heap manager which eliminates false page-sharing and minimizes fragmentation.Experiments on a 256-core cc-NUMA computing node show that the proposed approach helps applications to adapt to NUMA with only minor code changes and improves the performance of typical multithreaded scientific applications by up to 4.3 folds with the increased use of cores.展开更多
基金supported in part by the National Key Research and Development Program of China under Grant No.2018YFB1801503National Natural Science Foundation of China under Grants No.61931006,No.82071940,No.62101111,No.U20A20212,No.61921002,and No.U1930127+1 种基金Fundamental Research Funds for the Central Universities under Grants No.ZYGX2020ZB011 and No.ZYGX2019J013Medico-Engineering Cooperation Funds from University of Electronic Science and Technology of China under Grants No.ZYGX2021YGLH205 and No.ZYGX2021YGLH216.
文摘The microwave-induced thermoacoustic imaging(TAI)technology has both the advantages of high contrast of microwave imaging and high resolution of ultrasound imaging(UI),so it has carried out exploratory application research in various areas,such as the early detection of breast tumors and cerebrovascular diseases.However,the microwave generator used in the traditional TAI technology is huge and expensive,and the temporal resolution is also too low due to the single-element scanning mechanism.Thus,it is difficult to meet the needs of clinical applications.In this paper,the iterative process and the analysis of related application scenarios from single-element scanning to portable and array-based TAI,such as the miniaturized microwave generator,handheld antenna,multi-channel data acquisition,and UI/TAIdual-modality imaging,are reviewed,and the future trends of this technology are discussed.This review helps researchers in the field of TAI learn the technological development process and future trends.It also deepens clinicians’understanding of TAI so as to put forward more application requirements.
文摘连接是数据查询处理中最耗时、使用最频繁的操作之一,对提高连接操作的速率具有重要意义。阵列众核处理器是一类重要的众核处理器,具有强大的并行能力,可用来加速并行计算。基于阵列众核处理器的结构,设计和优化了一种高效的多层分区Hash连接算法。该算法通过多层划分的策略大大降低了主存访问次数,通过分区重排方法有效消除了数据倾斜的影响,获得了很高的性能。在异构融合阵列众核处理器DFMC(Deeply-Fused Many Core)原型系统上的实验结果表明,DFMC上多层分区Hash连接算法的性能是CPU-GPU耦合结构上最快的连接算法的8.0倍,表明利用阵列众核处理器加速数据查询应用具有优势。
基金Supported by the National Natural Science Foundation of China(No.61432018)
文摘3D reverse time migration in tiled transversly isotropic(3D RTM-TTI) is the most precise model for complex seismic imaging.However,vast computing time of 3D RTM-TTI prevents it from being widely used,which is addressed by providing parallel solutions for 3D RTM-TTI on multicores and many-cores.After data parallelism and memory optimization,the hot spot function of 3D RTMTTI gains 35.99 X speedup on two Intel Xeon CPUs,89.75 X speedup on one Intel Xeon Phi,89.92 X speedup on one NVIDIA K20 GPU compared with serial CPU baseline.This study makes RTM-TTI practical in industry.Since the computation pattern in RTM is stencil,the approaches also benefit a wide range of stencil-based applications.
文摘The major mechanism for ring chromosome formation is thought to result from breakage and reunion at the breakpoints on the long and short arms of a chromosome.This fusion event can produce terminal arm inversions,deletions,and duplications that determine the resulting phenotype.[1] Ring chromosome 13 is relatively uncommon,with an estimated incidence of 1/58,000 live births.
基金supported by the National Key Research and Development Program of China(No.2016YFB0201300)。
文摘The Distributed Shared Memory(DSM)architecture is widely used in today’s computer design to mitigate the ever-widening processing-memory gap,and it inevitably exhibits Non-Uniform Memory Access(NUMA)to shared-memory parallel applications.Failure to adapt to the NUMA effect can significantly downgrade application performance,especially on today’s manycore platforms with tens to hundreds of cores.However,traditional approaches such as first-touch and memory policy fall short in false page-sharing,fragmentation,or ease of use.In this paper,we propose a partitioned shared-memory approach that allows multithreaded applications to achieve full NUMA-awareness with only minor code changes and develop an accompanying NUMA-aware heap manager which eliminates false page-sharing and minimizes fragmentation.Experiments on a 256-core cc-NUMA computing node show that the proposed approach helps applications to adapt to NUMA with only minor code changes and improves the performance of typical multithreaded scientific applications by up to 4.3 folds with the increased use of cores.