期刊文献+
共找到623篇文章
< 1 2 32 >
每页显示 20 50 100
Efficient rock joint detection from large-scale 3D point clouds using vectorization and parallel computing approaches
1
作者 Yunfeng Ge Zihao Li +2 位作者 Huiming Tang Qian Chen Zhongxu Wen 《Geoscience Frontiers》 2025年第5期1-15,共15页
The application of three-dimensional(3D)point cloud parametric analyses on exposed rock surfaces,enabled by Light Detection and Ranging(LiDAR)technology,has gained significant popularity due to its efficiency and the ... The application of three-dimensional(3D)point cloud parametric analyses on exposed rock surfaces,enabled by Light Detection and Ranging(LiDAR)technology,has gained significant popularity due to its efficiency and the high quality of data it provides.However,as research extends to address more regional and complex geological challenges,the demand for algorithms that are both robust and highly efficient in processing large datasets continues to grow.This study proposes an advanced rock joint identification algorithm leveraging artificial neural networks(ANNs),incorporating parallel computing and vectorization of high-performance computing.The algorithm utilizes point cloud attributes—specifically point normal and point curvatures-as input parameters for ANNs,which classify data into rock joints and non-rock joints.Subsequently,individual rock joints are extracted using the density-based spatial clustering of applications with noise(DBSCAN)technique.Principal component analysis(PCA)is subsequently employed to calculate their orientations.By fully utilizing the computational power of parallel computing and vectorization,the algorithm increases the running speed by 3–4 times,enabling the processing of large-scale datasets within seconds.This breakthrough maximizes computational efficiency while maintaining high accuracy(compared with manual measurement,the deviation of the automatic measurement is within 2°),making it an effective solution for large-scale rock joint detection challenges.©2025 China University of Geosciences(Beijing)and Peking University. 展开更多
关键词 Rock joints Pointclouds Artificialneuralnetwork high-performance computing Parallel computing VECTORIZATION
在线阅读 下载PDF
Zuchongzhi-3 Sets New Benchmark with 105-Qubit Superconducting Quantum Processor
2
作者 LIU Danxu GE Shuyun WU Yuyang 《Bulletin of the Chinese Academy of Sciences》 2025年第1期55-56,共2页
A team of researchers from the University of Science and Technology of China(USTC)of the Chinese Academy of Sciences(CAS)and its partners have made significant advancements in random quantum circuit sampling with Zuch... A team of researchers from the University of Science and Technology of China(USTC)of the Chinese Academy of Sciences(CAS)and its partners have made significant advancements in random quantum circuit sampling with Zuchongzhi-3,a superconducting quantum computing prototype featuring 105 qubits and 182 couplers. 展开更多
关键词 quantum circuit sampling superconducting quantum computing prototype zuchongzhi superconducting quantum processor QUBITS COUPLERS
在线阅读 下载PDF
SW-DDFT: Parallel Optimization of the Dynamical Density Functional Theory Algorithm Based on Sunway Bluelight II Supercomputer
3
作者 Xiaoguang Lv Tao Liu +5 位作者 Han Qin Ying Guo Jingshan Pan Dawei Zhao Xiaoming Wu Meihong Yang 《Computers, Materials & Continua》 2025年第7期1417-1436,共20页
The Dynamical Density Functional Theory(DDFT)algorithm,derived by associating classical Density Functional Theory(DFT)with the fundamental Smoluchowski dynamical equation,describes the evolution of inhomo-geneous flui... The Dynamical Density Functional Theory(DDFT)algorithm,derived by associating classical Density Functional Theory(DFT)with the fundamental Smoluchowski dynamical equation,describes the evolution of inhomo-geneous fluid density distributions over time.It plays a significant role in studying the evolution of density distributions over time in inhomogeneous systems.The Sunway Bluelight II supercomputer,as a new generation of China’s developed supercomputer,possesses powerful computational capabilities.Porting and optimizing industrial software on this platform holds significant importance.For the optimization of the DDFT algorithm,based on the Sunway Bluelight II supercomputer and the unique hardware architecture of the SW39000 processor,this work proposes three acceleration strategies to enhance computational efficiency and performance,including direct parallel optimization,local-memory constrained optimization for CPEs,and multi-core groups collaboration and communication optimization.This method combines the characteristics of the program’s algorithm with the unique hardware architecture of the Sunway Bluelight II supercomputer,optimizing the storage and transmission structures to achieve a closer integration of software and hardware.For the first time,this paper presents Sunway-Dynamical Density Functional Theory(SW-DDFT).Experimental results show that SW-DDFT achieves a speedup of 6.67 times within a single-core group compared to the original DDFT implementation,with six core groups(a total of 384 CPEs),the maximum speedup can reach 28.64 times,and parallel efficiency can reach 71%,demonstrating excellent acceleration performance. 展开更多
关键词 Sunway supercomputer high-performance computing dynamical density functional theory parallel optimization
在线阅读 下载PDF
Analog Optical Computing for Artificial Intelligence 被引量:9
4
作者 Jiamin Wu Xing Lin +4 位作者 Yuchen Guo Junwei Liu Lu Fang Shuming Jiao Qionghai Dai 《Engineering》 SCIE EI 2022年第3期133-145,共13页
The rapid development of artificial intelligence(AI)facilitates various applications from all areas but also poses great challenges in its hardware implementation in terms of speed and energy because of the explosive ... The rapid development of artificial intelligence(AI)facilitates various applications from all areas but also poses great challenges in its hardware implementation in terms of speed and energy because of the explosive growth of data.Optical computing provides a distinctive perspective to address this bottleneck by harnessing the unique properties of photons including broad bandwidth,low latency,and high energy efficiency.In this review,we introduce the latest developments of optical computing for different AI models,including feedforward neural networks,reservoir computing,and spiking neural networks(SNNs).Recent progress in integrated photonic devices,combined with the rise of AI,provides a great opportunity for the renaissance of optical computing in practical applications.This effort requires multidisciplinary efforts from a broad community.This review provides an overview of the state-of-the-art accomplishments in recent years,discusses the availability of current technologies,and points out various remaining challenges in different aspects to push the frontier.We anticipate that the era of large-scale integrated photonics processors will soon arrive for practical AI applications in the form of hybrid optoelectronic frameworks. 展开更多
关键词 Artificial intelligence Optical computing Opto-electronic framework Neural network Neuromorphic computing Reservoir computing Photonics processor
在线阅读 下载PDF
MatDEM-fast matrix computing of the discrete element method 被引量:7
5
作者 Chun Liu Hui Liu Hongyong Zhang 《Earthquake Research Advances》 CSCD 2021年第3期1-7,共7页
Discrete element method can effectively simulate the discontinuity,inhomogeneity and large deformation and failure of rock and soil.Based on the innovative matrix computing of the discrete element method,the highperfo... Discrete element method can effectively simulate the discontinuity,inhomogeneity and large deformation and failure of rock and soil.Based on the innovative matrix computing of the discrete element method,the highperformance discrete element software MatDEM may handle millions of elements in one computer,and enables the discrete element simulation at the engineering scale.It supports heat calculation,multi-field and fluidsolid coupling numerical simulations.Furthermore,the software integrates pre-processing,solver,postprocessing,and powerful secondary development,allowing recompiling new discrete element software.The basic principles of the DEM,the implement and development of the MatDEM software,and its applications are introduced in this paper.The software and sample source code are available online(http://matdem.com). 展开更多
关键词 Discrete element method high-performance MatDEM Matrix computing
在线阅读 下载PDF
Optimization Task Scheduling Using Cooperation Search Algorithm for Heterogeneous Cloud Computing Systems 被引量:2
6
作者 Ahmed Y.Hamed M.Kh.Elnahary +1 位作者 Faisal S.Alsubaei Hamdy H.El-Sayed 《Computers, Materials & Continua》 SCIE EI 2023年第1期2133-2148,共16页
Cloud computing has taken over the high-performance distributed computing area,and it currently provides on-demand services and resource polling over the web.As a result of constantly changing user service demand,the ... Cloud computing has taken over the high-performance distributed computing area,and it currently provides on-demand services and resource polling over the web.As a result of constantly changing user service demand,the task scheduling problem has emerged as a critical analytical topic in cloud computing.The primary goal of scheduling tasks is to distribute tasks to available processors to construct the shortest possible schedule without breaching precedence restrictions.Assignments and schedules of tasks substantially influence system operation in a heterogeneous multiprocessor system.The diverse processes inside the heuristic-based task scheduling method will result in varying makespan in the heterogeneous computing system.As a result,an intelligent scheduling algorithm should efficiently determine the priority of every subtask based on the resources necessary to lower the makespan.This research introduced a novel efficient scheduling task method in cloud computing systems based on the cooperation search algorithm to tackle an essential task and schedule a heterogeneous cloud computing problem.The basic idea of thismethod is to use the advantages of meta-heuristic algorithms to get the optimal solution.We assess our algorithm’s performance by running it through three scenarios with varying numbers of tasks.The findings demonstrate that the suggested technique beats existingmethods NewGenetic Algorithm(NGA),Genetic Algorithm(GA),Whale Optimization Algorithm(WOA),Gravitational Search Algorithm(GSA),and Hybrid Heuristic and Genetic(HHG)by 7.9%,2.1%,8.8%,7.7%,3.4%respectively according to makespan. 展开更多
关键词 Heterogeneous processors cooperation search algorithm task scheduling cloud computing
在线阅读 下载PDF
Design and implementation of near-memory computing array architecture based on shared buffer 被引量:2
7
作者 SHAN Rui GAO Xu +3 位作者 FENG Yani HUI Chao CUI Xinyue CHAI Miaomiao 《High Technology Letters》 EI CAS 2022年第4期345-353,共9页
Deep learning algorithms have been widely used in computer vision,natural language processing and other fields.However,due to the ever-increasing scale of the deep learning model,the requirements for storage and compu... Deep learning algorithms have been widely used in computer vision,natural language processing and other fields.However,due to the ever-increasing scale of the deep learning model,the requirements for storage and computing performance are getting higher and higher,and the processors based on the von Neumann architecture have gradually exposed significant shortcomings such as consumption and long latency.In order to alleviate this problem,large-scale processing systems are shifting from a traditional computing-centric model to a data-centric model.A near-memory computing array architecture based on the shared buffer is proposed in this paper to improve system performance,which supports instructions with the characteristics of store-calculation integration,reducing the data movement between the processor and main memory.Through data reuse,the processing speed of the algorithm is further improved.The proposed architecture is verified and tested through the parallel realization of the convolutional neural network(CNN)algorithm.The experimental results show that at the frequency of 110 MHz,the calculation speed of a single convolution operation is increased by 66.64%on average compared with the CNN architecture that performs parallel calculations on field programmable gate array(FPGA).The processing speed of the whole convolution layer is improved by 8.81%compared with the reconfigurable array processor that does not support near-memory computing. 展开更多
关键词 near-memory computing shared buffer reconfigurable array processor convolutional neural network(CNN)
在线阅读 下载PDF
Challenges and reflections on exascale computing
8
作者 Yang Xuejun 《Engineering Sciences》 EI 2014年第3期17-22,共6页
This paper introduces the development of the exascale (10^18) computing. Though exascalc computing is a hot research direction worldwide, we are facing many challenges in the areas of memory wall, communica- tion wa... This paper introduces the development of the exascale (10^18) computing. Though exascalc computing is a hot research direction worldwide, we are facing many challenges in the areas of memory wall, communica- tion wall, reliability wall, power wall and scalability of parallel computing. According to these challenges, some thoughts and strategies are proposed. 展开更多
关键词 EXASCALE computing CENTRAL processing unit (CPU) storage WALL HETEROGENEOUS processor
在线阅读 下载PDF
Task Scheduling Optimization in Cloud Computing by Rao Algorithm
9
作者 A.Younes M.KhElnahary +1 位作者 Monagi H.Alkinani Hamdy H.El-Sayed 《Computers, Materials & Continua》 SCIE EI 2022年第9期4339-4356,共18页
Cloud computing is currently dominated within the space of highperformance distributed computing and it provides resource polling and ondemand services through the web.So,task scheduling problem becomes a very importa... Cloud computing is currently dominated within the space of highperformance distributed computing and it provides resource polling and ondemand services through the web.So,task scheduling problem becomes a very important analysis space within the field of a cloud computing environment as a result of user’s services demand modification dynamically.The main purpose of task scheduling is to assign tasks to available processors to produce minimum schedule length without violating precedence restrictions.In heterogeneous multiprocessor systems,task assignments and schedules have a significant impact on system operation.Within the heuristic-based task scheduling algorithm,the different processes will lead to a different task execution time(makespan)on a heterogeneous computing system.Thus,a good scheduling algorithm should be able to set precedence efficiently for every subtask depending on the resources required to reduce(makespan).In this paper,we propose a new efficient task scheduling algorithm in cloud computing systems based on RAO algorithm to solve an important task and schedule a heterogeneous multiple processing problem.The basic idea of this process is to exploit the advantages of heuristic-based algorithms to reduce space search and time to get the best solution.We evaluate our algorithm’s performance by applying it to three examples with a different number of tasks and processors.The experimental results show that the proposed approach significantly succeeded in finding the optimal solutions than others in terms of the time of task implementation. 展开更多
关键词 Heterogeneous processors RAO algorithm heuristic algorithms task scheduling MULTIPROCESSING cloud computing
在线阅读 下载PDF
Evaluation of the Application Benefit of Meteorological High Performance Computing Resources
10
作者 Min Wei Bin Wang 《Journal of Geoscience and Environment Protection》 2017年第7期153-160,共8页
The meteorological high-performance computing resource is the support platform for the weather forecast and climate prediction numerical model operation. The scientific and objective method to evaluate the application... The meteorological high-performance computing resource is the support platform for the weather forecast and climate prediction numerical model operation. The scientific and objective method to evaluate the application of meteorological high-performance computing resources can not only provide reference for the optimization of active resources, but also provide a quantitative basis for future resource construction and planning. In this paper, the concept of the utility value B and index compliance rate E of the meteorological high performance computing system are presented. The evaluation process, evaluation index and calculation method of the high performance computing resource application benefits are introduced. 展开更多
关键词 high-performance computing RESOURCES RESOURCE Application BENEFIT EVALUATION BENEFIT Value
暂未订购
API Development Increases Access to Shared Computing Resources at Boston University
11
作者 George Jones Amanda E. Wakefield +4 位作者 Jeff Triplett Kojo Idrissa James Goebel Dima Kozakov Sandor Vajda 《Journal of Software Engineering and Applications》 2022年第6期197-207,共11页
Within the last few decades, increases in computational resources have contributed enormously to the progress of science and engineering (S & E). To continue making rapid advancements, the S & E community must... Within the last few decades, increases in computational resources have contributed enormously to the progress of science and engineering (S & E). To continue making rapid advancements, the S & E community must be able to access computing resources. One way to provide such resources is through High-Performance Computing (HPC) centers. Many academic research institutions offer their own HPC Centers but struggle to make the computing resources easily accessible and user-friendly. Here we present SHABU, a RESTful Web API framework that enables S & E communities to access resources from Boston University’s Shared Computing Center (SCC). The SHABU requirements are derived from the use cases described in this work. 展开更多
关键词 API Framework Open Source high-performance computing Software Architecture Science and Engineering
在线阅读 下载PDF
Optimization Techniques for GPU-Based Parallel Programming Models in High-Performance Computing
12
作者 Shuntao Tang Wei Chen 《信息工程期刊(中英文版)》 2024年第1期7-11,共5页
This study embarks on a comprehensive examination of optimization techniques within GPU-based parallel programming models,pivotal for advancing high-performance computing(HPC).Emphasizing the transition of GPUs from g... This study embarks on a comprehensive examination of optimization techniques within GPU-based parallel programming models,pivotal for advancing high-performance computing(HPC).Emphasizing the transition of GPUs from graphic-centric processors to versatile computing units,it delves into the nuanced optimization of memory access,thread management,algorithmic design,and data structures.These optimizations are critical for exploiting the parallel processing capabilities of GPUs,addressingboth the theoretical frameworks and practical implementations.By integrating advanced strategies such as memory coalescing,dynamic scheduling,and parallel algorithmic transformations,this research aims to significantly elevate computational efficiency and throughput.The findings underscore the potential of optimized GPU programming to revolutionize computational tasks across various domains,highlighting a pathway towards achieving unparalleled processing power and efficiency in HPC environments.The paper not only contributes to the academic discourse on GPU optimization but also provides actionable insights for developers,fostering advancements in computational sciences and technology. 展开更多
关键词 Optimization Techniques GPU-Based Parallel Programming Models high-performance computing
在线阅读 下载PDF
面向含噪中规模量子处理器的量子机器学习 被引量:1
13
作者 石金晶 肖子萌 +2 位作者 王雯萱 张师超 李学龙 《计算机学报》 北大核心 2025年第3期602-631,共30页
量子计算与人工智能结合,在增强模型表达能力、加速和优化机器学习等方面可能产生颠覆性影响,有望突破人工智能领域所面临的可解释性差、最优解难等问题,量子人工智能已成为国内外重点关注的学科前沿。量子机器学习是量子人工智能领域... 量子计算与人工智能结合,在增强模型表达能力、加速和优化机器学习等方面可能产生颠覆性影响,有望突破人工智能领域所面临的可解释性差、最优解难等问题,量子人工智能已成为国内外重点关注的学科前沿。量子机器学习是量子人工智能领域的重要研究内容,它将量子计算基础理论与机器学习原理相结合,以实现具有量子加速的机器学习任务。随着量子计算软硬件的快速发展,含噪中规模量子(NISQ)处理器的学习优势被证明,国内外学者相继提出一系列量子机器学习方法,以挖掘量子计算助力人工智能技术发展的创新应用。然而,当前的量子机器学习仍局限于对算法的优化,缺乏系统层面的理论架构,仍有许多科学问题亟待解决。本文首先从量子机器学习系统表征角度出发,建立量子机器学习系统的层次模型,概括和总结了面向各类任务的量子机器学习方案,分析了量子机器学习在提高经典算法速度等方面可能体现的“量子优势”。接着根据量子机器学习系统的层次结构,从原理层、计算层、应用层这三个方面对现有量子机器学习方法进行了总结与梳理,系统性地分析和讨论了其中的关键问题与解决方案。最后,结合当前阶段量子人工智能的发展特点,重点分析了量子机器学习领域面临的科学问题与挑战,并对未来该领域的发展趋势进行了深入分析与展望。 展开更多
关键词 量子计算 量子人工智能 量子机器学习 量子算法 含噪中规模量子处理器
在线阅读 下载PDF
处理器数据预取器安全研究综述
14
作者 刘畅 黄祺霖 +4 位作者 刘煜川 林世鸿 秦中元 陈立全 吕勇强 《电子与信息学报》 北大核心 2025年第9期3038-3056,共19页
数据预取器是现代处理器用于提高性能的重要微架构组件。然而,由于在设计之初缺乏系统性的安全评估与考量,主流商用处理器中的预取器近年来被揭示出存在严重安全隐患,已被用于针对浏览器、操作系统和可信执行环境的侧信道攻击。面对这... 数据预取器是现代处理器用于提高性能的重要微架构组件。然而,由于在设计之初缺乏系统性的安全评估与考量,主流商用处理器中的预取器近年来被揭示出存在严重安全隐患,已被用于针对浏览器、操作系统和可信执行环境的侧信道攻击。面对这类新型微架构攻击,处理器安全研究亟需解决以下关键问题:如何系统性地分析攻击方法,全面认识预取器潜在风险,量化评估预取器安全程度,从而设计更加安全的数据预取器。为解决这些问题,该文系统调研了商用处理器中已知预取器设计及相关侧信道攻击,通过提取内存访问模式,为7种预取器建立行为模型,并基于此为20种侧信道攻击建立攻击模型,系统整理了各类攻击的触发条件和泄露信息,并分析可能存在的其他攻击方法。在此基础上,该文提出1套包含3个维度和24个指标的安全性评估体系,为数据预取器的安全性提供全面量化评估。最后,该文深入探讨了防御策略、安全预取器设计思路及未来研究方向。作为首篇聚焦于商用处理器数据预取器安全问题的综述性文章,该文有助于深入了解数据预取器面临的安全挑战,推动预取器的安全性量化评估体系构建,从而为设计更加安全的数据预取器提供指导。 展开更多
关键词 计算机体系结构 处理器 数据预取器 微架构安全 侧信道攻击
在线阅读 下载PDF
面向数据密集型应用的近数据处理架构设计 被引量:1
15
作者 谢洋 李晨 陈小文 《计算机工程与科学》 北大核心 2025年第5期797-810,共14页
大数据时代,多核处理器在处理数据密集型应用时,面临着数据局部性低、访存延迟高和内核计算效率低等挑战。近数据处理对于降低访存延迟、提高内核计算效率具有重要潜力。设计了一种计算访存松耦合的近数据处理架构(LcNDP),部署在多核处... 大数据时代,多核处理器在处理数据密集型应用时,面临着数据局部性低、访存延迟高和内核计算效率低等挑战。近数据处理对于降低访存延迟、提高内核计算效率具有重要潜力。设计了一种计算访存松耦合的近数据处理架构(LcNDP),部署在多核处理器的共享缓存端和内存端。一方面通过迁移内核的访存任务,实现内核计算与访存的并行,隐藏访存开销;另一方面通过近数据计算单元,处理流数据计算,降低内核计算量和访存开销。实验结果表明LcNDP相较于传统多核架构,平均延迟降低了43%,与传统近数据处理的多核架构相比平均延迟降低了23%。 展开更多
关键词 近数据 数据密集型应用 计算机体系结构 多核处理器
在线阅读 下载PDF
面向天河新一代超算系统的大规模精确对角化方法
16
作者 李彪 刘杰 王庆林 《计算机研究与发展》 北大核心 2025年第6期1347-1362,共16页
精确对角化(exact diagonalization)方法是一种在量子物理、凝聚态物理等领域广泛应用的数值计算方法,是最直接求得量子系统基态的数值方法.仅从哈密顿矩阵的对称性出发,利用无矩阵(matrix-free)方法、分层通信模型以及适配于MT-3000的... 精确对角化(exact diagonalization)方法是一种在量子物理、凝聚态物理等领域广泛应用的数值计算方法,是最直接求得量子系统基态的数值方法.仅从哈密顿矩阵的对称性出发,利用无矩阵(matrix-free)方法、分层通信模型以及适配于MT-3000的数据级并行算法,提出了面向天河新一代超算系统上的超大稀疏哈密顿矩阵向量乘异构并行算法,可以实现基于一维Hubbard模型的大规模精确对角化.提出的并行算法在天河新一代超算系统上进行了测试,其中在1400亿维度矩阵规模上,8192进程相比256进程强扩展效率为55.27%,而弱扩展到7300亿维度矩阵规模上,13740个进程相比64进程的弱扩展效率保持在51.25%以上. 展开更多
关键词 精确对角化 HUBBARD模型 异构并行计算 MT-3000处理器 量子多体系统
在线阅读 下载PDF
基于动态时序裕量压缩的高性能处理器设计
17
作者 连子涵 何卫锋 《计算机工程与科学》 北大核心 2025年第2期219-227,共9页
传统的同步电路设计方法根据静态时序分析得到的关键路径确定工作频率,但是关键路径并不是每个周期都会被激发,在关键路径和实际激发路径之间存在动态时序裕量。为此,提出了一种基于指令级时序裕量压缩的高性能处理器设计方法,旨在最大... 传统的同步电路设计方法根据静态时序分析得到的关键路径确定工作频率,但是关键路径并不是每个周期都会被激发,在关键路径和实际激发路径之间存在动态时序裕量。为此,提出了一种基于指令级时序裕量压缩的高性能处理器设计方法,旨在最大化压缩动态时序裕量从而获得性能提升。搭建了时序分析平台自动化获取指令时序;设计了一种时序编码策略,在不增加硬件开销的基础上将时序信息通过指令编码传递到硬件,并在硬件层设计了时序译码及仲裁电路,根据指令时序编码相应调节时钟周期,从而实现了指令级动态时序裕量压缩。在一款基于RISC-V指令集的超标量处理器上完成所提方法的仿真验证,结果表明,相比传统设计方法,通过该方法最高可获得31%的性能提升。 展开更多
关键词 时序裕量 高性能 处理器 RISC-V
在线阅读 下载PDF
高能效CNN加速器设计
18
作者 喇超 李淼 +1 位作者 张峰 张翠婷 《计算机科学与探索》 北大核心 2025年第9期2520-2531,共12页
当前,卷积神经网络(CNN)被广泛应用于图片分类、目标检测与识别以及自然语言理解等领域。随着卷积神经网络的复杂度和规模不断增加,对硬件部署带来了极大的挑战,尤其是面对嵌入式应用领域的低功耗、低时延需求,大多数现有平台存在高功... 当前,卷积神经网络(CNN)被广泛应用于图片分类、目标检测与识别以及自然语言理解等领域。随着卷积神经网络的复杂度和规模不断增加,对硬件部署带来了极大的挑战,尤其是面对嵌入式应用领域的低功耗、低时延需求,大多数现有平台存在高功耗、控制复杂的问题。为此,以优化加速器能效为目标,对决定系统能效的关键因素进行分析,以缩放计算精度和降低系统频率为主要出发点,研究极低比特下全网络统一量化方法,设计一种高能效CNN加速器MSNAP。该加速器以1比特权重和4比特激活值的轻量化计算单元为基础,构建了128×128空间并行加速阵列结构,由于空间并行度高,整个系统采用低运行频率。同时,采用权重固定、特征图广播的数据传播方式,有效减少权重、特征图的数据搬移次数,达到降低功耗、提高系统能效比的目的。通过22 nm工艺流片验证,结果表明,在20 MHz频率下,峰值算力达到10.54 TOPS,能效比达到64.317 TOPS/W,相较同类型加速器在采用CIFAR-10数据集的分类网络中,该加速器能效比有5倍的提升。部署的目标检测网络YOLO能够达到60 FPS的检测速率,完全满足嵌入式应用需求。 展开更多
关键词 加速器 卷积神经网络(CNN) 轻量化神经元计算单元(NCU) MSNAP 分支卷积量化(BCQ)
在线阅读 下载PDF
基于位操作的GNSS信号快速捕获算法研究
19
作者 任烨 吕心力 +2 位作者 田永和 杨少东 徐杰 《信息技术》 2025年第9期100-108,共9页
为提高基于DSP芯片的卫星导航信号的捕获速度,文中设计了一种基于位操作的快速捕获算法。首先,文中介绍了算法的流程和模块,并研究了利用位并行的方法实现信号捕获过程中码相位搜索过程的并行计算。其次,利用基于FFT的并行频率搜索实现... 为提高基于DSP芯片的卫星导航信号的捕获速度,文中设计了一种基于位操作的快速捕获算法。首先,文中介绍了算法的流程和模块,并研究了利用位并行的方法实现信号捕获过程中码相位搜索过程的并行计算。其次,利用基于FFT的并行频率搜索实现多普勒频率搜索过程的并行计算。该算法能够实现伪信号捕获搜索双重并行,提高信号捕获效率,同时保证信号捕获的灵敏度。最后,实验表明,该算法在单通道情况下对一颗北斗卫星的B1I信号的平均捕获时间为1.7s,相对于并行码相位搜索捕获算法的2.1s提高了19%。 展开更多
关键词 卫星导航系统 快速捕获 数字信号处理器 并行计算 位操作
在线阅读 下载PDF
Comprehensive review of innovative construction technology of high-core rockfill dam:Nuozhadu project
20
作者 Biao Liu Zongliang Zhang +4 位作者 Lei Yan Chao Liu Mingxin Wu Yuzhen Yu Daren Zhang 《River》 2025年第2期135-148,共14页
Underpinned by the ultrahigh-core rockfill dam at the Nuozhadu Hydropower Station,comprehensive studies and engineering practices have been conducted to address several critical challenges:coordination of seepage defo... Underpinned by the ultrahigh-core rockfill dam at the Nuozhadu Hydropower Station,comprehensive studies and engineering practices have been conducted to address several critical challenges:coordination of seepage deformation in dam materials,prevention and control of high-water-pressure seepage failure,static and dynamic deformation control,and construction quality monitoring.Advanced technologies have been developed for modifying impermeable soil materials and utilizing soft rocks.Constitutive models and high-performance fine computational methods for dam materials have been improved,along with innovative seismic safety measures.Additionally,a“Digital Dam”and an information system for monitoring the construction quality were implemented.These efforts ensured the successful construction of the Nuozhadu Dam,making it the tallest dam in China and the third tallest dam in the world upon completion.This achievement increased the height of core dams in China by 100 m and established a design and safety evaluation framework for ultrahigh-core rockfill dams exceeding 300 m in height.Furthermore,the current safety monitoring results indicate that the Nuozhadu Dam is safe and controllable. 展开更多
关键词 digital dam high-performance computational methods Nuozhadu hydropower station seismic safety ultrahigh-core rockfill dam
在线阅读 下载PDF
上一页 1 2 32 下一页 到第
使用帮助 返回顶部