期刊文献+
共找到29篇文章
< 1 2 >
每页显示 20 50 100
Communication delay-aware cooperative adaptive cruise control with dynamic network topologies——A convergence of communication and control 被引量:1
1
作者 Jihong Liu Yiqing Zhou Ling Liu 《Digital Communications and Networks》 2025年第1期191-199,共9页
Wireless communication-enabled Cooperative Adaptive Cruise Control(CACC)is expected to improve the safety and traffic capacity of vehicle platoons.Existing CACC considers a conventional communication delay with fixed ... Wireless communication-enabled Cooperative Adaptive Cruise Control(CACC)is expected to improve the safety and traffic capacity of vehicle platoons.Existing CACC considers a conventional communication delay with fixed Vehicular Communication Network(VCN)topologies.However,when the network is under attack,the communication delay may be much higher,and the stability of the system may not be guaranteed.This paper proposes a novel communication Delay Aware CACC with Dynamic Network Topologies(DADNT).The main idea is that for various communication delays,in order to maximize the traffic capacity while guaranteeing stability and minimizing the following error,the CACC should dynamically adjust the VCN network topology to achieve the minimum inter-vehicle spacing.To this end,a multi-objective optimization problem is formulated,and a 3-step Divide-And-Conquer sub-optimal solution(3DAC)is proposed.Simulation results show that with 3DAC,the proposed DADNT with CACC can reduce the inter-vehicle spacing by 5%,10%,and 14%,respectively,compared with the traditional CACC with fixed one-vehicle,two-vehicle,and three-vehicle look-ahead network topologies,thereby improving the traffic efficiency. 展开更多
关键词 Communication delay Cooperative adaptive Cruise control Network topology String stability
在线阅读 下载PDF
Quantum Circuit Implementation and Resource Evaluation of Ballet‑p/k Under Grover’s Attack
2
作者 HONG Rui-Peng ZHANG Lei +3 位作者 PANG Chen-Xu LI Guo-Yuan DING Ding WANG Jian-Xin 《密码学报(中英文)》 北大核心 2025年第5期1178-1194,共17页
The advent of Grover’s algorithm presents a significant threat to classical block cipher security,spurring research into post-quantum secure cipher design.This study engineers quantum circuit implementations for thre... The advent of Grover’s algorithm presents a significant threat to classical block cipher security,spurring research into post-quantum secure cipher design.This study engineers quantum circuit implementations for three versions of the Ballet family block ciphers.The Ballet‑p/k includes a modular-addition operation uncommon in lightweight block ciphers.Quantum ripple-carry adder is implemented for both“32+32”and“64+64”scale to support this operation.Subsequently,qubits,quantum gates count,and quantum circuit depth of three versions of Ballet algorithm are systematically evaluated under quantum computing model,and key recovery attack circuits are constructed based on Grover’s algorithm against each version.The comprehensive analysis shows:Ballet-128/128 fails to NIST Level 1 security,while when the resource accounting is restricted to the Clifford gates and T gates set for the Ballet-128/256 and Ballet-256/256 quantum circuits,the design attains Level 3. 展开更多
关键词 Grover’s algorithm quantum circuit Ballet family block ciphers quantum ripple-carry adder
在线阅读 下载PDF
Multi-Attribute and Multi-Point Cooperative Handover Strategy for LEO Satellite Communication Systems
3
作者 Li Hongguang Liu Yaoqi +2 位作者 Shi Jinglin Zhou Yiqing Qian Manli 《China Communications》 2026年第1期154-165,共12页
LEO satellite communication systems have the characteristics of high-speed and periodic movement.The handover of user link occurs frequently,which has a serious impact on user terminal application and system capacity.... LEO satellite communication systems have the characteristics of high-speed and periodic movement.The handover of user link occurs frequently,which has a serious impact on user terminal application and system capacity.To address this issue,we propose a handover strategy of LEO satellite user terminal based on multi-attribute and multi-point(MAMP)cooperation.Firstly,the satellite-user-time matrix is established by using the satellite constellation coverage and handover model.Then,combined with the visual time and signal quality,the user access matrix and satellite load matrix are extracted to determine the weight equation of the handover strategy with the channel reservation.According to the system modeling simulation,the algorithm improves the handover success rate by 2.5%,the lasted call access success rate by 3.2%,the load balancing degree by 20%,and the robustness by two orders of magnitude. 展开更多
关键词 HANDOVER LEO satellite load balancing MULTI-ATTRIBUTE
在线阅读 下载PDF
Design space exploration of neural network accelerator based on transfer learning
4
作者 吴豫章 ZHI Tian +1 位作者 SONG Xinkai LI Xi 《High Technology Letters》 EI CAS 2023年第4期416-426,共11页
With the increasing demand of computational power in artificial intelligence(AI)algorithms,dedicated accelerators have become a necessity.However,the complexity of hardware architectures,vast design search space,and c... With the increasing demand of computational power in artificial intelligence(AI)algorithms,dedicated accelerators have become a necessity.However,the complexity of hardware architectures,vast design search space,and complex tasks of accelerators have posed significant challenges.Tra-ditional search methods can become prohibitively slow if the search space continues to be expanded.A design space exploration(DSE)method is proposed based on transfer learning,which reduces the time for repeated training and uses multi-task models for different tasks on the same processor.The proposed method accurately predicts the latency and energy consumption associated with neural net-work accelerator design parameters,enabling faster identification of optimal outcomes compared with traditional methods.And compared with other DSE methods by using multilayer perceptron(MLP),the required training time is shorter.Comparative experiments with other methods demonstrate that the proposed method improves the efficiency of DSE without compromising the accuracy of the re-sults. 展开更多
关键词 design space exploration(DSE) transfer learning neural network accelerator multi-task learning
在线阅读 下载PDF
Improving scalability of sequential task flow models with cache-friendly parallel dependency tracking
5
作者 Xiran Gao Li Chen Xiaobing Feng 《CCF Transactions on High Performance Computing》 2026年第1期1-14,共14页
The Sequential Task Flow(STF)model guides task parallelism by dynamically analyzing data dependencies at runtime,making it well-suited to handle dynamic and irregular parallelism.However,it introduces additional depen... The Sequential Task Flow(STF)model guides task parallelism by dynamically analyzing data dependencies at runtime,making it well-suited to handle dynamic and irregular parallelism.However,it introduces additional dependency tracking overhead.As task granularity becomes increasingly fine-grained or hardware parallelism increases,the traditional Centralized TDG Building(CB)algorithm progressively becomes a performance bottleneck.The Parallel TDG Building algorithm with Helpers(PBH),which leverages hardware message-passing mechanisms,has achieved significant speedups on the SW26010 platform,but its intensive sub-microsecond irregular synchronizations make it difficult to scale on cache-coherent multicore platforms.This paper proposes Cache-friendly PBH(CPBH),a parallel dependency tracking algorithm optimized for cache-coherent architectures.CPBH introduces a locality-aware lock-free batch synchronization mechanism that reduces the overhead of atomic operation contention and improves data access locality.Additionally,it employs an asynchronous execution strategy to overlap dependency tracking and task graph execution using dynamic reference counting.Experiments on three cache-coherent multicore platforms using 10 HPC benchmarks demonstrate that CPBH achieves an average speedup exceeding 1.4×compared to CB and over 1.2×speedup compared to DDAST under fine-grained scenarios. 展开更多
关键词 High performance computing Cache-coherent platform Sequential task flow model Cache-friendly parallel dependency tracking algorithm
在线阅读 下载PDF
Computing over Space:Status,Challenges,and Opportunities 被引量:1
6
作者 Yaoqi Liu Yinhe Han +3 位作者 Hongxin Li Shuhao Gu Jibing Qiu Ting Li 《Engineering》 2025年第11期20-25,共6页
1.Introduction The rapid expansion of satellite constellations in recent years has resulted in the generation of massive amounts of data.This surge in data,coupled with diverse application scenarios,underscores the es... 1.Introduction The rapid expansion of satellite constellations in recent years has resulted in the generation of massive amounts of data.This surge in data,coupled with diverse application scenarios,underscores the escalating demand for high-performance computing over space.Computing over space entails the deployment of computational resources on platforms such as satellites to process large-scale data under constraints such as high radiation exposure,restricted power consumption,and minimized weight. 展开更多
关键词 satellite constellations deployment computational resources data processing space computing radiation exposure SPACE high performance computing power consumption
在线阅读 下载PDF
Joint jammer selection and power optimization in covert communications against a warden with uncertain locations 被引量:1
7
作者 Zhijun Han Yiqing Zhou +3 位作者 Yu Zhang Tong-Xing Zheng Ling Liu Jinglin Shi 《Digital Communications and Networks》 2025年第4期1113-1123,共11页
In covert communications,joint jammer selection and power optimization are important to improve performance.However,existing schemes usually assume a warden with a known location and perfect Channel State Information(... In covert communications,joint jammer selection and power optimization are important to improve performance.However,existing schemes usually assume a warden with a known location and perfect Channel State Information(CSI),which is difficult to achieve in practice.To be more practical,it is important to investigate covert communications against a warden with uncertain locations and imperfect CSI,which makes it difficult for legitimate transceivers to estimate the detection probability of the warden.First,the uncertainty caused by the unknown warden location must be removed,and the Optimal Detection Position(OPTDP)of the warden is derived which can provide the best detection performance(i.e.,the worst case for a covert communication).Then,to further avoid the impractical assumption of perfect CSI,the covert throughput is maximized using only the channel distribution information.Given this OPTDP based worst case for covert communications,the jammer selection,the jamming power,the transmission power,and the transmission rate are jointly optimized to maximize the covert throughput(OPTDP-JP).To solve this coupling problem,a Heuristic algorithm based on Maximum Distance Ratio(H-MAXDR)is proposed to provide a sub-optimal solution.First,according to the analysis of the covert throughput,the node with the maximum distance ratio(i.e.,the ratio of the distances from the jammer to the receiver and that to the warden)is selected as the friendly jammer(MAXDR).Then,the optimal transmission and jamming power can be derived,followed by the optimal transmission rate obtained via the bisection method.In numerical and simulation results,it is shown that although the location of the warden is unknown,by assuming the OPTDP of the warden,the proposed OPTDP-JP can always satisfy the covertness constraint.In addition,with an uncertain warden and imperfect CSI,the covert throughput provided by OPTDP-JP is 80%higher than the existing schemes when the covertness constraint is 0.9,showing the effectiveness of OPTDP-JP. 展开更多
关键词 Covert communications Uncertain warden Jammer selection Power optimization Throughput maximization
在线阅读 下载PDF
Inertial Navigation System Based Synthetic Aperture Radar RFID Localization for AGV
8
作者 Wu Jie Zhou Yiqing +1 位作者 Liu Ling Shi Jinglin 《China Communications》 2025年第12期15-29,共15页
Synthetic aperture radar(SAR)radio frequency identification(RFID)localization is widely used for automated guided vehicles(AGVs)in the industrial internet of things(IIoT).However,the AGV’s speeds are limited by the p... Synthetic aperture radar(SAR)radio frequency identification(RFID)localization is widely used for automated guided vehicles(AGVs)in the industrial internet of things(IIoT).However,the AGV’s speeds are limited by the phase difference(PD)of two neighboring readers.In this paper,an inertial navigation system(INS)based SAR RFID localization method(ISRL)where AGV moves nonlinearly.To relax the speed limitation,a new phase-unwrapping method based on the similarity of PDs(PU-SPD)is proposed to deal with the PD ambiguity when the AGV speed exceeds 60km/h.In localization,the gauss-newton algorithm(GN)is employed and an initial value estimation scheme based on variable substitution(IVE-VS)is proposed to improve its positioning accuracy and the convergence rate.Thus,ISRL is a combination of IVE-VS and GN.Moreover,the Cramer-Rao lower bound(CRLB)and the speed limitation is derived.Simulation results show that the ISRL can converge after two iterations,and the positioning accuracy can achieve 7.50cm at a phase noise levelσ=0.18,which is 35%better than the Hyperbolic unbiased estimation localization(HyUnb). 展开更多
关键词 first-order Taylor expansion phase ambiguity phase unwrapping radio frequency identification synthetic aperture radar
在线阅读 下载PDF
Shadow tomography of quantum states with prediction
9
作者 Jiyu JIANG Zongqi WAN +2 位作者 Tongyang LI Meiyue SHAO Jialin ZHANG 《Frontiers of Computer Science》 2025年第7期131-142,共12页
The shadow tomography problem introduced by[1]is an important problem in quantum computing.Given an unknown-qubit quantum state,the goal is to estimate tr■,...,tr■using as least copies of■as possible,within an addi... The shadow tomography problem introduced by[1]is an important problem in quantum computing.Given an unknown-qubit quantum state,the goal is to estimate tr■,...,tr■using as least copies of■as possible,within an additive error of,whereF1,...,FM are known-outcome measurements.In this paper,we consider the shadow tomography problem with a potentially inaccurate prediction■of the true state■.This corresponds to practical cases where we possess prior knowledge of the unknown state.For example,in quantum verification or calibration,we may be aware of the quantum state that the quantum device is expected to generate.However,the actual state it generates may have deviations.We introduce an algorithm with sample complexity■(nmax{■ε}log2M/ε4.In the generic case,even if the prediction can be arbitrarily bad,our algorithm has the same complexity as the best algorithm without prediction[2].At the same time,as the prediction quality improves,the sample complexity can be reduced smoothly to■(nlog2M/ε3)when the trace distance between the prediction and the unknown state is■(ε).Furthermore,we conduct numerical experiments to validate our theoretical analysis.The experiments are constructed to simulate noisy quantum circuits that reflect possible real scenarios in quantum verification or calibration.Notably,our algorithm outperforms the previous work without prediction in most settings. 展开更多
关键词 shadow tomography online learning quantum state learning FTRL quantum machine learning
原文传递
Optimization of the ParILUT-GPU algorithm
10
作者 Shaofeng Yang Zhi Li +2 位作者 Yunting Wang Xin He Guangming Tan 《CCF Transactions on High Performance Computing》 2026年第2期196-209,共14页
We have optimized the parallel threshold ILU algorithm(ParILUT)for GPUs.The optimizations are for three building blocks:candidate search and ILU residual computation,adding and removing elements,and threshold selectio... We have optimized the parallel threshold ILU algorithm(ParILUT)for GPUs.The optimizations are for three building blocks:candidate search and ILU residual computation,adding and removing elements,and threshold selection.Firstly,we fuse candidate search and ILU residual computation by modifying the ParILUT algorithm and extending the register-aware SpGEMM algorithm to calculate it.At the same time,we developed a GPU bin search algorithm to make the register-aware SpGEMM algorithm perform better in ParILUT.Secondly,we adopt a warp-row-parallel approach to add elements to new L and U and remove elements from candidates instead of the thread-row-parallel approach.And used the efficient GPU instructions to locate the positions of elements.Thirdly,we proposed a balanced classification tree in the threshold selection to balance the buckets’data,when a large number of elements with the same value.Finally,we experimented with the performance of each optimization and the whole ParILUT.And verified the correctness of the optimized ParILUT.The result indicates that the optimized ParILUT average speedup is 4.03 times over the original version,and the speedup increases with the amount of fill-in. 展开更多
关键词 Incomplete factorization preconditioners ParILUT Parallel threshold ILU GPU
在线阅读 下载PDF
A Parallel Discrete Event Simulation Engine for the Low-Earth-Orbit Satellite Constellation Networks
11
作者 Su Hailong Liu Yaoqi +3 位作者 Zhou Yiqing Shi Jinglin Li Hongguang Qian Manli 《China Communications》 SCIE CSCD 2024年第8期264-275,共12页
Low-Earth-Orbit satellite constellation networks(LEO-SCN)can provide low-cost,largescale,flexible coverage wireless communication services.High dynamics and large topological sizes characterize LEO-SCN.Protocol develo... Low-Earth-Orbit satellite constellation networks(LEO-SCN)can provide low-cost,largescale,flexible coverage wireless communication services.High dynamics and large topological sizes characterize LEO-SCN.Protocol development and application testing of LEO-SCN are challenging to carry out in a natural environment.Simulation platforms are a more effective means of technology demonstration.Currently available simulators have a single function and limited simulation scale.There needs to be a simulator for full-featured simulation.In this paper,we apply the parallel discrete-event simulation technique to the simulation of LEO-SCN to support large-scale complex system simulation at the packet level.To solve the problem that single-process programs cannot cope with complex simulations containing numerous entities,we propose a parallel mechanism and algorithms LP-NM and LP-YAWNS for synchronization.In the experiment,we use ns-3 to verify the acceleration ratio and efficiency of the above algorithms.The results show that our proposed mechanism can provide parallel simulation engine support for the LEO-SCN. 展开更多
关键词 CONSTELLATION low earth orbit satellite ns-3 null-message parallel discrete-event simulation
在线阅读 下载PDF
Quantum algorithm for Mastermind game
12
作者 Xiaoming Sun 《Science China(Physics,Mechanics & Astronomy)》 2026年第1期298-299,共2页
A novel quantum algorithm for the Mastermind game was proposed recently by a research team from Sun Yat-sen University to highlight the power of quantum computing.Mastermind is a popular code-breaking game between a c... A novel quantum algorithm for the Mastermind game was proposed recently by a research team from Sun Yat-sen University to highlight the power of quantum computing.Mastermind is a popular code-breaking game between a codemaker and a codebreaker.In the commercial version,the codemaker selects a secret sequence of four colored pegs(positions)from six possible colors. 展开更多
关键词 Sun Yat sen University quantum algorithm quantum computing quantum computingmastermind Mastermind game
原文传递
SYCL-MLU:unifying SIMT and SIMD in heterogeneous programming
13
作者 Runyu Zhou Yijin Li +4 位作者 Jiacheng Zhao Ziyang Wang En Shao Ziyan Xie Huimin Cui 《CCF Transactions on High Performance Computing》 2026年第1期94-106,共13页
With the rapid advancement of artificial intelligence and high-performance computing,heterogeneous computing platforms have evolved to encompass increasingly diverse architectures.While SYCL,an open standard for heter... With the rapid advancement of artificial intelligence and high-performance computing,heterogeneous computing platforms have evolved to encompass increasingly diverse architectures.While SYCL,an open standard for heterogeneous programming,has gained widespread adoption,its mainstream implementations(such as DPC++and AdaptiveCpp)primarily target SIMT-architecture devices like GPUs,presenting substantial challenges when adapting to specialized accelerators such as the Cambricon MLU,which employs a fundamentally different SIMD execution model.This cross-programming-model extension encounters two critical challenges:(1)bridging the programming abstraction gap between SIMT’s thread-level parallelism and SIMD’s data-level parallelism;and(2)harmonizing SYCL’s unified memory model with device-specific memory architectures.This paper proposes a novel cross-programming-model SYCL extension methodology to achieve full SYCL support for SIMD architectures,demonstrated through a comprehensive implementation for the Cambricon MLU platform.Our approach introduces MLU-specific vector programming interfaces while maintaining compatibility with the SYCL standard,enabling seamless integration of SIMD-based accelerators into the SYCL ecosystem.To validate our methodology,we integrated the extended SYCL-MLU implementation into PaddlePaddle’s CINN compiler,achieving a geometric mean performance improvement of 9.14%across representative neural networks,including ResNet,YOLOv3,and BERT.This research significantly broadens the application scope of SYCL in heterogeneous programming and provides a systematic methodology for extending SYCL to other SIMD-based hardware platforms. 展开更多
关键词 High performance computing Heterogeneous programming SYCL MLU CINN PaddlePaddle
在线阅读 下载PDF
FILL:a heterogeneous resource scheduling system addressing the low throughput problem in GROMACS
14
作者 Yueyuan Zhou ZiYi Ren +4 位作者 En Shao Lixian Ma Qiang Hu Leping Wang Guangming Tan 《CCF Transactions on High Performance Computing》 2024年第1期17-31,共15页
Despite advancements in computer hardware,the performance of GROMACS simulations has not exhibited significant improvement,primarily due to the inefficient utilization of substantial hardware resources.Enhancing resou... Despite advancements in computer hardware,the performance of GROMACS simulations has not exhibited significant improvement,primarily due to the inefficient utilization of substantial hardware resources.Enhancing resource utilization in GROMACS simulations can be achieved through effective resource scheduling when running multiple simulations concur-rently on a single computing node,particularly benefiting small-scale system simulations which are frequently employed.Previous research focused on co-running multiple GROMACS simulations through the utilization of time-slice technology.However,this approach introduced notable context-switching overhead and predominantly concentrated on optimizing GPU resources utilization,while neglecting the collaborative scheduling of heterogeneous CPU and GPU devices.Nowadays,various GPU vendors have introduced hardware partitioning technologies for spatial resources allocation,complementing traditional time-sharing techniques.Moreover,GROMACS operates as a heterogeneous computing application,alternating computations between the CPU and GPU devices.Notably,GPU utilization sometimes accounts for as little as 35%.Conse-quently,a comprehensive approach involving coordinated scheduling between both the GPU and CPU is imperative.To lever-age the potential of hardware partitioning technologies in alignment with GROMACS’runtime characteristics,we propose FILL:a resource scheduling system designed for co-running multiple GROMACS jobs.FILL employs space partitioning technology to effectively allocate hardware resources and facilitates collaborative scheduling of CPU and GPU resources,thereby ensuring precise and deterministic allocation of GROMACS job resources.The scheduling aims to improve system throughput while considering the turnaround time of simulations.Implemented on servers equipped with NVIDIA and AMD GPUs,FILL has showcased noteworthy advancements in system throughput.On NVIDIA GPU servers,FILL achieved an impressive improvement of up to 167%compared to the baseline approach and an astonishing boost of 27,928%compared to state-of-the-art alternatives.Similarly,on AMD GPU servers,FILL demonstrated significant enhancements of 459%and 24%over the baseline and state-of-the-art methods,respectively.These remarkable results validate the effectiveness of FILL in optimizing system throughput for multiple GROMACS simulations. 展开更多
关键词 Resource scheduling Co-running GROMACS System throughput
在线阅读 下载PDF
An interpretable DeePMD‑kit performance model for emerging supercomputers
15
作者 Xiangyu Meng Xun Wang +2 位作者 Mingzhen Li Guangming Tan Weile Jia 《CCF Transactions on High Performance Computing》 2025年第2期155-168,共14页
Deep potential(DP)scheme has increased the simulation temporal and spatial scales while maintaining the ab initio accuracy of the molecular dynamics.DeePMD-kit is an outstanding application that implements DP scheme e... Deep potential(DP)scheme has increased the simulation temporal and spatial scales while maintaining the ab initio accuracy of the molecular dynamics.DeePMD-kit is an outstanding application that implements DP scheme efficiently.However,current performance model cannot accurately measure the resource utilization of DeePMD-kit operators and predict the execution time.We introduce DP-perf,an interpretable performance model for DeePMD-kit.DP-perf can accurately measure the resource utilization of the individual DeePMD-kit operators,communication pattern,and the overall application by exploiting physical system properties and machine configurations.It can be easily applied to mainstream supercomputers including Tianhe-3F,the new Sunway,Fugaku,and Summit.With DP-perf,users can select the optimal machine and decide the corresponding configuration for various purposes(e.g.,lower cost,less time)without real runs.Evaluation of four top supercomputers shows that DP-perf can fit overall execution time with a low mean absolute percentage error of 5.7%/8.1%/14.3%/13.1%on Tianhe-3F/new Sunway/Fugaku/Summit.On the prediction scenario,DP-perf can predict the total execution time with a mean absolute percentage error of less than 20%. 展开更多
关键词 Performance model Molecular dynamics Deep potential Performance optimization
在线阅读 下载PDF
FASS-pruner:customizing a fine-grained CNN accelerator-aware pruning framework via intra-filter splitting and inter-filter shuffling
16
作者 Xiaohui Wei Xinyang Zheng +2 位作者 Chenyang Wang Guangli Li Hengshan Yue 《CCF Transactions on High Performance Computing》 2023年第3期292-303,共12页
Nowadays,with the increasing depth of CNNs,the number of computation and storage requirements with weights expands significantly,preventing their wide deployment on resource-constrained application scenarios such as e... Nowadays,with the increasing depth of CNNs,the number of computation and storage requirements with weights expands significantly,preventing their wide deployment on resource-constrained application scenarios such as embedded systems.To improve the efficiency of the current deep CNN inference stage,researchers have attempted to explore weight pruning techniques on CNN accelerators(e.g.,systolic arrays)to avoid the number of unimportant weights storage and computation.However,these attempts either suffer expensive extra hardware costs to encode/decode the irregular sparse weight pattern on accelerators or bring finite performance improvement due to structured pruning’s modest compression ratio.In order to address the above challenge,this paper proposes FASS-Pruner,a Fine-grained Accelerator-aware pruning framework via intra-filter Splitting and inter-filter Shuffling:(1)Considering the round-by-round execution behavior of CNN accelerator,FASS-Pruner split filters into multiple rounds to perform column-wise-weight pruning;(2)Leveraging the calculation independence characteristics across filters on CNN accelerators,FASS-Pruner shuffles the filters to prune the unimportant rowwise weights at CNN accelerator.Combining the sparse pattern of pruned CNN and the dataflow of systolic array,we modify the systolic array-based accelerator to enable it to execute pruned sparse CNN with better performance and lower energy consumption.By condensing the pruned sparse weights in systolic arrays,FASS-Pruner achieves a comparable pruning ratio while preserving the original data flow of CNN accelerators,thereby achieving significant performance and energy saving. 展开更多
关键词 CNN accelerator Model pruning Hardware-software co-design
在线阅读 下载PDF
From organoids to organoids-on-a-chip:Current applications and challenges in biomedical research 被引量:3
17
作者 Kailun Liu Xiaowei Chen +3 位作者 Zhen Fan Fei Ren Jing Liu Baoyang Hu 《Chinese Medical Journal》 2025年第7期792-807,共16页
The high failure rates in clinical drug development based on animal models highlight the urgent need for more representative human models in biomedical research.In response to this demand,organoids and organ chips wer... The high failure rates in clinical drug development based on animal models highlight the urgent need for more representative human models in biomedical research.In response to this demand,organoids and organ chips were integrated for greater physiological relevance and dynamic,controlled experimental conditions.This innovative platform—the organoids-on-a-chip technology—shows great promise in disease modeling,drug discovery,and personalized medicine,attracting interest from researchers,clinicians,regulatory authorities,and industry stakeholders.This review traces the evolution from organoids to organoids-on-a-chip,driven by the necessity for advanced biological models.We summarize the applications of organoids-on-a-chip in simulating physiological and pathological phenotypes and therapeutic evaluation of this technology.This section highlights how integrating technologies from organ chips,such as microfluidic systems,mechanical stimulation,and sensor integration,optimizes organoid cell types,spatial structure,and physiological functions,thereby expanding their biomedical applications.We conclude by addressing the current challenges in the development of organoids-on-a-chip and offering insights into the prospects.The advancement of organoids-on-a-chip is poised to enhance fidelity,standardization,and scalability.Furthermore,the integration of cutting-edge technologies and interdisciplinary collaborations will be crucial for the progression of organoids-on-a-chip technology. 展开更多
关键词 Organoids-on-a-chip ORGANOIDS Organ-on-a-chip Drug testing Disease modeling
原文传递
Grover’s search finds new applications in continuous optimization and spectral analysis
18
作者 Xiaoming Sun 《Science China(Physics,Mechanics & Astronomy)》 2025年第6期212-213,共2页
A novel quantum search algorithm tailored for continuous optimization and spectral problems was proposed recently by a research team from the University of Electronic Science and Technology of China to broaden quantum... A novel quantum search algorithm tailored for continuous optimization and spectral problems was proposed recently by a research team from the University of Electronic Science and Technology of China to broaden quantum computation frontiers and enrich its application landscape.Quantum computing has traditionally excelled at tackling discrete search challenges,but many important applications from large-scale optimization to advanced physics simulations necessitate searching through continuous domains.These continuous search problems involve uncountably infinite solution spaces and bring about computational complexities far beyond those faced in conventional discrete settings.This draft,titled“Fixed-Point Quantum Continuous Search Algorithm with Optimal Query Complexity”,takes on the core challenge of performing search tasks in domains that may be uncountably infinite,offering theoretical and practical insights into achieving quantum speedups in such settings[1]. 展开更多
关键词 advanced physics simulations discrete search challengesbut quantum computation optimization spectral problems spectral analysis fixed point algorithm quantum search algorithm continuous optimization
原文传递
Assessing and Understanding Creativity in Large Language Models
19
作者 Yunpu Zhao Rui Zhang +1 位作者 Wenyi Li Ling Li 《Machine Intelligence Research》 2025年第3期417-436,共20页
In the field of natural language processing,the rapid development of large language model(LLM)has attracted increasing attention.LLMs have shown a high level of creativity in various tasks,but the methods for assessin... In the field of natural language processing,the rapid development of large language model(LLM)has attracted increasing attention.LLMs have shown a high level of creativity in various tasks,but the methods for assessing such creativity are inadequate.Assessment of LLM creativity needs to consider differences from humans,requiring multiple dimensional measurement while balancing accuracy and efficiency.This paper aims to establish an efficient framework for assessing the level of creativity in LLMs.By adapting the modified Torrance tests of creative thinking,the research evaluates the creative performance of various LLMs across 7 tasks,emphasizing 4 criteria including fluency,flexibility,originality,and elaboration.In this context,we develop a comprehensive dataset of 700 questions for testing and an LLM-based evaluation method.In addition,this study presents a novel analysis of LLMs'responses to diverse prompts and role-play situations.We found that the creativity of LLMs primarily falls short in originality,while excelling in elaboration.In addition,the use of prompts and role-play settings of the model significantly influence creativity.Additionally,the experimental results also indicate that collaboration among multiple LLMs can enhance originality.Notably,our findings reveal a consensus between human evaluations and LLMs regarding the personality traits that influence creativity.The findings underscore the significant impact of LLM design on creativity and bridge artificial intelligence and human creativity,offering insights into LLMs'creativity and potential applications. 展开更多
关键词 Large language models(LLMs) creativity assessment prompt engineering cognitive psychology divergent thinking
原文传递
Deterministic streaming algorithms for non-monotone submodular maximization
20
作者 Xiaoming SUN Jialin ZHANG Shuo ZHANG 《Frontiers of Computer Science》 2025年第6期103-114,共12页
Submodular maximization is a significant area of interest in combinatorial optimization.It has various real-world applications.In recent years,streaming algorithms for submodular maximization have gained attention,all... Submodular maximization is a significant area of interest in combinatorial optimization.It has various real-world applications.In recent years,streaming algorithms for submodular maximization have gained attention,allowing realtime processing of large data sets by examining each piece of data only once.However,most of the current state-of-the-art algorithms are only applicable to monotone submodular maximization.There are still significant gaps in the approximation ratios between monotone and non-monotone objective functions.In this paper,we propose a streaming algorithm framework for non-monotone submodular maximization and use this framework to design deterministic streaming algorithms for the d-knapsack constraint and the knapsack constraint.Our 1-pass streaming algorithm for the d-knapsack constraint has a 1/4(d+1)-∈approximation ratio,using O(BlogB/∈)memory,and O(logB/∈)query time per element,where B=MIN(n,b)is the maximum number of elements that the knapsack can store.As a special case of the d-knapsack constraint,we have the 1-pass streaming algorithm with a 1/8-∈approximation ratio to the knapsack constraint.To our knowledge,there is currently no streaming algorithm for this constraint when the objective function is non-monotone,even when d=1.In addition,we propose a multi-pass streaming algorithm with 1/6-∈approximation,which stores O(B)elements. 展开更多
关键词 submodular maximization streaming algorithms cardinality constraint knapsack constraint
原文传递
上一页 1 2 下一页 到第
使用帮助 返回顶部