In this study,we investigate the ef-ficacy of a hybrid parallel algo-rithm aiming at enhancing the speed of evaluation of two-electron repulsion integrals(ERI)and Fock matrix generation on the Hygon C86/DCU(deep compu...In this study,we investigate the ef-ficacy of a hybrid parallel algo-rithm aiming at enhancing the speed of evaluation of two-electron repulsion integrals(ERI)and Fock matrix generation on the Hygon C86/DCU(deep computing unit)heterogeneous computing platform.Multiple hybrid parallel schemes are assessed using a range of model systems,including those with up to 1200 atoms and 10000 basis func-tions.The findings of our research reveal that,during Hartree-Fock(HF)calculations,a single DCU ex-hibits 33.6 speedups over 32 C86 CPU cores.Compared with the efficiency of Wuhan Electronic Structure Package on Intel X86 and NVIDIA A100 computing platform,the Hygon platform exhibits good cost-effective-ness,showing great potential in quantum chemistry calculation and other high-performance scientific computations.展开更多
Federated learning(FL)is a distributed machine learning paradigm for edge cloud computing.FL can facilitate data-driven decision-making in tactical scenarios,effectively addressing both data volume and infrastructure ...Federated learning(FL)is a distributed machine learning paradigm for edge cloud computing.FL can facilitate data-driven decision-making in tactical scenarios,effectively addressing both data volume and infrastructure challenges in edge environments.However,the diversity of clients in edge cloud computing presents significant challenges for FL.Personalized federated learning(pFL)received considerable attention in recent years.One example of pFL involves exploiting the global and local information in the local model.Current pFL algorithms experience limitations such as slow convergence speed,catastrophic forgetting,and poor performance in complex tasks,which still have significant shortcomings compared to the centralized learning.To achieve high pFL performance,we propose FedCLCC:Federated Contrastive Learning and Conditional Computing.The core of FedCLCC is the use of contrastive learning and conditional computing.Contrastive learning determines the feature representation similarity to adjust the local model.Conditional computing separates the global and local information and feeds it to their corresponding heads for global and local handling.Our comprehensive experiments demonstrate that FedCLCC outperforms other state-of-the-art FL algorithms.展开更多
Graph computing has become pervasive in many applications due to its capacity to represent complex relationships among different objects in the big data era.However,general-purpose architectures are computationally in...Graph computing has become pervasive in many applications due to its capacity to represent complex relationships among different objects in the big data era.However,general-purpose architectures are computationally inefficient for graph algorithms,and dedicated architectures can provide high efficiency,but lack flexibility.To address these challenges,this paper proposes ParaGraph,a reduced instruction set computing-five(RISC-V)-based software-hardware co-designed graph computing accelerator that can process graph algorithms in parallel,and also establishes a performance evaluation model to assess the efficiency of co-acceleration.ParaGraph handles parallel processing of typical graph algorithms on the hardware side,while performing overall functional control on the software side with custom designed instructions.ParaGraph is verified on the XCVU440 field-programmable gate array(FPGA)board with E203,a RISC-V processor.Compared with current mainstream graph computing accelerators,ParaGraph consumes 7.94%less block RAM(BRAM)resources than ThunderGP.Its power consumption is reduced by 86.90%,24.90%,and 76.38%compared with ThunderGP,HitGraph,and GraphS,respectively.The power efficiency of connected components(CC)and degree centrality(DC)algorithms is improved by an average of 6.50 times over ThunderGP,2.51 times over HitGraph,and 3.99 times over GraphS.The software-hardware co-design acceleration performance indicators H/W.Cap for CC and DC are 13.02 and 14.02,respectively.展开更多
碳捕集与封存(Carbon capture and storage, CCS)的投资决策研究大多聚焦于单一企业的不足,从燃煤电厂角度出发,描述了两家投资主体参与市场竞争的“双寡头”情况;同时,考虑了碳价和技术创新双重不确定的影响,将碳配额和政府补贴作为鼓...碳捕集与封存(Carbon capture and storage, CCS)的投资决策研究大多聚焦于单一企业的不足,从燃煤电厂角度出发,描述了两家投资主体参与市场竞争的“双寡头”情况;同时,考虑了碳价和技术创新双重不确定的影响,将碳配额和政府补贴作为鼓励投资的激励政策,构建了CCS改造投资的实物期权评价模型。通过逆向归纳法,分别得出垄断情况和双寡头情况下的投资价值和投资临界值。研究表明:抢占投资会造成投资者的短视行为,碳价波动率、碳捕获率、技术创新幅度等参数的增大会减缓投资,政府补贴和技术创新概率的增大则会加速投资。展开更多
基金supported by the National Natural Science Foundation of China(No.22373112 to Ji Qi,No.22373111 and 21921004 to Minghui Yang)GH-fund A(No.202107011790)。
文摘In this study,we investigate the ef-ficacy of a hybrid parallel algo-rithm aiming at enhancing the speed of evaluation of two-electron repulsion integrals(ERI)and Fock matrix generation on the Hygon C86/DCU(deep computing unit)heterogeneous computing platform.Multiple hybrid parallel schemes are assessed using a range of model systems,including those with up to 1200 atoms and 10000 basis func-tions.The findings of our research reveal that,during Hartree-Fock(HF)calculations,a single DCU ex-hibits 33.6 speedups over 32 C86 CPU cores.Compared with the efficiency of Wuhan Electronic Structure Package on Intel X86 and NVIDIA A100 computing platform,the Hygon platform exhibits good cost-effective-ness,showing great potential in quantum chemistry calculation and other high-performance scientific computations.
基金supported by the Natural Science Foundation of Xinjiang Uygur Autonomous Region(Grant No.2022D01B 187)。
文摘Federated learning(FL)is a distributed machine learning paradigm for edge cloud computing.FL can facilitate data-driven decision-making in tactical scenarios,effectively addressing both data volume and infrastructure challenges in edge environments.However,the diversity of clients in edge cloud computing presents significant challenges for FL.Personalized federated learning(pFL)received considerable attention in recent years.One example of pFL involves exploiting the global and local information in the local model.Current pFL algorithms experience limitations such as slow convergence speed,catastrophic forgetting,and poor performance in complex tasks,which still have significant shortcomings compared to the centralized learning.To achieve high pFL performance,we propose FedCLCC:Federated Contrastive Learning and Conditional Computing.The core of FedCLCC is the use of contrastive learning and conditional computing.Contrastive learning determines the feature representation similarity to adjust the local model.Conditional computing separates the global and local information and feeds it to their corresponding heads for global and local handling.Our comprehensive experiments demonstrate that FedCLCC outperforms other state-of-the-art FL algorithms.
基金Supported by the National Key R&D Program of China(No.2022ZD0119001)the National Natural Science Foundation of China(No.61834005)+1 种基金the Shaanxi Province Key R&D Plan(No.2022GY-027,2021GY-029)the Key Scientific Research Project of Shaanxi Department of Education(No.22JY060).
文摘Graph computing has become pervasive in many applications due to its capacity to represent complex relationships among different objects in the big data era.However,general-purpose architectures are computationally inefficient for graph algorithms,and dedicated architectures can provide high efficiency,but lack flexibility.To address these challenges,this paper proposes ParaGraph,a reduced instruction set computing-five(RISC-V)-based software-hardware co-designed graph computing accelerator that can process graph algorithms in parallel,and also establishes a performance evaluation model to assess the efficiency of co-acceleration.ParaGraph handles parallel processing of typical graph algorithms on the hardware side,while performing overall functional control on the software side with custom designed instructions.ParaGraph is verified on the XCVU440 field-programmable gate array(FPGA)board with E203,a RISC-V processor.Compared with current mainstream graph computing accelerators,ParaGraph consumes 7.94%less block RAM(BRAM)resources than ThunderGP.Its power consumption is reduced by 86.90%,24.90%,and 76.38%compared with ThunderGP,HitGraph,and GraphS,respectively.The power efficiency of connected components(CC)and degree centrality(DC)algorithms is improved by an average of 6.50 times over ThunderGP,2.51 times over HitGraph,and 3.99 times over GraphS.The software-hardware co-design acceleration performance indicators H/W.Cap for CC and DC are 13.02 and 14.02,respectively.
文摘碳捕集与封存(Carbon capture and storage, CCS)的投资决策研究大多聚焦于单一企业的不足,从燃煤电厂角度出发,描述了两家投资主体参与市场竞争的“双寡头”情况;同时,考虑了碳价和技术创新双重不确定的影响,将碳配额和政府补贴作为鼓励投资的激励政策,构建了CCS改造投资的实物期权评价模型。通过逆向归纳法,分别得出垄断情况和双寡头情况下的投资价值和投资临界值。研究表明:抢占投资会造成投资者的短视行为,碳价波动率、碳捕获率、技术创新幅度等参数的增大会减缓投资,政府补贴和技术创新概率的增大则会加速投资。