The paper describes an efficient direct method to solve an equation Ax = b, where A is a sparse matrix, on the Intel®Xeon PhiTM coprocessor. The main challenge for such a system is how to engage all available ...The paper describes an efficient direct method to solve an equation Ax = b, where A is a sparse matrix, on the Intel®Xeon PhiTM coprocessor. The main challenge for such a system is how to engage all available threads (about 240) and how to reduce OpenMP* synchronization overhead, which is very expensive for hundreds of threads. The method consists of decomposing A into a product of lower-triangular, diagonal, and upper triangular matrices followed by solves of the resulting three subsystems. The main idea is based on the hybrid parallel algorithm used in the Intel®Math Kernel Library Parallel Direct Sparse Solver for Clusters [1]. Our implementation exploits a static scheduling algorithm during the factorization step to reduce OpenMP synchronization overhead. To effectively engage all available threads, a three-level approach of parallelization is used. Furthermore, we demonstrate that our implementation can perform up to 100 times better on factorization step and up to 65 times better in terms of overall performance on the 240 threads of the Intel®Xeon PhiTM coprocessor.展开更多
This paper describes a method of calculating the Schur complement of a sparse positive definite matrix A. The main idea of this approach is to represent matrix A in the form of an elimination tree using a reordering a...This paper describes a method of calculating the Schur complement of a sparse positive definite matrix A. The main idea of this approach is to represent matrix A in the form of an elimination tree using a reordering algorithm like METIS and putting columns/rows for which the Schur complement is needed into the top node of the elimination tree. Any problem with a degenerate part of the initial matrix can be resolved with the help of iterative refinement. The proposed approach is close to the “multifrontal” one which was implemented by Ian Duff and others in 1980s. Schur complement computations described in this paper are available in Intel®Math Kernel Library (Intel®MKL). In this paper we present the algorithm for Schur complement computations, experiments that demonstrate a negligible increase in the number of elements in the factored matrix, and comparison with existing alternatives.展开更多
With the increasing importance of cloud services worldwide, the cloud infrastructure and platform management has become critical for cloud service providers. In this paper, a novel architecture of intelligent server m...With the increasing importance of cloud services worldwide, the cloud infrastructure and platform management has become critical for cloud service providers. In this paper, a novel architecture of intelligent server management framework is proposed. In this framework, the communication layer is based on the Extensible Messaging and Presence Protocol (XMPP), which was developed for instant messaging and has been proven to be highly mature and suitable for mobile and large scalable deployment due to its extensibility and efficiency. The proposed architecture can simplify server management and increase flexibility and scalability when managing hundreds of thousands of servers in the cloud era.展开更多
Artificial intelligence(AI)algorithms achieve outstanding results in many applicationdomains such as computer vision and natural language processing The performance ofAl models is the outcome of complex and costly mod...Artificial intelligence(AI)algorithms achieve outstanding results in many applicationdomains such as computer vision and natural language processing The performance ofAl models is the outcome of complex and costly model architecture design and trainingprocesses.Hence,it is paramount for model owners to protect their AI models frompiracy-model cloning,illegitimate distribution and use.IP protection mechanisms havebeen applied to Al models,and in particular to deep neural networks,to verify themodel ownership.State-of-the-art AI model ownership protection techniques have beensurveyed.The pros and cons of Al model ownership protection have been reported.The majonity of previous works are focused on watermarking,while more advancedmethods such fingerprinting and attestation are promising but not yet explored indepth.This study has been concluded by discussing possible research directions in thearea.展开更多
面对新工业革命背景下算力复杂性上升与定制化需求加剧的挑战,开源硬件正成为打破封闭架构限制、增强技术自主可控能力的重要途径.重点关注了以RISC-Ⅴ(Reduced Instruction Set ComputerFive)为代表的开源指令集架构,系统梳理了其生态...面对新工业革命背景下算力复杂性上升与定制化需求加剧的挑战,开源硬件正成为打破封闭架构限制、增强技术自主可控能力的重要途径.重点关注了以RISC-Ⅴ(Reduced Instruction Set ComputerFive)为代表的开源指令集架构,系统梳理了其生态优势和产业价值;同时比较了国内外主要开源项目在设计开放性、系统灵活性及协同创新机制方面的不同特点;从时间维度展开分析,可以明确开源硬件从底层架构创新逐步走向异构融合和场景拓展的发展趋势.研究表明,开源硬件在智能制造、边缘计算、沉浸式终端等关键领域有着广阔的应用前景,能够有效提升算力利用效率,降低开发难度和系统成本.开源硬件正推动芯片设计从封闭模式向共享模式转变,为工业智能化升级和技术安全战略提供新的支撑.展开更多
文摘The paper describes an efficient direct method to solve an equation Ax = b, where A is a sparse matrix, on the Intel®Xeon PhiTM coprocessor. The main challenge for such a system is how to engage all available threads (about 240) and how to reduce OpenMP* synchronization overhead, which is very expensive for hundreds of threads. The method consists of decomposing A into a product of lower-triangular, diagonal, and upper triangular matrices followed by solves of the resulting three subsystems. The main idea is based on the hybrid parallel algorithm used in the Intel®Math Kernel Library Parallel Direct Sparse Solver for Clusters [1]. Our implementation exploits a static scheduling algorithm during the factorization step to reduce OpenMP synchronization overhead. To effectively engage all available threads, a three-level approach of parallelization is used. Furthermore, we demonstrate that our implementation can perform up to 100 times better on factorization step and up to 65 times better in terms of overall performance on the 240 threads of the Intel®Xeon PhiTM coprocessor.
文摘This paper describes a method of calculating the Schur complement of a sparse positive definite matrix A. The main idea of this approach is to represent matrix A in the form of an elimination tree using a reordering algorithm like METIS and putting columns/rows for which the Schur complement is needed into the top node of the elimination tree. Any problem with a degenerate part of the initial matrix can be resolved with the help of iterative refinement. The proposed approach is close to the “multifrontal” one which was implemented by Ian Duff and others in 1980s. Schur complement computations described in this paper are available in Intel®Math Kernel Library (Intel®MKL). In this paper we present the algorithm for Schur complement computations, experiments that demonstrate a negligible increase in the number of elements in the factored matrix, and comparison with existing alternatives.
文摘With the increasing importance of cloud services worldwide, the cloud infrastructure and platform management has become critical for cloud service providers. In this paper, a novel architecture of intelligent server management framework is proposed. In this framework, the communication layer is based on the Extensible Messaging and Presence Protocol (XMPP), which was developed for instant messaging and has been proven to be highly mature and suitable for mobile and large scalable deployment due to its extensibility and efficiency. The proposed architecture can simplify server management and increase flexibility and scalability when managing hundreds of thousands of servers in the cloud era.
基金supported by the European Union Horizon 2020 research and innovation program under CPSoSAware project(grant no.871738)by Science Foundation Ireland,grant no.12/RC/2289-P2,Insight Centre for Data Analytics。
文摘Artificial intelligence(AI)algorithms achieve outstanding results in many applicationdomains such as computer vision and natural language processing The performance ofAl models is the outcome of complex and costly model architecture design and trainingprocesses.Hence,it is paramount for model owners to protect their AI models frompiracy-model cloning,illegitimate distribution and use.IP protection mechanisms havebeen applied to Al models,and in particular to deep neural networks,to verify themodel ownership.State-of-the-art AI model ownership protection techniques have beensurveyed.The pros and cons of Al model ownership protection have been reported.The majonity of previous works are focused on watermarking,while more advancedmethods such fingerprinting and attestation are promising but not yet explored indepth.This study has been concluded by discussing possible research directions in thearea.
文摘面对新工业革命背景下算力复杂性上升与定制化需求加剧的挑战,开源硬件正成为打破封闭架构限制、增强技术自主可控能力的重要途径.重点关注了以RISC-Ⅴ(Reduced Instruction Set ComputerFive)为代表的开源指令集架构,系统梳理了其生态优势和产业价值;同时比较了国内外主要开源项目在设计开放性、系统灵活性及协同创新机制方面的不同特点;从时间维度展开分析,可以明确开源硬件从底层架构创新逐步走向异构融合和场景拓展的发展趋势.研究表明,开源硬件在智能制造、边缘计算、沉浸式终端等关键领域有着广阔的应用前景,能够有效提升算力利用效率,降低开发难度和系统成本.开源硬件正推动芯片设计从封闭模式向共享模式转变,为工业智能化升级和技术安全战略提供新的支撑.