In recent years, the widespread adoption of parallel computing, especially in multi-core processors and high-performance computing environments, ushered in a new era of efficiency and speed. This trend was particularl...In recent years, the widespread adoption of parallel computing, especially in multi-core processors and high-performance computing environments, ushered in a new era of efficiency and speed. This trend was particularly noteworthy in the field of image processing, which witnessed significant advancements. This parallel computing project explored the field of parallel image processing, with a focus on the grayscale conversion of colorful images. Our approach involved integrating OpenMP into our framework for parallelization to execute a critical image processing task: grayscale conversion. By using OpenMP, we strategically enhanced the overall performance of the conversion process by distributing the workload across multiple threads. The primary objectives of our project revolved around optimizing computation time and improving overall efficiency, particularly in the task of grayscale conversion of colorful images. Utilizing OpenMP for concurrent processing across multiple cores significantly reduced execution times through the effective distribution of tasks among these cores. The speedup values for various image sizes highlighted the efficacy of parallel processing, especially for large images. However, a detailed examination revealed a potential decline in parallelization efficiency with an increasing number of cores. This underscored the importance of a carefully optimized parallelization strategy, considering factors like load balancing and minimizing communication overhead. Despite challenges, the overall scalability and efficiency achieved with parallel image processing underscored OpenMP’s effectiveness in accelerating image manipulation tasks.展开更多
Real-time capabilities and computational efficiency are provided by parallel image processing utilizing OpenMP. However, race conditions can affect the accuracy and reliability of the outcomes. This paper highlights t...Real-time capabilities and computational efficiency are provided by parallel image processing utilizing OpenMP. However, race conditions can affect the accuracy and reliability of the outcomes. This paper highlights the importance of addressing race conditions in parallel image processing, specifically focusing on color inverse filtering using OpenMP. We considered three solutions to solve race conditions, each with distinct characteristics: #pragma omp atomic: Protects individual memory operations for fine-grained control. #pragma omp critical: Protects entire code blocks for exclusive access. #pragma omp parallel sections reduction: Employs a reduction clause for safe aggregation of values across threads. Our findings show that the produced images were unaffected by race condition. However, it becomes evident that solving the race conditions in the code makes it significantly faster, especially when it is executed on multiple cores.展开更多
Optical neural networks have emerged as feasible alternatives to their electronic counterparts,offering significant benefits such as low power consumption,low latency,and high parallelism.However,the realization of ul...Optical neural networks have emerged as feasible alternatives to their electronic counterparts,offering significant benefits such as low power consumption,low latency,and high parallelism.However,the realization of ultra-compact nonlinear deep neural networks and multi-thread processing remain crucial challenges for optical computing.We present a monolithically integrated all-optical nonlinear diffractive deep neural network(AON-D^(2) NN)chip for the first time.The all-optical nonlinear activation function is implemented using germanium microstructures,which provide low loss and are compatible with the standard silicon photonics fabrication process.Assisted by the germanium activation function,the classification accuracy is improved by 9.1%for four-classification tasks.In addition,the chip's reconfigurability enables multi-task learning in situ via an innovative cross-training algorithm,yielding two task-specific inference results with accuracies of 95%and 96%,respectively.Furthermore,leveraging the wavelength-dependent response of the chip,the multi-thread nonlinear optical neural network is implemented for the first time,capable of handling two different tasks in parallel.The proposed AON-D^(2)NN contains three hidden layers with a footprint of only 0.73 mm^(2).It can achieve ultra-low latency(172 ps),paving the path for realizing high-performance optical neural networks.展开更多
获得空间电磁场场强分布是电磁频谱管理的重要任务之一,研究如何提高其计算性能以适应快速变化的空间电磁环境具有重要的意义。OpenMP(Open Multi Processing)是一种简单快速提高计算效率的方式,它有助于CPU多核资源被充分利用。提出了...获得空间电磁场场强分布是电磁频谱管理的重要任务之一,研究如何提高其计算性能以适应快速变化的空间电磁环境具有重要的意义。OpenMP(Open Multi Processing)是一种简单快速提高计算效率的方式,它有助于CPU多核资源被充分利用。提出了一种基于Open MP的并行获得空间电磁场场强分布方法,通过合理分析计算过程,设计相应并行方案,使得设计的并行算法适合CPU多核处理方式,并行度高。大量实验结果表明,该并行算法明显提高了计算效率,且具有高可扩展性。展开更多
The time-dependent density functional-based tight-bind (TD-DFTB) method is implemented on the multi-core and the graphical processing unit (GPU) system for excited state calcu-lations of large system with hundreds...The time-dependent density functional-based tight-bind (TD-DFTB) method is implemented on the multi-core and the graphical processing unit (GPU) system for excited state calcu-lations of large system with hundreds or thousands of atoms. Sparse matrix and OpenMP multithreaded are used for building the Hamiltonian matrix. The diagonal of the eigenvalue problem in the ground state is implemented on the GPUs with double precision. The GPU- based acceleration fully preserves all the properties, and a considerable total speedup of 8.73 can be achieved. A Krylov-space-based algorithm with the OpenMP parallel and CPU acceleration is used for finding the lowest eigenvalue and eigenvector of the large TDDFT matrix, which greatly reduces the iterations taken and the time spent on the excited states eigenvalue problem. The Krylov solver with the GPU acceleration of matrix-vector product can converge quickly to obtain the final result and a notable speed-up of 206 times can be observed for system size of 812 atoms. The calculations on serials of small and large systems show that the fast TD-DFTB code can obtain reasonable result with a much cheaper computational requirement compared with the first-principle results of CIS and full TDDFT calculation.展开更多
The use of multi-core processors will become a trend in safety critical systems. For safe execution of multi- threaded code, automatic code generation from formal spec- ification is a desirable method. Signal, a synch...The use of multi-core processors will become a trend in safety critical systems. For safe execution of multi- threaded code, automatic code generation from formal spec- ification is a desirable method. Signal, a synchronous lan- guage dedicated for the functional description of safety crit- ical systems, provides soundness semantics for determinis- tic concurrency. Although sequential code generation of Sig- nal has been implemented in Polychrony compiler, deter- ministic multi-threaded code generation strategy is still far from mature. Moreover, existing code generation methods use certain multi-thread library, which limits the cross plat- form executions. OpenMP is an application program inter- face (API) standard for parallel programming, supported by several mainstream compilers from different platforms. This paper presents a methodology translating Signal program to OpenMP-based multi-threaded C code. First, the intermedi- ate representation of the core syntax of Signal using syn- chronous guarded actions is defined. Then, according to the compositional semantics of Signal equations, the Signal pro- gram is synthesized to dependency graph (DG). After par- allel tasks are extracted from dependency graph, the Signal program can be finally translated into OpenMP-based C code which can be executed on multiple platforms.展开更多
文摘In recent years, the widespread adoption of parallel computing, especially in multi-core processors and high-performance computing environments, ushered in a new era of efficiency and speed. This trend was particularly noteworthy in the field of image processing, which witnessed significant advancements. This parallel computing project explored the field of parallel image processing, with a focus on the grayscale conversion of colorful images. Our approach involved integrating OpenMP into our framework for parallelization to execute a critical image processing task: grayscale conversion. By using OpenMP, we strategically enhanced the overall performance of the conversion process by distributing the workload across multiple threads. The primary objectives of our project revolved around optimizing computation time and improving overall efficiency, particularly in the task of grayscale conversion of colorful images. Utilizing OpenMP for concurrent processing across multiple cores significantly reduced execution times through the effective distribution of tasks among these cores. The speedup values for various image sizes highlighted the efficacy of parallel processing, especially for large images. However, a detailed examination revealed a potential decline in parallelization efficiency with an increasing number of cores. This underscored the importance of a carefully optimized parallelization strategy, considering factors like load balancing and minimizing communication overhead. Despite challenges, the overall scalability and efficiency achieved with parallel image processing underscored OpenMP’s effectiveness in accelerating image manipulation tasks.
文摘Real-time capabilities and computational efficiency are provided by parallel image processing utilizing OpenMP. However, race conditions can affect the accuracy and reliability of the outcomes. This paper highlights the importance of addressing race conditions in parallel image processing, specifically focusing on color inverse filtering using OpenMP. We considered three solutions to solve race conditions, each with distinct characteristics: #pragma omp atomic: Protects individual memory operations for fine-grained control. #pragma omp critical: Protects entire code blocks for exclusive access. #pragma omp parallel sections reduction: Employs a reduction clause for safe aggregation of values across threads. Our findings show that the produced images were unaffected by race condition. However, it becomes evident that solving the race conditions in the code makes it significantly faster, especially when it is executed on multiple cores.
基金supported by the National Key Research and Development Program of China(Grant No.2023YFB2806502)the National Natural Science Foundation of China(Grant No.62425504)the Knowledge Innovation Program of Wuhan—Basic Research(Grant No.2023010201010049)。
文摘Optical neural networks have emerged as feasible alternatives to their electronic counterparts,offering significant benefits such as low power consumption,low latency,and high parallelism.However,the realization of ultra-compact nonlinear deep neural networks and multi-thread processing remain crucial challenges for optical computing.We present a monolithically integrated all-optical nonlinear diffractive deep neural network(AON-D^(2) NN)chip for the first time.The all-optical nonlinear activation function is implemented using germanium microstructures,which provide low loss and are compatible with the standard silicon photonics fabrication process.Assisted by the germanium activation function,the classification accuracy is improved by 9.1%for four-classification tasks.In addition,the chip's reconfigurability enables multi-task learning in situ via an innovative cross-training algorithm,yielding two task-specific inference results with accuracies of 95%and 96%,respectively.Furthermore,leveraging the wavelength-dependent response of the chip,the multi-thread nonlinear optical neural network is implemented for the first time,capable of handling two different tasks in parallel.The proposed AON-D^(2)NN contains three hidden layers with a footprint of only 0.73 mm^(2).It can achieve ultra-low latency(172 ps),paving the path for realizing high-performance optical neural networks.
文摘获得空间电磁场场强分布是电磁频谱管理的重要任务之一,研究如何提高其计算性能以适应快速变化的空间电磁环境具有重要的意义。OpenMP(Open Multi Processing)是一种简单快速提高计算效率的方式,它有助于CPU多核资源被充分利用。提出了一种基于Open MP的并行获得空间电磁场场强分布方法,通过合理分析计算过程,设计相应并行方案,使得设计的并行算法适合CPU多核处理方式,并行度高。大量实验结果表明,该并行算法明显提高了计算效率,且具有高可扩展性。
文摘The time-dependent density functional-based tight-bind (TD-DFTB) method is implemented on the multi-core and the graphical processing unit (GPU) system for excited state calcu-lations of large system with hundreds or thousands of atoms. Sparse matrix and OpenMP multithreaded are used for building the Hamiltonian matrix. The diagonal of the eigenvalue problem in the ground state is implemented on the GPUs with double precision. The GPU- based acceleration fully preserves all the properties, and a considerable total speedup of 8.73 can be achieved. A Krylov-space-based algorithm with the OpenMP parallel and CPU acceleration is used for finding the lowest eigenvalue and eigenvector of the large TDDFT matrix, which greatly reduces the iterations taken and the time spent on the excited states eigenvalue problem. The Krylov solver with the GPU acceleration of matrix-vector product can converge quickly to obtain the final result and a notable speed-up of 206 times can be observed for system size of 812 atoms. The calculations on serials of small and large systems show that the fast TD-DFTB code can obtain reasonable result with a much cheaper computational requirement compared with the first-principle results of CIS and full TDDFT calculation.
文摘The use of multi-core processors will become a trend in safety critical systems. For safe execution of multi- threaded code, automatic code generation from formal spec- ification is a desirable method. Signal, a synchronous lan- guage dedicated for the functional description of safety crit- ical systems, provides soundness semantics for determinis- tic concurrency. Although sequential code generation of Sig- nal has been implemented in Polychrony compiler, deter- ministic multi-threaded code generation strategy is still far from mature. Moreover, existing code generation methods use certain multi-thread library, which limits the cross plat- form executions. OpenMP is an application program inter- face (API) standard for parallel programming, supported by several mainstream compilers from different platforms. This paper presents a methodology translating Signal program to OpenMP-based multi-threaded C code. First, the intermedi- ate representation of the core syntax of Signal using syn- chronous guarded actions is defined. Then, according to the compositional semantics of Signal equations, the Signal pro- gram is synthesized to dependency graph (DG). After par- allel tasks are extracted from dependency graph, the Signal program can be finally translated into OpenMP-based C code which can be executed on multiple platforms.