For studying and optimizing the performance of general-purpose computing on graphics processing units(GPGPU)based on single instruction multiple threads(SIMT)processor about the neural network application,this work co...For studying and optimizing the performance of general-purpose computing on graphics processing units(GPGPU)based on single instruction multiple threads(SIMT)processor about the neural network application,this work contributes a self-developed SIMT processor named Pomelo and correlated assembly program.The parallel mechanism of SIMT computing mode and self-developed Pomelo processor is briefly introduced.A common convolutional neural network(CNN)is built to verify the compatibility and functionality of the Pomelo processor.CNN computing flow with task level and hardware level optimization is adopted on the Pomelo processor.A specific algorithm for organizing a Z-shaped memory structure is developed,which addresses reducing memory access in mass data computing tasks.Performing the above-combined adaptation and optimization strategy,the experimental result demonstrates that reducing memory access in SIMT computing mode plays a crucial role in improving performance.A 6.52 times performance is achieved on the 4 processing elements case.展开更多
In this paper, we present a graphics processing unit(GPU)-based implementation of a weighted bit-reliability based(w BRB) decoder for non-binary LDPC(NB-LDPC) codes. To achieve coalesced memory accesses, an efficient ...In this paper, we present a graphics processing unit(GPU)-based implementation of a weighted bit-reliability based(w BRB) decoder for non-binary LDPC(NB-LDPC) codes. To achieve coalesced memory accesses, an efficient data structure for the w BRB algorithm is proposed. Based on the Single-Instruction Multiple-Threads(SIMT) programming model, a novel mapping strategy with high intra-frame parallelism is presented to improve the latency and throughput performance. Moreover, by using Single-Instruction Multiple-Data(SIMD) intrinsics, four 8-bit message elements are packed into a 32-bit unit and simultaneously processed. Experimental results show that the proposed w BRB decoder provides good tradeoff between error performance and throughput for the codes with relatively large column degrees or high rates.展开更多
Belief propagation(BP)decoding outputs soft information and can be naturally used in iterative receivers.BP list(BPL)decoding provides comparable error-correction performance to the successive cancellation list(SCL)de...Belief propagation(BP)decoding outputs soft information and can be naturally used in iterative receivers.BP list(BPL)decoding provides comparable error-correction performance to the successive cancellation list(SCL)decoding.In this paper,we firstly introduce an enhanced code construction scheme for BPL decoding to improve its errorcorrection capability.Then,a GPU-based BPL decoder with adoption of the new code construction is presented.Finally,the proposed BPL decoder is tested on NVIDIA RTX3070 and GTX1060.Experimental results show that the presented BPL decoder with early termination criterion achieves above 1 Gbps throughput on RTX3070 for the code(1024,512)with 32 lists under good channel conditions.展开更多
Single isocenter multiple target stereotactic radiosurgery (SIMT-SRS) has potentially emerged as a new pillar in radio-immune combination therapy for the management of brain metastasis. Accuracy and efficiency are pus...Single isocenter multiple target stereotactic radiosurgery (SIMT-SRS) has potentially emerged as a new pillar in radio-immune combination therapy for the management of brain metastasis. Accuracy and efficiency are pushed to a higher level in the era of the linear accelerator-based SIMT-SRS. This short review focuses on patient selection, image preparation, patient simulation, electronic portal imaging device (EPID) QA, and the patient treatment process in the SIMT-SRS treatment only. Image-relevant recommendations and guidelines are presented and contrast application, acquisition efficiency, and alignment accuracy of CT and MRI images are explored. With guidance, the SIMT-SRS can be implemented with high precision and efficiency. 1 mm or 0.5 mm and non-uniform PTV margin expansion for all targets would become possible. It will enhance cancer killing effect in radio-immune combination therapy. General routine daily, monthly, and annual linear accelerator image quality assurances are excluded.展开更多
基金the Scientific Research Program Funded by Shaanxi Provincial Education Department(20JY058)。
文摘For studying and optimizing the performance of general-purpose computing on graphics processing units(GPGPU)based on single instruction multiple threads(SIMT)processor about the neural network application,this work contributes a self-developed SIMT processor named Pomelo and correlated assembly program.The parallel mechanism of SIMT computing mode and self-developed Pomelo processor is briefly introduced.A common convolutional neural network(CNN)is built to verify the compatibility and functionality of the Pomelo processor.CNN computing flow with task level and hardware level optimization is adopted on the Pomelo processor.A specific algorithm for organizing a Z-shaped memory structure is developed,which addresses reducing memory access in mass data computing tasks.Performing the above-combined adaptation and optimization strategy,the experimental result demonstrates that reducing memory access in SIMT computing mode plays a crucial role in improving performance.A 6.52 times performance is achieved on the 4 processing elements case.
基金the National Natural Science Foundation of China (91438116)
文摘In this paper, we present a graphics processing unit(GPU)-based implementation of a weighted bit-reliability based(w BRB) decoder for non-binary LDPC(NB-LDPC) codes. To achieve coalesced memory accesses, an efficient data structure for the w BRB algorithm is proposed. Based on the Single-Instruction Multiple-Threads(SIMT) programming model, a novel mapping strategy with high intra-frame parallelism is presented to improve the latency and throughput performance. Moreover, by using Single-Instruction Multiple-Data(SIMD) intrinsics, four 8-bit message elements are packed into a 32-bit unit and simultaneously processed. Experimental results show that the proposed w BRB decoder provides good tradeoff between error performance and throughput for the codes with relatively large column degrees or high rates.
基金supported by the Fundamental Research Funds for the Central Universities (FRF-TP20-062A1)Guangdong Basic and Applied Basic Research Foundation (2021A1515110070)
文摘Belief propagation(BP)decoding outputs soft information and can be naturally used in iterative receivers.BP list(BPL)decoding provides comparable error-correction performance to the successive cancellation list(SCL)decoding.In this paper,we firstly introduce an enhanced code construction scheme for BPL decoding to improve its errorcorrection capability.Then,a GPU-based BPL decoder with adoption of the new code construction is presented.Finally,the proposed BPL decoder is tested on NVIDIA RTX3070 and GTX1060.Experimental results show that the presented BPL decoder with early termination criterion achieves above 1 Gbps throughput on RTX3070 for the code(1024,512)with 32 lists under good channel conditions.
文摘Single isocenter multiple target stereotactic radiosurgery (SIMT-SRS) has potentially emerged as a new pillar in radio-immune combination therapy for the management of brain metastasis. Accuracy and efficiency are pushed to a higher level in the era of the linear accelerator-based SIMT-SRS. This short review focuses on patient selection, image preparation, patient simulation, electronic portal imaging device (EPID) QA, and the patient treatment process in the SIMT-SRS treatment only. Image-relevant recommendations and guidelines are presented and contrast application, acquisition efficiency, and alignment accuracy of CT and MRI images are explored. With guidance, the SIMT-SRS can be implemented with high precision and efficiency. 1 mm or 0.5 mm and non-uniform PTV margin expansion for all targets would become possible. It will enhance cancer killing effect in radio-immune combination therapy. General routine daily, monthly, and annual linear accelerator image quality assurances are excluded.