HPC(high perfomance computing)based on clusters of multicores is one of the main research lines in parallel programming.It is important to study the impact of programming paradigms of shared memory,message passing or ...HPC(high perfomance computing)based on clusters of multicores is one of the main research lines in parallel programming.It is important to study the impact of programming paradigms of shared memory,message passing or a combination of both on these architectures in order to efficiently exploit the power of these architectures.The Smith-Waterman algorithm is used as study case for the local alignment of DNA sequences,which allows establishing the similarity degree between two sequences.In this paper,the Smith-Waterman algorithm is parallelized by means of a pipeline scheme due to the data dependencies that are inherent to the problem,using the various communication/synchronization models mentioned above and then carrying out a comparative analysis.Finally,experimental results are presented,as well as future research lines.展开更多
In this article,we review recent advances in the technology of writing fiber Bragg gratings(FBGs)in selected cores of multicore fibers(MCFs)by using femtosecond laser pulses.The writing technology of such a key elemen...In this article,we review recent advances in the technology of writing fiber Bragg gratings(FBGs)in selected cores of multicore fibers(MCFs)by using femtosecond laser pulses.The writing technology of such a key element as the FBG opens up wide opportunities for the creation of next generation fiber lasers and sensors based on MCFs.The advantages of the technology are shown by using the examples of 3D shape sensors,acoustic emission sensors with spatially multiplexed channels,as well as multicore fiber Raman lasers.展开更多
Specific and sustained release of nutrients from capsules to the gastrointestinal tract has attracted many attentions in the field of food and drug delivery.In this work,we reported a monoaxial dispersion electrospray...Specific and sustained release of nutrients from capsules to the gastrointestinal tract has attracted many attentions in the field of food and drug delivery.In this work,we reported a monoaxial dispersion electrospraying-ionotropic gelation technique to prepare multicore millimeter-sized spherical capsules for specific and sustained release of fish oil.The spherical capsules had diameters from 2.05 mm to 0.35 mm with the increased applied voltages.The capsules consisted of uniform(at applied voltages of≤10 k V)or nonuniform(at applied voltages of>10 k V)multicores.The obtained capsules had reasonable loading ratios(9.7%-6.3%)due to the multicore structure.In addition,the obtained capsules had specific and sustained release behaviors of fish oil into the small intestinal phase of in vitro gastrointestinal tract and small intestinal tract models.The simple monoaxial dispersion electrospraying-ionotropic gelatin technique does not involve complicated preparation formulations and polymer modification,which makes the technique has a potential application prospect for the fish oil preparations and the encapsulation of functional active substances in the field of food and drug industries.展开更多
This essay designed a kind of new seven-core fiber with lower crosstalk and loss, and made space division multiplexing transmission experiment based on this seven-core fiber. It is known that crosstalk has the most se...This essay designed a kind of new seven-core fiber with lower crosstalk and loss, and made space division multiplexing transmission experiment based on this seven-core fiber. It is known that crosstalk has the most serious influence in multicore fiber transmission process. Before the experiment, the affecting factors of fiber crosstalk were analyzed through simulation, such as core space, bending radius, and fiber length. Combined with the simulation analysis, the design scheme of multicore fiber with low crosstalk was obtained. Before the fiber design, various factors of influence crosstalk such as the core- to-core distance, bending radius, fiber length and so on. Based on the simulation analysis, conclusion has made on the design scheme of multi-core optimal fiber with low crosstalk. The space division multiplexing and wavelength division multiplexing technology, was adopted to conduct seven-core optical fiber transmission of 58.7kin.The crosstalk of adjacent core was suppressed to as low as 45dB / km, the attenuation of inner core was 0.24dB/ km, the outer cores' 0.32dB/km. Different bit error rate (BER) performances were also studied under different conditions, through reasonably designing the system to reduce the error rate, improve the performance of the system, and realize long distance and large capacity transmission with fiber.展开更多
Recently,Multicore systems use Dynamic Voltage/Frequency Scaling(DV/FS)technology to allow the cores to operate with various voltage and/or frequencies than other cores to save power and enhance the performance.In thi...Recently,Multicore systems use Dynamic Voltage/Frequency Scaling(DV/FS)technology to allow the cores to operate with various voltage and/or frequencies than other cores to save power and enhance the performance.In this paper,an effective and reliable hybridmodel to reduce the energy and makespan in multicore systems is proposed.The proposed hybrid model enhances and integrates the greedy approach with dynamic programming to achieve optimal Voltage/Frequency(Vmin/F)levels.Then,the allocation process is applied based on the availableworkloads.The hybrid model consists of three stages.The first stage gets the optimum safe voltage while the second stage sets the level of energy efficiency,and finally,the third is the allocation stage.Experimental results on various benchmarks show that the proposed model can generate optimal solutions to save energy while minimizing the makespan penalty.Comparisons with other competitive algorithms show that the proposed model provides on average 48%improvements in energy-saving and achieves an 18%reduction in computation time while ensuring a high degree of system reliability.展开更多
Great strides have been made over the past decade to establish femtosecond lasers in advanced manufacturing systems for enabling new forms of non-contact processing of transparent materials.Research advances have show...Great strides have been made over the past decade to establish femtosecond lasers in advanced manufacturing systems for enabling new forms of non-contact processing of transparent materials.Research advances have shown that a myriad of additive and subtractive techniques is now possible for flexible 2D and 3D structuring of such materials with micro-and nano-scale precision.In this paper,these techniques have been refined and scaled up to demonstrate the potential for 3D writing of high-density optical packaging components,specifically addressing the major bottleneck for efficiently connecting optical fibres to silicon photonic(SiP)processors for use in telecom and data centres.An 84-channel fused silica interposer was introduced for high-density edge coupling of multicore fibres(MCFs)to a SiP chip.Femtosecond laser irradiation followed by chemical etching was further harnessed to open alignment sockets,permitting rapid assembly with precise locking of MCF positions for efficient coupling to laser written optical waveguides in the interposer.A 3D waveguide fanout design provided an attractive balancing of low losses,modematching,high channel density,compact footprint,and low crosstalk.The 3D additive and subtractive processes thus demonstrated the potential for higher scale integration and rapid photonic assembly and packaging of micro-optic components for telecom interconnects,with possible broader applications in integrated biophotonic chips or micro-displays.展开更多
We proposed a method for shape sensing using a few multicore fiber Bragg grating (FBG) sensors ina single-port continuum surgical robot (CSR). The traditional method of utilizing a forward kinematic model tocalculate t...We proposed a method for shape sensing using a few multicore fiber Bragg grating (FBG) sensors ina single-port continuum surgical robot (CSR). The traditional method of utilizing a forward kinematic model tocalculate the shape of a single-port CSR is limited by the accuracy of the model. If FBG sensors are used forshape sensing, their accuracy will be affected by their number, especially in long and flexible CSRs. A fusionmethod based on an extended Kalman filter (EKF) was proposed to solve this problem. Shape reconstructionwas performed using the CSR forward kinematic model and FBG sensors, and the two results were fused usingan EKF. The CSR reconstruction method adopted the incremental form of the forward kinematic model, whilethe FBG sensor method adopted the discrete arc-segment assumption method. The fusion method can eliminatethe inaccuracy of the kinematic model and obtain more accurate shape reconstruction results using only a smallnumber of FBG sensors. We validated our algorithm through experiments on multiple bending shapes underdifferent load conditions. The results show that our method significantly outperformed the traditional methodsin terms of robustness and effectiveness.展开更多
Multicore fiber(MCF)which contains more than one core in a single fiber cladding has attracted ever increasing attention for application in optical sensing systems owing to its unique capability of independent light t...Multicore fiber(MCF)which contains more than one core in a single fiber cladding has attracted ever increasing attention for application in optical sensing systems owing to its unique capability of independent light transmission in multiple spatial channels.Different from the situation in standard single mode fiber(SMF),the fiber bending gives rise to tangential strain in off-center cores,and this unique feature has been employed for directional bending and shape sensing,where strain measurement is achieved by using either fiber Bragg gratings(FBGs),optical frequency-domain reflectometry(OFDR)or Brillouin distributed sensing technique.On the other hand,the parallel spatial cores enable space-division multiplexed(SDM)system configuration that allows for the multiplexing of multiple distributed sensing techniques.As a result,multi-parameter sensing or performance enhanced sensing can be achieved by using MCF.In this paper,we review the research progress in MCF based distributed fiber sensors.Brief introductions of MCF and the multiplexing/de-multiplexing methods are presented.The bending sensitivity of off-center cores is analyzed.Curvature and shape sensing,as well as various SDM distributed sensing using MCF are summarized,and the working principles of diverse MCF sensors are discussed.Finally,we present the challenges and prospects of MCF for distributed sensing applications.展开更多
The general m-machine permutation flowshop problem with the total flow-time objective is known to be NP-hard for m ≥ 2. The only practical method for finding optimal solutions has been branch-and-bound algorithms. In...The general m-machine permutation flowshop problem with the total flow-time objective is known to be NP-hard for m ≥ 2. The only practical method for finding optimal solutions has been branch-and-bound algorithms. In this paper, we present an improved sequential algorithm which is based on a strict alternation of Generation and Exploration execution modes as well as Depth-First/Best-First hybrid strategies. The experimental results show that the proposed scheme exhibits improved performance compared with the algorithm in [1]. More importantly, our method can be easily extended and implemented with lightweight threads to speed up the execution times. Good speedups can be obtained on shared-memory multicore systems.展开更多
Recently,transmitting diverse signals in different cores of a multicore fiber(MCF)has greatly improved the communication capacity of a single fiber.In such an MCF-based communication system,mux/demux devices with broa...Recently,transmitting diverse signals in different cores of a multicore fiber(MCF)has greatly improved the communication capacity of a single fiber.In such an MCF-based communication system,mux/demux devices with broad bandwidth are of great significance.In this work,we design and fabricate a 19-channel mux/demux device based on femtosecond laser direct writing.The fabricated mux/demux device possesses an average insertion loss of 0.88 dB and intercore crosstalk of no more than−29.1 dB.Moreover,the fabricated mux/demux device features a broad bandwidth across the C+L band.Such a mux/demux device enables low-loss 19-core fiber(de)multiplexing over the whole C+L band,showing a convincing potential value in wavelength-space division multiplexing applications.In addition,a 19-core fiber fan-in/fan-out system is also established based on a pair of mux/demux devices in this work.展开更多
This study produced a statistical analysis of multicore eddy structures based on 23 years’ altimetry data in global oceans. Multicore structures were identified using a threshold-free closed-contour algorithm of sea ...This study produced a statistical analysis of multicore eddy structures based on 23 years’ altimetry data in global oceans. Multicore structures were identified using a threshold-free closed-contour algorithm of sea surface height, which was improved for this study in respect of certain technical details. Meanwhile a more accurate definition of eddy boundary was used to estimate eddy scale. Generally, multicore structures, which have two or more closed eddies of the same polarity within their boundaries, represent an important transitional stage in their lives during which the component eddies might experience splitting or merging. In comparison with global eddies, the lifetimes and propagation distances of multicore eddies were found to be much smaller because of their inherent structural instability. However, at the same latitude, the spatial scale of multicore eddies was found larger than that of single-core eddies, i.e., the eddy area could be at least twice as large. Multicore eddies were found to exhibit some features similar to global eddies. For example, multicore eddies tend to occur in the Antarctic Circumpolar Current, some western boundary currents, and mid-latitude regions around 25°N/S, the majority(70%) of eddies propagate westward while only 30% propagate eastward, and large-amplitude eddies are restricted mainly to reasonably confined regions of highly unstable currents.展开更多
Graphic processing units (GPUs) have been widely recognized as cost-efficient co-processors with acceptable size, weight, and power consumption. However, adopting GPUs in real-time systems is still challenging, due ...Graphic processing units (GPUs) have been widely recognized as cost-efficient co-processors with acceptable size, weight, and power consumption. However, adopting GPUs in real-time systems is still challenging, due to the lack in framework for real-time analysis. In order to guarantee real-time requirements while maintaining system utilization ~in modern heterogeneous systems, such as multicore multi-GPU systems, a novel suspension-based k-exclusion real-time locking protocol and the associated suspension-aware schedulability analysis are proposed. The proposed protocol provides a synchronization framework that enables multiple GPUs to be efficiently integrated in multicore real-time systems. Comparative evaluations show that the proposed methods improve upon the existing work in terms of schedulability.展开更多
In this paper,a typical experiment is carried out based on a high-resolution air-sea coupled model,namely,the coupled ocean-atmosphere-wave-sediment transport(COAWST)model,on both heterogeneous many-core(SW)and homoge...In this paper,a typical experiment is carried out based on a high-resolution air-sea coupled model,namely,the coupled ocean-atmosphere-wave-sediment transport(COAWST)model,on both heterogeneous many-core(SW)and homogenous multicore(Intel)supercomputing platforms.We construct a hindcast of Typhoon Lekima on both the SW and Intel platforms,compare the simulation results between these two platforms and compare the key elements of the atmospheric and ocean modules to reanalysis data.The comparative experiment in this typhoon case indicates that the domestic many-core computing platform and general cluster yield almost no differences in the simulated typhoon path and intensity,and the differences in surface pressure(PSFC)in the WRF model and sea surface temperature(SST)in the short-range forecast are very small,whereas a major difference can be identified at high latitudes after the first 10 days.Further heat budget analysis verifies that the differences in SST after 10 days are mainly caused by shortwave radiation variations,as influenced by subsequently generated typhoons in the system.These typhoons generated in the hindcast after the first 10 days attain obviously different trajectories between the two platforms.展开更多
Heterogeneous multicore clusters are becoming more popular for high-performance computing due to their great computing power and cost-to-performance effectiveness nowadays.Nevertheless,parallel efficiency degradation ...Heterogeneous multicore clusters are becoming more popular for high-performance computing due to their great computing power and cost-to-performance effectiveness nowadays.Nevertheless,parallel efficiency degradation is still a problem in large-scale structural analysis based on heterogeneousmulticore clusters.To solve it,a hybrid hierarchical parallel algorithm(HHPA)is proposed on the basis of the conventional domain decomposition algorithm(CDDA)and the parallel sparse solver.In this new algorithm,a three-layer parallelization of the computational procedure is introduced to enable the separation of the communication of inter-nodes,heterogeneous-core-groups(HCGs)and inside-heterogeneous-core-groups through mapping computing tasks to various hardware layers.This approach can not only achieve load balancing at different layers efficiently but can also improve the communication rate significantly through hierarchical communication.Additionally,the proposed hybrid parallel approach in this article can reduce the interface equation size and further reduce the solution time,which can make up for the shortcoming of growing communication overheads with the increase of interface equation size when employing CDDA.Moreover,the distributed sparse storage of a large amount of data is introduced to improve memory access.By solving benchmark instances on the Shenwei-Taihuzhiguang supercomputer,the results show that the proposed method can obtain higher speedup and parallel efficiency compared with CDDA and more superior extensibility of parallel partition compared with the two-level parallel computing algorithm(TPCA).展开更多
The simulation is an important means of performance evaluation of the computer architecture. Nowadays, the serial simulation of general purpose graphics processing unit(GPGPU) architecture is the main bottleneck for t...The simulation is an important means of performance evaluation of the computer architecture. Nowadays, the serial simulation of general purpose graphics processing unit(GPGPU) architecture is the main bottleneck for the simulation speed. To address this issue, we propose the intra-kernel parallelization on a multicore processor and the inter-kernel parallelization on a multiple-machine platform. We apply these two methods to the GPGPU-sim simulator. The intra-kernel parallelization method firstly parallelizes the serial simulation of multiple compute units in one cycle. Then it parallelizes the timing and functional simulation to reduce the performance loss caused by the synchronization between different compute units. The inter-kernel parallelization method divides multiple kernels of a CUDA program into several groups and distributes these groups across multiple simulation hosts to perform the simulation. Experimental results show that the intra-kernel parallelization method achieves a speed-up of up to 12 with a maximum error rate of 0.009 4% on a 32-core machine, and the inter-kernel parallelization method can accelerate the simulation by a factor of up to 3.9 with a maximum error rate of 0.11% on four simulation hosts. The orthogonality between these two methods allows us to combine them together on multiple multi-core hosts to get further performance improvements.展开更多
Multicore systems oftentimes use multiple levels of cache to bridge the gap between processor and memory speed.This paper presents a new design of a dedicated pipeline cache memory for multicore processors called dual...Multicore systems oftentimes use multiple levels of cache to bridge the gap between processor and memory speed.This paper presents a new design of a dedicated pipeline cache memory for multicore processors called dual port content addressable memory(DPCAM).In addition,it proposes a new replacement algorithm based on hardware which is called a near-far access replacement algorithm(NFRA)to reduce the cost overhead of the cache controller and improve the cache access latency.The experimental results indicated that the latency for write and read operations are significantly less in comparison with a set-associative cache memory.Moreover,it was shown that a latency of a read operation is nearly constant regardless of the size of DPCAM.However,an estimation of the power dissipation showed that DPCAM consumes about 7%greater than a set-associative cache memory of the same size.These results encourage for embedding DPCAM within the multicore processors as a small shared cache memory.展开更多
In this paper, a hybrid neural-genetic fuzzy system is proposed to control the flow and height of water in the reservoirs of water transfer networks. These controls will avoid probable water wastes in the reservoirs a...In this paper, a hybrid neural-genetic fuzzy system is proposed to control the flow and height of water in the reservoirs of water transfer networks. These controls will avoid probable water wastes in the reservoirs and pressure drops in water distribution networks. The proposed approach combines the artificial neural network, genetic algorithm, and fuzzy inference system to improve the performance of the supervisory control and data acquisition stations through a new control philosophy for instruments and control valves in the reservoirs of the water transfer networks. First, a multi-core artificial neural network model, including a multi-layer perceptron and radial based function, is proposed to forecast the daily consumption of the water in a reservoir. A genetic algorithm is proposed to optimize the parameters of the artificial neural networks. Then, the online height of water in the reservoir and the output of artificial neural networks are used as inputs of a fuzzy inference system to estimate the flow rate of the reservoir inlet. Finally, the estimated inlet flow is translated into the input valve position using a transform control unit supported by a nonlinear autoregressive exogenous model. The proposed approach is applied in the Tehran water transfer network. The results of this study show that the usage of the proposed approach significantly reduces the deviation of the reservoir height from the desired levels.展开更多
The strict and high-standard requirements for the safety and stability ofmajor engineering systems make it a tough challenge for large-scale finite element modal analysis.At the same time,realizing the systematic anal...The strict and high-standard requirements for the safety and stability ofmajor engineering systems make it a tough challenge for large-scale finite element modal analysis.At the same time,realizing the systematic analysis of the entire large structure of these engineering systems is extremely meaningful in practice.This article proposes a multilevel hierarchical parallel algorithm for large-scale finite element modal analysis to reduce the parallel computational efficiency loss when using heterogeneous multicore distributed storage computers in solving large-scale finite element modal analysis.Based on two-level partitioning and four-transformation strategies,the proposed algorithm not only improves the memory access rate through the sparsely distributed storage of a large amount of data but also reduces the solution time by reducing the scale of the generalized characteristic equation(GCEs).Moreover,a multilevel hierarchical parallelization approach is introduced during the computational procedure to enable the separation of the communication of inter-nodes,intra-nodes,heterogeneous core groups(HCGs),and inside HCGs through mapping computing tasks to various hardware layers.This method can efficiently achieve load balancing at different layers and significantly improve the communication rate through hierarchical communication.Therefore,it can enhance the efficiency of parallel computing of large-scale finite element modal analysis by fully exploiting the architecture characteristics of heterogeneous multicore clusters.Finally,typical numerical experiments were used to validate the correctness and efficiency of the proposedmethod.Then a parallel modal analysis example of the cross-river tunnel with over ten million degrees of freedom(DOFs)was performed,and ten-thousand core processors were applied to verify the feasibility of the algorithm.展开更多
3D reverse time migration in tiled transversly isotropic(3D RTM-TTI) is the most precise model for complex seismic imaging.However,vast computing time of 3D RTM-TTI prevents it from being widely used,which is addresse...3D reverse time migration in tiled transversly isotropic(3D RTM-TTI) is the most precise model for complex seismic imaging.However,vast computing time of 3D RTM-TTI prevents it from being widely used,which is addressed by providing parallel solutions for 3D RTM-TTI on multicores and many-cores.After data parallelism and memory optimization,the hot spot function of 3D RTMTTI gains 35.99 X speedup on two Intel Xeon CPUs,89.75 X speedup on one Intel Xeon Phi,89.92 X speedup on one NVIDIA K20 GPU compared with serial CPU baseline.This study makes RTM-TTI practical in industry.Since the computation pattern in RTM is stencil,the approaches also benefit a wide range of stencil-based applications.展开更多
This thesis will present the research and practice of traffic lights and traffic signs recognition system based on multicore of FPGA. This system consists of four parts as following: the collection of dynamic images, ...This thesis will present the research and practice of traffic lights and traffic signs recognition system based on multicore of FPGA. This system consists of four parts as following: the collection of dynamic images, the preprocessing of gray value, the detection of the edges and the patterning and the judgment of the pattern matching. The multiple cores system is consist of three cores. Each core parallels processes the incoming images from camera collection in terms of different colors and graphic elements. The image data read in from the camera works as the sharing data of the three cores.展开更多
文摘HPC(high perfomance computing)based on clusters of multicores is one of the main research lines in parallel programming.It is important to study the impact of programming paradigms of shared memory,message passing or a combination of both on these architectures in order to efficiently exploit the power of these architectures.The Smith-Waterman algorithm is used as study case for the local alignment of DNA sequences,which allows establishing the similarity degree between two sequences.In this paper,the Smith-Waterman algorithm is parallelized by means of a pipeline scheme due to the data dependencies that are inherent to the problem,using the various communication/synchronization models mentioned above and then carrying out a comparative analysis.Finally,experimental results are presented,as well as future research lines.
基金supported by the Russian Ministry of Science and Higher Education (14.Y26.31.0017)Russian Foundation for Basic Research(18-52-7822)the work concerning MCF fiber Raman lasers was supported by Russian Science Foundation (21-72-30024)
文摘In this article,we review recent advances in the technology of writing fiber Bragg gratings(FBGs)in selected cores of multicore fibers(MCFs)by using femtosecond laser pulses.The writing technology of such a key element as the FBG opens up wide opportunities for the creation of next generation fiber lasers and sensors based on MCFs.The advantages of the technology are shown by using the examples of 3D shape sensors,acoustic emission sensors with spatially multiplexed channels,as well as multicore fiber Raman lasers.
基金supported by research grants from the National Key R&D Program(2019YFD0902003)。
文摘Specific and sustained release of nutrients from capsules to the gastrointestinal tract has attracted many attentions in the field of food and drug delivery.In this work,we reported a monoaxial dispersion electrospraying-ionotropic gelation technique to prepare multicore millimeter-sized spherical capsules for specific and sustained release of fish oil.The spherical capsules had diameters from 2.05 mm to 0.35 mm with the increased applied voltages.The capsules consisted of uniform(at applied voltages of≤10 k V)or nonuniform(at applied voltages of>10 k V)multicores.The obtained capsules had reasonable loading ratios(9.7%-6.3%)due to the multicore structure.In addition,the obtained capsules had specific and sustained release behaviors of fish oil into the small intestinal phase of in vitro gastrointestinal tract and small intestinal tract models.The simple monoaxial dispersion electrospraying-ionotropic gelatin technique does not involve complicated preparation formulations and polymer modification,which makes the technique has a potential application prospect for the fish oil preparations and the encapsulation of functional active substances in the field of food and drug industries.
基金National High Technology 863 Program of China(No.2013AA013301,2013AA013403,2015AA015501,2015AA015502,2015AA015504,2015AA016901)National NSFC(No.61425022/61522501/61307086/61475024/61275158/61201151/61275074/61205066)+4 种基金Beijing Nova Program(No.Z141101001814048)Beijing Excellent Ph.D.Thesis Guidance Foundation(No.20121001302)the Universities Ph.D.Special Research Funds(No.20120005110003/20120005120007)the Fundamental Research Funds for the Central Universities with No.2014RC0203Fund of State Key Laboratory of IPOC(BUPT)
文摘This essay designed a kind of new seven-core fiber with lower crosstalk and loss, and made space division multiplexing transmission experiment based on this seven-core fiber. It is known that crosstalk has the most serious influence in multicore fiber transmission process. Before the experiment, the affecting factors of fiber crosstalk were analyzed through simulation, such as core space, bending radius, and fiber length. Combined with the simulation analysis, the design scheme of multicore fiber with low crosstalk was obtained. Before the fiber design, various factors of influence crosstalk such as the core- to-core distance, bending radius, fiber length and so on. Based on the simulation analysis, conclusion has made on the design scheme of multi-core optimal fiber with low crosstalk. The space division multiplexing and wavelength division multiplexing technology, was adopted to conduct seven-core optical fiber transmission of 58.7kin.The crosstalk of adjacent core was suppressed to as low as 45dB / km, the attenuation of inner core was 0.24dB/ km, the outer cores' 0.32dB/km. Different bit error rate (BER) performances were also studied under different conditions, through reasonably designing the system to reduce the error rate, improve the performance of the system, and realize long distance and large capacity transmission with fiber.
文摘Recently,Multicore systems use Dynamic Voltage/Frequency Scaling(DV/FS)technology to allow the cores to operate with various voltage and/or frequencies than other cores to save power and enhance the performance.In this paper,an effective and reliable hybridmodel to reduce the energy and makespan in multicore systems is proposed.The proposed hybrid model enhances and integrates the greedy approach with dynamic programming to achieve optimal Voltage/Frequency(Vmin/F)levels.Then,the allocation process is applied based on the availableworkloads.The hybrid model consists of three stages.The first stage gets the optimum safe voltage while the second stage sets the level of energy efficiency,and finally,the third is the allocation stage.Experimental results on various benchmarks show that the proposed model can generate optimal solutions to save energy while minimizing the makespan penalty.Comparisons with other competitive algorithms show that the proposed model provides on average 48%improvements in energy-saving and achieves an 18%reduction in computation time while ensuring a high degree of system reliability.
基金Financial support from Huawei Technologies Co.,Ltd,China(Project YB2016020025)is gratefully acknowledged.
文摘Great strides have been made over the past decade to establish femtosecond lasers in advanced manufacturing systems for enabling new forms of non-contact processing of transparent materials.Research advances have shown that a myriad of additive and subtractive techniques is now possible for flexible 2D and 3D structuring of such materials with micro-and nano-scale precision.In this paper,these techniques have been refined and scaled up to demonstrate the potential for 3D writing of high-density optical packaging components,specifically addressing the major bottleneck for efficiently connecting optical fibres to silicon photonic(SiP)processors for use in telecom and data centres.An 84-channel fused silica interposer was introduced for high-density edge coupling of multicore fibres(MCFs)to a SiP chip.Femtosecond laser irradiation followed by chemical etching was further harnessed to open alignment sockets,permitting rapid assembly with precise locking of MCF positions for efficient coupling to laser written optical waveguides in the interposer.A 3D waveguide fanout design provided an attractive balancing of low losses,modematching,high channel density,compact footprint,and low crosstalk.The 3D additive and subtractive processes thus demonstrated the potential for higher scale integration and rapid photonic assembly and packaging of micro-optic components for telecom interconnects,with possible broader applications in integrated biophotonic chips or micro-displays.
基金the National Natural Science Foundation of China(Nos.61873257 and U20A20195)the Project of Natural Science Foundation of Liaoning Province(No.2021-MS-033)the Foundation of Millions of Talents Project of the Department of Human Resources and Social Security of Liaoning Province(No.2021921037)。
文摘We proposed a method for shape sensing using a few multicore fiber Bragg grating (FBG) sensors ina single-port continuum surgical robot (CSR). The traditional method of utilizing a forward kinematic model tocalculate the shape of a single-port CSR is limited by the accuracy of the model. If FBG sensors are used forshape sensing, their accuracy will be affected by their number, especially in long and flexible CSRs. A fusionmethod based on an extended Kalman filter (EKF) was proposed to solve this problem. Shape reconstructionwas performed using the CSR forward kinematic model and FBG sensors, and the two results were fused usingan EKF. The CSR reconstruction method adopted the incremental form of the forward kinematic model, whilethe FBG sensor method adopted the discrete arc-segment assumption method. The fusion method can eliminatethe inaccuracy of the kinematic model and obtain more accurate shape reconstruction results using only a smallnumber of FBG sensors. We validated our algorithm through experiments on multiple bending shapes underdifferent load conditions. The results show that our method significantly outperformed the traditional methodsin terms of robustness and effectiveness.
文摘Multicore fiber(MCF)which contains more than one core in a single fiber cladding has attracted ever increasing attention for application in optical sensing systems owing to its unique capability of independent light transmission in multiple spatial channels.Different from the situation in standard single mode fiber(SMF),the fiber bending gives rise to tangential strain in off-center cores,and this unique feature has been employed for directional bending and shape sensing,where strain measurement is achieved by using either fiber Bragg gratings(FBGs),optical frequency-domain reflectometry(OFDR)or Brillouin distributed sensing technique.On the other hand,the parallel spatial cores enable space-division multiplexed(SDM)system configuration that allows for the multiplexing of multiple distributed sensing techniques.As a result,multi-parameter sensing or performance enhanced sensing can be achieved by using MCF.In this paper,we review the research progress in MCF based distributed fiber sensors.Brief introductions of MCF and the multiplexing/de-multiplexing methods are presented.The bending sensitivity of off-center cores is analyzed.Curvature and shape sensing,as well as various SDM distributed sensing using MCF are summarized,and the working principles of diverse MCF sensors are discussed.Finally,we present the challenges and prospects of MCF for distributed sensing applications.
文摘The general m-machine permutation flowshop problem with the total flow-time objective is known to be NP-hard for m ≥ 2. The only practical method for finding optimal solutions has been branch-and-bound algorithms. In this paper, we present an improved sequential algorithm which is based on a strict alternation of Generation and Exploration execution modes as well as Depth-First/Best-First hybrid strategies. The experimental results show that the proposed scheme exhibits improved performance compared with the algorithm in [1]. More importantly, our method can be easily extended and implemented with lightweight threads to speed up the execution times. Good speedups can be obtained on shared-memory multicore systems.
基金supported by the National Natural Science Foundation of China(Grant Nos.62125503 and 62261160388)the Key R&D Program of Hubei Province of China(Grant Nos.2020BAB001 and 2021BAA024)+3 种基金the Key R&D Program of Guangdong Province(Grant No.2018B030325002)the Science and Technology Innovation Commission of Shenzhen(Grant No.JCYJ20200109114018750)the Open Projects Foundation(No.SKLD2201)of State Key Laboratory of Optical Fiber and Cable Manufacture Technology(YOFC)the Innovation Project of Optics Valley Laboratory(Grant No.OVL2021BG004).
文摘Recently,transmitting diverse signals in different cores of a multicore fiber(MCF)has greatly improved the communication capacity of a single fiber.In such an MCF-based communication system,mux/demux devices with broad bandwidth are of great significance.In this work,we design and fabricate a 19-channel mux/demux device based on femtosecond laser direct writing.The fabricated mux/demux device possesses an average insertion loss of 0.88 dB and intercore crosstalk of no more than−29.1 dB.Moreover,the fabricated mux/demux device features a broad bandwidth across the C+L band.Such a mux/demux device enables low-loss 19-core fiber(de)multiplexing over the whole C+L band,showing a convincing potential value in wavelength-space division multiplexing applications.In addition,a 19-core fiber fan-in/fan-out system is also established based on a pair of mux/demux devices in this work.
基金The National Key Reasearch and Development Program of China under contract No.2016YFC1401800the National Natural Science Foundation of China under contract No.41576176+1 种基金the National Programme on Global Change and Air-Sea InteractionDragon 4 Project under contract No.32292
文摘This study produced a statistical analysis of multicore eddy structures based on 23 years’ altimetry data in global oceans. Multicore structures were identified using a threshold-free closed-contour algorithm of sea surface height, which was improved for this study in respect of certain technical details. Meanwhile a more accurate definition of eddy boundary was used to estimate eddy scale. Generally, multicore structures, which have two or more closed eddies of the same polarity within their boundaries, represent an important transitional stage in their lives during which the component eddies might experience splitting or merging. In comparison with global eddies, the lifetimes and propagation distances of multicore eddies were found to be much smaller because of their inherent structural instability. However, at the same latitude, the spatial scale of multicore eddies was found larger than that of single-core eddies, i.e., the eddy area could be at least twice as large. Multicore eddies were found to exhibit some features similar to global eddies. For example, multicore eddies tend to occur in the Antarctic Circumpolar Current, some western boundary currents, and mid-latitude regions around 25°N/S, the majority(70%) of eddies propagate westward while only 30% propagate eastward, and large-amplitude eddies are restricted mainly to reasonably confined regions of highly unstable currents.
基金supported by the National Natural Science Foundation of China under Grant No.61003032/F020207
文摘Graphic processing units (GPUs) have been widely recognized as cost-efficient co-processors with acceptable size, weight, and power consumption. However, adopting GPUs in real-time systems is still challenging, due to the lack in framework for real-time analysis. In order to guarantee real-time requirements while maintaining system utilization ~in modern heterogeneous systems, such as multicore multi-GPU systems, a novel suspension-based k-exclusion real-time locking protocol and the associated suspension-aware schedulability analysis are proposed. The proposed protocol provides a synchronization framework that enables multiple GPUs to be efficiently integrated in multicore real-time systems. Comparative evaluations show that the proposed methods improve upon the existing work in terms of schedulability.
基金This work is supported by the National Key Research and Development Plan program of the Ministry of Science and Technology of China(No.2016YFB0201100)Additionally,this work is supported by the National Laboratory for Marine Science and Technology(Qingdao)Major Project of the Aoshan Science and Technology Innovation Program(No.2018ASKJ01-04)the Open Fundation of Key Laboratory of Marine Science and Numerical Simulation,Ministry of Natural Resources(No.2021-YB-02).
文摘In this paper,a typical experiment is carried out based on a high-resolution air-sea coupled model,namely,the coupled ocean-atmosphere-wave-sediment transport(COAWST)model,on both heterogeneous many-core(SW)and homogenous multicore(Intel)supercomputing platforms.We construct a hindcast of Typhoon Lekima on both the SW and Intel platforms,compare the simulation results between these two platforms and compare the key elements of the atmospheric and ocean modules to reanalysis data.The comparative experiment in this typhoon case indicates that the domestic many-core computing platform and general cluster yield almost no differences in the simulated typhoon path and intensity,and the differences in surface pressure(PSFC)in the WRF model and sea surface temperature(SST)in the short-range forecast are very small,whereas a major difference can be identified at high latitudes after the first 10 days.Further heat budget analysis verifies that the differences in SST after 10 days are mainly caused by shortwave radiation variations,as influenced by subsequently generated typhoons in the system.These typhoons generated in the hindcast after the first 10 days attain obviously different trajectories between the two platforms.
基金supported by the National Natural Science Foundation of China (Grant No.11772192).
文摘Heterogeneous multicore clusters are becoming more popular for high-performance computing due to their great computing power and cost-to-performance effectiveness nowadays.Nevertheless,parallel efficiency degradation is still a problem in large-scale structural analysis based on heterogeneousmulticore clusters.To solve it,a hybrid hierarchical parallel algorithm(HHPA)is proposed on the basis of the conventional domain decomposition algorithm(CDDA)and the parallel sparse solver.In this new algorithm,a three-layer parallelization of the computational procedure is introduced to enable the separation of the communication of inter-nodes,heterogeneous-core-groups(HCGs)and inside-heterogeneous-core-groups through mapping computing tasks to various hardware layers.This approach can not only achieve load balancing at different layers efficiently but can also improve the communication rate significantly through hierarchical communication.Additionally,the proposed hybrid parallel approach in this article can reduce the interface equation size and further reduce the solution time,which can make up for the shortcoming of growing communication overheads with the increase of interface equation size when employing CDDA.Moreover,the distributed sparse storage of a large amount of data is introduced to improve memory access.By solving benchmark instances on the Shenwei-Taihuzhiguang supercomputer,the results show that the proposed method can obtain higher speedup and parallel efficiency compared with CDDA and more superior extensibility of parallel partition compared with the two-level parallel computing algorithm(TPCA).
基金the National Natural Science Foundation of China(Nos.61572508,61272144,61303065and 61202121)the National High Technology Research and Development Program(863)of China(No.2012AA010905)+2 种基金the Research Project of National University of Defense Technology(No.JC13-06-02)the Doctoral Fund of Ministry of Education of China(No.20134307120028)the Research Fund for the Doctoral Program of Higher Education of China(No.20114307120013)
文摘The simulation is an important means of performance evaluation of the computer architecture. Nowadays, the serial simulation of general purpose graphics processing unit(GPGPU) architecture is the main bottleneck for the simulation speed. To address this issue, we propose the intra-kernel parallelization on a multicore processor and the inter-kernel parallelization on a multiple-machine platform. We apply these two methods to the GPGPU-sim simulator. The intra-kernel parallelization method firstly parallelizes the serial simulation of multiple compute units in one cycle. Then it parallelizes the timing and functional simulation to reduce the performance loss caused by the synchronization between different compute units. The inter-kernel parallelization method divides multiple kernels of a CUDA program into several groups and distributes these groups across multiple simulation hosts to perform the simulation. Experimental results show that the intra-kernel parallelization method achieves a speed-up of up to 12 with a maximum error rate of 0.009 4% on a 32-core machine, and the inter-kernel parallelization method can accelerate the simulation by a factor of up to 3.9 with a maximum error rate of 0.11% on four simulation hosts. The orthogonality between these two methods allows us to combine them together on multiple multi-core hosts to get further performance improvements.
文摘Multicore systems oftentimes use multiple levels of cache to bridge the gap between processor and memory speed.This paper presents a new design of a dedicated pipeline cache memory for multicore processors called dual port content addressable memory(DPCAM).In addition,it proposes a new replacement algorithm based on hardware which is called a near-far access replacement algorithm(NFRA)to reduce the cost overhead of the cache controller and improve the cache access latency.The experimental results indicated that the latency for write and read operations are significantly less in comparison with a set-associative cache memory.Moreover,it was shown that a latency of a read operation is nearly constant regardless of the size of DPCAM.However,an estimation of the power dissipation showed that DPCAM consumes about 7%greater than a set-associative cache memory of the same size.These results encourage for embedding DPCAM within the multicore processors as a small shared cache memory.
文摘In this paper, a hybrid neural-genetic fuzzy system is proposed to control the flow and height of water in the reservoirs of water transfer networks. These controls will avoid probable water wastes in the reservoirs and pressure drops in water distribution networks. The proposed approach combines the artificial neural network, genetic algorithm, and fuzzy inference system to improve the performance of the supervisory control and data acquisition stations through a new control philosophy for instruments and control valves in the reservoirs of the water transfer networks. First, a multi-core artificial neural network model, including a multi-layer perceptron and radial based function, is proposed to forecast the daily consumption of the water in a reservoir. A genetic algorithm is proposed to optimize the parameters of the artificial neural networks. Then, the online height of water in the reservoir and the output of artificial neural networks are used as inputs of a fuzzy inference system to estimate the flow rate of the reservoir inlet. Finally, the estimated inlet flow is translated into the input valve position using a transform control unit supported by a nonlinear autoregressive exogenous model. The proposed approach is applied in the Tehran water transfer network. The results of this study show that the usage of the proposed approach significantly reduces the deviation of the reservoir height from the desired levels.
基金supported by the National Natural Science Foundation of China(Grant No.11772192).
文摘The strict and high-standard requirements for the safety and stability ofmajor engineering systems make it a tough challenge for large-scale finite element modal analysis.At the same time,realizing the systematic analysis of the entire large structure of these engineering systems is extremely meaningful in practice.This article proposes a multilevel hierarchical parallel algorithm for large-scale finite element modal analysis to reduce the parallel computational efficiency loss when using heterogeneous multicore distributed storage computers in solving large-scale finite element modal analysis.Based on two-level partitioning and four-transformation strategies,the proposed algorithm not only improves the memory access rate through the sparsely distributed storage of a large amount of data but also reduces the solution time by reducing the scale of the generalized characteristic equation(GCEs).Moreover,a multilevel hierarchical parallelization approach is introduced during the computational procedure to enable the separation of the communication of inter-nodes,intra-nodes,heterogeneous core groups(HCGs),and inside HCGs through mapping computing tasks to various hardware layers.This method can efficiently achieve load balancing at different layers and significantly improve the communication rate through hierarchical communication.Therefore,it can enhance the efficiency of parallel computing of large-scale finite element modal analysis by fully exploiting the architecture characteristics of heterogeneous multicore clusters.Finally,typical numerical experiments were used to validate the correctness and efficiency of the proposedmethod.Then a parallel modal analysis example of the cross-river tunnel with over ten million degrees of freedom(DOFs)was performed,and ten-thousand core processors were applied to verify the feasibility of the algorithm.
基金Supported by the National Natural Science Foundation of China(No.61432018)
文摘3D reverse time migration in tiled transversly isotropic(3D RTM-TTI) is the most precise model for complex seismic imaging.However,vast computing time of 3D RTM-TTI prevents it from being widely used,which is addressed by providing parallel solutions for 3D RTM-TTI on multicores and many-cores.After data parallelism and memory optimization,the hot spot function of 3D RTMTTI gains 35.99 X speedup on two Intel Xeon CPUs,89.75 X speedup on one Intel Xeon Phi,89.92 X speedup on one NVIDIA K20 GPU compared with serial CPU baseline.This study makes RTM-TTI practical in industry.Since the computation pattern in RTM is stencil,the approaches also benefit a wide range of stencil-based applications.
文摘This thesis will present the research and practice of traffic lights and traffic signs recognition system based on multicore of FPGA. This system consists of four parts as following: the collection of dynamic images, the preprocessing of gray value, the detection of the edges and the patterning and the judgment of the pattern matching. The multiple cores system is consist of three cores. Each core parallels processes the incoming images from camera collection in terms of different colors and graphic elements. The image data read in from the camera works as the sharing data of the three cores.