期刊文献+
共找到2篇文章
< 1 >
每页显示 20 50 100
Optimized Parallel Execution of Declarative Programs on Distributed Memory Multiprocessors
1
作者 沈美明 田新民 +2 位作者 王鼎兴 郑纬民 温冬婵 《Journal of Computer Science & Technology》 SCIE EI CSCD 1993年第3期233-242,共10页
In this paper,we focus on the compiling implementation of parallel logic language PARLOG and functional language ML on distributed memory multiprocessors.Under the graph rewriting framework, a Heterogeneous Parallel G... In this paper,we focus on the compiling implementation of parallel logic language PARLOG and functional language ML on distributed memory multiprocessors.Under the graph rewriting framework, a Heterogeneous Parallel Graph Rewriting Execution Model(HPGREM)is presented firstly.Then based on HPGREM,a parallel abstract machine PAM/TGR is described.Furthermore,several optimizing compilation schemes for executing declarative programs on transputer array are proposed. The performance statistics on a transputer array demonstrate the effectiveness of our model,parallel ab- stract machine,optimizing compilation strategies and compiler. 展开更多
关键词 Declarative language parallel graph rewriting execution model optimized parallel compiler distributed memory multiprocessors parallel abstract machine
原文传递
Efficient Handling of Lock Hand-off in DSM Multiprocessors with Buffering Coherence Controllers 被引量:1
2
作者 Benjamín Sahelices Agustín de Dios +2 位作者 Pablo Ibáez Víctor Vials-Yúfera José María Llabería 《Journal of Computer Science & Technology》 SCIE EI CSCD 2012年第1期75-91,共17页
Synchronization in parallel programs is a major performance bottleneck in multiprocessor systems. Shared data is protected by locks and a lot of time is spent on the competition arising at the lock hand-off. In order ... Synchronization in parallel programs is a major performance bottleneck in multiprocessor systems. Shared data is protected by locks and a lot of time is spent on the competition arising at the lock hand-off. In order to be serialized, requests to the same cache line can either be bounced (NACKed) or buffered in the coherence controller. In this paper, we focus mainly on systems whose coherence controllers buffer requests. In a lock hand-off, a burst of requests to the same line arrive at the coherence controller. During lock hand-off only the requests from the winning processor contribute to progress of the computation, since the winning processor is the only one that will advance the work. This key observation leads us to propose a hardware mechanism we call request bypassing, which allows requests from the winning processor to bypass the requests buffered in the coherence controller keeping the lock line. We present an inexpensive implementation of request bypassing that reduces the time spent on all the execution phases of a critical section (acquiring the lock, accessing shared data, and releasing the lock) and which, as a consequence, speeds up the whole parallel computation. This mechanism requires neither compiler or programmer support nor ISA or coherence protocol changes. By simulating a 32-processor system, we show that using request bypassing does not degrade but rather improves performance in three applications with low synchronization rates, while in those having a large amount of synchronization activity (the remaining four), we see reductions in execution time and in lock stall time ranging from 14% to 39% and from 52% to 7170, respectively. We compare request bypassing with a previously proposed technique called read combining and with a system that bounces requests, observing a significantly lower execution time with the bypassing scheme. Finally, we analyze the sensitivity of our results to some key hardware and software parameters. 展开更多
关键词 distributed shared memory multiprocessors synchronization buffer coherence controller request bypass
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部