Massive information flows are gen- erated from interactive processing and visua- lizations. To efficiently support information transmission over the Interact, information cen- tric architecture has been recently propo...Massive information flows are gen- erated from interactive processing and visua- lizations. To efficiently support information transmission over the Interact, information cen- tric architecture has been recently proposed. In this paper, we consider an information centric architecture, called the data centric networking architecture to provide communication servi- ces to big data, where a service identifier is used to name the data objects. We propose dif- ferent approaches for the dissemination of data objects in a large-scale data centric network. In particular, we propose various approaches to link the data dissemination approach with the topology of the Internet. Further, we eva- luate the proposed approaches with respect to data delivery efficiency, round-trip time imp- rovement, and deployment cost. Based on the results obtained from this study, it can be sh- own that by disseminating data objects to small ISPs, the data delivery efficiency can be significantly improved within an acceptable deployment cost.展开更多
GPUs become a ubiquitous choice as coprocessors since they have excellent ability in concurrent processing. In GPU architecture, shared memory plays a very important role in system performance as it can largely improv...GPUs become a ubiquitous choice as coprocessors since they have excellent ability in concurrent processing. In GPU architecture, shared memory plays a very important role in system performance as it can largely improve bandwidth utilization and accelerate memory operations. However, even for affine GPU applications that contain regular access patterns, optimizing for shared memory is not an easy work. It often requires programmer expertise and nontrivial parameter selection. Improper shared memory usage might even underutilize GPU resource: Even using state-of-the-art high level programming models (e.g., OpenACC and OpenHMPP), it is still hard to utilize shared memory since they lack inherent support in describing shared memory optimization and selecting suitable parameters, let alone maintaining high resource utilization. Targeting higher productivity for affine applications, we propose a data centric way to shared memory optimization on GPU. We design a pragma extension on OpenACC so as to convey data management hints of programmers to compiler. Meanwhile, we devise a compiler framework to automatically select optimal parameters for shared arrays, using the polyhedral model. We further propose optimization techniques to expose higher memory and instruction level parallelism. The experimental results show that our shared memory centric approaches effectively improve the performance of five typical GPU applications across four widely used platforms by 3.7x on average, and do not burden programmers with lots of pragmas.展开更多
基金supported by the National Science and Technology Major Projects of the Ministry of Science and Technology of China under Grant No.2012ZX03005003the State Key Program of National Natural Science of China under Grant No.61232017+3 种基金the National Basic Research Program of China(973 Program)under Grant No.2013CB329101the National Natural Science Foundation of China under Grants No.61102049,No.61271202the Beijing Natural Science Foundation underGrants No.4132053,No.4122060the Scientific Research Foundation of the Returned Overseas Chinese Scholars of State Education Ministry under Grant No.W13C300010
文摘Massive information flows are gen- erated from interactive processing and visua- lizations. To efficiently support information transmission over the Interact, information cen- tric architecture has been recently proposed. In this paper, we consider an information centric architecture, called the data centric networking architecture to provide communication servi- ces to big data, where a service identifier is used to name the data objects. We propose dif- ferent approaches for the dissemination of data objects in a large-scale data centric network. In particular, we propose various approaches to link the data dissemination approach with the topology of the Internet. Further, we eva- luate the proposed approaches with respect to data delivery efficiency, round-trip time imp- rovement, and deployment cost. Based on the results obtained from this study, it can be sh- own that by disseminating data objects to small ISPs, the data delivery efficiency can be significantly improved within an acceptable deployment cost.
基金This work was supported by the National High Technology Research and Development 863 Program of China under Grant No. 2012AA010902, the National Natural Science Foundation of China (NSFC) under Grant No. 61432018, and the Innovation Research Group of NSFC under Grant No. 61221062.
文摘GPUs become a ubiquitous choice as coprocessors since they have excellent ability in concurrent processing. In GPU architecture, shared memory plays a very important role in system performance as it can largely improve bandwidth utilization and accelerate memory operations. However, even for affine GPU applications that contain regular access patterns, optimizing for shared memory is not an easy work. It often requires programmer expertise and nontrivial parameter selection. Improper shared memory usage might even underutilize GPU resource: Even using state-of-the-art high level programming models (e.g., OpenACC and OpenHMPP), it is still hard to utilize shared memory since they lack inherent support in describing shared memory optimization and selecting suitable parameters, let alone maintaining high resource utilization. Targeting higher productivity for affine applications, we propose a data centric way to shared memory optimization on GPU. We design a pragma extension on OpenACC so as to convey data management hints of programmers to compiler. Meanwhile, we devise a compiler framework to automatically select optimal parameters for shared arrays, using the polyhedral model. We further propose optimization techniques to expose higher memory and instruction level parallelism. The experimental results show that our shared memory centric approaches effectively improve the performance of five typical GPU applications across four widely used platforms by 3.7x on average, and do not burden programmers with lots of pragmas.