Data prefetching is an effective data access latency hiding technique to mask the CPU stall caused by cache misses and to bridge the performance gap between processor and memory. With hardware and/or software support,...Data prefetching is an effective data access latency hiding technique to mask the CPU stall caused by cache misses and to bridge the performance gap between processor and memory. With hardware and/or software support, data prefetching brings data closer to a processor before it is actually needed. Many prefetching techniques have been developed for single-core processors. Recent developments in processor technology have brought multicore processors into mainstream. While some of the single-core prefetching techniques are directly applicable to multicore processors, numerous novel strategies have been proposed in the past few years to take advantage of multiple cores. This paper aims to provide a comprehensive review of the state-of-the-art prefetching techniques, and proposes a taxonomy that classifies various design concerns in developing a prefetching strategy, especially for multicore processors. We compare various existing methods through analysis as well.展开更多
Data pre-deployment in the HDFS (Hadoop distributed file systems) is more complicated than that in traditional file systems. There are many key issues need to be addressed, such as determining the target location of...Data pre-deployment in the HDFS (Hadoop distributed file systems) is more complicated than that in traditional file systems. There are many key issues need to be addressed, such as determining the target location of the data prefetching, the amount of data to be prefetched, the balance between data prefetching services and normal data accesses. Aiming to solve these problems, we employ the characteristics of digital ocean information service flows and propose a deployment scheme which combines input data prefetching with output data oriented storage strategies. The method achieves the parallelism of data preparation and data processing, thereby massively reducing I/O time cost of digital ocean cloud computing platforms when processing multi-source information synergistic tasks. The experimental results show that the scheme has a higher degree of parallelism than traditional Hadoop mechanisms, shortens the waiting time of a running service node, and significantly reduces data access conflicts.展开更多
With the advent of new computing paradigms,parallel file systems serve not only traditional scientific computing applications but also non-scientific computing applications,such as financial computing,business,and pub...With the advent of new computing paradigms,parallel file systems serve not only traditional scientific computing applications but also non-scientific computing applications,such as financial computing,business,and public administration.Parallel file systems provide storage services for multiple applications.As a result,various requirements need to be met.However,parallel file systems usually provide a unified storage solution,which cannot meet specific application needs.In this paper,an extended tile handle scheme is proposed to deal with this problem.The original file handle is extended to record I/O optimization information,which allows file systems to specify optimizations for a file or directory based on workload characteristics.Therefore,fine-grained management of I/O optimizations can be achieved.On the basis of the extended file handle scheme,data prefetching and small file optimization mechanisms are proposed for parallel file systems.The experimental results show that the proposed approach improves the aggregate throughput of the overall system by up to 189.75%.展开更多
基金supported in part by the National Science Foundation of USA under Grant Nos.EIA-0224377,CNS-0406328,CNS-0509118,and CCF-0621435.
文摘Data prefetching is an effective data access latency hiding technique to mask the CPU stall caused by cache misses and to bridge the performance gap between processor and memory. With hardware and/or software support, data prefetching brings data closer to a processor before it is actually needed. Many prefetching techniques have been developed for single-core processors. Recent developments in processor technology have brought multicore processors into mainstream. While some of the single-core prefetching techniques are directly applicable to multicore processors, numerous novel strategies have been proposed in the past few years to take advantage of multiple cores. This paper aims to provide a comprehensive review of the state-of-the-art prefetching techniques, and proposes a taxonomy that classifies various design concerns in developing a prefetching strategy, especially for multicore processors. We compare various existing methods through analysis as well.
基金The Ocean Public Welfare Scientific Research Project of State Oceanic Administration of China under contract No.20110533
文摘Data pre-deployment in the HDFS (Hadoop distributed file systems) is more complicated than that in traditional file systems. There are many key issues need to be addressed, such as determining the target location of the data prefetching, the amount of data to be prefetched, the balance between data prefetching services and normal data accesses. Aiming to solve these problems, we employ the characteristics of digital ocean information service flows and propose a deployment scheme which combines input data prefetching with output data oriented storage strategies. The method achieves the parallelism of data preparation and data processing, thereby massively reducing I/O time cost of digital ocean cloud computing platforms when processing multi-source information synergistic tasks. The experimental results show that the scheme has a higher degree of parallelism than traditional Hadoop mechanisms, shortens the waiting time of a running service node, and significantly reduces data access conflicts.
基金supported by the National key R&D Program of China(2018YFB0203901)the National Natural Science Foundation of China(Grant No.61772053)+1 种基金the Science Challenge Project,No.TZ2016002the fund of the State Key Laboratory of Software Development Environment(SKLSDE-2017ZX-10)。
文摘With the advent of new computing paradigms,parallel file systems serve not only traditional scientific computing applications but also non-scientific computing applications,such as financial computing,business,and public administration.Parallel file systems provide storage services for multiple applications.As a result,various requirements need to be met.However,parallel file systems usually provide a unified storage solution,which cannot meet specific application needs.In this paper,an extended tile handle scheme is proposed to deal with this problem.The original file handle is extended to record I/O optimization information,which allows file systems to specify optimizations for a file or directory based on workload characteristics.Therefore,fine-grained management of I/O optimizations can be achieved.On the basis of the extended file handle scheme,data prefetching and small file optimization mechanisms are proposed for parallel file systems.The experimental results show that the proposed approach improves the aggregate throughput of the overall system by up to 189.75%.