This paper describes a method for building hot snapshot copy based on windows-file system (HSCF). The architecture and running mechanism of HSCF are discussed after giving a comparison with other on-line backup tecb...This paper describes a method for building hot snapshot copy based on windows-file system (HSCF). The architecture and running mechanism of HSCF are discussed after giving a comparison with other on-line backup tecbnology. HSCF, based on a file system filter driver, protects computer data and ensures their integrity and consistency with following three steps: access to open files, synchronization and copy on-write. Its strategies for improving system performance are analyzed including priority setting, incremental snapshot and load balance. HSCF is a new kind of snapshot technology to solve the data integrity and consistency problem in online backup, which is different from other storage-level snapshot and Open File Solution.展开更多
In this paper, we explored a load-balancing algorithm in a cluster file system contains two levels of metadata-server, primary-level server quickly distributestasks to second-level servers depending on the closest loa...In this paper, we explored a load-balancing algorithm in a cluster file system contains two levels of metadata-server, primary-level server quickly distributestasks to second-level servers depending on the closest load-balancing information. At the same time, we explored a method which accurately reflect I/O traffic and storage of storage-node: computing the heat-value of file, according to which we realized a more logical storage allocation. According to the experiment result, we conclude that this new algorithm shortens the executing time of tasks and improves the system performance compared with other load algorithm.展开更多
Data layout in a file system is the organization of data stored in external storages. The data layout has a huge impact on performance of storage systems. We survey three main kinds of data layout in traditional file ...Data layout in a file system is the organization of data stored in external storages. The data layout has a huge impact on performance of storage systems. We survey three main kinds of data layout in traditional file systems: in-place update file system, log-structured file system, and copy-on-write file sys- tem. Each file system has its own strengths and weaknesses under different circumstances. We also include a recent us- age of persistent layout in a file system that combines both flash memory and byte- addressable non- volatile memory. With this survey, we conclude that persistent data layout in file systems may evolve dramatically in the era of emerging non-volatile memory.展开更多
Working with files and the safety of information has always been relevant, especially in financial institutions where the requirements for the safety of information and security are especially important. And in today...Working with files and the safety of information has always been relevant, especially in financial institutions where the requirements for the safety of information and security are especially important. And in today’s conditions, when an earthquake can destroy the floor of a city in an instant, or when a missile hits an office and all servers turn into scrap metal, the issue of data safety becomes especially important. Also, you can’t put the cost of the software and the convenience of working with files in last place. Especially if an office worker needs to find the necessary information on a client, a financial contract or a company’s financial product in a few seconds. Also, during the operation of computer equipment, failures are possible, and some of them can lead to partial or complete loss of information. In this paper, it is proposed to create another level of abstraction for working with the file system, which will be based on a relational database as a storage of objects and access rights to objects. Also considered are possible protocols for transferring data to other programs that work with files, these can be both small sites and the operating system itself. This article will be especially interesting for financial institutions or companies operating in the banking sector. The purpose of this article is an attempt to introduce another level of abstraction for working with files. A level that is completely abstracted from the storage medium.展开更多
File systems are fundamental for computers and devices with data storage units. They allow operating systems to understand and organize streams of bytes and obtain readable files from them. There are numerous file sys...File systems are fundamental for computers and devices with data storage units. They allow operating systems to understand and organize streams of bytes and obtain readable files from them. There are numerous file systems available in the industry, all with their own unique features. Understanding how these file systems work is essential for computer science students, but their complex nature can be difficult and challenging to grasp, especially for students at the beginning of their career. The Zion File System Simulator was designed with this in mind. Zion is a teaching and experimenting tool, in the form of a small application, built to help students understand how the I/O manager of an operating system interacts with the drive through the file system. Users can see and analyze the structure of a simple, flat file system provided with Zion, or simulate the most common structures such as FAT or NTFS. Students can also create their own implementations and run them through the simulator to analyze the different behaviors. Zion runs on Windows, and the application is provided with dynamic-link libraries that include the interfaces of a file system and a volume manager. These interfaces allow programmers to build their own file system or volume manager in Visual Studio using any .NET language (3.0 or above). Zion gives the users the power to adjust simulated architectural parameters such as volume and block size, or performance factors such as seek and transfer time. Zion runs workloads of I/O operations such as “create,” “delete,” “read,” and “write,” and analyzes the resulting metrics including I/O operations, read/write time, and disk fragmentation. Zion is a learning tool. It is not designed for measuring accurate performance of file systems and volume managers. The robustness of the application, together with its expandability, makes Zion a potential laboratory tool for computer science classes, helping students learn how file systems work and interact with an operating system.展开更多
One of the most critical threats to the reliability and robustness for file system is harboring bug (silent data corruption). In this research we focus on checksum mismatch since it occurs not only in the user data bu...One of the most critical threats to the reliability and robustness for file system is harboring bug (silent data corruption). In this research we focus on checksum mismatch since it occurs not only in the user data but also in file system. Our proposed solution has the ability to check this bug in file system of Linux. In our proposed solution there is no need to invoke or revoke checker utility, it comes as the integrated part of file system and has the ability to check upcoming updates before harboring bug make unrecoverable changes that leads significant data loses. Demonstration testing shows satisfactory results in file server and web server environments in terms of less memory consumption and avoidable delay in system’s updating.展开更多
Big data are always processed repeatedly with small changes, which is a major form of big data processing. The feature of incremental change of big data shows that incremental computing mode can improve the performanc...Big data are always processed repeatedly with small changes, which is a major form of big data processing. The feature of incremental change of big data shows that incremental computing mode can improve the performance greatly. HDFS is a distributed file system on Hadoop which is the most popular platform for big data analytics. And HDFS adopts fixed-size chunking policy, which is inefficient facing incremental computing. Therefore, in this paper, we proposed iHDFS (incremental HDFS), a distributed file system, which can provide basic guarantee for big data parallel processing. The iHDFS is implemented as an extension to HDFS. In iHDFS, Rabin fingerprint algorithm is applied to achieve content defined chunking. This policy make data chunking has much higher stability, and the intermediate processing results can be reused efficiently, so the performance of incremental data processing can be improved significantly. The effectiveness and efficiency of iHDFS have been demonstrated by the experimental results.展开更多
Many enterprises and personals are inclining to outsource their data to public clouds, but security and privacy are two critical problems cannot be ignored. The door of cloud provider may be broken, and the data may a...Many enterprises and personals are inclining to outsource their data to public clouds, but security and privacy are two critical problems cannot be ignored. The door of cloud provider may be broken, and the data may also be dug into by providers to find valuable information. In this paper, a secure and efficient storage file (SES FS) system is proposed to distribute files in several clouds and allows users to search the files securely and efficiently. In the proposed system, keywords were transformed into integers and secretly shared in a defined finite field, then the shares were mapped to random numbers in specified random domain in each cloud. Files were encrypted with distinct secret key and scattered within different clouds. Information about keyword/file was secretly shared among cloud providers. Legal users can search in the clouds to find correct encrypted files and reconstruct corresponding secret key. No adversary can find or detect the real file information even they can collude all the servers. Manipulation on shares by one or more clouds can be detected with high probability. The system can also detect malicious servers through introduced virtual points. One interesting property for the scheme is that new keywords can be added easily, which is difficult and usually not efficient for many searchable symmetric encryption systems. Detailed experimental result shows, with tolerable uploading delay, the scheme exhibits excellent performance on data retrieving aspect.展开更多
Hadoop framework emerged at the right moment when traditional tools were powerless in terms of handling big data. Hadoop Distributed File System (HDFS) which serves as a highly fault-tolerance distributed file system ...Hadoop framework emerged at the right moment when traditional tools were powerless in terms of handling big data. Hadoop Distributed File System (HDFS) which serves as a highly fault-tolerance distributed file system in Hadoop, can improve the throughput of data access effectively. It is very suitable for the application of handling large amounts of datasets. However, Hadoop has the disadvantage that the memory usage rate in NameNode is so high when processing large amounts of small files that it has become the limit of the whole system. In this paper, we propose an approach to optimize the performance of HDFS with small files. The basic idea is to merge small files into a large one whose size is suitable for a block. Furthermore, indexes are built to meet the requirements for fast access to all files in HDFS. Preliminary experiment results show that our approach achieves better performance.展开更多
Persistent memory(PM)allows file systems to directly persist data on the memory bus.To increase the capacity of PM file systems,building a file system across sockets with each attached PM is attractive.However,accessi...Persistent memory(PM)allows file systems to directly persist data on the memory bus.To increase the capacity of PM file systems,building a file system across sockets with each attached PM is attractive.However,accessing data across sockets incurs impacts of the non-uniform memory access(NUMA)architecture,which will lead to significant performance degradation.In this paper,we first use experiments to understand the NUMA impacts on building PM file systems.And then,we propose four design principles for building a high-performance PM file system NapFS for the NUMA architecture.We architect NapFS with per-socket local PM file systems and per-socket dedicated IO thread pools.This not only allows applications to delegate data accesses to IO threads for avoiding remote PM accesses,but also fully reuses existing single-socket PM file systems to reduce implementation complexity.Additionally,NapFS utilizes fast DRAM to accelerate performance by adding a global cache and adopts a selective cache mechanism to eliminate the redundant double-copy overhead for synchronization operations.Lastly,we show that NapFS can adopt extended optimizations to improve scalability and the performance of critical requests.We evaluate NapFS against other multi-socket PM file systems.The evaluation results show that NapFS achieves 2.2x and 1.0x throughput improvement for Filebench and RocksDB,respectively.展开更多
The healthcare sector involves many steps to ensure efficient care for patients,such as appointment scheduling,consultation plans,online follow-up,and more.However,existing healthcare mechanisms are unable to facilita...The healthcare sector involves many steps to ensure efficient care for patients,such as appointment scheduling,consultation plans,online follow-up,and more.However,existing healthcare mechanisms are unable to facilitate a large number of patients,as these systems are centralized and hence vulnerable to various issues,including single points of failure,performance bottlenecks,and substantial monetary costs.Furthermore,these mechanisms are unable to provide an efficient mechanism for saving data against unauthorized access.To address these issues,this study proposes a blockchain-based authentication mechanism that authenticates all healthcare stakeholders based on their credentials.Furthermore,also utilize the capabilities of the InterPlanetary File System(IPFS)to store the Electronic Health Record(EHR)in a distributed way.This IPFS platform addresses not only the issue of high data storage costs on blockchain but also the issue of a single point of failure in the traditional centralized data storage model.The simulation results demonstrate that our model outperforms the benchmark schemes and provides an efficient mechanism for managing healthcare sector operations.The results show that it takes approximately 3.5 s for the smart contract to authenticate the node and provide it with the decryption key,which is ultimately used to access the data.The simulation results show that our proposed model outperforms existing solutions in terms of execution time and scalability.The execution time of our model smart contract is around 9000 transactions in just 6.5 s,while benchmark schemes require approximately 7 s for the same number of transactions.展开更多
FastDu is a file system service that tracks file system changes by intercepting file system calls to maintain directory summaries, which play important roles in both storage administration and improvement of user expe...FastDu is a file system service that tracks file system changes by intercepting file system calls to maintain directory summaries, which play important roles in both storage administration and improvement of user experiences for some applications. In most circumstances, directory summaries are independently harvested by applications via traversing the file system hierarchy and calling stat 0 on every file in each directory. For large file systems, this brute-force traverse-based approach can take many hours to complete, even if only a small percentage of the files have changed. This paper describes FastDu, which uses a pre-built database to store harvested directory summaries, and tracks the file system changes by intercept- ing file system calls, so that new harvesting is restricted to the small subset of directories that contain modified files. Tests using FastDu show that this approach reduces the time needed to get a directory summary by one or two orders of magnitude with almost negligible penalty to application-aware file system performance.展开更多
Driven by the increasing requirements of high-performance computing applications,supercomputers are prone to containing more and more computing nodes.Applications running on such a large-scale computing system are lik...Driven by the increasing requirements of high-performance computing applications,supercomputers are prone to containing more and more computing nodes.Applications running on such a large-scale computing system are likely to spawn millions of parallel processes,which usually generate a burst of I/O requests,introducing a great challenge into the metadata management of underlying parallel file systems.The traditional method used to overcome such a challenge is adopting multiple metadata servers in the scale-out manner,which will inevitably confront with serious network and consistence problems.This work instead pursues to enhance the metadata performance in the scale-up manner.Specifically,we propose to improve the performance of each individual metadata server by employing GPU to handle metadata requests in parallel.Our proposal designs a novel metadata server architecture,which employs CPU to interact with file system clients,while offloading the computing tasks about metadata into GPU.To take full advantages of the parallelism existing in GPU,we redesign the in-memory data structure for the name space of file systems.The new data structure can perfectly fit to the memory architecture of GPU,and thus helps to exploit the large number of parallel threads within GPU to serve the bursty metadata requests concurrently.We implement a prototype based on BeeGFS and conduct extensive experiments to evaluate our proposal,and the experimental results demonstrate that our GPU-based solution outperforms the CPU-based scheme by more than 50%under typical metadata operations.The superiority is strengthened further on high concurrent scenarios,e.g.,the high-performance computing systems supporting millions of parallel threads.展开更多
Distributed metadata consistency is one of the critical issues of metadata clusters in distributed file systems. Existing methods to maintain metadata consistency generally need several log forced write operations. Si...Distributed metadata consistency is one of the critical issues of metadata clusters in distributed file systems. Existing methods to maintain metadata consistency generally need several log forced write operations. Since synchronous disk IO is very inefficient, the average response time of metadata operations is greatly increased. In this paper, an asynchronous atomic commit protocol (ACP) named Dual-Log (DL) is presented. It does not need any log forced write operations. Optimizing for distributed metadata operations involving only two metadata servers, DL mutually records the redo log in counterpart metadata servers by transferring through the low latency network. A crashed metadata server can redo the metadata operation with the redundant redo log. Since the latency of the network is much lower than the latency of disk IO, DL can improve the performance of distributed metadata service significantly. The prototype of DL is implemented based on local journal. The performance is tested by comparing with two widely used protocols, EP and S2PC-MP, and the results show that the average response time of distributed metadata operations is reduced by about 40%-60%, and the recovery time is only I second under 10 thousands uncompleted distributed metadata operations.展开更多
Hadoop Distributed File System(HDFS)is one of the widely used distributed file systems in big data analysis for frameworks such as Hadoop.HDFS allows one to manage large volumes of data using low-cost commodity hardwa...Hadoop Distributed File System(HDFS)is one of the widely used distributed file systems in big data analysis for frameworks such as Hadoop.HDFS allows one to manage large volumes of data using low-cost commodity hardware.However,vulnerabilities in HDFS can be exploited for nefarious activities.This reinforces the importance of ensuring robust security to facilitate file sharing in Hadoop as well as having a trusted mechanism to check the authenticity of shared files.This is the focus of this paper,where we aim to improve the security of HDFS using a blockchain-enabled approach(hereafter referred to as BlockHDFS).Specifically,the proposed BlockHDFS uses the enterprise-level Hyperledger Fabric platform to capitalize on files'metadata for building trusted data security and traceability in HDFS.展开更多
Byte-addressable non-volatile memory(NVM),as a new participant in the storage hierarchy,gives extremely high performance in storage,which forces changes to be made on current filesystem designs.Page cache,once a signi...Byte-addressable non-volatile memory(NVM),as a new participant in the storage hierarchy,gives extremely high performance in storage,which forces changes to be made on current filesystem designs.Page cache,once a significant mechanism filling the performance gap between Dynamic Random Access Memory(DRAM)and block devices,is now a liability that heavily hinders the writing performance of NVM filesystems.Therefore state-of-the-art NVM filesystems leverage the direct access(DAX)technology to bypass the page cache entirely.However,the DRAM still provides higher bandwidth than NVM,which prevents skewed read workloads from benefiting from a higher bandwidth of the DRAM and leads to sub-optimal performance for the system.In this paper,we propose RCache,a readintensive workload-aware page cache for NVM filesystems.Different from traditional caching mechanisms where all reads go through DRAM,RCache uses a tiered page cache design,including assigning DRAM and NVM to hot and cold data separately,and reading data from both sides.To avoid copying data to DRAM in a critical path,RCache migrates data from NVM to DRAM in a background thread.Additionally,RCache manages data in DRAM in a lock-free manner for better latency and scalability.Evaluations on Intel Optane Data Center(DC)Persistent Memory Modules show that,compared with NOVA,RCache achieves 3 times higher bandwidth for read-intensive workloads and introduces little performance loss for write operations.展开更多
An adaptive dynamic load balancing algorithm based on QoS is proposed to improve the performance of load balancing in distributed file system,combining the advantages of a variety of load balancing algorithms.The new ...An adaptive dynamic load balancing algorithm based on QoS is proposed to improve the performance of load balancing in distributed file system,combining the advantages of a variety of load balancing algorithms.The new algorithm uses a tuple containing the number of files and the total file size as the QoS measure for the requested task.The master node sets a threshold for the requested task based on the QoS to filter storage nodes that meet the requirements of the task.In order to guarantee the reliability of the new algorithm,we consider the impact of CPU utilization,memory usage,disk IO occupancy rate,network bandwidth usage and hard disk usage on load balancing performance when calculating the real-time load balancing of storage nodes.The heterogeneity of the network is considered when the master node schedule task assignments to ensure the fairness of the algorithm.The comprehensive evaluation value is determined based the performance load ratio,which is calculated from the real-time load value of the storage node and a performance value after normalization.The master node assigns tasks to the storage node with the highest comprehensive evaluation value.The storage nodes provide adaptive feedback based on changes in the degree of connectivity,rather than periodic update of the load information.The actual distributed file system environment is set up on the server cluster,the performance of the new algorithm is tested through a contrast experiment.The experimental results show that the new algorithm can effectively reduce the average response time of the system,improve throughput,and enable the system load to reach a good balance.展开更多
Existing in-kernel distributed file systems cannot cope with the higher requirements in well- equipped cluster environments, especially when the system becomes larger and inevitably heterogeneous. TH-CluFS is a clus...Existing in-kernel distributed file systems cannot cope with the higher requirements in well- equipped cluster environments, especially when the system becomes larger and inevitably heterogeneous. TH-CluFS is a cluster file system designed for large heterogeneous systems. TH-CluFS is implemented completely in the user space by emulating the network file system (NFS) V2 server, and is easily portable to other portable operating system interface (POSIX)-compliant platforms with application programming/binary interface API/ABI compliance. In addition, TH-CluFS uses a serverless architecture which flexibly distributes data at file granularity and achieves a consistent file system view from distributed metadata. The global cache makes full use of the aggregated memories and disks in the cluster to optimize system performance. Experimental results suggest that although TH-CluFS is implemented as user-level components, it functions as a portable, single system image, and scalable cluster file system with acceptable performance sacrifices.展开更多
基金Supported by the National Natural Science Foun-dation of China (60473023) National Innovation Foundation forSmall Technology Based Firms(04C26214201280)
文摘This paper describes a method for building hot snapshot copy based on windows-file system (HSCF). The architecture and running mechanism of HSCF are discussed after giving a comparison with other on-line backup tecbnology. HSCF, based on a file system filter driver, protects computer data and ensures their integrity and consistency with following three steps: access to open files, synchronization and copy on-write. Its strategies for improving system performance are analyzed including priority setting, incremental snapshot and load balance. HSCF is a new kind of snapshot technology to solve the data integrity and consistency problem in online backup, which is different from other storage-level snapshot and Open File Solution.
基金Supported by the Industrialized Foundation ofHebei Province(020501) the Natural Science Foundation of HebeiUniversity(2005Q04)
文摘In this paper, we explored a load-balancing algorithm in a cluster file system contains two levels of metadata-server, primary-level server quickly distributestasks to second-level servers depending on the closest load-balancing information. At the same time, we explored a method which accurately reflect I/O traffic and storage of storage-node: computing the heat-value of file, according to which we realized a more logical storage allocation. According to the experiment result, we conclude that this new algorithm shortens the executing time of tasks and improves the system performance compared with other load algorithm.
基金supported by ZTE Industry-Academia-Research Cooperation Funds
文摘Data layout in a file system is the organization of data stored in external storages. The data layout has a huge impact on performance of storage systems. We survey three main kinds of data layout in traditional file systems: in-place update file system, log-structured file system, and copy-on-write file sys- tem. Each file system has its own strengths and weaknesses under different circumstances. We also include a recent us- age of persistent layout in a file system that combines both flash memory and byte- addressable non- volatile memory. With this survey, we conclude that persistent data layout in file systems may evolve dramatically in the era of emerging non-volatile memory.
文摘Working with files and the safety of information has always been relevant, especially in financial institutions where the requirements for the safety of information and security are especially important. And in today’s conditions, when an earthquake can destroy the floor of a city in an instant, or when a missile hits an office and all servers turn into scrap metal, the issue of data safety becomes especially important. Also, you can’t put the cost of the software and the convenience of working with files in last place. Especially if an office worker needs to find the necessary information on a client, a financial contract or a company’s financial product in a few seconds. Also, during the operation of computer equipment, failures are possible, and some of them can lead to partial or complete loss of information. In this paper, it is proposed to create another level of abstraction for working with the file system, which will be based on a relational database as a storage of objects and access rights to objects. Also considered are possible protocols for transferring data to other programs that work with files, these can be both small sites and the operating system itself. This article will be especially interesting for financial institutions or companies operating in the banking sector. The purpose of this article is an attempt to introduce another level of abstraction for working with files. A level that is completely abstracted from the storage medium.
文摘File systems are fundamental for computers and devices with data storage units. They allow operating systems to understand and organize streams of bytes and obtain readable files from them. There are numerous file systems available in the industry, all with their own unique features. Understanding how these file systems work is essential for computer science students, but their complex nature can be difficult and challenging to grasp, especially for students at the beginning of their career. The Zion File System Simulator was designed with this in mind. Zion is a teaching and experimenting tool, in the form of a small application, built to help students understand how the I/O manager of an operating system interacts with the drive through the file system. Users can see and analyze the structure of a simple, flat file system provided with Zion, or simulate the most common structures such as FAT or NTFS. Students can also create their own implementations and run them through the simulator to analyze the different behaviors. Zion runs on Windows, and the application is provided with dynamic-link libraries that include the interfaces of a file system and a volume manager. These interfaces allow programmers to build their own file system or volume manager in Visual Studio using any .NET language (3.0 or above). Zion gives the users the power to adjust simulated architectural parameters such as volume and block size, or performance factors such as seek and transfer time. Zion runs workloads of I/O operations such as “create,” “delete,” “read,” and “write,” and analyzes the resulting metrics including I/O operations, read/write time, and disk fragmentation. Zion is a learning tool. It is not designed for measuring accurate performance of file systems and volume managers. The robustness of the application, together with its expandability, makes Zion a potential laboratory tool for computer science classes, helping students learn how file systems work and interact with an operating system.
文摘One of the most critical threats to the reliability and robustness for file system is harboring bug (silent data corruption). In this research we focus on checksum mismatch since it occurs not only in the user data but also in file system. Our proposed solution has the ability to check this bug in file system of Linux. In our proposed solution there is no need to invoke or revoke checker utility, it comes as the integrated part of file system and has the ability to check upcoming updates before harboring bug make unrecoverable changes that leads significant data loses. Demonstration testing shows satisfactory results in file server and web server environments in terms of less memory consumption and avoidable delay in system’s updating.
文摘Big data are always processed repeatedly with small changes, which is a major form of big data processing. The feature of incremental change of big data shows that incremental computing mode can improve the performance greatly. HDFS is a distributed file system on Hadoop which is the most popular platform for big data analytics. And HDFS adopts fixed-size chunking policy, which is inefficient facing incremental computing. Therefore, in this paper, we proposed iHDFS (incremental HDFS), a distributed file system, which can provide basic guarantee for big data parallel processing. The iHDFS is implemented as an extension to HDFS. In iHDFS, Rabin fingerprint algorithm is applied to achieve content defined chunking. This policy make data chunking has much higher stability, and the intermediate processing results can be reused efficiently, so the performance of incremental data processing can be improved significantly. The effectiveness and efficiency of iHDFS have been demonstrated by the experimental results.
基金Demonstration on the Construction of Guangdong Survey and Geomatics Industry Technology Innovation Alliance (2017B090907030)The Demonstration of Big Data Application for Land Resource Management and Service (2015B010110006)+3 种基金Qiong Huang is supported by Guangdong Natural Science Funds for Distinguished Young Scholar (No. 2014A030306021)Guangdong Program for Special Support of Top-notch Young Professionals (No. 2015TQ01X796)Pearl River Nova Program of Guangzhou (No. 201610010037)and the National Natural Science Foundation of China (Nos. 61472146, 61672242).
文摘Many enterprises and personals are inclining to outsource their data to public clouds, but security and privacy are two critical problems cannot be ignored. The door of cloud provider may be broken, and the data may also be dug into by providers to find valuable information. In this paper, a secure and efficient storage file (SES FS) system is proposed to distribute files in several clouds and allows users to search the files securely and efficiently. In the proposed system, keywords were transformed into integers and secretly shared in a defined finite field, then the shares were mapped to random numbers in specified random domain in each cloud. Files were encrypted with distinct secret key and scattered within different clouds. Information about keyword/file was secretly shared among cloud providers. Legal users can search in the clouds to find correct encrypted files and reconstruct corresponding secret key. No adversary can find or detect the real file information even they can collude all the servers. Manipulation on shares by one or more clouds can be detected with high probability. The system can also detect malicious servers through introduced virtual points. One interesting property for the scheme is that new keywords can be added easily, which is difficult and usually not efficient for many searchable symmetric encryption systems. Detailed experimental result shows, with tolerable uploading delay, the scheme exhibits excellent performance on data retrieving aspect.
文摘Hadoop framework emerged at the right moment when traditional tools were powerless in terms of handling big data. Hadoop Distributed File System (HDFS) which serves as a highly fault-tolerance distributed file system in Hadoop, can improve the throughput of data access effectively. It is very suitable for the application of handling large amounts of datasets. However, Hadoop has the disadvantage that the memory usage rate in NameNode is so high when processing large amounts of small files that it has become the limit of the whole system. In this paper, we propose an approach to optimize the performance of HDFS with small files. The basic idea is to merge small files into a large one whose size is suitable for a block. Furthermore, indexes are built to meet the requirements for fast access to all files in HDFS. Preliminary experiment results show that our approach achieves better performance.
基金supported by the Major Research Plan of the National Natural Science Foundation of China under Grant No.92270202the Strategic Priority Research Program of the Chinese Academy of Sciences under Grant No.XDB44030200.
文摘Persistent memory(PM)allows file systems to directly persist data on the memory bus.To increase the capacity of PM file systems,building a file system across sockets with each attached PM is attractive.However,accessing data across sockets incurs impacts of the non-uniform memory access(NUMA)architecture,which will lead to significant performance degradation.In this paper,we first use experiments to understand the NUMA impacts on building PM file systems.And then,we propose four design principles for building a high-performance PM file system NapFS for the NUMA architecture.We architect NapFS with per-socket local PM file systems and per-socket dedicated IO thread pools.This not only allows applications to delegate data accesses to IO threads for avoiding remote PM accesses,but also fully reuses existing single-socket PM file systems to reduce implementation complexity.Additionally,NapFS utilizes fast DRAM to accelerate performance by adding a global cache and adopts a selective cache mechanism to eliminate the redundant double-copy overhead for synchronization operations.Lastly,we show that NapFS can adopt extended optimizations to improve scalability and the performance of critical requests.We evaluate NapFS against other multi-socket PM file systems.The evaluation results show that NapFS achieves 2.2x and 1.0x throughput improvement for Filebench and RocksDB,respectively.
基金supported by the Ongoing Research Funding program(ORF-2025-636),King Saud University,Riyadh,Saudi Arabia.
文摘The healthcare sector involves many steps to ensure efficient care for patients,such as appointment scheduling,consultation plans,online follow-up,and more.However,existing healthcare mechanisms are unable to facilitate a large number of patients,as these systems are centralized and hence vulnerable to various issues,including single points of failure,performance bottlenecks,and substantial monetary costs.Furthermore,these mechanisms are unable to provide an efficient mechanism for saving data against unauthorized access.To address these issues,this study proposes a blockchain-based authentication mechanism that authenticates all healthcare stakeholders based on their credentials.Furthermore,also utilize the capabilities of the InterPlanetary File System(IPFS)to store the Electronic Health Record(EHR)in a distributed way.This IPFS platform addresses not only the issue of high data storage costs on blockchain but also the issue of a single point of failure in the traditional centralized data storage model.The simulation results demonstrate that our model outperforms the benchmark schemes and provides an efficient mechanism for managing healthcare sector operations.The results show that it takes approximately 3.5 s for the smart contract to authenticate the node and provide it with the decryption key,which is ultimately used to access the data.The simulation results show that our proposed model outperforms existing solutions in terms of execution time and scalability.The execution time of our model smart contract is around 9000 transactions in just 6.5 s,while benchmark schemes require approximately 7 s for the same number of transactions.
基金Supported by the National Key Basic Research and Development Program (973) of China (No. 2011CB302505)the National Natural Science Foundation of China (Nos. 60803121 and 61073165)the National High-Tech Research and Development (863) Program of China (Nos. 2010AA012401 and 2009AA01A130)
文摘FastDu is a file system service that tracks file system changes by intercepting file system calls to maintain directory summaries, which play important roles in both storage administration and improvement of user experiences for some applications. In most circumstances, directory summaries are independently harvested by applications via traversing the file system hierarchy and calling stat 0 on every file in each directory. For large file systems, this brute-force traverse-based approach can take many hours to complete, even if only a small percentage of the files have changed. This paper describes FastDu, which uses a pre-built database to store harvested directory summaries, and tracks the file system changes by intercept- ing file system calls, so that new harvesting is restricted to the small subset of directories that contain modified files. Tests using FastDu show that this approach reduces the time needed to get a directory summary by one or two orders of magnitude with almost negligible penalty to application-aware file system performance.
基金Supported by the National Key Research and Development Program of China under Grant No. 2018YFB0203904the National Natural Science Foundation of China under Grant Nos. 61872392, U1811461 and 61832020+4 种基金the Pearl River Science and Technology Nova Program of Guangzhou under Grant No. 201906010008Guangdong Natural Science Foundation under Grant No. 2018B030312002the Major Program of Guangdong Basic and Applied Research under Grant No. 2019B030302002the Program for Guangdong Introducing Innovative and Entrepreneurial Teams under Grant No. 2016ZT06D211the Key-Area Research and Development Program of Guang Dong Province of China under Grant No. 2019B010107001.
文摘Driven by the increasing requirements of high-performance computing applications,supercomputers are prone to containing more and more computing nodes.Applications running on such a large-scale computing system are likely to spawn millions of parallel processes,which usually generate a burst of I/O requests,introducing a great challenge into the metadata management of underlying parallel file systems.The traditional method used to overcome such a challenge is adopting multiple metadata servers in the scale-out manner,which will inevitably confront with serious network and consistence problems.This work instead pursues to enhance the metadata performance in the scale-up manner.Specifically,we propose to improve the performance of each individual metadata server by employing GPU to handle metadata requests in parallel.Our proposal designs a novel metadata server architecture,which employs CPU to interact with file system clients,while offloading the computing tasks about metadata into GPU.To take full advantages of the parallelism existing in GPU,we redesign the in-memory data structure for the name space of file systems.The new data structure can perfectly fit to the memory architecture of GPU,and thus helps to exploit the large number of parallel threads within GPU to serve the bursty metadata requests concurrently.We implement a prototype based on BeeGFS and conduct extensive experiments to evaluate our proposal,and the experimental results demonstrate that our GPU-based solution outperforms the CPU-based scheme by more than 50%under typical metadata operations.The superiority is strengthened further on high concurrent scenarios,e.g.,the high-performance computing systems supporting millions of parallel threads.
基金supported by the National Basic Research 973 Program of China under Grant No.2011CB302304the NationalHigh Technology Research and Development 863 Program of China under Grant Nos.2011AA01A102 and 2013AA013205+1 种基金the StrategicPriority Research Program of the Chinese Academy of Sciences under Grant No.XDA06010401the Chinese Academy of SciencesKey Deployment Project under Grant No.KGZD-EW-103-5(7)
文摘Distributed metadata consistency is one of the critical issues of metadata clusters in distributed file systems. Existing methods to maintain metadata consistency generally need several log forced write operations. Since synchronous disk IO is very inefficient, the average response time of metadata operations is greatly increased. In this paper, an asynchronous atomic commit protocol (ACP) named Dual-Log (DL) is presented. It does not need any log forced write operations. Optimizing for distributed metadata operations involving only two metadata servers, DL mutually records the redo log in counterpart metadata servers by transferring through the low latency network. A crashed metadata server can redo the metadata operation with the redundant redo log. Since the latency of the network is much lower than the latency of disk IO, DL can improve the performance of distributed metadata service significantly. The prototype of DL is implemented based on local journal. The performance is tested by comparing with two widely used protocols, EP and S2PC-MP, and the results show that the average response time of distributed metadata operations is reduced by about 40%-60%, and the recovery time is only I second under 10 thousands uncompleted distributed metadata operations.
文摘Hadoop Distributed File System(HDFS)is one of the widely used distributed file systems in big data analysis for frameworks such as Hadoop.HDFS allows one to manage large volumes of data using low-cost commodity hardware.However,vulnerabilities in HDFS can be exploited for nefarious activities.This reinforces the importance of ensuring robust security to facilitate file sharing in Hadoop as well as having a trusted mechanism to check the authenticity of shared files.This is the focus of this paper,where we aim to improve the security of HDFS using a blockchain-enabled approach(hereafter referred to as BlockHDFS).Specifically,the proposed BlockHDFS uses the enterprise-level Hyperledger Fabric platform to capitalize on files'metadata for building trusted data security and traceability in HDFS.
基金supported by ZTE Industry⁃University⁃Institute Coopera⁃tion Funds under Grant No.HC⁃CN⁃20181128026.
文摘Byte-addressable non-volatile memory(NVM),as a new participant in the storage hierarchy,gives extremely high performance in storage,which forces changes to be made on current filesystem designs.Page cache,once a significant mechanism filling the performance gap between Dynamic Random Access Memory(DRAM)and block devices,is now a liability that heavily hinders the writing performance of NVM filesystems.Therefore state-of-the-art NVM filesystems leverage the direct access(DAX)technology to bypass the page cache entirely.However,the DRAM still provides higher bandwidth than NVM,which prevents skewed read workloads from benefiting from a higher bandwidth of the DRAM and leads to sub-optimal performance for the system.In this paper,we propose RCache,a readintensive workload-aware page cache for NVM filesystems.Different from traditional caching mechanisms where all reads go through DRAM,RCache uses a tiered page cache design,including assigning DRAM and NVM to hot and cold data separately,and reading data from both sides.To avoid copying data to DRAM in a critical path,RCache migrates data from NVM to DRAM in a background thread.Additionally,RCache manages data in DRAM in a lock-free manner for better latency and scalability.Evaluations on Intel Optane Data Center(DC)Persistent Memory Modules show that,compared with NOVA,RCache achieves 3 times higher bandwidth for read-intensive workloads and introduces little performance loss for write operations.
基金supported in part by the National Basic Research Program of China("973"Program)(No.2013CB329102).
文摘An adaptive dynamic load balancing algorithm based on QoS is proposed to improve the performance of load balancing in distributed file system,combining the advantages of a variety of load balancing algorithms.The new algorithm uses a tuple containing the number of files and the total file size as the QoS measure for the requested task.The master node sets a threshold for the requested task based on the QoS to filter storage nodes that meet the requirements of the task.In order to guarantee the reliability of the new algorithm,we consider the impact of CPU utilization,memory usage,disk IO occupancy rate,network bandwidth usage and hard disk usage on load balancing performance when calculating the real-time load balancing of storage nodes.The heterogeneity of the network is considered when the master node schedule task assignments to ensure the fairness of the algorithm.The comprehensive evaluation value is determined based the performance load ratio,which is calculated from the real-time load value of the storage node and a performance value after normalization.The master node assigns tasks to the storage node with the highest comprehensive evaluation value.The storage nodes provide adaptive feedback based on changes in the degree of connectivity,rather than periodic update of the load information.The actual distributed file system environment is set up on the server cluster,the performance of the new algorithm is tested through a contrast experiment.The experimental results show that the new algorithm can effectively reduce the average response time of the system,improve throughput,and enable the system load to reach a good balance.
基金Supported by the National Natural Science Foundation of China(No. 60073010) and China Grid Project
文摘Existing in-kernel distributed file systems cannot cope with the higher requirements in well- equipped cluster environments, especially when the system becomes larger and inevitably heterogeneous. TH-CluFS is a cluster file system designed for large heterogeneous systems. TH-CluFS is implemented completely in the user space by emulating the network file system (NFS) V2 server, and is easily portable to other portable operating system interface (POSIX)-compliant platforms with application programming/binary interface API/ABI compliance. In addition, TH-CluFS uses a serverless architecture which flexibly distributes data at file granularity and achieves a consistent file system view from distributed metadata. The global cache makes full use of the aggregated memories and disks in the cluster to optimize system performance. Experimental results suggest that although TH-CluFS is implemented as user-level components, it functions as a portable, single system image, and scalable cluster file system with acceptable performance sacrifices.