This paper describes a method for building hot snapshot copy based on windows-file system (HSCF). The architecture and running mechanism of HSCF are discussed after giving a comparison with other on-line backup tecb...This paper describes a method for building hot snapshot copy based on windows-file system (HSCF). The architecture and running mechanism of HSCF are discussed after giving a comparison with other on-line backup tecbnology. HSCF, based on a file system filter driver, protects computer data and ensures their integrity and consistency with following three steps: access to open files, synchronization and copy on-write. Its strategies for improving system performance are analyzed including priority setting, incremental snapshot and load balance. HSCF is a new kind of snapshot technology to solve the data integrity and consistency problem in online backup, which is different from other storage-level snapshot and Open File Solution.展开更多
In this paper, we explored a load-balancing algorithm in a cluster file system contains two levels of metadata-server, primary-level server quickly distributestasks to second-level servers depending on the closest loa...In this paper, we explored a load-balancing algorithm in a cluster file system contains two levels of metadata-server, primary-level server quickly distributestasks to second-level servers depending on the closest load-balancing information. At the same time, we explored a method which accurately reflect I/O traffic and storage of storage-node: computing the heat-value of file, according to which we realized a more logical storage allocation. According to the experiment result, we conclude that this new algorithm shortens the executing time of tasks and improves the system performance compared with other load algorithm.展开更多
Data layout in a file system is the organization of data stored in external storages. The data layout has a huge impact on performance of storage systems. We survey three main kinds of data layout in traditional file ...Data layout in a file system is the organization of data stored in external storages. The data layout has a huge impact on performance of storage systems. We survey three main kinds of data layout in traditional file systems: in-place update file system, log-structured file system, and copy-on-write file sys- tem. Each file system has its own strengths and weaknesses under different circumstances. We also include a recent us- age of persistent layout in a file system that combines both flash memory and byte- addressable non- volatile memory. With this survey, we conclude that persistent data layout in file systems may evolve dramatically in the era of emerging non-volatile memory.展开更多
Working with files and the safety of information has always been relevant, especially in financial institutions where the requirements for the safety of information and security are especially important. And in today...Working with files and the safety of information has always been relevant, especially in financial institutions where the requirements for the safety of information and security are especially important. And in today’s conditions, when an earthquake can destroy the floor of a city in an instant, or when a missile hits an office and all servers turn into scrap metal, the issue of data safety becomes especially important. Also, you can’t put the cost of the software and the convenience of working with files in last place. Especially if an office worker needs to find the necessary information on a client, a financial contract or a company’s financial product in a few seconds. Also, during the operation of computer equipment, failures are possible, and some of them can lead to partial or complete loss of information. In this paper, it is proposed to create another level of abstraction for working with the file system, which will be based on a relational database as a storage of objects and access rights to objects. Also considered are possible protocols for transferring data to other programs that work with files, these can be both small sites and the operating system itself. This article will be especially interesting for financial institutions or companies operating in the banking sector. The purpose of this article is an attempt to introduce another level of abstraction for working with files. A level that is completely abstracted from the storage medium.展开更多
File systems are fundamental for computers and devices with data storage units. They allow operating systems to understand and organize streams of bytes and obtain readable files from them. There are numerous file sys...File systems are fundamental for computers and devices with data storage units. They allow operating systems to understand and organize streams of bytes and obtain readable files from them. There are numerous file systems available in the industry, all with their own unique features. Understanding how these file systems work is essential for computer science students, but their complex nature can be difficult and challenging to grasp, especially for students at the beginning of their career. The Zion File System Simulator was designed with this in mind. Zion is a teaching and experimenting tool, in the form of a small application, built to help students understand how the I/O manager of an operating system interacts with the drive through the file system. Users can see and analyze the structure of a simple, flat file system provided with Zion, or simulate the most common structures such as FAT or NTFS. Students can also create their own implementations and run them through the simulator to analyze the different behaviors. Zion runs on Windows, and the application is provided with dynamic-link libraries that include the interfaces of a file system and a volume manager. These interfaces allow programmers to build their own file system or volume manager in Visual Studio using any .NET language (3.0 or above). Zion gives the users the power to adjust simulated architectural parameters such as volume and block size, or performance factors such as seek and transfer time. Zion runs workloads of I/O operations such as “create,” “delete,” “read,” and “write,” and analyzes the resulting metrics including I/O operations, read/write time, and disk fragmentation. Zion is a learning tool. It is not designed for measuring accurate performance of file systems and volume managers. The robustness of the application, together with its expandability, makes Zion a potential laboratory tool for computer science classes, helping students learn how file systems work and interact with an operating system.展开更多
One of the most critical threats to the reliability and robustness for file system is harboring bug (silent data corruption). In this research we focus on checksum mismatch since it occurs not only in the user data bu...One of the most critical threats to the reliability and robustness for file system is harboring bug (silent data corruption). In this research we focus on checksum mismatch since it occurs not only in the user data but also in file system. Our proposed solution has the ability to check this bug in file system of Linux. In our proposed solution there is no need to invoke or revoke checker utility, it comes as the integrated part of file system and has the ability to check upcoming updates before harboring bug make unrecoverable changes that leads significant data loses. Demonstration testing shows satisfactory results in file server and web server environments in terms of less memory consumption and avoidable delay in system’s updating.展开更多
Big data are always processed repeatedly with small changes, which is a major form of big data processing. The feature of incremental change of big data shows that incremental computing mode can improve the performanc...Big data are always processed repeatedly with small changes, which is a major form of big data processing. The feature of incremental change of big data shows that incremental computing mode can improve the performance greatly. HDFS is a distributed file system on Hadoop which is the most popular platform for big data analytics. And HDFS adopts fixed-size chunking policy, which is inefficient facing incremental computing. Therefore, in this paper, we proposed iHDFS (incremental HDFS), a distributed file system, which can provide basic guarantee for big data parallel processing. The iHDFS is implemented as an extension to HDFS. In iHDFS, Rabin fingerprint algorithm is applied to achieve content defined chunking. This policy make data chunking has much higher stability, and the intermediate processing results can be reused efficiently, so the performance of incremental data processing can be improved significantly. The effectiveness and efficiency of iHDFS have been demonstrated by the experimental results.展开更多
Many enterprises and personals are inclining to outsource their data to public clouds, but security and privacy are two critical problems cannot be ignored. The door of cloud provider may be broken, and the data may a...Many enterprises and personals are inclining to outsource their data to public clouds, but security and privacy are two critical problems cannot be ignored. The door of cloud provider may be broken, and the data may also be dug into by providers to find valuable information. In this paper, a secure and efficient storage file (SES FS) system is proposed to distribute files in several clouds and allows users to search the files securely and efficiently. In the proposed system, keywords were transformed into integers and secretly shared in a defined finite field, then the shares were mapped to random numbers in specified random domain in each cloud. Files were encrypted with distinct secret key and scattered within different clouds. Information about keyword/file was secretly shared among cloud providers. Legal users can search in the clouds to find correct encrypted files and reconstruct corresponding secret key. No adversary can find or detect the real file information even they can collude all the servers. Manipulation on shares by one or more clouds can be detected with high probability. The system can also detect malicious servers through introduced virtual points. One interesting property for the scheme is that new keywords can be added easily, which is difficult and usually not efficient for many searchable symmetric encryption systems. Detailed experimental result shows, with tolerable uploading delay, the scheme exhibits excellent performance on data retrieving aspect.展开更多
Hadoop framework emerged at the right moment when traditional tools were powerless in terms of handling big data. Hadoop Distributed File System (HDFS) which serves as a highly fault-tolerance distributed file system ...Hadoop framework emerged at the right moment when traditional tools were powerless in terms of handling big data. Hadoop Distributed File System (HDFS) which serves as a highly fault-tolerance distributed file system in Hadoop, can improve the throughput of data access effectively. It is very suitable for the application of handling large amounts of datasets. However, Hadoop has the disadvantage that the memory usage rate in NameNode is so high when processing large amounts of small files that it has become the limit of the whole system. In this paper, we propose an approach to optimize the performance of HDFS with small files. The basic idea is to merge small files into a large one whose size is suitable for a block. Furthermore, indexes are built to meet the requirements for fast access to all files in HDFS. Preliminary experiment results show that our approach achieves better performance.展开更多
Persistent memory(PM)allows file systems to directly persist data on the memory bus.To increase the capacity of PM file systems,building a file system across sockets with each attached PM is attractive.However,accessi...Persistent memory(PM)allows file systems to directly persist data on the memory bus.To increase the capacity of PM file systems,building a file system across sockets with each attached PM is attractive.However,accessing data across sockets incurs impacts of the non-uniform memory access(NUMA)architecture,which will lead to significant performance degradation.In this paper,we first use experiments to understand the NUMA impacts on building PM file systems.And then,we propose four design principles for building a high-performance PM file system NapFS for the NUMA architecture.We architect NapFS with per-socket local PM file systems and per-socket dedicated IO thread pools.This not only allows applications to delegate data accesses to IO threads for avoiding remote PM accesses,but also fully reuses existing single-socket PM file systems to reduce implementation complexity.Additionally,NapFS utilizes fast DRAM to accelerate performance by adding a global cache and adopts a selective cache mechanism to eliminate the redundant double-copy overhead for synchronization operations.Lastly,we show that NapFS can adopt extended optimizations to improve scalability and the performance of critical requests.We evaluate NapFS against other multi-socket PM file systems.The evaluation results show that NapFS achieves 2.2x and 1.0x throughput improvement for Filebench and RocksDB,respectively.展开更多
The healthcare sector involves many steps to ensure efficient care for patients,such as appointment scheduling,consultation plans,online follow-up,and more.However,existing healthcare mechanisms are unable to facilita...The healthcare sector involves many steps to ensure efficient care for patients,such as appointment scheduling,consultation plans,online follow-up,and more.However,existing healthcare mechanisms are unable to facilitate a large number of patients,as these systems are centralized and hence vulnerable to various issues,including single points of failure,performance bottlenecks,and substantial monetary costs.Furthermore,these mechanisms are unable to provide an efficient mechanism for saving data against unauthorized access.To address these issues,this study proposes a blockchain-based authentication mechanism that authenticates all healthcare stakeholders based on their credentials.Furthermore,also utilize the capabilities of the InterPlanetary File System(IPFS)to store the Electronic Health Record(EHR)in a distributed way.This IPFS platform addresses not only the issue of high data storage costs on blockchain but also the issue of a single point of failure in the traditional centralized data storage model.The simulation results demonstrate that our model outperforms the benchmark schemes and provides an efficient mechanism for managing healthcare sector operations.The results show that it takes approximately 3.5 s for the smart contract to authenticate the node and provide it with the decryption key,which is ultimately used to access the data.The simulation results show that our proposed model outperforms existing solutions in terms of execution time and scalability.The execution time of our model smart contract is around 9000 transactions in just 6.5 s,while benchmark schemes require approximately 7 s for the same number of transactions.展开更多
FastDu is a file system service that tracks file system changes by intercepting file system calls to maintain directory summaries, which play important roles in both storage administration and improvement of user expe...FastDu is a file system service that tracks file system changes by intercepting file system calls to maintain directory summaries, which play important roles in both storage administration and improvement of user experiences for some applications. In most circumstances, directory summaries are independently harvested by applications via traversing the file system hierarchy and calling stat 0 on every file in each directory. For large file systems, this brute-force traverse-based approach can take many hours to complete, even if only a small percentage of the files have changed. This paper describes FastDu, which uses a pre-built database to store harvested directory summaries, and tracks the file system changes by intercept- ing file system calls, so that new harvesting is restricted to the small subset of directories that contain modified files. Tests using FastDu show that this approach reduces the time needed to get a directory summary by one or two orders of magnitude with almost negligible penalty to application-aware file system performance.展开更多
With the increase in the quantity and scale of Static Random-Access Memory Field Programmable Gate Arrays (SRAM-based FPGAs) for aerospace application, the volume of FPGA configuration bit files that must be stored ha...With the increase in the quantity and scale of Static Random-Access Memory Field Programmable Gate Arrays (SRAM-based FPGAs) for aerospace application, the volume of FPGA configuration bit files that must be stored has increased dramatically. The use of compression techniques for these bitstream files is emerging as a key strategy to alleviate the burden on storage resources. Due to the severe resource constraints of space-based electronics and the unique application environment, the simplicity, efficiency and robustness of the decompression circuitry is also a key design consideration. Through comparative analysis current bitstream file compression technologies, this research suggests that the Lempel Ziv Oberhumer (LZO) compression algorithm is more suitable for satellite applications. This paper also delves into the compression process and format of the LZO compression algorithm, as well as the inherent characteristics of configuration bitstream files. We propose an improved algorithm based on LZO for bitstream file compression, which optimises the compression process by refining the format and reducing the offset. Furthermore, a low-cost, robust decompression hardware architecture is proposed based on this method. Experimental results show that the compression speed of the improved LZO algorithm is increased by 3%, the decompression hardware cost is reduced by approximately 60%, and the compression ratio is slightly reduced by 0.47%.展开更多
[Objective]In response to the issue of insufficient integrity in hourly routine meteorological element data files,this paper aims to improve the availability and reliability of data files,and provide high-quality data...[Objective]In response to the issue of insufficient integrity in hourly routine meteorological element data files,this paper aims to improve the availability and reliability of data files,and provide high-quality data file support for meteorological forecasting and services.[Method]In this paper,an efficient and accurate method for data file quality control and fusion processing is developed.By locating the missing measurement time,data are extracted from the"AWZ.db"database and the minute routine meteorological element data file,and merged into the hourly routine meteorological element data file.[Result]Data processing efficiency and accuracy are significantly improved,and the problem of incomplete hourly routine meteorological element data files is solved.At the same time,it emphasizes the importance of ensuring the accuracy of the files used and carefully checking and verifying the fusion results,and proposes strategies to improve data quality.[Conclusion]This method provides convenience for observation personnel and effectively improves the integrity and accuracy of data files.In the future,it is expected to provide more reliable data support for meteorological forecasting and services.展开更多
At present,the polymerase chain reaction(PCR)amplification-based file retrieval method is the mostcommonly used and effective means of DNA file retrieval.The number of orthogonal primers limitsthe number of files that...At present,the polymerase chain reaction(PCR)amplification-based file retrieval method is the mostcommonly used and effective means of DNA file retrieval.The number of orthogonal primers limitsthe number of files that can be accurately accessed,which in turn affects the density in a single oligo poolof digital DNA storage.In this paper,a multi-mode DNA sequence design method based on PCR file retrie-val in a single oligonucleotide pool is proposed for high-capacity DNA data storage.Firstly,by analyzingthe maximum number of orthogonal primers at each predicted primer length,it was found that the rela-tionship between primer length and the maximum available primer number does not increase linearly,and the maximum number of orthogonal primers is on the order of 10^(4).Next,this paper analyzes themaximum address space capacity of DNA sequences with different types of primer binding sites for filemapping.In the case where the capacity of the primer library is R(where R is even),the number ofaddress spaces that can be mapped by the single-primer DNA sequence design scheme proposed in thispaper is four times that of the previous one,and the two-level primer DNA sequence design scheme can reach [R/2·(R/2-1)]^(2)times.Finally,a multi-mode DNA sequence generation method is designed based onthe number of files to be stored in the oligonucleotide pool,in order to meet the requirements of the ran-dom retrieval of target files in an oligonucleotide pool with large-scale file numbers.The performance ofthe primers generated by the orthogonal primer library generator proposed in this paper is verified,andthe average Gibbs free energy of the most stable heterodimer formed between the orthogonal primersproduced is−1 kcal·(mol·L^(−1))^(−1)(1 kcal=4.184 kJ).At the same time,by selectively PCR-amplifying theDNA sequences of the two-level primer binding sites for random access,the target sequence can be accu-rately read with a minimum of 10^(3) reads,when the primer binding site sequences at different positionsare mutually different.This paper provides a pipeline for orthogonal primer library generation and multi-mode mapping schemes between files and primers,which can help achieve precise random access to filesin large-scale DNA oligo pools.展开更多
Images and videos play an increasingly vital role in daily life and are widely utilized as key evidentiary sources in judicial investigations and forensic analysis.Simultaneously,advancements in image and video proces...Images and videos play an increasingly vital role in daily life and are widely utilized as key evidentiary sources in judicial investigations and forensic analysis.Simultaneously,advancements in image and video processing technologies have facilitated the widespread availability of powerful editing tools,such as Deepfakes,enabling anyone to easily create manipulated or fake visual content,which poses an enormous threat to social security and public trust.To verify the authenticity and integrity of images and videos,numerous approaches have been proposed,which are primarily based on content analysis and their effectiveness is susceptible to interference from various image or video post-processing operations.Recent research has highlighted the potential of file containers analysis as a promising forensic approach that offers efficient and interpretable results.However,there is still a lack of review articles on this kind of approach.In order to fill this gap,we present a comprehensive review of file containers-based image and video forensics in this paper.Specifically,we categorize the existing methods into two distinct stages,qualitative analysis and quantitative analysis.In addition,an overall framework is proposed to organize the exiting approaches.Then,the advantages and disadvantages of the schemes used across different forensic tasks are provided.Finally,we outline the trends in this research area,aiming to provide valuable insights and technical guidance for future research.展开更多
The size of the Audio and Video(AV)content of the 8K program is four times larger than that of 4K content,providing viewers with a more ideal audiovisual experience while placing higher demands on the capability and e...The size of the Audio and Video(AV)content of the 8K program is four times larger than that of 4K content,providing viewers with a more ideal audiovisual experience while placing higher demands on the capability and efficiency of document preparation and processing,signal transmission,and scheduling.However,it is difficult to meet the high robustness requirements of 8K broadcast services because the existing broadcast system architecture is limited by efficiency,cost,and other factors.In this study,an 8K Ultra-High-Definition(UHD)TV program broadcast scheme was designed.The verification results show that the scheme is high quality,highly efficient,and robust.In particular,in the research,the file format normalizing module was first placed in the broadcast area instead of the file preparation area,and the low-compression transmission scheme of the all-IP signal JPEG XS was designed in the signal transmission area for improving the efficiency of the scheme.Next,to reduce the impact on the robustness of broadcast services,the broadcast control logic of the broadcast core components is optimized.Finally,a series of 8K TV program broadcasting systems have been implemented and their performance has been verified.The results show that the system meets the efficiency and robustness requirements of a high-quality 8K AV broadcast system,and thus has a high degree of practicability.展开更多
Driven by the increasing requirements of high-performance computing applications,supercomputers are prone to containing more and more computing nodes.Applications running on such a large-scale computing system are lik...Driven by the increasing requirements of high-performance computing applications,supercomputers are prone to containing more and more computing nodes.Applications running on such a large-scale computing system are likely to spawn millions of parallel processes,which usually generate a burst of I/O requests,introducing a great challenge into the metadata management of underlying parallel file systems.The traditional method used to overcome such a challenge is adopting multiple metadata servers in the scale-out manner,which will inevitably confront with serious network and consistence problems.This work instead pursues to enhance the metadata performance in the scale-up manner.Specifically,we propose to improve the performance of each individual metadata server by employing GPU to handle metadata requests in parallel.Our proposal designs a novel metadata server architecture,which employs CPU to interact with file system clients,while offloading the computing tasks about metadata into GPU.To take full advantages of the parallelism existing in GPU,we redesign the in-memory data structure for the name space of file systems.The new data structure can perfectly fit to the memory architecture of GPU,and thus helps to exploit the large number of parallel threads within GPU to serve the bursty metadata requests concurrently.We implement a prototype based on BeeGFS and conduct extensive experiments to evaluate our proposal,and the experimental results demonstrate that our GPU-based solution outperforms the CPU-based scheme by more than 50%under typical metadata operations.The superiority is strengthened further on high concurrent scenarios,e.g.,the high-performance computing systems supporting millions of parallel threads.展开更多
Distributed metadata consistency is one of the critical issues of metadata clusters in distributed file systems. Existing methods to maintain metadata consistency generally need several log forced write operations. Si...Distributed metadata consistency is one of the critical issues of metadata clusters in distributed file systems. Existing methods to maintain metadata consistency generally need several log forced write operations. Since synchronous disk IO is very inefficient, the average response time of metadata operations is greatly increased. In this paper, an asynchronous atomic commit protocol (ACP) named Dual-Log (DL) is presented. It does not need any log forced write operations. Optimizing for distributed metadata operations involving only two metadata servers, DL mutually records the redo log in counterpart metadata servers by transferring through the low latency network. A crashed metadata server can redo the metadata operation with the redundant redo log. Since the latency of the network is much lower than the latency of disk IO, DL can improve the performance of distributed metadata service significantly. The prototype of DL is implemented based on local journal. The performance is tested by comparing with two widely used protocols, EP and S2PC-MP, and the results show that the average response time of distributed metadata operations is reduced by about 40%-60%, and the recovery time is only I second under 10 thousands uncompleted distributed metadata operations.展开更多
基金Supported by the National Natural Science Foun-dation of China (60473023) National Innovation Foundation forSmall Technology Based Firms(04C26214201280)
文摘This paper describes a method for building hot snapshot copy based on windows-file system (HSCF). The architecture and running mechanism of HSCF are discussed after giving a comparison with other on-line backup tecbnology. HSCF, based on a file system filter driver, protects computer data and ensures their integrity and consistency with following three steps: access to open files, synchronization and copy on-write. Its strategies for improving system performance are analyzed including priority setting, incremental snapshot and load balance. HSCF is a new kind of snapshot technology to solve the data integrity and consistency problem in online backup, which is different from other storage-level snapshot and Open File Solution.
基金Supported by the Industrialized Foundation ofHebei Province(020501) the Natural Science Foundation of HebeiUniversity(2005Q04)
文摘In this paper, we explored a load-balancing algorithm in a cluster file system contains two levels of metadata-server, primary-level server quickly distributestasks to second-level servers depending on the closest load-balancing information. At the same time, we explored a method which accurately reflect I/O traffic and storage of storage-node: computing the heat-value of file, according to which we realized a more logical storage allocation. According to the experiment result, we conclude that this new algorithm shortens the executing time of tasks and improves the system performance compared with other load algorithm.
基金supported by ZTE Industry-Academia-Research Cooperation Funds
文摘Data layout in a file system is the organization of data stored in external storages. The data layout has a huge impact on performance of storage systems. We survey three main kinds of data layout in traditional file systems: in-place update file system, log-structured file system, and copy-on-write file sys- tem. Each file system has its own strengths and weaknesses under different circumstances. We also include a recent us- age of persistent layout in a file system that combines both flash memory and byte- addressable non- volatile memory. With this survey, we conclude that persistent data layout in file systems may evolve dramatically in the era of emerging non-volatile memory.
文摘Working with files and the safety of information has always been relevant, especially in financial institutions where the requirements for the safety of information and security are especially important. And in today’s conditions, when an earthquake can destroy the floor of a city in an instant, or when a missile hits an office and all servers turn into scrap metal, the issue of data safety becomes especially important. Also, you can’t put the cost of the software and the convenience of working with files in last place. Especially if an office worker needs to find the necessary information on a client, a financial contract or a company’s financial product in a few seconds. Also, during the operation of computer equipment, failures are possible, and some of them can lead to partial or complete loss of information. In this paper, it is proposed to create another level of abstraction for working with the file system, which will be based on a relational database as a storage of objects and access rights to objects. Also considered are possible protocols for transferring data to other programs that work with files, these can be both small sites and the operating system itself. This article will be especially interesting for financial institutions or companies operating in the banking sector. The purpose of this article is an attempt to introduce another level of abstraction for working with files. A level that is completely abstracted from the storage medium.
文摘File systems are fundamental for computers and devices with data storage units. They allow operating systems to understand and organize streams of bytes and obtain readable files from them. There are numerous file systems available in the industry, all with their own unique features. Understanding how these file systems work is essential for computer science students, but their complex nature can be difficult and challenging to grasp, especially for students at the beginning of their career. The Zion File System Simulator was designed with this in mind. Zion is a teaching and experimenting tool, in the form of a small application, built to help students understand how the I/O manager of an operating system interacts with the drive through the file system. Users can see and analyze the structure of a simple, flat file system provided with Zion, or simulate the most common structures such as FAT or NTFS. Students can also create their own implementations and run them through the simulator to analyze the different behaviors. Zion runs on Windows, and the application is provided with dynamic-link libraries that include the interfaces of a file system and a volume manager. These interfaces allow programmers to build their own file system or volume manager in Visual Studio using any .NET language (3.0 or above). Zion gives the users the power to adjust simulated architectural parameters such as volume and block size, or performance factors such as seek and transfer time. Zion runs workloads of I/O operations such as “create,” “delete,” “read,” and “write,” and analyzes the resulting metrics including I/O operations, read/write time, and disk fragmentation. Zion is a learning tool. It is not designed for measuring accurate performance of file systems and volume managers. The robustness of the application, together with its expandability, makes Zion a potential laboratory tool for computer science classes, helping students learn how file systems work and interact with an operating system.
文摘One of the most critical threats to the reliability and robustness for file system is harboring bug (silent data corruption). In this research we focus on checksum mismatch since it occurs not only in the user data but also in file system. Our proposed solution has the ability to check this bug in file system of Linux. In our proposed solution there is no need to invoke or revoke checker utility, it comes as the integrated part of file system and has the ability to check upcoming updates before harboring bug make unrecoverable changes that leads significant data loses. Demonstration testing shows satisfactory results in file server and web server environments in terms of less memory consumption and avoidable delay in system’s updating.
文摘Big data are always processed repeatedly with small changes, which is a major form of big data processing. The feature of incremental change of big data shows that incremental computing mode can improve the performance greatly. HDFS is a distributed file system on Hadoop which is the most popular platform for big data analytics. And HDFS adopts fixed-size chunking policy, which is inefficient facing incremental computing. Therefore, in this paper, we proposed iHDFS (incremental HDFS), a distributed file system, which can provide basic guarantee for big data parallel processing. The iHDFS is implemented as an extension to HDFS. In iHDFS, Rabin fingerprint algorithm is applied to achieve content defined chunking. This policy make data chunking has much higher stability, and the intermediate processing results can be reused efficiently, so the performance of incremental data processing can be improved significantly. The effectiveness and efficiency of iHDFS have been demonstrated by the experimental results.
基金Demonstration on the Construction of Guangdong Survey and Geomatics Industry Technology Innovation Alliance (2017B090907030)The Demonstration of Big Data Application for Land Resource Management and Service (2015B010110006)+3 种基金Qiong Huang is supported by Guangdong Natural Science Funds for Distinguished Young Scholar (No. 2014A030306021)Guangdong Program for Special Support of Top-notch Young Professionals (No. 2015TQ01X796)Pearl River Nova Program of Guangzhou (No. 201610010037)and the National Natural Science Foundation of China (Nos. 61472146, 61672242).
文摘Many enterprises and personals are inclining to outsource their data to public clouds, but security and privacy are two critical problems cannot be ignored. The door of cloud provider may be broken, and the data may also be dug into by providers to find valuable information. In this paper, a secure and efficient storage file (SES FS) system is proposed to distribute files in several clouds and allows users to search the files securely and efficiently. In the proposed system, keywords were transformed into integers and secretly shared in a defined finite field, then the shares were mapped to random numbers in specified random domain in each cloud. Files were encrypted with distinct secret key and scattered within different clouds. Information about keyword/file was secretly shared among cloud providers. Legal users can search in the clouds to find correct encrypted files and reconstruct corresponding secret key. No adversary can find or detect the real file information even they can collude all the servers. Manipulation on shares by one or more clouds can be detected with high probability. The system can also detect malicious servers through introduced virtual points. One interesting property for the scheme is that new keywords can be added easily, which is difficult and usually not efficient for many searchable symmetric encryption systems. Detailed experimental result shows, with tolerable uploading delay, the scheme exhibits excellent performance on data retrieving aspect.
文摘Hadoop framework emerged at the right moment when traditional tools were powerless in terms of handling big data. Hadoop Distributed File System (HDFS) which serves as a highly fault-tolerance distributed file system in Hadoop, can improve the throughput of data access effectively. It is very suitable for the application of handling large amounts of datasets. However, Hadoop has the disadvantage that the memory usage rate in NameNode is so high when processing large amounts of small files that it has become the limit of the whole system. In this paper, we propose an approach to optimize the performance of HDFS with small files. The basic idea is to merge small files into a large one whose size is suitable for a block. Furthermore, indexes are built to meet the requirements for fast access to all files in HDFS. Preliminary experiment results show that our approach achieves better performance.
基金supported by the Major Research Plan of the National Natural Science Foundation of China under Grant No.92270202the Strategic Priority Research Program of the Chinese Academy of Sciences under Grant No.XDB44030200.
文摘Persistent memory(PM)allows file systems to directly persist data on the memory bus.To increase the capacity of PM file systems,building a file system across sockets with each attached PM is attractive.However,accessing data across sockets incurs impacts of the non-uniform memory access(NUMA)architecture,which will lead to significant performance degradation.In this paper,we first use experiments to understand the NUMA impacts on building PM file systems.And then,we propose four design principles for building a high-performance PM file system NapFS for the NUMA architecture.We architect NapFS with per-socket local PM file systems and per-socket dedicated IO thread pools.This not only allows applications to delegate data accesses to IO threads for avoiding remote PM accesses,but also fully reuses existing single-socket PM file systems to reduce implementation complexity.Additionally,NapFS utilizes fast DRAM to accelerate performance by adding a global cache and adopts a selective cache mechanism to eliminate the redundant double-copy overhead for synchronization operations.Lastly,we show that NapFS can adopt extended optimizations to improve scalability and the performance of critical requests.We evaluate NapFS against other multi-socket PM file systems.The evaluation results show that NapFS achieves 2.2x and 1.0x throughput improvement for Filebench and RocksDB,respectively.
基金supported by the Ongoing Research Funding program(ORF-2025-636),King Saud University,Riyadh,Saudi Arabia.
文摘The healthcare sector involves many steps to ensure efficient care for patients,such as appointment scheduling,consultation plans,online follow-up,and more.However,existing healthcare mechanisms are unable to facilitate a large number of patients,as these systems are centralized and hence vulnerable to various issues,including single points of failure,performance bottlenecks,and substantial monetary costs.Furthermore,these mechanisms are unable to provide an efficient mechanism for saving data against unauthorized access.To address these issues,this study proposes a blockchain-based authentication mechanism that authenticates all healthcare stakeholders based on their credentials.Furthermore,also utilize the capabilities of the InterPlanetary File System(IPFS)to store the Electronic Health Record(EHR)in a distributed way.This IPFS platform addresses not only the issue of high data storage costs on blockchain but also the issue of a single point of failure in the traditional centralized data storage model.The simulation results demonstrate that our model outperforms the benchmark schemes and provides an efficient mechanism for managing healthcare sector operations.The results show that it takes approximately 3.5 s for the smart contract to authenticate the node and provide it with the decryption key,which is ultimately used to access the data.The simulation results show that our proposed model outperforms existing solutions in terms of execution time and scalability.The execution time of our model smart contract is around 9000 transactions in just 6.5 s,while benchmark schemes require approximately 7 s for the same number of transactions.
基金Supported by the National Key Basic Research and Development Program (973) of China (No. 2011CB302505)the National Natural Science Foundation of China (Nos. 60803121 and 61073165)the National High-Tech Research and Development (863) Program of China (Nos. 2010AA012401 and 2009AA01A130)
文摘FastDu is a file system service that tracks file system changes by intercepting file system calls to maintain directory summaries, which play important roles in both storage administration and improvement of user experiences for some applications. In most circumstances, directory summaries are independently harvested by applications via traversing the file system hierarchy and calling stat 0 on every file in each directory. For large file systems, this brute-force traverse-based approach can take many hours to complete, even if only a small percentage of the files have changed. This paper describes FastDu, which uses a pre-built database to store harvested directory summaries, and tracks the file system changes by intercept- ing file system calls, so that new harvesting is restricted to the small subset of directories that contain modified files. Tests using FastDu show that this approach reduces the time needed to get a directory summary by one or two orders of magnitude with almost negligible penalty to application-aware file system performance.
基金supported in part by the National Key Laboratory of Science and Technology on Space Microwave(Grant Nos.HTKJ2022KL504009 and HTKJ2022KL5040010).
文摘With the increase in the quantity and scale of Static Random-Access Memory Field Programmable Gate Arrays (SRAM-based FPGAs) for aerospace application, the volume of FPGA configuration bit files that must be stored has increased dramatically. The use of compression techniques for these bitstream files is emerging as a key strategy to alleviate the burden on storage resources. Due to the severe resource constraints of space-based electronics and the unique application environment, the simplicity, efficiency and robustness of the decompression circuitry is also a key design consideration. Through comparative analysis current bitstream file compression technologies, this research suggests that the Lempel Ziv Oberhumer (LZO) compression algorithm is more suitable for satellite applications. This paper also delves into the compression process and format of the LZO compression algorithm, as well as the inherent characteristics of configuration bitstream files. We propose an improved algorithm based on LZO for bitstream file compression, which optimises the compression process by refining the format and reducing the offset. Furthermore, a low-cost, robust decompression hardware architecture is proposed based on this method. Experimental results show that the compression speed of the improved LZO algorithm is increased by 3%, the decompression hardware cost is reduced by approximately 60%, and the compression ratio is slightly reduced by 0.47%.
基金the Fifth Batch of Innovation Teams of Wuzhou Meteorological Bureau"Wuzhou Innovation Team for Enhancing the Comprehensive Meteorological Observation Ability through Digitization and Intelligence"Wuzhou Science and Technology Planning Project(202402122,202402119).
文摘[Objective]In response to the issue of insufficient integrity in hourly routine meteorological element data files,this paper aims to improve the availability and reliability of data files,and provide high-quality data file support for meteorological forecasting and services.[Method]In this paper,an efficient and accurate method for data file quality control and fusion processing is developed.By locating the missing measurement time,data are extracted from the"AWZ.db"database and the minute routine meteorological element data file,and merged into the hourly routine meteorological element data file.[Result]Data processing efficiency and accuracy are significantly improved,and the problem of incomplete hourly routine meteorological element data files is solved.At the same time,it emphasizes the importance of ensuring the accuracy of the files used and carefully checking and verifying the fusion results,and proposes strategies to improve data quality.[Conclusion]This method provides convenience for observation personnel and effectively improves the integrity and accuracy of data files.In the future,it is expected to provide more reliable data support for meteorological forecasting and services.
基金supported by the fund from Tianjin Municipal Science and Technology Bureau(22JCYBJC01390).
文摘At present,the polymerase chain reaction(PCR)amplification-based file retrieval method is the mostcommonly used and effective means of DNA file retrieval.The number of orthogonal primers limitsthe number of files that can be accurately accessed,which in turn affects the density in a single oligo poolof digital DNA storage.In this paper,a multi-mode DNA sequence design method based on PCR file retrie-val in a single oligonucleotide pool is proposed for high-capacity DNA data storage.Firstly,by analyzingthe maximum number of orthogonal primers at each predicted primer length,it was found that the rela-tionship between primer length and the maximum available primer number does not increase linearly,and the maximum number of orthogonal primers is on the order of 10^(4).Next,this paper analyzes themaximum address space capacity of DNA sequences with different types of primer binding sites for filemapping.In the case where the capacity of the primer library is R(where R is even),the number ofaddress spaces that can be mapped by the single-primer DNA sequence design scheme proposed in thispaper is four times that of the previous one,and the two-level primer DNA sequence design scheme can reach [R/2·(R/2-1)]^(2)times.Finally,a multi-mode DNA sequence generation method is designed based onthe number of files to be stored in the oligonucleotide pool,in order to meet the requirements of the ran-dom retrieval of target files in an oligonucleotide pool with large-scale file numbers.The performance ofthe primers generated by the orthogonal primer library generator proposed in this paper is verified,andthe average Gibbs free energy of the most stable heterodimer formed between the orthogonal primersproduced is−1 kcal·(mol·L^(−1))^(−1)(1 kcal=4.184 kJ).At the same time,by selectively PCR-amplifying theDNA sequences of the two-level primer binding sites for random access,the target sequence can be accu-rately read with a minimum of 10^(3) reads,when the primer binding site sequences at different positionsare mutually different.This paper provides a pipeline for orthogonal primer library generation and multi-mode mapping schemes between files and primers,which can help achieve precise random access to filesin large-scale DNA oligo pools.
基金supported in part by Natural Science Foundation of Hubei Province of China under Grant 2023AFB016the 2022 Opening Fund for Hubei Key Laboratory of Intelligent Vision Based Monitoring for Hydroelectric Engineering under Grant 2022SDSJ02the Construction Fund for Hubei Key Laboratory of Intelligent Vision Based Monitoring for Hydroelectric Engineering under Grant 2019ZYYD007.
文摘Images and videos play an increasingly vital role in daily life and are widely utilized as key evidentiary sources in judicial investigations and forensic analysis.Simultaneously,advancements in image and video processing technologies have facilitated the widespread availability of powerful editing tools,such as Deepfakes,enabling anyone to easily create manipulated or fake visual content,which poses an enormous threat to social security and public trust.To verify the authenticity and integrity of images and videos,numerous approaches have been proposed,which are primarily based on content analysis and their effectiveness is susceptible to interference from various image or video post-processing operations.Recent research has highlighted the potential of file containers analysis as a promising forensic approach that offers efficient and interpretable results.However,there is still a lack of review articles on this kind of approach.In order to fill this gap,we present a comprehensive review of file containers-based image and video forensics in this paper.Specifically,we categorize the existing methods into two distinct stages,qualitative analysis and quantitative analysis.In addition,an overall framework is proposed to organize the exiting approaches.Then,the advantages and disadvantages of the schemes used across different forensic tasks are provided.Finally,we outline the trends in this research area,aiming to provide valuable insights and technical guidance for future research.
文摘The size of the Audio and Video(AV)content of the 8K program is four times larger than that of 4K content,providing viewers with a more ideal audiovisual experience while placing higher demands on the capability and efficiency of document preparation and processing,signal transmission,and scheduling.However,it is difficult to meet the high robustness requirements of 8K broadcast services because the existing broadcast system architecture is limited by efficiency,cost,and other factors.In this study,an 8K Ultra-High-Definition(UHD)TV program broadcast scheme was designed.The verification results show that the scheme is high quality,highly efficient,and robust.In particular,in the research,the file format normalizing module was first placed in the broadcast area instead of the file preparation area,and the low-compression transmission scheme of the all-IP signal JPEG XS was designed in the signal transmission area for improving the efficiency of the scheme.Next,to reduce the impact on the robustness of broadcast services,the broadcast control logic of the broadcast core components is optimized.Finally,a series of 8K TV program broadcasting systems have been implemented and their performance has been verified.The results show that the system meets the efficiency and robustness requirements of a high-quality 8K AV broadcast system,and thus has a high degree of practicability.
基金Supported by the National Key Research and Development Program of China under Grant No. 2018YFB0203904the National Natural Science Foundation of China under Grant Nos. 61872392, U1811461 and 61832020+4 种基金the Pearl River Science and Technology Nova Program of Guangzhou under Grant No. 201906010008Guangdong Natural Science Foundation under Grant No. 2018B030312002the Major Program of Guangdong Basic and Applied Research under Grant No. 2019B030302002the Program for Guangdong Introducing Innovative and Entrepreneurial Teams under Grant No. 2016ZT06D211the Key-Area Research and Development Program of Guang Dong Province of China under Grant No. 2019B010107001.
文摘Driven by the increasing requirements of high-performance computing applications,supercomputers are prone to containing more and more computing nodes.Applications running on such a large-scale computing system are likely to spawn millions of parallel processes,which usually generate a burst of I/O requests,introducing a great challenge into the metadata management of underlying parallel file systems.The traditional method used to overcome such a challenge is adopting multiple metadata servers in the scale-out manner,which will inevitably confront with serious network and consistence problems.This work instead pursues to enhance the metadata performance in the scale-up manner.Specifically,we propose to improve the performance of each individual metadata server by employing GPU to handle metadata requests in parallel.Our proposal designs a novel metadata server architecture,which employs CPU to interact with file system clients,while offloading the computing tasks about metadata into GPU.To take full advantages of the parallelism existing in GPU,we redesign the in-memory data structure for the name space of file systems.The new data structure can perfectly fit to the memory architecture of GPU,and thus helps to exploit the large number of parallel threads within GPU to serve the bursty metadata requests concurrently.We implement a prototype based on BeeGFS and conduct extensive experiments to evaluate our proposal,and the experimental results demonstrate that our GPU-based solution outperforms the CPU-based scheme by more than 50%under typical metadata operations.The superiority is strengthened further on high concurrent scenarios,e.g.,the high-performance computing systems supporting millions of parallel threads.
基金supported by the National Basic Research 973 Program of China under Grant No.2011CB302304the NationalHigh Technology Research and Development 863 Program of China under Grant Nos.2011AA01A102 and 2013AA013205+1 种基金the StrategicPriority Research Program of the Chinese Academy of Sciences under Grant No.XDA06010401the Chinese Academy of SciencesKey Deployment Project under Grant No.KGZD-EW-103-5(7)
文摘Distributed metadata consistency is one of the critical issues of metadata clusters in distributed file systems. Existing methods to maintain metadata consistency generally need several log forced write operations. Since synchronous disk IO is very inefficient, the average response time of metadata operations is greatly increased. In this paper, an asynchronous atomic commit protocol (ACP) named Dual-Log (DL) is presented. It does not need any log forced write operations. Optimizing for distributed metadata operations involving only two metadata servers, DL mutually records the redo log in counterpart metadata servers by transferring through the low latency network. A crashed metadata server can redo the metadata operation with the redundant redo log. Since the latency of the network is much lower than the latency of disk IO, DL can improve the performance of distributed metadata service significantly. The prototype of DL is implemented based on local journal. The performance is tested by comparing with two widely used protocols, EP and S2PC-MP, and the results show that the average response time of distributed metadata operations is reduced by about 40%-60%, and the recovery time is only I second under 10 thousands uncompleted distributed metadata operations.