[Objective]In response to the issue of insufficient integrity in hourly routine meteorological element data files,this paper aims to improve the availability and reliability of data files,and provide high-quality data...[Objective]In response to the issue of insufficient integrity in hourly routine meteorological element data files,this paper aims to improve the availability and reliability of data files,and provide high-quality data file support for meteorological forecasting and services.[Method]In this paper,an efficient and accurate method for data file quality control and fusion processing is developed.By locating the missing measurement time,data are extracted from the"AWZ.db"database and the minute routine meteorological element data file,and merged into the hourly routine meteorological element data file.[Result]Data processing efficiency and accuracy are significantly improved,and the problem of incomplete hourly routine meteorological element data files is solved.At the same time,it emphasizes the importance of ensuring the accuracy of the files used and carefully checking and verifying the fusion results,and proposes strategies to improve data quality.[Conclusion]This method provides convenience for observation personnel and effectively improves the integrity and accuracy of data files.In the future,it is expected to provide more reliable data support for meteorological forecasting and services.展开更多
In consultative committee for space data systems(CCSDS)file delivery protocol(CFDP)recommendation of reliable transmission,there are no detail transmission procedure and delay calculation of prompted negative acknowle...In consultative committee for space data systems(CCSDS)file delivery protocol(CFDP)recommendation of reliable transmission,there are no detail transmission procedure and delay calculation of prompted negative acknowledge and asynchronous negative acknowledge models.CFDP is designed to provide data and storage management,story and forward,custody transfer and reliable end-to-end delivery over deep space characterized by huge latency,intermittent link,asymmetric bandwidth and big bit error rate(BER).Four reliable transmission models are analyzed and an expected file-delivery time is calculated with different trans-mission rates,numbers and sizes of packet data units,BERs and frequencies of external events,etc.By comparison of four CFDP models,the requirement of BER for typical missions in deep space is obtained and rules of choosing CFDP models under different uplink state informations are given,which provides references for protocol models selection,utilization and modification.展开更多
In this paper, we analyze the complexity and entropy of different methods of data compression algorithms: LZW, Huffman, Fixed-length code (FLC), and Huffman after using Fixed-length code (HFLC). We test those algorith...In this paper, we analyze the complexity and entropy of different methods of data compression algorithms: LZW, Huffman, Fixed-length code (FLC), and Huffman after using Fixed-length code (HFLC). We test those algorithms on different files of different sizes and then conclude that: LZW is the best one in all compression scales that we tested especially on the large files, then Huffman, HFLC, and FLC, respectively. Data compression still is an important topic for research these days, and has many applications and uses needed. Therefore, we suggest continuing searching in this field and trying to combine two techniques in order to reach a best one, or use another source mapping (Hamming) like embedding a linear array into a Hypercube with other good techniques like Huffman and trying to reach good results.展开更多
Metadata prefetching and data placement play a critical role in enhancing access performance for file systems operating over wide-area networks.However,developing effective strategies for metadata prefetching in envir...Metadata prefetching and data placement play a critical role in enhancing access performance for file systems operating over wide-area networks.However,developing effective strategies for metadata prefetching in environments with concurrent workloads and for data placement across distributed networks remains a significant challenge.This study introduces novel and efficient methodologies for metadata prefetching and data placement,leveraging fine-grained control of prefetching strategies and variable-sized data fragment writing to optimize the I/O bandwidth of distributed file systems.The proposed metadata prefetching technique employs dynamic workload analysis to identify dominant workload patterns and adaptively refines prefetching policies,thereby boosting metadata access efficiency under concurrent scenarios.Meanwhile,the data placement strategy improves write performance by storing data fragments locally within the nearest data center and transmitting only the fragment location metadata to the remote data center hosting the original file.Experimental evaluations using real-world system traces demonstrate that the proposed approaches reduce metadata access times by up to 33.5%and application data access times by 17.19%compared to state-of-the-art techniques.展开更多
In this paper, we present a distributed multi-level cache system based on cloud storage, which is aimed at the low access efficiency of small spatio-temporal data files in information service system of Smart City. Tak...In this paper, we present a distributed multi-level cache system based on cloud storage, which is aimed at the low access efficiency of small spatio-temporal data files in information service system of Smart City. Taking classification attribute of small spatio-temporal data files in Smart City as the basis of cache content selection, the cache system adopts different cache pool management strategies in different levels of cache. The results of experiment in prototype system indicate that multi-level cache in this paper effectively increases the access bandwidth of small spatio-temporal files in Smart City and greatly improves service quality of multiple concurrent access in system.展开更多
Data layout in a file system is the organization of data stored in external storages. The data layout has a huge impact on performance of storage systems. We survey three main kinds of data layout in traditional file ...Data layout in a file system is the organization of data stored in external storages. The data layout has a huge impact on performance of storage systems. We survey three main kinds of data layout in traditional file systems: in-place update file system, log-structured file system, and copy-on-write file sys- tem. Each file system has its own strengths and weaknesses under different circumstances. We also include a recent us- age of persistent layout in a file system that combines both flash memory and byte- addressable non- volatile memory. With this survey, we conclude that persistent data layout in file systems may evolve dramatically in the era of emerging non-volatile memory.展开更多
Integrating heterogeneous data sources is a precondition to share data for enterprises. Highly-efficient data updating can both save system expenses, and offer real-time data. It is one of the hot issues to modify dat...Integrating heterogeneous data sources is a precondition to share data for enterprises. Highly-efficient data updating can both save system expenses, and offer real-time data. It is one of the hot issues to modify data rapidly in the pre-processing area of the data warehouse. An extract transform loading design is proposed based on a new data algorithm called Diff-Match,which is developed by utilizing mode matching and data-filtering technology. It can accelerate data renewal, filter the heterogeneous data, and seek out different sets of data. Its efficiency has been proved by its successful application in an enterprise of electric apparatus groups.展开更多
This paper describes a method for building hot snapshot copy based on windows-file system (HSCF). The architecture and running mechanism of HSCF are discussed after giving a comparison with other on-line backup tecb...This paper describes a method for building hot snapshot copy based on windows-file system (HSCF). The architecture and running mechanism of HSCF are discussed after giving a comparison with other on-line backup tecbnology. HSCF, based on a file system filter driver, protects computer data and ensures their integrity and consistency with following three steps: access to open files, synchronization and copy on-write. Its strategies for improving system performance are analyzed including priority setting, incremental snapshot and load balance. HSCF is a new kind of snapshot technology to solve the data integrity and consistency problem in online backup, which is different from other storage-level snapshot and Open File Solution.展开更多
Integration between file systems and multidatabase systems is a necessary approach to support data sharing from distributed and heterogeneous data sources. We first analyses problems about data integration between fil...Integration between file systems and multidatabase systems is a necessary approach to support data sharing from distributed and heterogeneous data sources. We first analyses problems about data integration between file systems and multidatabase systems. Then, A common data model named XIDM(XML\|based Integrating Dada Model), which is XML oriented, is presented. XIDM bases on a series of XML standards, especially XML Schema, and can well describe semistructured data. So XIDM is powerfully practicable and multipurpose.展开更多
Satellite networking communications in navigation satellite system and spacebased deep space exploration have the features of a long delay and high bit error rate (BER). Through analyzing the advantages and disadvan...Satellite networking communications in navigation satellite system and spacebased deep space exploration have the features of a long delay and high bit error rate (BER). Through analyzing the advantages and disadvantages of the Consulta tive Committee for the Space Data System (CCSDS) file delivery protocol (CFDP), a new improved repeated sending file delivery protocol (RSFDP) based on the adaptive repeated sending is put forward to build an efficient and reliable file transmission. According to the estimation of the BER of the transmission link, RSFDP repeatedly sends the lost protocol data units (PDUs) at the stage of the retransmission to improve the success rate and reduce time of the retransmission. Theoretical analyses and results of the Opnet simulation indicate that the performance of RSFDP has significant improvement gains over CFDP in the link with a long delay and high BER. The realizing results based on the space borne filed programmable gate array (FPGA) platform show the applicability of the proposed algorithm.展开更多
Protecting the security of sensitive information has become a matter of great concern to everyone. Data hiding technique solves the problem to some extent, but still, some shortcomings remain for researching. To impro...Protecting the security of sensitive information has become a matter of great concern to everyone. Data hiding technique solves the problem to some extent, but still, some shortcomings remain for researching. To improve the capability of hiding huge data file in disk with high efficiency. In this paper, we propose a novel approach called CryptFS, which is achieved by utilizing the file access mechanism and modifying the cluster chain structure to hide data. CryptFS can quickly hide data file with G bytes size in less than 0.1s. The time used for hiding and recovering data is irrelevant to the size of data file, and the reliability of the hidden file is high, which will not be overlaid by new created file and disk defragment.展开更多
An information hiding algorithm is proposed,which hides information by embedding secret data into the palette of bitmap resources of portable executable(PE)files.This algorithm has higher security than some traditiona...An information hiding algorithm is proposed,which hides information by embedding secret data into the palette of bitmap resources of portable executable(PE)files.This algorithm has higher security than some traditional ones because of integrating secret data and bitmap resources together.Through analyzing the principle of bitmap resources parsing in an operating system and the layer of resource data in PE files,a safe and useful solution is presented to solve two problems that bitmap resources are incorrectly analyzed and other resources data are confused in the process of data embedding.The feasibility and effectiveness of the proposed algorithm are confirmed through computer experiments.展开更多
The bane of achieving a scalable distributed file sharing system is the centralized data system and single server oriented file [sharing] system. In this paper, the solution to the problems in a distributed environmen...The bane of achieving a scalable distributed file sharing system is the centralized data system and single server oriented file [sharing] system. In this paper, the solution to the problems in a distributed environment is presented. Thus, inability to upload sizeable files, slow transportation of files, weak security and lack of fault tolerance are the major problems in the existing system. Hence, the utmost need is to build a client-server network that runs on two or more server systems in order to implement scalability, such that when one server is down, the other(s) would still hold up the activities within the network. And to achieve this reliable network environment, LINUX network operating system is recommended among others as a preferred platform for the synchronization of the server systems, such that every user activity like sending of internal memos/mails, publication of academic articles, is replicated;thereby, achieving the proposed result. Hence, Scalable Distributed File Sharing System provides the robustness required to having a reliable intranet.展开更多
MIXED is a digital preservation project. It uses a strategy of converting data to intermediate XML. In this paper we position this strategy with respect to the well-known emulation and migration strategies. Then we de...MIXED is a digital preservation project. It uses a strategy of converting data to intermediate XML. In this paper we position this strategy with respect to the well-known emulation and migration strategies. Then we detail the MIXED strategy and explain why it is an optimized, economical way of migration. Finally, we describe how DANS is implementing a software tool that can perform the migrations needed for this strategy.展开更多
A simple fast method is given for sequentially retrieving all the records in a B tree. A file structure for database is proposed. The records in its primary data file are sorted according to the key order. A B tree ...A simple fast method is given for sequentially retrieving all the records in a B tree. A file structure for database is proposed. The records in its primary data file are sorted according to the key order. A B tree is used as its dense index. It is easy to insert, delete or search a record, and it is also convenient to retrieve records in the sequential order of the keys. The merits and efficiencies of these methods or structures are discussed in detail.展开更多
Nowadays, an increasing number of persons choose to outsource their computing demands and storage demands to the Cloud. In order to ensure the integrity of the data in the untrusted Cloud, especially the dynamic files...Nowadays, an increasing number of persons choose to outsource their computing demands and storage demands to the Cloud. In order to ensure the integrity of the data in the untrusted Cloud, especially the dynamic files which can be updated online, we propose an improved dynamic provable data possession model. We use some homomorphic tags to verify the integrity of the file and use some hash values generated by some secret values and tags to prevent replay attack and forgery attack. Compared with previous works, our proposal reduces the computational and communication complexity from O(logn) to O(1). We did some experiments to ensure this improvement and extended the model to file sharing situation.展开更多
To better understand different users' accessing intentions, a novel clustering and supervising method based on accessing path is presented. This method divides users' interest space to express the distribution...To better understand different users' accessing intentions, a novel clustering and supervising method based on accessing path is presented. This method divides users' interest space to express the distribution of users' interests, and directly to instruct the constructing process of web pages indexing for advanced performance.展开更多
By moving computations from computing nodes to storage nodes, active storage technology provides an efficient for data-intensive high-performance computing applications. The existing studies have neglected the heterog...By moving computations from computing nodes to storage nodes, active storage technology provides an efficient for data-intensive high-performance computing applications. The existing studies have neglected the heterogeneity of storage nodes on the performance of active storage systems. We introduce CADP, a capability-aware data placement scheme for heterogeneous active storage systems to obtain high-performance data processing. The basic idea of CADP is to place data on storage nodes based on their computing capability and storage capability, so that the load-imbalance among heterogeneous servers can be avoided. We have implemented CADP under a parallel I/O system. The experimental results show that the proposed capability-aware data placement scheme can improve the active storage system performance significantly.展开更多
In the Big Data era,numerous sources and environments generate massive amounts of data.This enormous amount of data necessitates specialized advanced tools and procedures that effectively evaluate the information and ...In the Big Data era,numerous sources and environments generate massive amounts of data.This enormous amount of data necessitates specialized advanced tools and procedures that effectively evaluate the information and anticipate decisions for future changes.Hadoop is used to process this kind of data.It is known to handle vast volumes of data more efficiently than tiny amounts,which results in inefficiency in the framework.This study proposes a novel solution to the problem by applying the Enhanced Best Fit Merging algorithm(EBFM)that merges files depending on predefined parameters(type and size).Implementing this algorithm will ensure that the maximum amount of the block size and the generated file size will be in the same range.Its primary goal is to dynamically merge files with the stated criteria based on the file type to guarantee the efficacy and efficiency of the established system.This procedure takes place before the files are available for the Hadoop framework.Additionally,the files generated by the system are named with specific keywords to ensure there is no data loss(file overwrite).The proposed approach guarantees the generation of the fewest possible large files,which reduces the input/output memory burden and corresponds to the Hadoop framework’s effectiveness.The findings show that the proposed technique enhances the framework’s performance by approximately 64%while comparing all other potential performance-impairing variables.The proposed approach is implementable in any environment that uses the Hadoop framework,not limited to smart cities,real-time data analysis,etc.展开更多
Data hiding(DH)is an important technology for securely transmitting secret data in networks,and has increasing become a research hotspot throughout the world.However,for Joint photographic experts group(JPEG)images,it...Data hiding(DH)is an important technology for securely transmitting secret data in networks,and has increasing become a research hotspot throughout the world.However,for Joint photographic experts group(JPEG)images,it is difficult to balance the contradiction among embedded capacity,visual quality and the file size increment in existing data hiding schemes.Thus,to deal with this problem,a high-imperceptibility data hiding for JPEG images is proposed based on direction modification.First,this proposed scheme sorts all of the quantized discrete cosine transform(DCT)block in ascending order according to the number of non-consecutive-zero alternating current(AC)coefficients.Then it selects non-consecutive-zero AC coefficients with absolute values less than or equal to 1 at the same frequency position in two adjacent blocks for pairing.Finally,the 2-bit secret data can be embedded into a coefficient-pair by using the filled reference matrix and the designed direction modification rules.The experiment was conducted on 5 standard test images and 1000 images of BOSSbase dataset,respectively.The experimental results showed that the visual quality of the proposed scheme was improved by 1∼4 dB compared with the comparison schemes,and the file size increment was reduced at most to 15%of the comparison schemes.展开更多
基金the Fifth Batch of Innovation Teams of Wuzhou Meteorological Bureau"Wuzhou Innovation Team for Enhancing the Comprehensive Meteorological Observation Ability through Digitization and Intelligence"Wuzhou Science and Technology Planning Project(202402122,202402119).
文摘[Objective]In response to the issue of insufficient integrity in hourly routine meteorological element data files,this paper aims to improve the availability and reliability of data files,and provide high-quality data file support for meteorological forecasting and services.[Method]In this paper,an efficient and accurate method for data file quality control and fusion processing is developed.By locating the missing measurement time,data are extracted from the"AWZ.db"database and the minute routine meteorological element data file,and merged into the hourly routine meteorological element data file.[Result]Data processing efficiency and accuracy are significantly improved,and the problem of incomplete hourly routine meteorological element data files is solved.At the same time,it emphasizes the importance of ensuring the accuracy of the files used and carefully checking and verifying the fusion results,and proposes strategies to improve data quality.[Conclusion]This method provides convenience for observation personnel and effectively improves the integrity and accuracy of data files.In the future,it is expected to provide more reliable data support for meteorological forecasting and services.
基金supported by the National Natural Science Fandation of China (60672089,60772075)
文摘In consultative committee for space data systems(CCSDS)file delivery protocol(CFDP)recommendation of reliable transmission,there are no detail transmission procedure and delay calculation of prompted negative acknowledge and asynchronous negative acknowledge models.CFDP is designed to provide data and storage management,story and forward,custody transfer and reliable end-to-end delivery over deep space characterized by huge latency,intermittent link,asymmetric bandwidth and big bit error rate(BER).Four reliable transmission models are analyzed and an expected file-delivery time is calculated with different trans-mission rates,numbers and sizes of packet data units,BERs and frequencies of external events,etc.By comparison of four CFDP models,the requirement of BER for typical missions in deep space is obtained and rules of choosing CFDP models under different uplink state informations are given,which provides references for protocol models selection,utilization and modification.
文摘In this paper, we analyze the complexity and entropy of different methods of data compression algorithms: LZW, Huffman, Fixed-length code (FLC), and Huffman after using Fixed-length code (HFLC). We test those algorithms on different files of different sizes and then conclude that: LZW is the best one in all compression scales that we tested especially on the large files, then Huffman, HFLC, and FLC, respectively. Data compression still is an important topic for research these days, and has many applications and uses needed. Therefore, we suggest continuing searching in this field and trying to combine two techniques in order to reach a best one, or use another source mapping (Hamming) like embedding a linear array into a Hypercube with other good techniques like Huffman and trying to reach good results.
基金funded by the National Natural Science Foundation of China under Grant No.62362019the Hainan Provincial Natural Science Foundation of China under Grant No.624RC482.
文摘Metadata prefetching and data placement play a critical role in enhancing access performance for file systems operating over wide-area networks.However,developing effective strategies for metadata prefetching in environments with concurrent workloads and for data placement across distributed networks remains a significant challenge.This study introduces novel and efficient methodologies for metadata prefetching and data placement,leveraging fine-grained control of prefetching strategies and variable-sized data fragment writing to optimize the I/O bandwidth of distributed file systems.The proposed metadata prefetching technique employs dynamic workload analysis to identify dominant workload patterns and adaptively refines prefetching policies,thereby boosting metadata access efficiency under concurrent scenarios.Meanwhile,the data placement strategy improves write performance by storing data fragments locally within the nearest data center and transmitting only the fragment location metadata to the remote data center hosting the original file.Experimental evaluations using real-world system traces demonstrate that the proposed approaches reduce metadata access times by up to 33.5%and application data access times by 17.19%compared to state-of-the-art techniques.
基金Supported by the Natural Science Foundation of Hubei Province(2012FFC034,2014CFC1100)
文摘In this paper, we present a distributed multi-level cache system based on cloud storage, which is aimed at the low access efficiency of small spatio-temporal data files in information service system of Smart City. Taking classification attribute of small spatio-temporal data files in Smart City as the basis of cache content selection, the cache system adopts different cache pool management strategies in different levels of cache. The results of experiment in prototype system indicate that multi-level cache in this paper effectively increases the access bandwidth of small spatio-temporal files in Smart City and greatly improves service quality of multiple concurrent access in system.
基金supported by ZTE Industry-Academia-Research Cooperation Funds
文摘Data layout in a file system is the organization of data stored in external storages. The data layout has a huge impact on performance of storage systems. We survey three main kinds of data layout in traditional file systems: in-place update file system, log-structured file system, and copy-on-write file sys- tem. Each file system has its own strengths and weaknesses under different circumstances. We also include a recent us- age of persistent layout in a file system that combines both flash memory and byte- addressable non- volatile memory. With this survey, we conclude that persistent data layout in file systems may evolve dramatically in the era of emerging non-volatile memory.
基金Supported by National Natural Science Foundation of China (No. 50475117)Tianjin Natural Science Foundation (No.06YFJMJC03700).
文摘Integrating heterogeneous data sources is a precondition to share data for enterprises. Highly-efficient data updating can both save system expenses, and offer real-time data. It is one of the hot issues to modify data rapidly in the pre-processing area of the data warehouse. An extract transform loading design is proposed based on a new data algorithm called Diff-Match,which is developed by utilizing mode matching and data-filtering technology. It can accelerate data renewal, filter the heterogeneous data, and seek out different sets of data. Its efficiency has been proved by its successful application in an enterprise of electric apparatus groups.
基金Supported by the National Natural Science Foun-dation of China (60473023) National Innovation Foundation forSmall Technology Based Firms(04C26214201280)
文摘This paper describes a method for building hot snapshot copy based on windows-file system (HSCF). The architecture and running mechanism of HSCF are discussed after giving a comparison with other on-line backup tecbnology. HSCF, based on a file system filter driver, protects computer data and ensures their integrity and consistency with following three steps: access to open files, synchronization and copy on-write. Its strategies for improving system performance are analyzed including priority setting, incremental snapshot and load balance. HSCF is a new kind of snapshot technology to solve the data integrity and consistency problem in online backup, which is different from other storage-level snapshot and Open File Solution.
基金Supported by the Beforehand Research for National Defense of China(94J3. 4. 2. JW0 5 15 )
文摘Integration between file systems and multidatabase systems is a necessary approach to support data sharing from distributed and heterogeneous data sources. We first analyses problems about data integration between file systems and multidatabase systems. Then, A common data model named XIDM(XML\|based Integrating Dada Model), which is XML oriented, is presented. XIDM bases on a series of XML standards, especially XML Schema, and can well describe semistructured data. So XIDM is powerfully practicable and multipurpose.
基金supported by the National High Technology Research and Development Program of China (863 Program) (2011AA1569)
文摘Satellite networking communications in navigation satellite system and spacebased deep space exploration have the features of a long delay and high bit error rate (BER). Through analyzing the advantages and disadvantages of the Consulta tive Committee for the Space Data System (CCSDS) file delivery protocol (CFDP), a new improved repeated sending file delivery protocol (RSFDP) based on the adaptive repeated sending is put forward to build an efficient and reliable file transmission. According to the estimation of the BER of the transmission link, RSFDP repeatedly sends the lost protocol data units (PDUs) at the stage of the retransmission to improve the success rate and reduce time of the retransmission. Theoretical analyses and results of the Opnet simulation indicate that the performance of RSFDP has significant improvement gains over CFDP in the link with a long delay and high BER. The realizing results based on the space borne filed programmable gate array (FPGA) platform show the applicability of the proposed algorithm.
基金Supported by the National High Technology Research and Development Program of China (863 Program) (2009AA01Z434)the "Core Electronic Devices, High_End General Chip, and Fundamental Software" Major Project (2013JH00103)
文摘Protecting the security of sensitive information has become a matter of great concern to everyone. Data hiding technique solves the problem to some extent, but still, some shortcomings remain for researching. To improve the capability of hiding huge data file in disk with high efficiency. In this paper, we propose a novel approach called CryptFS, which is achieved by utilizing the file access mechanism and modifying the cluster chain structure to hide data. CryptFS can quickly hide data file with G bytes size in less than 0.1s. The time used for hiding and recovering data is irrelevant to the size of data file, and the reliability of the hidden file is high, which will not be overlaid by new created file and disk defragment.
基金supported by the Applied Basic Research Programs of Sichuan Province under Grant No.2010JY0001the Fundamental Research Funds for the Central Universities under Grant No.ZYGX2010J068
文摘An information hiding algorithm is proposed,which hides information by embedding secret data into the palette of bitmap resources of portable executable(PE)files.This algorithm has higher security than some traditional ones because of integrating secret data and bitmap resources together.Through analyzing the principle of bitmap resources parsing in an operating system and the layer of resource data in PE files,a safe and useful solution is presented to solve two problems that bitmap resources are incorrectly analyzed and other resources data are confused in the process of data embedding.The feasibility and effectiveness of the proposed algorithm are confirmed through computer experiments.
文摘The bane of achieving a scalable distributed file sharing system is the centralized data system and single server oriented file [sharing] system. In this paper, the solution to the problems in a distributed environment is presented. Thus, inability to upload sizeable files, slow transportation of files, weak security and lack of fault tolerance are the major problems in the existing system. Hence, the utmost need is to build a client-server network that runs on two or more server systems in order to implement scalability, such that when one server is down, the other(s) would still hold up the activities within the network. And to achieve this reliable network environment, LINUX network operating system is recommended among others as a preferred platform for the synchronization of the server systems, such that every user activity like sending of internal memos/mails, publication of academic articles, is replicated;thereby, achieving the proposed result. Hence, Scalable Distributed File Sharing System provides the robustness required to having a reliable intranet.
文摘MIXED is a digital preservation project. It uses a strategy of converting data to intermediate XML. In this paper we position this strategy with respect to the well-known emulation and migration strategies. Then we detail the MIXED strategy and explain why it is an optimized, economical way of migration. Finally, we describe how DANS is implementing a software tool that can perform the migrations needed for this strategy.
文摘A simple fast method is given for sequentially retrieving all the records in a B tree. A file structure for database is proposed. The records in its primary data file are sorted according to the key order. A B tree is used as its dense index. It is easy to insert, delete or search a record, and it is also convenient to retrieve records in the sequential order of the keys. The merits and efficiencies of these methods or structures are discussed in detail.
基金supported by Major Program of Shanghai Science and Technology Commission under Grant No.10DZ1500200Collaborative Applied Research and Development Project between Morgan Stanley and Shanghai Jiao Tong University, China
文摘Nowadays, an increasing number of persons choose to outsource their computing demands and storage demands to the Cloud. In order to ensure the integrity of the data in the untrusted Cloud, especially the dynamic files which can be updated online, we propose an improved dynamic provable data possession model. We use some homomorphic tags to verify the integrity of the file and use some hash values generated by some secret values and tags to prevent replay attack and forgery attack. Compared with previous works, our proposal reduces the computational and communication complexity from O(logn) to O(1). We did some experiments to ensure this improvement and extended the model to file sharing situation.
文摘To better understand different users' accessing intentions, a novel clustering and supervising method based on accessing path is presented. This method divides users' interest space to express the distribution of users' interests, and directly to instruct the constructing process of web pages indexing for advanced performance.
基金Supported by the National Science and Technology Foundation of China(61572377)the Natural Science Foundation of Hubei Province(2014CFB239)+2 种基金the Open Fund from HPCL(201512-02)the Open Fund from SKLSE(2015-A-06)the US National Science Foundation(CNS-1162540)
文摘By moving computations from computing nodes to storage nodes, active storage technology provides an efficient for data-intensive high-performance computing applications. The existing studies have neglected the heterogeneity of storage nodes on the performance of active storage systems. We introduce CADP, a capability-aware data placement scheme for heterogeneous active storage systems to obtain high-performance data processing. The basic idea of CADP is to place data on storage nodes based on their computing capability and storage capability, so that the load-imbalance among heterogeneous servers can be avoided. We have implemented CADP under a parallel I/O system. The experimental results show that the proposed capability-aware data placement scheme can improve the active storage system performance significantly.
基金This research was supported by the Universiti Sains Malaysia(USM)and the ministry of Higher Education Malaysia through Fundamental Research Grant Scheme(FRGS-Grant No:FRGS/1/2020/TK0/USM/02/1).
文摘In the Big Data era,numerous sources and environments generate massive amounts of data.This enormous amount of data necessitates specialized advanced tools and procedures that effectively evaluate the information and anticipate decisions for future changes.Hadoop is used to process this kind of data.It is known to handle vast volumes of data more efficiently than tiny amounts,which results in inefficiency in the framework.This study proposes a novel solution to the problem by applying the Enhanced Best Fit Merging algorithm(EBFM)that merges files depending on predefined parameters(type and size).Implementing this algorithm will ensure that the maximum amount of the block size and the generated file size will be in the same range.Its primary goal is to dynamically merge files with the stated criteria based on the file type to guarantee the efficacy and efficiency of the established system.This procedure takes place before the files are available for the Hadoop framework.Additionally,the files generated by the system are named with specific keywords to ensure there is no data loss(file overwrite).The proposed approach guarantees the generation of the fewest possible large files,which reduces the input/output memory burden and corresponds to the Hadoop framework’s effectiveness.The findings show that the proposed technique enhances the framework’s performance by approximately 64%while comparing all other potential performance-impairing variables.The proposed approach is implementable in any environment that uses the Hadoop framework,not limited to smart cities,real-time data analysis,etc.
基金supported by the National Natural Science Foundation of China (62072325)Shanxi Scholarship Council of China (HGKY2019081)+1 种基金Fundamental Research Program of Shanxi Province (202103021224272)TYUST SRIF (20212039).
文摘Data hiding(DH)is an important technology for securely transmitting secret data in networks,and has increasing become a research hotspot throughout the world.However,for Joint photographic experts group(JPEG)images,it is difficult to balance the contradiction among embedded capacity,visual quality and the file size increment in existing data hiding schemes.Thus,to deal with this problem,a high-imperceptibility data hiding for JPEG images is proposed based on direction modification.First,this proposed scheme sorts all of the quantized discrete cosine transform(DCT)block in ascending order according to the number of non-consecutive-zero alternating current(AC)coefficients.Then it selects non-consecutive-zero AC coefficients with absolute values less than or equal to 1 at the same frequency position in two adjacent blocks for pairing.Finally,the 2-bit secret data can be embedded into a coefficient-pair by using the filled reference matrix and the designed direction modification rules.The experiment was conducted on 5 standard test images and 1000 images of BOSSbase dataset,respectively.The experimental results showed that the visual quality of the proposed scheme was improved by 1∼4 dB compared with the comparison schemes,and the file size increment was reduced at most to 15%of the comparison schemes.