Slow speed of the Next-Generation sequencing data analysis, compared to the latest high throughput sequencers such as HiSeq X system, using the current industry standard genome analysis pipeline, has been the major fa...Slow speed of the Next-Generation sequencing data analysis, compared to the latest high throughput sequencers such as HiSeq X system, using the current industry standard genome analysis pipeline, has been the major factor of data backlog which limits the real-time use of genomic data for precision medicine. This study demonstrates the DRAGEN Bio-IT Processor as a potential candidate to remove the “Big Data Bottleneck”. DRAGENTM accomplished the variant calling, for ~40× coverage WGS data in as low as ~30 minutes using a single command, achieving the over 50-fold data analysis speed while maintaining the similar or better variant calling accuracy than the standard GATK Best Practices workflow. This systematic comparison provides the faster and efficient NGS data analysis alternative to NGS-based healthcare industries and research institutes to meet the requirement for precision medicine based healthcare.展开更多
Graphics processors have received an increasing attention with the growing demand for gaming,video streaming,and many other applications.During the graphics rendering with OpenGL,host CPU needs the runtime attributes ...Graphics processors have received an increasing attention with the growing demand for gaming,video streaming,and many other applications.During the graphics rendering with OpenGL,host CPU needs the runtime attributes to move on to the next procedure of rendering,which covers almost all the function units of graphics pipeline.Current methods suffer from the memory capacity issues to hold the variables or huge amount of data parsing paths which can cause congestion on the interface between graphics processor and host CPU.This paper refers to the operation principle of commuting bus,and proposes a bus-like data feedback mechanism(BFM)to traverse all the pipeline stages and collect the run-time status data or execution error of graphics rendering,then send them back to the host CPU.BFM can work in parallel with the graphics rendering logic.This method can complete the data feedback ta.sk easily with only 0.6%increase of resource utilization and has no negative impact on performance,which also obtains 1.3 times speed enhancement compared with a traditional approach.展开更多
With a view to adopting to the globalized business landscape,organizations rely on third-party business relationships to enhance their operations,expand their capabilities,and drive innovation.While these collaboratio...With a view to adopting to the globalized business landscape,organizations rely on third-party business relationships to enhance their operations,expand their capabilities,and drive innovation.While these collaborations offer numerous benefits,they also introduce a range of risks that organizations must carefully mitigate.If the obligation to meet the regulatory requirements is added to the equation,mitigating the third-party risk related to data governance,becomes one of the biggest challenges.展开更多
Damage caused by people and organizations unconnected with the pipeline management is a major risk faced by pipelines,and its consequences can have a huge impact.However,the present measures to monitor this have major...Damage caused by people and organizations unconnected with the pipeline management is a major risk faced by pipelines,and its consequences can have a huge impact.However,the present measures to monitor this have major problems such as time delays,overlooking threats,and false alarms.To overcome the disadvantages of these methods,analysis of big location data from mobile phone systems was applied to prevent third-party damage to pipelines,and a third-party damage prevention system was developed for pipelines including encryption mobile phone data,data preprocessing,and extraction of characteristic patterns.By applying this to natural gas pipelines,a large amount of location data was collected for data feature recognition and model analysis.Third-party illegal construction and occupation activities were discovered in a timely manner.This is important for preventing third-party damage to pipelines.展开更多
Many organizations apply cloud computing to store and effectively process data for various applications.The user uploads the data in the cloud has less security due to the unreliable verification process of data integ...Many organizations apply cloud computing to store and effectively process data for various applications.The user uploads the data in the cloud has less security due to the unreliable verification process of data integrity.In this research,an enhanced Merkle hash tree method of effective authentication model is proposed in the multi-owner cloud to increase the security of the cloud data.Merkle Hash tree applies the leaf nodes with a hash tag and the non-leaf node contains the table of hash information of child to encrypt the large data.Merkle Hash tree provides the efficient mapping of data and easily identifies the changesmade in the data due to proper structure.The developed model supports privacy-preserving public auditing to provide a secure cloud storage system.The data owners upload the data in the cloud and edit the data using the private key.An enhanced Merkle hash tree method stores the data in the cloud server and splits it into batches.The data files requested by the data owner are audit by a third-party auditor and the multiowner authentication method is applied during the modification process to authenticate the user.The result shows that the proposed method reduces the encryption and decryption time for cloud data storage by 2–167 ms when compared to the existing Advanced Encryption Standard and Blowfish.展开更多
This paper investigates the Web data aggregation issues in multidimensional on-line analytical processing (MOLAP) and presents a rule-driven aggregation approach. The core of the approach is defining aggregate rules...This paper investigates the Web data aggregation issues in multidimensional on-line analytical processing (MOLAP) and presents a rule-driven aggregation approach. The core of the approach is defining aggregate rules. To define the rules for reading warehouse data and computing aggregates, a rule definition language - array aggregation language (AAL) is developed. This language treats an array as a function from indexes to values and provides syntax and semantics based on monads. External functions can be called in aggregation rules to specify array reading, writing, and aggregating. Based on the features of AAL, array operations are unified as function operations, which can be easily expressed and automatically evaluated. To implement the aggregation approach, a processor for computing aggregates over the base cube and for materializing them in the data warehouse is built, and the component structure and working principle of the aggregation processor are introduced.展开更多
Architectures based on the data flow computing model provide an alternative to the conventional Von-Neumann architecture that are widelyused for general purpose computing.Processors based on the data flow architecture...Architectures based on the data flow computing model provide an alternative to the conventional Von-Neumann architecture that are widelyused for general purpose computing.Processors based on the data flow architecture employ fine-grain data-driven parallelism.These architectures have thepotential to exploit the inherent parallelism in compute intensive applicationslike signal processing,image and video processing and so on and can thusachieve faster throughputs and higher power efficiency.In this paper,severaldata flow computing architectures are explored,and their main architecturalfeatures are studied.Furthermore,a classification of the processors is presented based on whether they employ either the data flow execution modelexclusively or in combination with the control flow model and are accordinglygrouped as exclusive data flow or hybrid architectures.The hybrid categoryis further subdivided as conjoint or accelerator-style architectures dependingon how they deploy and separate the data flow and control flow executionmodel within their execution blocks.Lastly,a brief comparison and discussionof their advantages and drawbacks is also considered.From this study weconclude that although the data flow architectures are seen to have maturedsignificantly,issues like data-structure handling and lack of efficient placementand scheduling algorithms have prevented these from becoming commerciallyviable.展开更多
In the present scenario of rapid growth in cloud computing models,several companies and users started to share their data on cloud servers.However,when the model is not completely trusted,the data owners face several ...In the present scenario of rapid growth in cloud computing models,several companies and users started to share their data on cloud servers.However,when the model is not completely trusted,the data owners face several security-related problems,such as user privacy breaches,data disclosure,data corruption,and so on,during the process of data outsourcing.For addressing and handling the security-related issues on Cloud,several models were proposed.With that concern,this paper develops a Privacy-Preserved Data Security Approach(PP-DSA)to provide the data security and data integrity for the out-sourcing data in Cloud Environment.Privacy preservation is ensured in this work with the Efficient Authentication Technique(EAT)using the Group Signature method that is applied with Third-Party Auditor(TPA).The role of the auditor is to secure the data and guarantee shared data integrity.Additionally,the Cloud Service Provider(CSP)and Data User(DU)can also be the attackers that are to be handled with the EAT.Here,the major objective of the work is to enhance cloud security and thereby,increase Quality of Service(QoS).The results are evaluated based on the model effectiveness,security,and reliability and show that the proposed model provides better results than existing works.展开更多
With high computational capacity, e.g. many-core and wide floating point SIMD units, Intel Xeon Phi shows promising prospect to accelerate high-performance computing(HPC) applications. But the application of Intel Xeo...With high computational capacity, e.g. many-core and wide floating point SIMD units, Intel Xeon Phi shows promising prospect to accelerate high-performance computing(HPC) applications. But the application of Intel Xeon Phi on data analytics workloads in data center is still an open question. Phibench 2.0 is built for the latest generation of Intel Xeon Phi(KNL, Knights Landing), based on the prior work PhiBench(also named BigDataBench-Phi), which is designed for the former generation of Intel Xeon Phi(KNC, Knights Corner). Workloads of PhiBench 2.0 are delicately chosen based on BigdataBench 4.0 and PhiBench 1.0. Other than that, these workloads are well optimized on KNL, and run on real-world datasets to evaluate their performance and scalability. Further, the microarchitecture-level characteristics including CPI, cache behavior, vectorization intensity, and branch prediction efficiency are analyzed and the impact of affinity and scheduling policy on performance are investigated. It is believed that the observations would help other researchers working on Intel Xeon Phi and data analytics workloads.展开更多
This report presents the design and implementation of a Distributed Data Acquisition、 Monitoring and Processing System (DDAMAP)。It is assumed that operations of a factory are organized into two-levels: client machin...This report presents the design and implementation of a Distributed Data Acquisition、 Monitoring and Processing System (DDAMAP)。It is assumed that operations of a factory are organized into two-levels: client machines at plant-level collect real-time raw data from sensors and measurement instrumentations and transfer them to a central processor over the Ethernets, and the central processor handles tasks of real-time data processing and monitoring. This system utilizes the computation power of Intel T2300 dual-core processor and parallel computations supported by multi-threading techniques. Our experiments show that these techniques can significantly improve the system performance and are viable solutions to real-time high-speed data processing.展开更多
文摘Slow speed of the Next-Generation sequencing data analysis, compared to the latest high throughput sequencers such as HiSeq X system, using the current industry standard genome analysis pipeline, has been the major factor of data backlog which limits the real-time use of genomic data for precision medicine. This study demonstrates the DRAGEN Bio-IT Processor as a potential candidate to remove the “Big Data Bottleneck”. DRAGENTM accomplished the variant calling, for ~40× coverage WGS data in as low as ~30 minutes using a single command, achieving the over 50-fold data analysis speed while maintaining the similar or better variant calling accuracy than the standard GATK Best Practices workflow. This systematic comparison provides the faster and efficient NGS data analysis alternative to NGS-based healthcare industries and research institutes to meet the requirement for precision medicine based healthcare.
基金the National Natural Science Foundation of China(Nos.61834005,61772417,61602377,61802304 and 61874087)the International Science and Technology Cooperation Program of Shaanxi China(No.2018KW-006)。
文摘Graphics processors have received an increasing attention with the growing demand for gaming,video streaming,and many other applications.During the graphics rendering with OpenGL,host CPU needs the runtime attributes to move on to the next procedure of rendering,which covers almost all the function units of graphics pipeline.Current methods suffer from the memory capacity issues to hold the variables or huge amount of data parsing paths which can cause congestion on the interface between graphics processor and host CPU.This paper refers to the operation principle of commuting bus,and proposes a bus-like data feedback mechanism(BFM)to traverse all the pipeline stages and collect the run-time status data or execution error of graphics rendering,then send them back to the host CPU.BFM can work in parallel with the graphics rendering logic.This method can complete the data feedback ta.sk easily with only 0.6%increase of resource utilization and has no negative impact on performance,which also obtains 1.3 times speed enhancement compared with a traditional approach.
文摘With a view to adopting to the globalized business landscape,organizations rely on third-party business relationships to enhance their operations,expand their capabilities,and drive innovation.While these collaborations offer numerous benefits,they also introduce a range of risks that organizations must carefully mitigate.If the obligation to meet the regulatory requirements is added to the equation,mitigating the third-party risk related to data governance,becomes one of the biggest challenges.
基金supported by Pipeline Management Data Analysis and Typical Model Research [Grant Number 2016B-3105-0501]CNPC (China National Petroleum Corporation) project, Research on Oil and Gas Pipeline Safety and Reliability Operating [Grant Number 2015-B025-0628]
文摘Damage caused by people and organizations unconnected with the pipeline management is a major risk faced by pipelines,and its consequences can have a huge impact.However,the present measures to monitor this have major problems such as time delays,overlooking threats,and false alarms.To overcome the disadvantages of these methods,analysis of big location data from mobile phone systems was applied to prevent third-party damage to pipelines,and a third-party damage prevention system was developed for pipelines including encryption mobile phone data,data preprocessing,and extraction of characteristic patterns.By applying this to natural gas pipelines,a large amount of location data was collected for data feature recognition and model analysis.Third-party illegal construction and occupation activities were discovered in a timely manner.This is important for preventing third-party damage to pipelines.
基金The Universiti Kebangsaan Malaysia(UKM)Research Grant Scheme FRGS/1/2020/ICT03/UKM/02/6 and GGPM-2020-028 funded this research.
文摘Many organizations apply cloud computing to store and effectively process data for various applications.The user uploads the data in the cloud has less security due to the unreliable verification process of data integrity.In this research,an enhanced Merkle hash tree method of effective authentication model is proposed in the multi-owner cloud to increase the security of the cloud data.Merkle Hash tree applies the leaf nodes with a hash tag and the non-leaf node contains the table of hash information of child to encrypt the large data.Merkle Hash tree provides the efficient mapping of data and easily identifies the changesmade in the data due to proper structure.The developed model supports privacy-preserving public auditing to provide a secure cloud storage system.The data owners upload the data in the cloud and edit the data using the private key.An enhanced Merkle hash tree method stores the data in the cloud server and splits it into batches.The data files requested by the data owner are audit by a third-party auditor and the multiowner authentication method is applied during the modification process to authenticate the user.The result shows that the proposed method reduces the encryption and decryption time for cloud data storage by 2–167 ms when compared to the existing Advanced Encryption Standard and Blowfish.
基金The National Natural Science Foundationof China (No60573165)
文摘This paper investigates the Web data aggregation issues in multidimensional on-line analytical processing (MOLAP) and presents a rule-driven aggregation approach. The core of the approach is defining aggregate rules. To define the rules for reading warehouse data and computing aggregates, a rule definition language - array aggregation language (AAL) is developed. This language treats an array as a function from indexes to values and provides syntax and semantics based on monads. External functions can be called in aggregation rules to specify array reading, writing, and aggregating. Based on the features of AAL, array operations are unified as function operations, which can be easily expressed and automatically evaluated. To implement the aggregation approach, a processor for computing aggregates over the base cube and for materializing them in the data warehouse is built, and the component structure and working principle of the aggregation processor are introduced.
文摘Architectures based on the data flow computing model provide an alternative to the conventional Von-Neumann architecture that are widelyused for general purpose computing.Processors based on the data flow architecture employ fine-grain data-driven parallelism.These architectures have thepotential to exploit the inherent parallelism in compute intensive applicationslike signal processing,image and video processing and so on and can thusachieve faster throughputs and higher power efficiency.In this paper,severaldata flow computing architectures are explored,and their main architecturalfeatures are studied.Furthermore,a classification of the processors is presented based on whether they employ either the data flow execution modelexclusively or in combination with the control flow model and are accordinglygrouped as exclusive data flow or hybrid architectures.The hybrid categoryis further subdivided as conjoint or accelerator-style architectures dependingon how they deploy and separate the data flow and control flow executionmodel within their execution blocks.Lastly,a brief comparison and discussionof their advantages and drawbacks is also considered.From this study weconclude that although the data flow architectures are seen to have maturedsignificantly,issues like data-structure handling and lack of efficient placementand scheduling algorithms have prevented these from becoming commerciallyviable.
文摘In the present scenario of rapid growth in cloud computing models,several companies and users started to share their data on cloud servers.However,when the model is not completely trusted,the data owners face several security-related problems,such as user privacy breaches,data disclosure,data corruption,and so on,during the process of data outsourcing.For addressing and handling the security-related issues on Cloud,several models were proposed.With that concern,this paper develops a Privacy-Preserved Data Security Approach(PP-DSA)to provide the data security and data integrity for the out-sourcing data in Cloud Environment.Privacy preservation is ensured in this work with the Efficient Authentication Technique(EAT)using the Group Signature method that is applied with Third-Party Auditor(TPA).The role of the auditor is to secure the data and guarantee shared data integrity.Additionally,the Cloud Service Provider(CSP)and Data User(DU)can also be the attackers that are to be handled with the EAT.Here,the major objective of the work is to enhance cloud security and thereby,increase Quality of Service(QoS).The results are evaluated based on the model effectiveness,security,and reliability and show that the proposed model provides better results than existing works.
基金Supported by the National High Technology Research and Development Program of China(No.2015AA015308)the National Key Research and Development Plan of China(No.2016YFB1000600,2016YFB1000601)the Major Program of National Natural Science Foundation of China(No.61432006)
文摘With high computational capacity, e.g. many-core and wide floating point SIMD units, Intel Xeon Phi shows promising prospect to accelerate high-performance computing(HPC) applications. But the application of Intel Xeon Phi on data analytics workloads in data center is still an open question. Phibench 2.0 is built for the latest generation of Intel Xeon Phi(KNL, Knights Landing), based on the prior work PhiBench(also named BigDataBench-Phi), which is designed for the former generation of Intel Xeon Phi(KNC, Knights Corner). Workloads of PhiBench 2.0 are delicately chosen based on BigdataBench 4.0 and PhiBench 1.0. Other than that, these workloads are well optimized on KNL, and run on real-world datasets to evaluate their performance and scalability. Further, the microarchitecture-level characteristics including CPI, cache behavior, vectorization intensity, and branch prediction efficiency are analyzed and the impact of affinity and scheduling policy on performance are investigated. It is believed that the observations would help other researchers working on Intel Xeon Phi and data analytics workloads.
文摘This report presents the design and implementation of a Distributed Data Acquisition、 Monitoring and Processing System (DDAMAP)。It is assumed that operations of a factory are organized into two-levels: client machines at plant-level collect real-time raw data from sensors and measurement instrumentations and transfer them to a central processor over the Ethernets, and the central processor handles tasks of real-time data processing and monitoring. This system utilizes the computation power of Intel T2300 dual-core processor and parallel computations supported by multi-threading techniques. Our experiments show that these techniques can significantly improve the system performance and are viable solutions to real-time high-speed data processing.