期刊文献+
共找到366,695篇文章
< 1 2 250 >
每页显示 20 50 100
Effective and Efficient Feature Selection for Large-scale Data Using Bayes' Theorem 被引量:7
1
作者 Subramanian Appavu Alias Balamurugan Ramasamy Rajaram 《International Journal of Automation and computing》 EI 2009年第1期62-71,共10页
This paper proposes one method of feature selection by using Bayes' theorem. The purpose of the proposed method is to reduce the computational complexity and increase the classification accuracy of the selected featu... This paper proposes one method of feature selection by using Bayes' theorem. The purpose of the proposed method is to reduce the computational complexity and increase the classification accuracy of the selected feature subsets. The dependence between two attributes (binary) is determined based on the probabilities of their joint values that contribute to positive and negative classification decisions. If opposing sets of attribute values do not lead to opposing classification decisions (zero probability), then the two attributes are considered independent of each other, otherwise dependent, and one of them can be removed and thus the number of attributes is reduced. The process must be repeated on all combinations of attributes. The paper also evaluates the approach by comparing it with existing feature selection algorithms over 8 datasets from University of California, Irvine (UCI) machine learning databases. The proposed method shows better results in terms of number of selected features, classification accuracy, and running time than most existing algorithms. 展开更多
关键词 data mining CLASSIFICATION feature selection dimensionality reduction Bayes' theorem.
在线阅读 下载PDF
Large-scale Data Collection and Analysis via a Gamified Intelligent Crowdsourcing Platform
2
作者 Simone Hantke Tobias Olenyi +2 位作者 Christoph Hausner Tobias Appel Bjorn Schuller 《International Journal of Automation and computing》 EI CSCD 2019年第4期427-436,共10页
In this contribution, we present iHEARu-PLAY, an online, multi-player platform for crowdsourced database collection and labelling, including the voice analysis application (VoiLA), a free web-based speech classificati... In this contribution, we present iHEARu-PLAY, an online, multi-player platform for crowdsourced database collection and labelling, including the voice analysis application (VoiLA), a free web-based speech classification tool designed to educate iHEARu-PLAY users about state-of-the-art speech analysis paradigms. Via this associated speech analysis web interface, in addition, VoiLA encourages users to take an active role in improving the service by providing labelled speech data. The platform allows users to record and upload voice samples directly from their browser, which are then analysed in a state-of-the-art classification pipeline. A set of pre-trained models targeting a range of speaker states and traits such as gender, valence, arousal, dominance, and 24 different discrete emotions is employed. The analysis results are visualised in a way that they are easily interpretable by laymen, giving users unique insights into how their voice sounds. We assess the effectiveness of iHEARu-PLAY and its integrated VoiLA feature via a series of user evaluations which indicate that it is fun and easy to use, and that it provides accurate and informative results. 展开更多
关键词 Human computation speech analysis crowdsourcing gamified data COLLECTION SURVEY
原文传递
Semi-supervised Affinity Propagation Clustering Based on Subtractive Clustering for Large-Scale Data Sets
3
作者 Qi Zhu Huifu Zhang Quanqin Yang 《国际计算机前沿大会会议论文集》 2015年第1期76-77,共2页
In the face of a growing number of large-scale data sets, affinity propagation clustering algorithm to calculate the process required to build the similarity matrix, will bring huge storage and computation. Therefore,... In the face of a growing number of large-scale data sets, affinity propagation clustering algorithm to calculate the process required to build the similarity matrix, will bring huge storage and computation. Therefore, this paper proposes an improved affinity propagation clustering algorithm. First, add the subtraction clustering, using the density value of the data points to obtain the point of initial clusters. Then, calculate the similarity distance between the initial cluster points, and reference the idea of semi-supervised clustering, adding pairs restriction information, structure sparse similarity matrix. Finally, the cluster representative points conduct AP clustering until a suitable cluster division.Experimental results show that the algorithm allows the calculation is greatly reduced, the similarity matrix storage capacity is also reduced, and better than the original algorithm on the clustering effect and processing speed. 展开更多
关键词 subtractive CLUSTERING INITIAL cluster AFFINITY propagation CLUSTERING SEMI-SUPERVISED CLUSTERING large-scale data SETS
在线阅读 下载PDF
Research on the Application of Industrial Robot Technology in Large-Scale Geodetic Data Acquisition
4
作者 Guangxiang Zhang Hongfang Cheng Huiwei Yang 《Journal of Environmental & Earth Sciences》 2026年第2期157-182,共26页
Large-scale geodetic data acquisition is fundamental to infrastructure lifecycle management,construction quality control,urban digital twins,and hazard monitoring,yet conventional surveying workflows remain labor-inte... Large-scale geodetic data acquisition is fundamental to infrastructure lifecycle management,construction quality control,urban digital twins,and hazard monitoring,yet conventional surveying workflows remain labor-intensive and difficult to scale in complex or hazardous environments.The industrial robot technology is proving to be an enabling technology in providing repeatable,high-throughput,and safety-conscious geodetic acquisition through its ability to offer controllable motion,stable sensor deployment,and autonomy coupled with perception stacks.The review itself is a synthesis of the recent studies on robot-based geodetic acquisition from the platform workflow application perspective.We summarize in the priority industrial robot platforms which have potential applications in geodesy,distinction being made between those based on autonomous mobile robots,mobile manipulators,fixed-base manipulators,cooperative multi-robot arrangements,and the design considerations underlying their construction:geometric stability,payload loading,and tightly constrained safety of operation.We then consider sensing configurations,principles of calibration and synchronization,as well as acquisition strategies that regulate the completeness of data and measurement consistency.The foundations of core processing are examined in light of georeferencing,registration,Simultaneous Localization and Mapping(SLAM)-based localization,and uncertainty propagation,which are essential to achieve survey-grade outputs.The evidence of application is discussed in the framework of infrastructure monitoring,construction,industrial facilities,urban/corridor mapping,mining,and indoor/underground settings,showing areas of obvious robotics advantage in repeatability and risk mitigation,as well as conditions of limitation because of the Global Navigation Satellite System(GNSS)denial,drift,calibration sensitivity,and inconsistent evaluation practices.Lastly,we determine research priorities such as benchmark datasets and metrics,accuracy-motivated autonomy,strong multisensor fusion with uncertainty results,and a closer association with Building Information Modeling(BIM)/digital twin pipelines. 展开更多
关键词 Industrial Robots Geodetic data Acquisition Mobile Mapping SLAM Sensor Fusion
在线阅读 下载PDF
Leveraging Large-Scale Data for Efficient Low-Bit CUTLASS GEMM Optimization via Neural Networks
5
作者 Hong Guo Nianhui Guo +1 位作者 Christoph Meinel Haojin Yang 《Big Data Mining and Analytics》 2026年第2期632-652,共21页
Optimizing GEneral Matrix Multiplication(GEMM)on GPU platforms is becoming increasingly critical to meet the growing computational demands of modern deep neural network research.While significant progress has been mad... Optimizing GEneral Matrix Multiplication(GEMM)on GPU platforms is becoming increasingly critical to meet the growing computational demands of modern deep neural network research.While significant progress has been made in accelerating high-precision GEMM,the optimization of low-bit GEMM remains a challenging open problem.The CUTLASS library provides highly optimized low-bit GEMM templates leveraging Tensor Cores;however,performance varies considerably depending on tile and pipeline configurations across different GPU architectures.In this work,we propose a novel auto-tuning framework for low-bit CUTLASS GEMM,utilizing a neural network model to predict optimal GEMM template parameters for target GPUs.Our model is trained on a synthetic dataset with up to 116100 unique samples,encompassing diverse matrix sizes across various Ampere GPUs,and is thoroughly evaluated on these hardware platforms.Experimental results show that our method achieves an accuracy of up to 95.11%on the validation dataset.Furthermore,real-time evaluations of low-bit data types on the A100 GPU demonstrate speedups of up to 1.99×for GEMM operations and 1.28×for the linear layer,compared to the default CUTLASS templates. 展开更多
关键词 Low-bit GEneral Matrix Multiplication(GEMM) CUTLASS optimization neural network auto-tuning Tensor Cores tile and pipeline large-scale dataset
原文传递
Are China's Classes Predominantly Centered Around Teacher-Presentation Instruction?——A Large-Scale Data Analysis Based on Classroom Intelligent Analysis Systems
6
作者 Yihe Gao Xiaozhe Yang 《ECNU Review of Education》 2025年第2期349-355,共7页
Systemarchitecture The Intelligent Teaching Team of the Shanghai Institute(Laboratory)of AI Education and the Institute of Curriculum and Instruction of East China Normal University collaborated to develop the High-Qu... Systemarchitecture The Intelligent Teaching Team of the Shanghai Institute(Laboratory)of AI Education and the Institute of Curriculum and Instruction of East China Normal University collaborated to develop the High-Quality Classroom Intelligent Analysis Standard system.This system was measured from the dimensions of Class Eficiency,Equity and Democracy,referred to as CEED system. 展开更多
关键词 A large-scale data analysis Chinese class classroom intelligent analysis systems
原文传递
Recent Progresses in Synthesis of Cyclic Polymers in Large-scale and Some Functionalized Composites
7
作者 QU Kairu GUO Lyuzhou +3 位作者 WANG Wenbin YAN Xuzhou CAO Xuezheng YANG Zhenzhong 《高等学校化学学报》 北大核心 2026年第1期42-57,共16页
Among various architectures of polymers,end-group-free rings have attracted growing interests due to their distinct physicochemical performances over the linear counterparts which are exemplified by reduced hydrodynam... Among various architectures of polymers,end-group-free rings have attracted growing interests due to their distinct physicochemical performances over the linear counterparts which are exemplified by reduced hydrodynamic size and slower degradation.It is key to develop facile methods to large-scale synthesis of polymer rings with tunable compositions and microstructures.Recent progresses in large-scale synthesis of polymer rings against single-chain dynamic nanoparticles,and the example applications in synchronous enhancing toughness and strength of polymer nanocomposites are summarized.Once there is the breakthrough in rational design and effective large-scale synthesis of polymer rings and their functional derivatives,a family of cyclic functional hybrids would be available,thus providing a new paradigm in developing polymer science and engineering. 展开更多
关键词 Cyclic polymer large-scale synthesis Single-chain nanoparticle Performance Composite
在线阅读 下载PDF
Four-dimensional integrated standardization practice in the construction of large-scale complex information systems
8
作者 Zhang Qi Chen Shuang Ni Xibing 《China Standardization》 2026年第2期62-66,共5页
Large-scale complex systems are integral to the functioning of various organizations within the national economy.Despite their significance,the lengthy construction cycles and the involvement of multiple entities ofte... Large-scale complex systems are integral to the functioning of various organizations within the national economy.Despite their significance,the lengthy construction cycles and the involvement of multiple entities often result in the deprioritization of standardized management practices,as they do not yield immediate benefits.The implementation of such systems typically encompasses the integrated phases of "development,construction,utiliz ation,and operation and maintenance".To enhance the overall delivery quality of these systems,it is imperative to dismantle the management barriers among these phases and adopt a holistic approach to standardized management.This paper takes a specific system project as a research object to identify common challenges,and proposes improvement strategies in the implementation of standar dized management.Empirical results indicate a substantial reduction in the system s full-lifecycle costs. 展开更多
关键词 large-scale complex information systems quality management STANDARDIZATION
原文传递
Two Unconventional Types of Large-scale Circulation Anomalies Inducing Heavy Rainfall over the Yangtze River Basin
9
作者 Xinyu LI Mengyao CHEN Riyu LU 《Advances in Atmospheric Sciences》 2026年第3期565-577,共13页
Summer rainfall in the Yangtze River basin(YRB)is favored by two key factors in the lower troposphere:the tropical anticyclonic anomaly over the western North Pacific and the extratropical northeasterly anomalies to t... Summer rainfall in the Yangtze River basin(YRB)is favored by two key factors in the lower troposphere:the tropical anticyclonic anomaly over the western North Pacific and the extratropical northeasterly anomalies to the north of the YRB.This study,however,found that approximately 46%of heavy rainfall events in the YRB occur when only one factor appears and the other is opposite signed.Accordingly,these heavy rainfall events can be categorized into two types:the extratropical northeasterly anomalies but tropical cyclonic anomaly(first unconventional type),and the tropical anticyclonic anomaly but extratropical southwesterly anomalies(second unconventional type).Anomalous water vapor convergence and upward motion exists for both types,but through different mechanisms.For the first type,the moisture convergence and upward motion are induced by a cyclonic anomaly over the YRB,which appears in the mid and lower troposphere and originates from the upstream region.For the second type,a mid-tropospheric cyclonic anomaly over Lake Baikal extends southward and results in southwesterly anomalies over the YRB,in conjunction with the tropical anticyclonic anomaly.The southwesterly anomalies transport water vapor to the YRB and lead to upward motion through warm advection.This study emphasizes the role of mid-tropospheric circulations in inducing heavy rainfall in the YRB. 展开更多
关键词 Yangtze River basin heavy rainfall large-scale circulation anomalies
在线阅读 下载PDF
Adaptive event-triggered decentralized control for nonlinear interconnected large-scale systems with actuator failures:a fully actuated system approach
10
作者 Yueyao Ye Yanan Qi +2 位作者 Yiyu Feng Xiaofeng Xu Xianfu Zhang 《Control Theory and Technology》 2026年第1期82-95,共14页
This study develops an event-triggered control strategy utilizing the fully actuated system approach for nonlinear interconnected large-scale systems containing actuator failures.First,to reduce the complexity of the ... This study develops an event-triggered control strategy utilizing the fully actuated system approach for nonlinear interconnected large-scale systems containing actuator failures.First,to reduce the complexity of the design process,we transform the studied system into the form of a fully actuated system through a state transformation.Then,to address the unknown nonlinear functions and actuator fault parameters,we employ neural networks and adaptive estimation techniques,respectively.Moreover,to reduce the control cost and improve the control efficiency,we introduce event-triggered inputs into the control strategy.It is proved by the Lyapunov stability analysis that all signals of the closed-loop system are bounded and the output of system eventually converge to a bounded region.The efficacy of the control approach is ultimately demonstrated via the simulation of an actual machine feeding system. 展开更多
关键词 Nonlinear interconnected large-scale systems Fully actuated system approach Actuator failures Neural networks
原文传递
Handling missing data in large-scale TBM datasets:Methods,strategies,and applications 被引量:1
11
作者 Haohan Xiao Ruilang Cao +5 位作者 Zuyu Chen Chengyu Hong Jun Wang Min Yao Litao Fan Teng Luo 《Intelligent Geoengineering》 2025年第3期109-125,共17页
Substantial advancements have been achieved in Tunnel Boring Machine(TBM)technology and monitoring systems,yet the presence of missing data impedes accurate analysis and interpretation of TBM monitoring results.This s... Substantial advancements have been achieved in Tunnel Boring Machine(TBM)technology and monitoring systems,yet the presence of missing data impedes accurate analysis and interpretation of TBM monitoring results.This study aims to investigate the issue of missing data in extensive TBM datasets.Through a comprehensive literature review,we analyze the mechanism of missing TBM data and compare different imputation methods,including statistical analysis and machine learning algorithms.We also examine the impact of various missing patterns and rates on the efficacy of these methods.Finally,we propose a dynamic interpolation strategy tailored for TBM engineering sites.The research results show that K-Nearest Neighbors(KNN)and Random Forest(RF)algorithms can achieve good interpolation results;As the missing rate increases,the interpolation effect of different methods will decrease;The interpolation effect of block missing is poor,followed by mixed missing,and the interpolation effect of sporadic missing is the best.On-site application results validate the proposed interpolation strategy's capability to achieve robust missing value interpolation effects,applicable in ML scenarios such as parameter optimization,attitude warning,and pressure prediction.These findings contribute to enhancing the efficiency of TBM missing data processing,offering more effective support for large-scale TBM monitoring datasets. 展开更多
关键词 Tunnel boring machine(TBM) Missing data imputation Machine learning(ML) Time series interpolation data preprocessing Real-time data stream
在线阅读 下载PDF
Trend Analysis of Large-Scale Twitter Data Based on Witnesses during a Hazardous Event: A Case Study on California Wildfire Evacuation
12
作者 Syed A. Morshed Khandakar Mamun Ahmed +1 位作者 Kamar Amine Kazi Ashraf Moinuddin 《World Journal of Engineering and Technology》 2021年第2期229-239,共11页
Social media data created a paradigm shift in assessing situational awareness during a natural disaster or emergencies such as wildfire, hurricane, tropical storm etc. Twitter as an emerging data source is an effectiv... Social media data created a paradigm shift in assessing situational awareness during a natural disaster or emergencies such as wildfire, hurricane, tropical storm etc. Twitter as an emerging data source is an effective and innovative digital platform to observe trend from social media users’ perspective who are direct or indirect witnesses of the calamitous event. This paper aims to collect and analyze twitter data related to the recent wildfire in California to perform a trend analysis by classifying firsthand and credible information from Twitter users. This work investigates tweets on the recent wildfire in California and classifies them based on witnesses into two types: 1) direct witnesses and 2) indirect witnesses. The collected and analyzed information can be useful for law enforcement agencies and humanitarian organizations for communication and verification of the situational awareness during wildfire hazards. Trend analysis is an aggregated approach that includes sentimental analysis and topic modeling performed through domain-expert manual annotation and machine learning. Trend analysis ultimately builds a fine-grained analysis to assess evacuation routes and provide valuable information to the firsthand emergency responders<span style="font-family:Verdana;">.</span> 展开更多
关键词 WILDFIRE EVACUATION TWITTER large-scale data Topic Model Sentimental Analysis Trend Analysis
在线阅读 下载PDF
Large-scale data archiving: At the interface of archive science and computer science
13
作者 Chaolemen Borjigin Qingwen Jin 《Data Science and Informetrics》 2023年第3期1-17,共17页
Both computer science and archival science are concerned with archiving large-scale data,but they have different focuses.Large-scale data archiving in computer science focuses on technical aspects that can reduce the ... Both computer science and archival science are concerned with archiving large-scale data,but they have different focuses.Large-scale data archiving in computer science focuses on technical aspects that can reduce the cost of data storage and improve the reliability and efficiency of Big Data management.Its weaknesses lie in inadequate and non-standardized management.Archiving in archival science focuses on the management aspects and neglects the necessary technical considerations,resulting in high storage and retention costs and poor ability to manage Big Data.Therefore,the integration of large-scale data archiving and archival theory can balance the existing research limitations of the two fields and propose two research topics for related research-archival management of Big Data and large-scale management of archived Big Data. 展开更多
关键词 data archiving Archive Science Computer Science large-scale data data storage
原文传递
Minimum Epsilon-Kernel Computation for Large-Scale Data Processing
14
作者 Hong-Jie Guo Jian-Zhong Li Hong Gao 《Journal of Computer Science & Technology》 SCIE EI CSCD 2022年第6期1398-1411,共14页
Kernel is a kind of data summary which is elaborately extracted from a large dataset.Given a problem,the solution obtained from the kernel is an approximate version of the solution obtained from the whole dataset with... Kernel is a kind of data summary which is elaborately extracted from a large dataset.Given a problem,the solution obtained from the kernel is an approximate version of the solution obtained from the whole dataset with a provable approximate ratio.It is widely used in geometric optimization,clustering,and approximate query processing,etc.,for scaling them up to massive data.In this paper,we focus on the minimumε-kernel(MK)computation that asks for a kernel of the smallest size for large-scale data processing.For the open problem presented by Wang et al.that whether the minimumε-coreset(MC)problem and the MK problem can be reduced to each other,we first formalize the MK problem and analyze its complexity.Due to the NP-hardness of the MK problem in three or higher dimensions,an approximate algorithm,namely Set Cover-Based Minimumε-Kernel algorithm(SCMK),is developed to solve it.We prove that the MC problem and the MK problem can be Turing-reduced to each other.Then,we discuss the update of MK under insertion and deletion operations,respectively.Finally,a randomized algorithm,called the Randomized Algorithm of Set Cover-Based Minimumε-Kernel algorithm(RA-SCMK),is utilized to further reduce the complexity of SCMK.The efficiency and effectiveness of SCMK and RA-SCMK are verified by experimental results on real-world and synthetic datasets.Experiments show that the kernel sizes of SCMK are 2x and 17.6x smaller than those of an ANN-based method on real-world and synthetic datasets,respectively.The speedup ratio of SCMK over the ANN-based method is 5.67 on synthetic datasets.RA-SCMK runs up to three times faster than SCMK on synthetic datasets. 展开更多
关键词 approximate query processing KERNEL large-scale dataset NP-HARD
原文传递
Challenges in the Large-Scale Deployment of CCUS 被引量:3
15
作者 Zhenhua Rui Lianbo Zeng Birol Dindoruk 《Engineering》 2025年第1期17-20,共4页
1.Introduction Climate change mitigation pathways aimed at limiting global anthropogenic carbon dioxide(CO_(2))emissions while striving to constrain the global temperature increase to below 2℃—as outlined by the Int... 1.Introduction Climate change mitigation pathways aimed at limiting global anthropogenic carbon dioxide(CO_(2))emissions while striving to constrain the global temperature increase to below 2℃—as outlined by the Intergovernmental Panel on Climate Change(IPCC)—consistently predict the widespread implementation of CO_(2)geological storage on a global scale. 展开更多
关键词 large-scale Deployment CCUS CHALLENGES Climate Change Mitigation
在线阅读 下载PDF
Network and data location aware approach for simultaneous job scheduling and data replication in large-scale data grid environments 被引量:3
16
作者 Najme MANSOURI 《Frontiers of Computer Science》 SCIE EI CSCD 2014年第3期391-408,共18页
Data Grid integrates graphically distributed resources for solving data intensive scientific applications. Effective scheduling in Grid can reduce the amount of data transferred among nodes by submitting a job to a no... Data Grid integrates graphically distributed resources for solving data intensive scientific applications. Effective scheduling in Grid can reduce the amount of data transferred among nodes by submitting a job to a node, where most of the requested data files are available. Scheduling is a traditional problem in parallel and distributed system. However, due to special issues and goals of Grid, traditional approach is not effective in this environment any more. Therefore, it is necessary to propose methods specialized for this kind of parallel and distributed system. Another solution is to use a data replication strategy to create multiple copies of files and store them in convenient locations to shorten file access times. To utilize the above two concepts, in this paper we develop a job scheduling policy, called hierarchical job scheduling strategy (HJSS), and a dynamic data replication strategy, called advanced dynamic hierarchical replication strategy (ADHRS), to improve the data access efficiencies in a hierarchical Data Grid. HJSS uses hierarchical scheduling to reduce the search time for an appropriate computing node. It considers network characteristics, number of jobs waiting in queue, file locations, and disk read speed of storage drive at data sources. Moreover, due to the limited storage capacity, a good replica replacement algorithm is needed. We present a novel replacement strategy which deletes files in two steps when free space is not enough for the new replica: first, it deletes those files with minimum time for transferring. Second, if space is still insufficient then it considers the last time the replica was requested, number of access, size of replica and file transfer time. The simulation results show that our proposed algorithm has better performance in comparison with other algorithms in terms of job execution time, number of intercommunications, number of replications, hit ratio, computing resource usage and storage usage. 展开更多
关键词 data replication data grid OPTORSIM job scheduling simulation
原文传递
Influence of ground fissures on metro shield tunnels:Large-scale experiment and numerical analysis 被引量:2
17
作者 Yuxuan Gou Qiangbing Huang +2 位作者 Nina Liu Dongping Chen Jianbing Peng 《Journal of Rock Mechanics and Geotechnical Engineering》 2025年第3期1356-1377,共22页
The recent upsurge in metro construction emphasizes the necessity of understanding the mechanical performance of metro shield tunnel subjected to the influence of ground fissures.In this study,a largescale experiment,... The recent upsurge in metro construction emphasizes the necessity of understanding the mechanical performance of metro shield tunnel subjected to the influence of ground fissures.In this study,a largescale experiment,in combination with numerical simulation,was conducted to investigate the influence of ground fissures on a metro shield tunnel.The results indicate that the lining contact pressure at the vault increases in the hanging wall while decreases in the footwall,resulting in a two-dimensional stress state of vertical shear and axial tension-compression,and simultaneous vertical dislocation and axial tilt for the segments around the ground fissure.In addition,the damage to curved bolts includes tensile yield,flexural yield,and shear twist,leading to obvious concrete lining damage,particularly at the vault,arch bottom,and hance,indicating that the joints in these positions are weak areas.The shield tunnel orthogonal to the ground fissure ultimately experiences shear failure,suggesting that the maximum actual dislocation of ground fissure that the structure can withstand is approximately 20 cm,and five segment rings in the hanging wall and six segment rings in the footwall also need to be reinforced.This study could provide a reference for metro design in ground fissure sites. 展开更多
关键词 Shield tunnel Ground fissure large-scale experiment Mechanical performance Failure mode
在线阅读 下载PDF
Survey of Large-Scale Data Management Systems for Big Data Applications 被引量:4
18
作者 吴冷冬 袁立言 犹嘉槐 《Journal of Computer Science & Technology》 SCIE EI CSCD 2015年第1期163-183,共21页
Today, data is flowing into various organizations at an unprecedented scale. The ability to scale out for processing an enhanced workload has become an important factor for the proliferation and popularization of data... Today, data is flowing into various organizations at an unprecedented scale. The ability to scale out for processing an enhanced workload has become an important factor for the proliferation and popularization of database systems. Big data applications demand and consequently lead to the developments of diverse large-scale data management systems in different organizations, ranging from traditional database vendors to new emerging Internet-based enterprises. In this survey, we investigate, characterize, and analyze the large-scale data management systems in depth and develop comprehensive taxonomies for various critical aspects covering the data model, the system architecture, and the consistency model. We map the prevailing highly scalable data management systems to the proposed taxonomies, not only to classify the common techniques but also to provide a basis for analyzing current system scalability limitations. To overcome these limitations, we predicate and highlight the possible principles that future efforts need to be undertaken for the next generation large-scale data management systems. 展开更多
关键词 data model system architecture consistency model SCALABILITY
原文传递
Low-power task scheduling algorithm for large-scale cloud data centers 被引量:3
19
作者 Xiaolong Xu Jiaxing Wu +1 位作者 Geng Yang Ruchuan Wang 《Journal of Systems Engineering and Electronics》 SCIE EI CSCD 2013年第5期870-878,共9页
How to effectively reduce the energy consumption of large-scale data centers is a key issue in cloud computing. This paper presents a novel low-power task scheduling algorithm (L3SA) for large-scale cloud data cente... How to effectively reduce the energy consumption of large-scale data centers is a key issue in cloud computing. This paper presents a novel low-power task scheduling algorithm (L3SA) for large-scale cloud data centers. The winner tree is introduced to make the data nodes as the leaf nodes of the tree and the final winner on the purpose of reducing energy consumption is selected. The complexity of large-scale cloud data centers is fully consider, and the task comparson coefficient is defined to make task scheduling strategy more reasonable. Experiments and performance analysis show that the proposed algorithm can effectively improve the node utilization, and reduce the overall power consumption of the cloud data center. 展开更多
关键词 cloud computing data center task scheduling energy consumption.
在线阅读 下载PDF
A Distributed Framework for Large-scale Protein-protein Interaction Data Analysis and Prediction Using MapReduce 被引量:3
20
作者 Lun Hu Shicheng Yang +3 位作者 Xin Luo Huaqiang Yuan Khaled Sedraoui MengChu Zhou 《IEEE/CAA Journal of Automatica Sinica》 SCIE EI CSCD 2022年第1期160-172,共13页
Protein-protein interactions are of great significance for human to understand the functional mechanisms of proteins.With the rapid development of high-throughput genomic technologies,massive protein-protein interacti... Protein-protein interactions are of great significance for human to understand the functional mechanisms of proteins.With the rapid development of high-throughput genomic technologies,massive protein-protein interaction(PPI)data have been generated,making it very difficult to analyze them efficiently.To address this problem,this paper presents a distributed framework by reimplementing one of state-of-the-art algorithms,i.e.,CoFex,using MapReduce.To do so,an in-depth analysis of its limitations is conducted from the perspectives of efficiency and memory consumption when applying it for large-scale PPI data analysis and prediction.Respective solutions are then devised to overcome these limitations.In particular,we adopt a novel tree-based data structure to reduce the heavy memory consumption caused by the huge sequence information of proteins.After that,its procedure is modified by following the MapReduce framework to take the prediction task distributively.A series of extensive experiments have been conducted to evaluate the performance of our framework in terms of both efficiency and accuracy.Experimental results well demonstrate that the proposed framework can considerably improve its computational efficiency by more than two orders of magnitude while retaining the same high accuracy. 展开更多
关键词 Distributed computing large-scale prediction machine learning MAPREDUCE protein-protein interaction(PPI)
在线阅读 下载PDF
上一页 1 2 250 下一页 到第
使用帮助 返回顶部