This paper proposes one method of feature selection by using Bayes' theorem. The purpose of the proposed method is to reduce the computational complexity and increase the classification accuracy of the selected featu...This paper proposes one method of feature selection by using Bayes' theorem. The purpose of the proposed method is to reduce the computational complexity and increase the classification accuracy of the selected feature subsets. The dependence between two attributes (binary) is determined based on the probabilities of their joint values that contribute to positive and negative classification decisions. If opposing sets of attribute values do not lead to opposing classification decisions (zero probability), then the two attributes are considered independent of each other, otherwise dependent, and one of them can be removed and thus the number of attributes is reduced. The process must be repeated on all combinations of attributes. The paper also evaluates the approach by comparing it with existing feature selection algorithms over 8 datasets from University of California, Irvine (UCI) machine learning databases. The proposed method shows better results in terms of number of selected features, classification accuracy, and running time than most existing algorithms.展开更多
In this contribution, we present iHEARu-PLAY, an online, multi-player platform for crowdsourced database collection and labelling, including the voice analysis application (VoiLA), a free web-based speech classificati...In this contribution, we present iHEARu-PLAY, an online, multi-player platform for crowdsourced database collection and labelling, including the voice analysis application (VoiLA), a free web-based speech classification tool designed to educate iHEARu-PLAY users about state-of-the-art speech analysis paradigms. Via this associated speech analysis web interface, in addition, VoiLA encourages users to take an active role in improving the service by providing labelled speech data. The platform allows users to record and upload voice samples directly from their browser, which are then analysed in a state-of-the-art classification pipeline. A set of pre-trained models targeting a range of speaker states and traits such as gender, valence, arousal, dominance, and 24 different discrete emotions is employed. The analysis results are visualised in a way that they are easily interpretable by laymen, giving users unique insights into how their voice sounds. We assess the effectiveness of iHEARu-PLAY and its integrated VoiLA feature via a series of user evaluations which indicate that it is fun and easy to use, and that it provides accurate and informative results.展开更多
In the face of a growing number of large-scale data sets, affinity propagation clustering algorithm to calculate the process required to build the similarity matrix, will bring huge storage and computation. Therefore,...In the face of a growing number of large-scale data sets, affinity propagation clustering algorithm to calculate the process required to build the similarity matrix, will bring huge storage and computation. Therefore, this paper proposes an improved affinity propagation clustering algorithm. First, add the subtraction clustering, using the density value of the data points to obtain the point of initial clusters. Then, calculate the similarity distance between the initial cluster points, and reference the idea of semi-supervised clustering, adding pairs restriction information, structure sparse similarity matrix. Finally, the cluster representative points conduct AP clustering until a suitable cluster division.Experimental results show that the algorithm allows the calculation is greatly reduced, the similarity matrix storage capacity is also reduced, and better than the original algorithm on the clustering effect and processing speed.展开更多
Large-scale geodetic data acquisition is fundamental to infrastructure lifecycle management,construction quality control,urban digital twins,and hazard monitoring,yet conventional surveying workflows remain labor-inte...Large-scale geodetic data acquisition is fundamental to infrastructure lifecycle management,construction quality control,urban digital twins,and hazard monitoring,yet conventional surveying workflows remain labor-intensive and difficult to scale in complex or hazardous environments.The industrial robot technology is proving to be an enabling technology in providing repeatable,high-throughput,and safety-conscious geodetic acquisition through its ability to offer controllable motion,stable sensor deployment,and autonomy coupled with perception stacks.The review itself is a synthesis of the recent studies on robot-based geodetic acquisition from the platform workflow application perspective.We summarize in the priority industrial robot platforms which have potential applications in geodesy,distinction being made between those based on autonomous mobile robots,mobile manipulators,fixed-base manipulators,cooperative multi-robot arrangements,and the design considerations underlying their construction:geometric stability,payload loading,and tightly constrained safety of operation.We then consider sensing configurations,principles of calibration and synchronization,as well as acquisition strategies that regulate the completeness of data and measurement consistency.The foundations of core processing are examined in light of georeferencing,registration,Simultaneous Localization and Mapping(SLAM)-based localization,and uncertainty propagation,which are essential to achieve survey-grade outputs.The evidence of application is discussed in the framework of infrastructure monitoring,construction,industrial facilities,urban/corridor mapping,mining,and indoor/underground settings,showing areas of obvious robotics advantage in repeatability and risk mitigation,as well as conditions of limitation because of the Global Navigation Satellite System(GNSS)denial,drift,calibration sensitivity,and inconsistent evaluation practices.Lastly,we determine research priorities such as benchmark datasets and metrics,accuracy-motivated autonomy,strong multisensor fusion with uncertainty results,and a closer association with Building Information Modeling(BIM)/digital twin pipelines.展开更多
Optimizing GEneral Matrix Multiplication(GEMM)on GPU platforms is becoming increasingly critical to meet the growing computational demands of modern deep neural network research.While significant progress has been mad...Optimizing GEneral Matrix Multiplication(GEMM)on GPU platforms is becoming increasingly critical to meet the growing computational demands of modern deep neural network research.While significant progress has been made in accelerating high-precision GEMM,the optimization of low-bit GEMM remains a challenging open problem.The CUTLASS library provides highly optimized low-bit GEMM templates leveraging Tensor Cores;however,performance varies considerably depending on tile and pipeline configurations across different GPU architectures.In this work,we propose a novel auto-tuning framework for low-bit CUTLASS GEMM,utilizing a neural network model to predict optimal GEMM template parameters for target GPUs.Our model is trained on a synthetic dataset with up to 116100 unique samples,encompassing diverse matrix sizes across various Ampere GPUs,and is thoroughly evaluated on these hardware platforms.Experimental results show that our method achieves an accuracy of up to 95.11%on the validation dataset.Furthermore,real-time evaluations of low-bit data types on the A100 GPU demonstrate speedups of up to 1.99×for GEMM operations and 1.28×for the linear layer,compared to the default CUTLASS templates.展开更多
Systemarchitecture The Intelligent Teaching Team of the Shanghai Institute(Laboratory)of AI Education and the Institute of Curriculum and Instruction of East China Normal University collaborated to develop the High-Qu...Systemarchitecture The Intelligent Teaching Team of the Shanghai Institute(Laboratory)of AI Education and the Institute of Curriculum and Instruction of East China Normal University collaborated to develop the High-Quality Classroom Intelligent Analysis Standard system.This system was measured from the dimensions of Class Eficiency,Equity and Democracy,referred to as CEED system.展开更多
Among various architectures of polymers,end-group-free rings have attracted growing interests due to their distinct physicochemical performances over the linear counterparts which are exemplified by reduced hydrodynam...Among various architectures of polymers,end-group-free rings have attracted growing interests due to their distinct physicochemical performances over the linear counterparts which are exemplified by reduced hydrodynamic size and slower degradation.It is key to develop facile methods to large-scale synthesis of polymer rings with tunable compositions and microstructures.Recent progresses in large-scale synthesis of polymer rings against single-chain dynamic nanoparticles,and the example applications in synchronous enhancing toughness and strength of polymer nanocomposites are summarized.Once there is the breakthrough in rational design and effective large-scale synthesis of polymer rings and their functional derivatives,a family of cyclic functional hybrids would be available,thus providing a new paradigm in developing polymer science and engineering.展开更多
Large-scale complex systems are integral to the functioning of various organizations within the national economy.Despite their significance,the lengthy construction cycles and the involvement of multiple entities ofte...Large-scale complex systems are integral to the functioning of various organizations within the national economy.Despite their significance,the lengthy construction cycles and the involvement of multiple entities often result in the deprioritization of standardized management practices,as they do not yield immediate benefits.The implementation of such systems typically encompasses the integrated phases of "development,construction,utiliz ation,and operation and maintenance".To enhance the overall delivery quality of these systems,it is imperative to dismantle the management barriers among these phases and adopt a holistic approach to standardized management.This paper takes a specific system project as a research object to identify common challenges,and proposes improvement strategies in the implementation of standar dized management.Empirical results indicate a substantial reduction in the system s full-lifecycle costs.展开更多
Summer rainfall in the Yangtze River basin(YRB)is favored by two key factors in the lower troposphere:the tropical anticyclonic anomaly over the western North Pacific and the extratropical northeasterly anomalies to t...Summer rainfall in the Yangtze River basin(YRB)is favored by two key factors in the lower troposphere:the tropical anticyclonic anomaly over the western North Pacific and the extratropical northeasterly anomalies to the north of the YRB.This study,however,found that approximately 46%of heavy rainfall events in the YRB occur when only one factor appears and the other is opposite signed.Accordingly,these heavy rainfall events can be categorized into two types:the extratropical northeasterly anomalies but tropical cyclonic anomaly(first unconventional type),and the tropical anticyclonic anomaly but extratropical southwesterly anomalies(second unconventional type).Anomalous water vapor convergence and upward motion exists for both types,but through different mechanisms.For the first type,the moisture convergence and upward motion are induced by a cyclonic anomaly over the YRB,which appears in the mid and lower troposphere and originates from the upstream region.For the second type,a mid-tropospheric cyclonic anomaly over Lake Baikal extends southward and results in southwesterly anomalies over the YRB,in conjunction with the tropical anticyclonic anomaly.The southwesterly anomalies transport water vapor to the YRB and lead to upward motion through warm advection.This study emphasizes the role of mid-tropospheric circulations in inducing heavy rainfall in the YRB.展开更多
This study develops an event-triggered control strategy utilizing the fully actuated system approach for nonlinear interconnected large-scale systems containing actuator failures.First,to reduce the complexity of the ...This study develops an event-triggered control strategy utilizing the fully actuated system approach for nonlinear interconnected large-scale systems containing actuator failures.First,to reduce the complexity of the design process,we transform the studied system into the form of a fully actuated system through a state transformation.Then,to address the unknown nonlinear functions and actuator fault parameters,we employ neural networks and adaptive estimation techniques,respectively.Moreover,to reduce the control cost and improve the control efficiency,we introduce event-triggered inputs into the control strategy.It is proved by the Lyapunov stability analysis that all signals of the closed-loop system are bounded and the output of system eventually converge to a bounded region.The efficacy of the control approach is ultimately demonstrated via the simulation of an actual machine feeding system.展开更多
Substantial advancements have been achieved in Tunnel Boring Machine(TBM)technology and monitoring systems,yet the presence of missing data impedes accurate analysis and interpretation of TBM monitoring results.This s...Substantial advancements have been achieved in Tunnel Boring Machine(TBM)technology and monitoring systems,yet the presence of missing data impedes accurate analysis and interpretation of TBM monitoring results.This study aims to investigate the issue of missing data in extensive TBM datasets.Through a comprehensive literature review,we analyze the mechanism of missing TBM data and compare different imputation methods,including statistical analysis and machine learning algorithms.We also examine the impact of various missing patterns and rates on the efficacy of these methods.Finally,we propose a dynamic interpolation strategy tailored for TBM engineering sites.The research results show that K-Nearest Neighbors(KNN)and Random Forest(RF)algorithms can achieve good interpolation results;As the missing rate increases,the interpolation effect of different methods will decrease;The interpolation effect of block missing is poor,followed by mixed missing,and the interpolation effect of sporadic missing is the best.On-site application results validate the proposed interpolation strategy's capability to achieve robust missing value interpolation effects,applicable in ML scenarios such as parameter optimization,attitude warning,and pressure prediction.These findings contribute to enhancing the efficiency of TBM missing data processing,offering more effective support for large-scale TBM monitoring datasets.展开更多
Social media data created a paradigm shift in assessing situational awareness during a natural disaster or emergencies such as wildfire, hurricane, tropical storm etc. Twitter as an emerging data source is an effectiv...Social media data created a paradigm shift in assessing situational awareness during a natural disaster or emergencies such as wildfire, hurricane, tropical storm etc. Twitter as an emerging data source is an effective and innovative digital platform to observe trend from social media users’ perspective who are direct or indirect witnesses of the calamitous event. This paper aims to collect and analyze twitter data related to the recent wildfire in California to perform a trend analysis by classifying firsthand and credible information from Twitter users. This work investigates tweets on the recent wildfire in California and classifies them based on witnesses into two types: 1) direct witnesses and 2) indirect witnesses. The collected and analyzed information can be useful for law enforcement agencies and humanitarian organizations for communication and verification of the situational awareness during wildfire hazards. Trend analysis is an aggregated approach that includes sentimental analysis and topic modeling performed through domain-expert manual annotation and machine learning. Trend analysis ultimately builds a fine-grained analysis to assess evacuation routes and provide valuable information to the firsthand emergency responders<span style="font-family:Verdana;">.</span>展开更多
Both computer science and archival science are concerned with archiving large-scale data,but they have different focuses.Large-scale data archiving in computer science focuses on technical aspects that can reduce the ...Both computer science and archival science are concerned with archiving large-scale data,but they have different focuses.Large-scale data archiving in computer science focuses on technical aspects that can reduce the cost of data storage and improve the reliability and efficiency of Big Data management.Its weaknesses lie in inadequate and non-standardized management.Archiving in archival science focuses on the management aspects and neglects the necessary technical considerations,resulting in high storage and retention costs and poor ability to manage Big Data.Therefore,the integration of large-scale data archiving and archival theory can balance the existing research limitations of the two fields and propose two research topics for related research-archival management of Big Data and large-scale management of archived Big Data.展开更多
Kernel is a kind of data summary which is elaborately extracted from a large dataset.Given a problem,the solution obtained from the kernel is an approximate version of the solution obtained from the whole dataset with...Kernel is a kind of data summary which is elaborately extracted from a large dataset.Given a problem,the solution obtained from the kernel is an approximate version of the solution obtained from the whole dataset with a provable approximate ratio.It is widely used in geometric optimization,clustering,and approximate query processing,etc.,for scaling them up to massive data.In this paper,we focus on the minimumε-kernel(MK)computation that asks for a kernel of the smallest size for large-scale data processing.For the open problem presented by Wang et al.that whether the minimumε-coreset(MC)problem and the MK problem can be reduced to each other,we first formalize the MK problem and analyze its complexity.Due to the NP-hardness of the MK problem in three or higher dimensions,an approximate algorithm,namely Set Cover-Based Minimumε-Kernel algorithm(SCMK),is developed to solve it.We prove that the MC problem and the MK problem can be Turing-reduced to each other.Then,we discuss the update of MK under insertion and deletion operations,respectively.Finally,a randomized algorithm,called the Randomized Algorithm of Set Cover-Based Minimumε-Kernel algorithm(RA-SCMK),is utilized to further reduce the complexity of SCMK.The efficiency and effectiveness of SCMK and RA-SCMK are verified by experimental results on real-world and synthetic datasets.Experiments show that the kernel sizes of SCMK are 2x and 17.6x smaller than those of an ANN-based method on real-world and synthetic datasets,respectively.The speedup ratio of SCMK over the ANN-based method is 5.67 on synthetic datasets.RA-SCMK runs up to three times faster than SCMK on synthetic datasets.展开更多
1.Introduction Climate change mitigation pathways aimed at limiting global anthropogenic carbon dioxide(CO_(2))emissions while striving to constrain the global temperature increase to below 2℃—as outlined by the Int...1.Introduction Climate change mitigation pathways aimed at limiting global anthropogenic carbon dioxide(CO_(2))emissions while striving to constrain the global temperature increase to below 2℃—as outlined by the Intergovernmental Panel on Climate Change(IPCC)—consistently predict the widespread implementation of CO_(2)geological storage on a global scale.展开更多
Data Grid integrates graphically distributed resources for solving data intensive scientific applications. Effective scheduling in Grid can reduce the amount of data transferred among nodes by submitting a job to a no...Data Grid integrates graphically distributed resources for solving data intensive scientific applications. Effective scheduling in Grid can reduce the amount of data transferred among nodes by submitting a job to a node, where most of the requested data files are available. Scheduling is a traditional problem in parallel and distributed system. However, due to special issues and goals of Grid, traditional approach is not effective in this environment any more. Therefore, it is necessary to propose methods specialized for this kind of parallel and distributed system. Another solution is to use a data replication strategy to create multiple copies of files and store them in convenient locations to shorten file access times. To utilize the above two concepts, in this paper we develop a job scheduling policy, called hierarchical job scheduling strategy (HJSS), and a dynamic data replication strategy, called advanced dynamic hierarchical replication strategy (ADHRS), to improve the data access efficiencies in a hierarchical Data Grid. HJSS uses hierarchical scheduling to reduce the search time for an appropriate computing node. It considers network characteristics, number of jobs waiting in queue, file locations, and disk read speed of storage drive at data sources. Moreover, due to the limited storage capacity, a good replica replacement algorithm is needed. We present a novel replacement strategy which deletes files in two steps when free space is not enough for the new replica: first, it deletes those files with minimum time for transferring. Second, if space is still insufficient then it considers the last time the replica was requested, number of access, size of replica and file transfer time. The simulation results show that our proposed algorithm has better performance in comparison with other algorithms in terms of job execution time, number of intercommunications, number of replications, hit ratio, computing resource usage and storage usage.展开更多
The recent upsurge in metro construction emphasizes the necessity of understanding the mechanical performance of metro shield tunnel subjected to the influence of ground fissures.In this study,a largescale experiment,...The recent upsurge in metro construction emphasizes the necessity of understanding the mechanical performance of metro shield tunnel subjected to the influence of ground fissures.In this study,a largescale experiment,in combination with numerical simulation,was conducted to investigate the influence of ground fissures on a metro shield tunnel.The results indicate that the lining contact pressure at the vault increases in the hanging wall while decreases in the footwall,resulting in a two-dimensional stress state of vertical shear and axial tension-compression,and simultaneous vertical dislocation and axial tilt for the segments around the ground fissure.In addition,the damage to curved bolts includes tensile yield,flexural yield,and shear twist,leading to obvious concrete lining damage,particularly at the vault,arch bottom,and hance,indicating that the joints in these positions are weak areas.The shield tunnel orthogonal to the ground fissure ultimately experiences shear failure,suggesting that the maximum actual dislocation of ground fissure that the structure can withstand is approximately 20 cm,and five segment rings in the hanging wall and six segment rings in the footwall also need to be reinforced.This study could provide a reference for metro design in ground fissure sites.展开更多
Today, data is flowing into various organizations at an unprecedented scale. The ability to scale out for processing an enhanced workload has become an important factor for the proliferation and popularization of data...Today, data is flowing into various organizations at an unprecedented scale. The ability to scale out for processing an enhanced workload has become an important factor for the proliferation and popularization of database systems. Big data applications demand and consequently lead to the developments of diverse large-scale data management systems in different organizations, ranging from traditional database vendors to new emerging Internet-based enterprises. In this survey, we investigate, characterize, and analyze the large-scale data management systems in depth and develop comprehensive taxonomies for various critical aspects covering the data model, the system architecture, and the consistency model. We map the prevailing highly scalable data management systems to the proposed taxonomies, not only to classify the common techniques but also to provide a basis for analyzing current system scalability limitations. To overcome these limitations, we predicate and highlight the possible principles that future efforts need to be undertaken for the next generation large-scale data management systems.展开更多
How to effectively reduce the energy consumption of large-scale data centers is a key issue in cloud computing. This paper presents a novel low-power task scheduling algorithm (L3SA) for large-scale cloud data cente...How to effectively reduce the energy consumption of large-scale data centers is a key issue in cloud computing. This paper presents a novel low-power task scheduling algorithm (L3SA) for large-scale cloud data centers. The winner tree is introduced to make the data nodes as the leaf nodes of the tree and the final winner on the purpose of reducing energy consumption is selected. The complexity of large-scale cloud data centers is fully consider, and the task comparson coefficient is defined to make task scheduling strategy more reasonable. Experiments and performance analysis show that the proposed algorithm can effectively improve the node utilization, and reduce the overall power consumption of the cloud data center.展开更多
Protein-protein interactions are of great significance for human to understand the functional mechanisms of proteins.With the rapid development of high-throughput genomic technologies,massive protein-protein interacti...Protein-protein interactions are of great significance for human to understand the functional mechanisms of proteins.With the rapid development of high-throughput genomic technologies,massive protein-protein interaction(PPI)data have been generated,making it very difficult to analyze them efficiently.To address this problem,this paper presents a distributed framework by reimplementing one of state-of-the-art algorithms,i.e.,CoFex,using MapReduce.To do so,an in-depth analysis of its limitations is conducted from the perspectives of efficiency and memory consumption when applying it for large-scale PPI data analysis and prediction.Respective solutions are then devised to overcome these limitations.In particular,we adopt a novel tree-based data structure to reduce the heavy memory consumption caused by the huge sequence information of proteins.After that,its procedure is modified by following the MapReduce framework to take the prediction task distributively.A series of extensive experiments have been conducted to evaluate the performance of our framework in terms of both efficiency and accuracy.Experimental results well demonstrate that the proposed framework can considerably improve its computational efficiency by more than two orders of magnitude while retaining the same high accuracy.展开更多
文摘This paper proposes one method of feature selection by using Bayes' theorem. The purpose of the proposed method is to reduce the computational complexity and increase the classification accuracy of the selected feature subsets. The dependence between two attributes (binary) is determined based on the probabilities of their joint values that contribute to positive and negative classification decisions. If opposing sets of attribute values do not lead to opposing classification decisions (zero probability), then the two attributes are considered independent of each other, otherwise dependent, and one of them can be removed and thus the number of attributes is reduced. The process must be repeated on all combinations of attributes. The paper also evaluates the approach by comparing it with existing feature selection algorithms over 8 datasets from University of California, Irvine (UCI) machine learning databases. The proposed method shows better results in terms of number of selected features, classification accuracy, and running time than most existing algorithms.
基金supported by the European Community’s Seventh Framework Programme(No.338164)(ERC Starting Grant iHEARu)
文摘In this contribution, we present iHEARu-PLAY, an online, multi-player platform for crowdsourced database collection and labelling, including the voice analysis application (VoiLA), a free web-based speech classification tool designed to educate iHEARu-PLAY users about state-of-the-art speech analysis paradigms. Via this associated speech analysis web interface, in addition, VoiLA encourages users to take an active role in improving the service by providing labelled speech data. The platform allows users to record and upload voice samples directly from their browser, which are then analysed in a state-of-the-art classification pipeline. A set of pre-trained models targeting a range of speaker states and traits such as gender, valence, arousal, dominance, and 24 different discrete emotions is employed. The analysis results are visualised in a way that they are easily interpretable by laymen, giving users unique insights into how their voice sounds. We assess the effectiveness of iHEARu-PLAY and its integrated VoiLA feature via a series of user evaluations which indicate that it is fun and easy to use, and that it provides accurate and informative results.
基金This research has been partially supported by the national natural science foundation of China (51175169) and the national science and technology support program (2012BAF02B01).
文摘In the face of a growing number of large-scale data sets, affinity propagation clustering algorithm to calculate the process required to build the similarity matrix, will bring huge storage and computation. Therefore, this paper proposes an improved affinity propagation clustering algorithm. First, add the subtraction clustering, using the density value of the data points to obtain the point of initial clusters. Then, calculate the similarity distance between the initial cluster points, and reference the idea of semi-supervised clustering, adding pairs restriction information, structure sparse similarity matrix. Finally, the cluster representative points conduct AP clustering until a suitable cluster division.Experimental results show that the algorithm allows the calculation is greatly reduced, the similarity matrix storage capacity is also reduced, and better than the original algorithm on the clustering effect and processing speed.
基金funded by the 2026 School-level Sci-entific Research Project(wzdzrzd202611)the 2024 Anhui Province University Scientific Research Projects(2024AH052016,2024AH052017).
文摘Large-scale geodetic data acquisition is fundamental to infrastructure lifecycle management,construction quality control,urban digital twins,and hazard monitoring,yet conventional surveying workflows remain labor-intensive and difficult to scale in complex or hazardous environments.The industrial robot technology is proving to be an enabling technology in providing repeatable,high-throughput,and safety-conscious geodetic acquisition through its ability to offer controllable motion,stable sensor deployment,and autonomy coupled with perception stacks.The review itself is a synthesis of the recent studies on robot-based geodetic acquisition from the platform workflow application perspective.We summarize in the priority industrial robot platforms which have potential applications in geodesy,distinction being made between those based on autonomous mobile robots,mobile manipulators,fixed-base manipulators,cooperative multi-robot arrangements,and the design considerations underlying their construction:geometric stability,payload loading,and tightly constrained safety of operation.We then consider sensing configurations,principles of calibration and synchronization,as well as acquisition strategies that regulate the completeness of data and measurement consistency.The foundations of core processing are examined in light of georeferencing,registration,Simultaneous Localization and Mapping(SLAM)-based localization,and uncertainty propagation,which are essential to achieve survey-grade outputs.The evidence of application is discussed in the framework of infrastructure monitoring,construction,industrial facilities,urban/corridor mapping,mining,and indoor/underground settings,showing areas of obvious robotics advantage in repeatability and risk mitigation,as well as conditions of limitation because of the Global Navigation Satellite System(GNSS)denial,drift,calibration sensitivity,and inconsistent evaluation practices.Lastly,we determine research priorities such as benchmark datasets and metrics,accuracy-motivated autonomy,strong multisensor fusion with uncertainty results,and a closer association with Building Information Modeling(BIM)/digital twin pipelines.
基金supported by the Federal Ministry of Research,Technology and Space under the funding code“KI-Servicezentrum Berlin-Brandenburg”16IS22092.
文摘Optimizing GEneral Matrix Multiplication(GEMM)on GPU platforms is becoming increasingly critical to meet the growing computational demands of modern deep neural network research.While significant progress has been made in accelerating high-precision GEMM,the optimization of low-bit GEMM remains a challenging open problem.The CUTLASS library provides highly optimized low-bit GEMM templates leveraging Tensor Cores;however,performance varies considerably depending on tile and pipeline configurations across different GPU architectures.In this work,we propose a novel auto-tuning framework for low-bit CUTLASS GEMM,utilizing a neural network model to predict optimal GEMM template parameters for target GPUs.Our model is trained on a synthetic dataset with up to 116100 unique samples,encompassing diverse matrix sizes across various Ampere GPUs,and is thoroughly evaluated on these hardware platforms.Experimental results show that our method achieves an accuracy of up to 95.11%on the validation dataset.Furthermore,real-time evaluations of low-bit data types on the A100 GPU demonstrate speedups of up to 1.99×for GEMM operations and 1.28×for the linear layer,compared to the default CUTLASS templates.
基金supported by the China National Social Science Foundation(BHA220144).
文摘Systemarchitecture The Intelligent Teaching Team of the Shanghai Institute(Laboratory)of AI Education and the Institute of Curriculum and Instruction of East China Normal University collaborated to develop the High-Quality Classroom Intelligent Analysis Standard system.This system was measured from the dimensions of Class Eficiency,Equity and Democracy,referred to as CEED system.
基金Supported by the National Natural Science Foundation of China(Nos.52293472,22473096 and 22471164)。
文摘Among various architectures of polymers,end-group-free rings have attracted growing interests due to their distinct physicochemical performances over the linear counterparts which are exemplified by reduced hydrodynamic size and slower degradation.It is key to develop facile methods to large-scale synthesis of polymer rings with tunable compositions and microstructures.Recent progresses in large-scale synthesis of polymer rings against single-chain dynamic nanoparticles,and the example applications in synchronous enhancing toughness and strength of polymer nanocomposites are summarized.Once there is the breakthrough in rational design and effective large-scale synthesis of polymer rings and their functional derivatives,a family of cyclic functional hybrids would be available,thus providing a new paradigm in developing polymer science and engineering.
文摘Large-scale complex systems are integral to the functioning of various organizations within the national economy.Despite their significance,the lengthy construction cycles and the involvement of multiple entities often result in the deprioritization of standardized management practices,as they do not yield immediate benefits.The implementation of such systems typically encompasses the integrated phases of "development,construction,utiliz ation,and operation and maintenance".To enhance the overall delivery quality of these systems,it is imperative to dismantle the management barriers among these phases and adopt a holistic approach to standardized management.This paper takes a specific system project as a research object to identify common challenges,and proposes improvement strategies in the implementation of standar dized management.Empirical results indicate a substantial reduction in the system s full-lifecycle costs.
基金supported by the National Natural Science Foundation of China(Grant No.42275041)the Hainan Province Science and Technology Special Fund(Grant No.SOLZSKY2025006).
文摘Summer rainfall in the Yangtze River basin(YRB)is favored by two key factors in the lower troposphere:the tropical anticyclonic anomaly over the western North Pacific and the extratropical northeasterly anomalies to the north of the YRB.This study,however,found that approximately 46%of heavy rainfall events in the YRB occur when only one factor appears and the other is opposite signed.Accordingly,these heavy rainfall events can be categorized into two types:the extratropical northeasterly anomalies but tropical cyclonic anomaly(first unconventional type),and the tropical anticyclonic anomaly but extratropical southwesterly anomalies(second unconventional type).Anomalous water vapor convergence and upward motion exists for both types,but through different mechanisms.For the first type,the moisture convergence and upward motion are induced by a cyclonic anomaly over the YRB,which appears in the mid and lower troposphere and originates from the upstream region.For the second type,a mid-tropospheric cyclonic anomaly over Lake Baikal extends southward and results in southwesterly anomalies over the YRB,in conjunction with the tropical anticyclonic anomaly.The southwesterly anomalies transport water vapor to the YRB and lead to upward motion through warm advection.This study emphasizes the role of mid-tropospheric circulations in inducing heavy rainfall in the YRB.
基金supported by the Science Center Program of National Natural Science Foundation of China under Grant 62188101the National Natural Science Foundation of China under Grant 62573265.
文摘This study develops an event-triggered control strategy utilizing the fully actuated system approach for nonlinear interconnected large-scale systems containing actuator failures.First,to reduce the complexity of the design process,we transform the studied system into the form of a fully actuated system through a state transformation.Then,to address the unknown nonlinear functions and actuator fault parameters,we employ neural networks and adaptive estimation techniques,respectively.Moreover,to reduce the control cost and improve the control efficiency,we introduce event-triggered inputs into the control strategy.It is proved by the Lyapunov stability analysis that all signals of the closed-loop system are bounded and the output of system eventually converge to a bounded region.The efficacy of the control approach is ultimately demonstrated via the simulation of an actual machine feeding system.
基金supported by the National Natural Science Foundation of China(Grant No.52409151)the Programme of Shenzhen Key Laboratory of Green,Efficient and Intelligent Construction of Underground Metro Station(Programme No.ZDSYS20200923105200001)the Science and Technology Major Project of Xizang Autonomous Region of China(XZ202201ZD0003G).
文摘Substantial advancements have been achieved in Tunnel Boring Machine(TBM)technology and monitoring systems,yet the presence of missing data impedes accurate analysis and interpretation of TBM monitoring results.This study aims to investigate the issue of missing data in extensive TBM datasets.Through a comprehensive literature review,we analyze the mechanism of missing TBM data and compare different imputation methods,including statistical analysis and machine learning algorithms.We also examine the impact of various missing patterns and rates on the efficacy of these methods.Finally,we propose a dynamic interpolation strategy tailored for TBM engineering sites.The research results show that K-Nearest Neighbors(KNN)and Random Forest(RF)algorithms can achieve good interpolation results;As the missing rate increases,the interpolation effect of different methods will decrease;The interpolation effect of block missing is poor,followed by mixed missing,and the interpolation effect of sporadic missing is the best.On-site application results validate the proposed interpolation strategy's capability to achieve robust missing value interpolation effects,applicable in ML scenarios such as parameter optimization,attitude warning,and pressure prediction.These findings contribute to enhancing the efficiency of TBM missing data processing,offering more effective support for large-scale TBM monitoring datasets.
文摘Social media data created a paradigm shift in assessing situational awareness during a natural disaster or emergencies such as wildfire, hurricane, tropical storm etc. Twitter as an emerging data source is an effective and innovative digital platform to observe trend from social media users’ perspective who are direct or indirect witnesses of the calamitous event. This paper aims to collect and analyze twitter data related to the recent wildfire in California to perform a trend analysis by classifying firsthand and credible information from Twitter users. This work investigates tweets on the recent wildfire in California and classifies them based on witnesses into two types: 1) direct witnesses and 2) indirect witnesses. The collected and analyzed information can be useful for law enforcement agencies and humanitarian organizations for communication and verification of the situational awareness during wildfire hazards. Trend analysis is an aggregated approach that includes sentimental analysis and topic modeling performed through domain-expert manual annotation and machine learning. Trend analysis ultimately builds a fine-grained analysis to assess evacuation routes and provide valuable information to the firsthand emergency responders<span style="font-family:Verdana;">.</span>
基金supported by the National Natural Science Foundation of China(grant number 72074214).
文摘Both computer science and archival science are concerned with archiving large-scale data,but they have different focuses.Large-scale data archiving in computer science focuses on technical aspects that can reduce the cost of data storage and improve the reliability and efficiency of Big Data management.Its weaknesses lie in inadequate and non-standardized management.Archiving in archival science focuses on the management aspects and neglects the necessary technical considerations,resulting in high storage and retention costs and poor ability to manage Big Data.Therefore,the integration of large-scale data archiving and archival theory can balance the existing research limitations of the two fields and propose two research topics for related research-archival management of Big Data and large-scale management of archived Big Data.
基金the National Natural Science Foundation of China under Grant Nos.61732003,61832003,61972110 and U19A2059the National Key Research and Development Program of China under Grant No.2019YFB2101902the CCF-Baidu Open Fund CCF-BAIDU under Grant No.OF2021011.
文摘Kernel is a kind of data summary which is elaborately extracted from a large dataset.Given a problem,the solution obtained from the kernel is an approximate version of the solution obtained from the whole dataset with a provable approximate ratio.It is widely used in geometric optimization,clustering,and approximate query processing,etc.,for scaling them up to massive data.In this paper,we focus on the minimumε-kernel(MK)computation that asks for a kernel of the smallest size for large-scale data processing.For the open problem presented by Wang et al.that whether the minimumε-coreset(MC)problem and the MK problem can be reduced to each other,we first formalize the MK problem and analyze its complexity.Due to the NP-hardness of the MK problem in three or higher dimensions,an approximate algorithm,namely Set Cover-Based Minimumε-Kernel algorithm(SCMK),is developed to solve it.We prove that the MC problem and the MK problem can be Turing-reduced to each other.Then,we discuss the update of MK under insertion and deletion operations,respectively.Finally,a randomized algorithm,called the Randomized Algorithm of Set Cover-Based Minimumε-Kernel algorithm(RA-SCMK),is utilized to further reduce the complexity of SCMK.The efficiency and effectiveness of SCMK and RA-SCMK are verified by experimental results on real-world and synthetic datasets.Experiments show that the kernel sizes of SCMK are 2x and 17.6x smaller than those of an ANN-based method on real-world and synthetic datasets,respectively.The speedup ratio of SCMK over the ANN-based method is 5.67 on synthetic datasets.RA-SCMK runs up to three times faster than SCMK on synthetic datasets.
基金supported by the National Key Research and Development Program of China(2022YFE0206700)。
文摘1.Introduction Climate change mitigation pathways aimed at limiting global anthropogenic carbon dioxide(CO_(2))emissions while striving to constrain the global temperature increase to below 2℃—as outlined by the Intergovernmental Panel on Climate Change(IPCC)—consistently predict the widespread implementation of CO_(2)geological storage on a global scale.
文摘Data Grid integrates graphically distributed resources for solving data intensive scientific applications. Effective scheduling in Grid can reduce the amount of data transferred among nodes by submitting a job to a node, where most of the requested data files are available. Scheduling is a traditional problem in parallel and distributed system. However, due to special issues and goals of Grid, traditional approach is not effective in this environment any more. Therefore, it is necessary to propose methods specialized for this kind of parallel and distributed system. Another solution is to use a data replication strategy to create multiple copies of files and store them in convenient locations to shorten file access times. To utilize the above two concepts, in this paper we develop a job scheduling policy, called hierarchical job scheduling strategy (HJSS), and a dynamic data replication strategy, called advanced dynamic hierarchical replication strategy (ADHRS), to improve the data access efficiencies in a hierarchical Data Grid. HJSS uses hierarchical scheduling to reduce the search time for an appropriate computing node. It considers network characteristics, number of jobs waiting in queue, file locations, and disk read speed of storage drive at data sources. Moreover, due to the limited storage capacity, a good replica replacement algorithm is needed. We present a novel replacement strategy which deletes files in two steps when free space is not enough for the new replica: first, it deletes those files with minimum time for transferring. Second, if space is still insufficient then it considers the last time the replica was requested, number of access, size of replica and file transfer time. The simulation results show that our proposed algorithm has better performance in comparison with other algorithms in terms of job execution time, number of intercommunications, number of replications, hit ratio, computing resource usage and storage usage.
基金supported by the National Key Research&Development Program of China(Grant No.2023YFC3008404)the Key Laboratory of Earth Fissures Geological Disaster,Ministry of Natural Resources,China(Grant Nos.EFGD20240609 and EFGD20240610).
文摘The recent upsurge in metro construction emphasizes the necessity of understanding the mechanical performance of metro shield tunnel subjected to the influence of ground fissures.In this study,a largescale experiment,in combination with numerical simulation,was conducted to investigate the influence of ground fissures on a metro shield tunnel.The results indicate that the lining contact pressure at the vault increases in the hanging wall while decreases in the footwall,resulting in a two-dimensional stress state of vertical shear and axial tension-compression,and simultaneous vertical dislocation and axial tilt for the segments around the ground fissure.In addition,the damage to curved bolts includes tensile yield,flexural yield,and shear twist,leading to obvious concrete lining damage,particularly at the vault,arch bottom,and hance,indicating that the joints in these positions are weak areas.The shield tunnel orthogonal to the ground fissure ultimately experiences shear failure,suggesting that the maximum actual dislocation of ground fissure that the structure can withstand is approximately 20 cm,and five segment rings in the hanging wall and six segment rings in the footwall also need to be reinforced.This study could provide a reference for metro design in ground fissure sites.
文摘Today, data is flowing into various organizations at an unprecedented scale. The ability to scale out for processing an enhanced workload has become an important factor for the proliferation and popularization of database systems. Big data applications demand and consequently lead to the developments of diverse large-scale data management systems in different organizations, ranging from traditional database vendors to new emerging Internet-based enterprises. In this survey, we investigate, characterize, and analyze the large-scale data management systems in depth and develop comprehensive taxonomies for various critical aspects covering the data model, the system architecture, and the consistency model. We map the prevailing highly scalable data management systems to the proposed taxonomies, not only to classify the common techniques but also to provide a basis for analyzing current system scalability limitations. To overcome these limitations, we predicate and highlight the possible principles that future efforts need to be undertaken for the next generation large-scale data management systems.
基金supported by the National Natural Science Foundation of China(6120200461272084)+9 种基金the National Key Basic Research Program of China(973 Program)(2011CB302903)the Specialized Research Fund for the Doctoral Program of Higher Education(2009322312000120113223110003)the China Postdoctoral Science Foundation Funded Project(2011M5000952012T50514)the Natural Science Foundation of Jiangsu Province(BK2011754BK2009426)the Jiangsu Postdoctoral Science Foundation Funded Project(1102103C)the Natural Science Fund of Higher Education of Jiangsu Province(12KJB520007)the Project Funded by the Priority Academic Program Development of Jiangsu Higher Education Institutions(yx002001)
文摘How to effectively reduce the energy consumption of large-scale data centers is a key issue in cloud computing. This paper presents a novel low-power task scheduling algorithm (L3SA) for large-scale cloud data centers. The winner tree is introduced to make the data nodes as the leaf nodes of the tree and the final winner on the purpose of reducing energy consumption is selected. The complexity of large-scale cloud data centers is fully consider, and the task comparson coefficient is defined to make task scheduling strategy more reasonable. Experiments and performance analysis show that the proposed algorithm can effectively improve the node utilization, and reduce the overall power consumption of the cloud data center.
基金This work was supported in part by the National Natural Science Foundation of China(61772493)the CAAI-Huawei MindSpore Open Fund(CAAIXSJLJJ-2020-004B)+4 种基金the Natural Science Foundation of Chongqing(China)(cstc2019jcyjjqX0013)Chongqing Research Program of Technology Innovation and Application(cstc2019jscx-fxydX0024,cstc2019jscx-fxydX0027,cstc2018jszx-cyzdX0041)Guangdong Province Universities and College Pearl River Scholar Funded Scheme(2019)the Pioneer Hundred Talents Program of Chinese Academy of Sciencesthe Deanship of Scientific Research(DSR)at King Abdulaziz University(G-21-135-38).
文摘Protein-protein interactions are of great significance for human to understand the functional mechanisms of proteins.With the rapid development of high-throughput genomic technologies,massive protein-protein interaction(PPI)data have been generated,making it very difficult to analyze them efficiently.To address this problem,this paper presents a distributed framework by reimplementing one of state-of-the-art algorithms,i.e.,CoFex,using MapReduce.To do so,an in-depth analysis of its limitations is conducted from the perspectives of efficiency and memory consumption when applying it for large-scale PPI data analysis and prediction.Respective solutions are then devised to overcome these limitations.In particular,we adopt a novel tree-based data structure to reduce the heavy memory consumption caused by the huge sequence information of proteins.After that,its procedure is modified by following the MapReduce framework to take the prediction task distributively.A series of extensive experiments have been conducted to evaluate the performance of our framework in terms of both efficiency and accuracy.Experimental results well demonstrate that the proposed framework can considerably improve its computational efficiency by more than two orders of magnitude while retaining the same high accuracy.