期刊文献+
共找到1,075篇文章
< 1 2 54 >
每页显示 20 50 100
Data cleaning method for the process of acid production with flue gas based on improved random forest 被引量:3
1
作者 Xiaoli Li Minghua Liu +2 位作者 Kang Wang Zhiqiang Liu Guihai Li 《Chinese Journal of Chemical Engineering》 SCIE EI CAS CSCD 2023年第7期72-84,共13页
Acid production with flue gas is a complex nonlinear process with multiple variables and strong coupling.The operation data is an important basis for state monitoring,optimal control,and fault diagnosis.However,the op... Acid production with flue gas is a complex nonlinear process with multiple variables and strong coupling.The operation data is an important basis for state monitoring,optimal control,and fault diagnosis.However,the operating environment of acid production with flue gas is complex and there is much equipment.The data obtained by the detection equipment is seriously polluted and prone to abnormal phenomena such as data loss and outliers.Therefore,to solve the problem of abnormal data in the process of acid production with flue gas,a data cleaning method based on improved random forest is proposed.Firstly,an outlier data recognition model based on isolation forest is designed to identify and eliminate the outliers in the dataset.Secondly,an improved random forest regression model is established.Genetic algorithm is used to optimize the hyperparameters of the random forest regression model.Then the optimal parameter combination is found in the search space and the trend of data is predicted.Finally,the improved random forest data cleaning method is used to compensate for the missing data after eliminating abnormal data and the data cleaning is realized.Results show that the proposed method can accurately eliminate and compensate for the abnormal data in the process of acid production with flue gas.The method improves the accuracy of compensation for missing data.With the data after cleaning,a more accurate model can be established,which is significant to the subsequent temperature control.The conversion rate of SO_(2) can be further improved,thereby improving the yield of sulfuric acid and economic benefits. 展开更多
关键词 Acid production data cleaning Isolation forest Random forest data compensation
在线阅读 下载PDF
A Review of Data Cleaning Methods for Web Information System 被引量:1
2
作者 Jinlin Wang Xing Wang +2 位作者 Yuchen Yang Hongli Zhang Binxing Fang 《Computers, Materials & Continua》 SCIE EI 2020年第3期1053-1075,共23页
Web information system(WIS)is frequently-used and indispensable in daily social life.WIS provides information services in many scenarios,such as electronic commerce,communities,and edutainment.Data cleaning plays an e... Web information system(WIS)is frequently-used and indispensable in daily social life.WIS provides information services in many scenarios,such as electronic commerce,communities,and edutainment.Data cleaning plays an essential role in various WIS scenarios to improve the quality of data service.In this paper,we present a review of the state-of-the-art methods for data cleaning in WIS.According to the characteristics of data cleaning,we extract the critical elements of WIS,such as interactive objects,application scenarios,and core technology,to classify the existing works.Then,after elaborating and analyzing each category,we summarize the descriptions and challenges of data cleaning methods with sub-elements such as data&user interaction,data quality rule,model,crowdsourcing,and privacy preservation.Finally,we analyze various types of problems and provide suggestions for future research on data cleaning in WIS from the technology and interactive perspective. 展开更多
关键词 data cleaning web information system data quality rule crowdsourcing privacy preservation
在线阅读 下载PDF
Data Cleaning Based on Stacked Denoising Autoencoders and Multi-Sensor Collaborations 被引量:1
3
作者 Xiangmao Chang Yuan Qiu +1 位作者 Shangting Su Deliang Yang 《Computers, Materials & Continua》 SCIE EI 2020年第5期691-703,共13页
Wireless sensor networks are increasingly used in sensitive event monitoring.However,various abnormal data generated by sensors greatly decrease the accuracy of the event detection.Although many methods have been prop... Wireless sensor networks are increasingly used in sensitive event monitoring.However,various abnormal data generated by sensors greatly decrease the accuracy of the event detection.Although many methods have been proposed to deal with the abnormal data,they generally detect and/or repair all abnormal data without further differentiate.Actually,besides the abnormal data caused by events,it is well known that sensor nodes prone to generate abnormal data due to factors such as sensor hardware drawbacks and random effects of external sources.Dealing with all abnormal data without differentiate will result in false detection or missed detection of the events.In this paper,we propose a data cleaning approach based on Stacked Denoising Autoencoders(SDAE)and multi-sensor collaborations.We detect all abnormal data by SDAE,then differentiate the abnormal data by multi-sensor collaborations.The abnormal data caused by events are unchanged,while the abnormal data caused by other factors are repaired.Real data based simulations show the efficiency of the proposed approach. 展开更多
关键词 data cleaning wireless sensor networks stacked denoising autoencoders multi-sensor collaborations
在线阅读 下载PDF
An Improvement of Data Cleaning Method for Grain Big Data Processing Using Task Merging 被引量:1
4
作者 Feiyu Lian Maixia Fu Xingang Ju 《Journal of Computer and Communications》 2020年第3期1-19,共19页
Data quality has exerted important influence over the application of grain big data, so data cleaning is a necessary and important work. In MapReduce frame, parallel technique is often used to execute data cleaning in... Data quality has exerted important influence over the application of grain big data, so data cleaning is a necessary and important work. In MapReduce frame, parallel technique is often used to execute data cleaning in high scalability mode, but due to the lack of effective design, there are amounts of computing redundancy in the process of data cleaning, which results in lower performance. In this research, we found that some tasks often are carried out multiple times on same input files, or require same operation results in the process of data cleaning. For this problem, we proposed a new optimization technique that is based on task merge. By merging simple or redundancy computations on same input files, the number of the loop computation in MapReduce can be reduced greatly. The experiment shows, by this means, the overall system runtime is significantly reduced, which proves that the process of data cleaning is optimized. In this paper, we optimized several modules of data cleaning such as entity identification, inconsistent data restoration, and missing value filling. Experimental results show that the proposed method in this paper can increase efficiency for grain big data cleaning. 展开更多
关键词 GRAIN BIG data data cleaning TASK MERGING Hadoop MAPREDUCE
在线阅读 下载PDF
A Rule Management System for Knowledge Based Data Cleaning
5
作者 Louardi BRADJI Mahmoud BOUFAIDA 《Intelligent Information Management》 2011年第6期230-239,共10页
In this paper, we propose a rule management system for data cleaning that is based on knowledge. This system combines features of both rule based systems and rule based data cleaning frameworks. The important advantag... In this paper, we propose a rule management system for data cleaning that is based on knowledge. This system combines features of both rule based systems and rule based data cleaning frameworks. The important advantages of our system are threefold. First, it aims at proposing a strong and unified rule form based on first order structure that permits the representation and management of all the types of rules and their quality via some characteristics. Second, it leads to increase the quality of rules which conditions the quality of data cleaning. Third, it uses an appropriate knowledge acquisition process, which is the weakest task in the current rule and knowledge based systems. As several research works have shown that data cleaning is rather driven by domain knowledge than by data, we have identified and analyzed the properties that distinguish knowledge and rules from data for better determining the most components of the proposed system. In order to illustrate our system, we also present a first experiment with a case study at health sector where we demonstrate how the system is useful for the improvement of data quality. The autonomy, extensibility and platform-independency of the proposed rule management system facilitate its incorporation in any system that is interested in data quality management. 展开更多
关键词 RULE data Quality data cleanING KNOWLEDGE RULE Management SYSTEM RULE Based SYSTEM Structure
暂未订购
Big Data Cleaning Based on Improved CLOF and Random Forest for Distribution Networks 被引量:1
6
作者 Jie Liu Yijia Cao +2 位作者 Yong Li Yixiu Guo Wei Deng 《CSEE Journal of Power and Energy Systems》 SCIE EI CSCD 2024年第6期2528-2538,共11页
In order to improve the data quality,the big data cleaning method for distribution networks is studied in this paper.First,the Local Outlier Factor(LOF)algorithm based on DBSCAN clustering is used to detect outliers.H... In order to improve the data quality,the big data cleaning method for distribution networks is studied in this paper.First,the Local Outlier Factor(LOF)algorithm based on DBSCAN clustering is used to detect outliers.However,due to the difficulty in determining the LOF threshold,a method of dynamically calculating the threshold based on the transformer districts and time is proposed.In addition,the LOF algorithm combines the statistical distribution method to reduce the misjudgment rate.Aiming at the diversity and complexity of data missing forms in power big data,this paper has improved the Random Forest imputation algorithm,which can be applied to various forms of missing data,especially the blocked missing data and even some completely missing horizontal or vertical data.The data in this paper are from real data of 44 transformer districts of a certain 10 kV line in a distribution network.Experimental results show that outlier detection is accurate and suitable for any shape and multidimensional power big data.The improved Random Forest imputation algorithm is suitable for all missing forms,with higher imputation accuracy and better model stability.By comparing the network loss prediction between the data using this data cleaning method and the data removing outliers and missing values,it can be found that the accuracy of network loss prediction has improved by nearly 4%using the data cleaning method identified in this paper.Additionally,as the proportion of bad data increased,the difference between the prediction accuracy of cleaned data and that of uncleaned data is more significant. 展开更多
关键词 data cleaning DBSCAN LOF missing data imputation outliers detection Random Forest
原文传递
A method for cleaning wind power anomaly data by combining image processing with community detection algorithms
7
作者 Qiaoling Yang Kai Chen +2 位作者 Jianzhang Man Jiaheng Duan Zuoqi Jin 《Global Energy Interconnection》 EI CSCD 2024年第3期293-312,共20页
Current methodologies for cleaning wind power anomaly data exhibit limited capabilities in identifying abnormal data within extensive datasets and struggle to accommodate the considerable variability and intricacy of ... Current methodologies for cleaning wind power anomaly data exhibit limited capabilities in identifying abnormal data within extensive datasets and struggle to accommodate the considerable variability and intricacy of wind farm data.Consequently,a method for cleaning wind power anomaly data by combining image processing with community detection algorithms(CWPAD-IPCDA)is proposed.To precisely identify and initially clean anomalous data,wind power curve(WPC)images are converted into graph structures,which employ the Louvain community recognition algorithm and graph-theoretic methods for community detection and segmentation.Furthermore,the mathematical morphology operation(MMO)determines the main part of the initially cleaned wind power curve images and maps them back to the normal wind power points to complete the final cleaning.The CWPAD-IPCDA method was applied to clean datasets from 25 wind turbines(WTs)in two wind farms in northwest China to validate its feasibility.A comparison was conducted using density-based spatial clustering of applications with noise(DBSCAN)algorithm,an improved isolation forest algorithm,and an image-based(IB)algorithm.The experimental results demonstrate that the CWPAD-IPCDA method surpasses the other three algorithms,achieving an approximately 7.23%higher average data cleaning rate.The mean value of the sum of the squared errors(SSE)of the dataset after cleaning is approximately 6.887 lower than that of the other algorithms.Moreover,the mean of overall accuracy,as measured by the F1-score,exceeds that of the other methods by approximately 10.49%;this indicates that the CWPAD-IPCDA method is more conducive to improving the accuracy and reliability of wind power curve modeling and wind farm power forecasting. 展开更多
关键词 Wind turbine power curve Abnormal data cleaning Community detection Louvain algorithm Mathematical morphology operation
在线阅读 下载PDF
Cleaning of Multi-Source Uncertain Time Series Data Based on PageRank
8
作者 高嘉伟 孙纪舟 《Journal of Donghua University(English Edition)》 CAS 2023年第6期695-700,共6页
There are errors in multi-source uncertain time series data.Truth discovery methods for time series data are effective in finding more accurate values,but some have limitations in their usability.To tackle this challe... There are errors in multi-source uncertain time series data.Truth discovery methods for time series data are effective in finding more accurate values,but some have limitations in their usability.To tackle this challenge,we propose a new and convenient truth discovery method to handle time series data.A more accurate sample is closer to the truth and,consequently,to other accurate samples.Because the mutual-confirm relationship between sensors is very similar to the mutual-quote relationship between web pages,we evaluate sensor reliability based on PageRank and then estimate the truth by sensor reliability.Therefore,this method does not rely on smoothness assumptions or prior knowledge of the data.Finally,we validate the effectiveness and efficiency of the proposed method on real-world and synthetic data sets,respectively. 展开更多
关键词 big data data cleaning time series truth discovery PAGERANK
在线阅读 下载PDF
IoT data cleaning techniques: A survey
9
作者 Xiaoou Ding Hongzhi Wang +3 位作者 Genglong Li Haoxuan Li Yingze Li Yida Liu 《Intelligent and Converged Networks》 EI 2022年第4期325-339,共15页
Data cleaning is considered as an effective approach of improving data quality in order to help practitioners and researchers be devoted to downstream analysis and decision-making without worrying about data trustwort... Data cleaning is considered as an effective approach of improving data quality in order to help practitioners and researchers be devoted to downstream analysis and decision-making without worrying about data trustworthiness.This paper provides a systematic summary of the two main stages of data cleaning for Internet of Things(IoT)data with time series characteristics,including error data detection and data repairing.In respect to error data detection techniques,it categorizes an overview of quantitative data error detection methods for detecting single-point errors,continuous errors,and multidimensional time series data errors and qualitative data error detection methods for detecting rule-violating errors.Besides,it provides a detailed description of error data repairing techniques,involving statistics-based repairing,rule-based repairing,and human-involved repairing.We review the strengths and the limitations of the current data cleaning techniques under IoT data applications and conclude with an outlook on the future of IoT data cleaning. 展开更多
关键词 Internet of Things(IoT) data quality data cleaning error detection data repairing
原文传递
Data Cleaning About Student Information Based on Massive Open Online Course System
10
作者 Shengjun Yin Yaling Yi Hongzhi Wang 《国际计算机前沿大会会议论文集》 2020年第1期33-43,共11页
Recently,Massive Open Online Courses(MOOCs)is a major way of online learning for millions of people around the world,which generates a large amount of data in the meantime.However,due to errors produced from collectin... Recently,Massive Open Online Courses(MOOCs)is a major way of online learning for millions of people around the world,which generates a large amount of data in the meantime.However,due to errors produced from collecting,system,and so on,these data have various inconsistencies and missing values.In order to support accurate analysis,this paper studies the data cleaning technology for online open curriculum system,including missing value-time filling for time series,and rulebased input error correction.The data cleaning algorithm designed in this paper is divided into six parts:pre-processing,missing data processing,format and content error processing,logical error processing,irrelevant data processing and correlation analysis.This paper designs and implements missing-value-filling algorithm based on time series in the missing data processing part.According to the large number of descriptive variables existing in the format and content error processing module,it proposed one-based and separability-based criteria Hot+J3+PCA.The online course data cleaning algorithm was analyzed in detail on algorithm design,implementation and testing.After a lot of rigorous testing,the function of each module performs normally,and the cleaning performance of the algorithm is of expectation. 展开更多
关键词 MOOC data cleaning Time series Intermittent missing Dimension reduction
原文传递
数据调试综述
11
作者 李晨阳 马超红 孟小峰 《计算机研究与发展》 北大核心 2026年第1期41-65,共25页
人工智能的蓬勃发展,对医疗健康、生物信息、金融服务等各领域产生深远影响。人工智能应用的主要范式是构建机器学习模型,探索数据中的规则和模式,以用于推理和决策。人工智能系统的有效性和效率取决于2个关键方面:其一是模型方面(以模... 人工智能的蓬勃发展,对医疗健康、生物信息、金融服务等各领域产生深远影响。人工智能应用的主要范式是构建机器学习模型,探索数据中的规则和模式,以用于推理和决策。人工智能系统的有效性和效率取决于2个关键方面:其一是模型方面(以模型为中心),包括增强网络结构,如RNN到LSTM的转变、模型超参数调优等;其二是数据方面(以数据为中心),如标准化数据格式、增大数据量、减少数据噪声等。一直以来,调试人工智能系统主要侧重于优化模型。然而,以社交网络和电子商务为代表的数字化时代的到来产生庞大且多样的数据,使得以模型为中心的调试已无法满足人们对人工智能系统的需求。因此,研究界和工业界将注意力从模型转向数据,以弥补这一差距。为此,“数据调试”(data debugging)应运而生。与优化模型不同,数据调试侧重检查数据,即理解错误数据在机器学习管道的各阶段对下游任务的影响,进而调试相应错误以提高模型性能。基于此,在全面调研数据调试相关工作的基础上,首先,提出数据调试研究框架,根据数据调试方法与机器学习管道的交互,将现有方法分为封闭式数据调试、浸入式数据调试和混合式数据调试3类。接着,详细概述本领域的相关工作。然后,对数据调试方法进行实验评估,同时总结该研究领域常用的数据集和评价指标。最后,指出数据调试面临的挑战及未来发展方向。 展开更多
关键词 数据调试 机器学习 数据质量 数据清洗 人工智能
在线阅读 下载PDF
Can Automatic Classification Help to Increase Accuracy in Data Collection?
12
作者 Frederique Lang Diego Chavarro Yuxian Liu 《Journal of Data and Information Science》 2016年第3期42-58,共17页
Purpose: The authors aim at testing the performance of a set of machine learning algorithms that could improve the process of data cleaning when building datasets. Design/methodology/approach: The paper is centered ... Purpose: The authors aim at testing the performance of a set of machine learning algorithms that could improve the process of data cleaning when building datasets. Design/methodology/approach: The paper is centered on cleaning datasets gathered from publishers and online resources by the use of specific keywords. In this case, we analyzed data from the Web of Science. The accuracy of various forms of automatic classification was tested here in comparison with manual coding in order to determine their usefulness for data collection and cleaning. We assessed the performance of seven supervised classification algorithms (Support Vector Machine (SVM), Scaled Linear Discriminant Analysis, Lasso and elastic-net regularized generalized linear models, Maximum Entropy, Regression Tree, Boosting, and Random Forest) and analyzed two properties: accuracy and recall. We assessed not only each algorithm individually, but also their combinations through a voting scheme. We also tested the performance of these algorithms with different sizes of training data. When assessing the performance of different combinations, we used an indicator of coverage to account for the agreement and disagreement on classification between algorithms. Findings: We found that the performance of the algorithms used vary with the size of the sample for training. However, for the classification exercise in this paper the best performing algorithms were SVM and Boosting. The combination of these two algorithms achieved a high agreement on coverage and was highly accurate. This combination performs well with a small training dataset (10%), which may reduce the manual work needed for classification tasks. Research limitations: The dataset gathered has significantly more records related to the topic of interest compared to unrelated topics. This may affect the performance of some algorithms, especially in their identification of unrelated papers. Practical implications: Although the classification achieved by this means is not completely accurate, the amount of manual coding needed can be greatly reduced by using classification algorithms. This can be of great help when the dataset is big. With the help of accuracy, recall,and coverage measures, it is possible to have an estimation of the error involved in this classification, which could open the possibility of incorporating the use of these algorithms in software specifically designed for data cleaning and classification. 展开更多
关键词 DISAMBIGUATION Machine leaming data cleaning Classification ACCURACY RECALL COVERAGE
在线阅读 下载PDF
Intelligent Data Pre-processing Model in Integrated Ocean Observing Network System
13
作者 韩华 丁永生 刘凤鸣 《Journal of Donghua University(English Edition)》 EI CAS 2009年第5期499-502,共4页
There are a number of dirty data in observation data set derived from integrated ocean observing network system. Thus, the data must be carefully and reasonably processed before they are used for forecasting or analys... There are a number of dirty data in observation data set derived from integrated ocean observing network system. Thus, the data must be carefully and reasonably processed before they are used for forecasting or analysis. This paper proposes a data pre-processing model based on intelligent algorithms. Firstly, we introduce the integrated network platform of ocean observation. Next, the preprocessing model of data is presemed, and an imelligent cleaning model of data is proposed. Based on fuzzy clustering, the Kohonen clustering network is improved to fulfill the parallel calculation of fuzzy c-means clustering. The proposed dynamic algorithm can automatically f'md the new clustering center with the updated sample data. The rapid and dynamic performance of the model makes it suitable for real time calculation, and the efficiency and accuracy of the model is proved by test results through observation data analysis. 展开更多
关键词 integrated ocean observing network intelligentdata pre-processing data cleaning fuzzy soft clustering
在线阅读 下载PDF
Improve Data Quality by Processing Null Values and Semantic Dependencies
14
作者 Houda Zaidi Faouzi Boufarès Yann Pollet 《Journal of Computer and Communications》 2016年第5期78-85,共8页
Today, the quantity of data continues to increase, furthermore, the data are heterogeneous, from multiple sources (structured, semi-structured and unstructured) and with different levels of quality. Therefore, it is v... Today, the quantity of data continues to increase, furthermore, the data are heterogeneous, from multiple sources (structured, semi-structured and unstructured) and with different levels of quality. Therefore, it is very likely to manipulate data without knowledge about their structures and their semantics. In fact, the meta-data may be insufficient or totally absent. Data Anomalies may be due to the poverty of their semantic descriptions, or even the absence of their description. In this paper, we propose an approach to better understand the semantics and the structure of the data. Our approach helps to correct automatically the intra-column anomalies and the inter-col- umns ones. We aim to improve the quality of data by processing the null values and the semantic dependencies between columns. 展开更多
关键词 data Quality Big data Contextual Semantics Semantic Dependencies Functional Dependencies Null Values data cleaning
在线阅读 下载PDF
不动产存量数据深度清理平台设计与实现
15
作者 陈旭帅 周翔 +1 位作者 李程春 吴希欢 《工程勘察》 2026年第1期46-49,61,共5页
不动产存量数据历史遗留问题多、数据量庞大,深度数据清理是一项复杂、耗时长、涉及工作人员多的工作。为确保该项工作的有序开展,本文设计并研发出不动产存量数据深度清理平台,该平台采用“离线+在线”的模式,实现流程化、痕迹化的高... 不动产存量数据历史遗留问题多、数据量庞大,深度数据清理是一项复杂、耗时长、涉及工作人员多的工作。为确保该项工作的有序开展,本文设计并研发出不动产存量数据深度清理平台,该平台采用“离线+在线”的模式,实现流程化、痕迹化的高效数据清理和项目实施全流程管控,又不影响日常登记业务办理,在长沙市不动产存量数据全面清理项目中取得了很好的应用效果。 展开更多
关键词 不动产 存量数据 深度清理 离线+在线
原文传递
Research on big data applications in Global Energy Interconnection 被引量:9
16
作者 Dongxia Zhang Robert Caiming Qiu 《Global Energy Interconnection》 2018年第3期352-357,共6页
Construction of Global Energy Interconnection(GEI) is regarded as an effective way to utilize clean energy and it has been a hot research topic in recent years. As one of the enabling technologies for GEI, big data is... Construction of Global Energy Interconnection(GEI) is regarded as an effective way to utilize clean energy and it has been a hot research topic in recent years. As one of the enabling technologies for GEI, big data is accompanied with the sharing, fusion and comprehensive application of energy related data all over the world. The paper analyzes the technology innovation direction of GEI and the advantages of big data technologies in supporting GEI development, and then gives some typical application scenarios to illustrate the application value of big data. Finally, the architecture for applying random matrix theory in GEI is presented. 展开更多
关键词 Global Energy Interconnection Big data clean energy Random matrix theory
在线阅读 下载PDF
基于滑动窗口和斜率特征的振弦式传感器数据清洗方法 被引量:1
17
作者 陈建勋 陈辉 +3 位作者 罗彦斌 罗华 陈浩 李昌鹏 《中国公路学报》 北大核心 2025年第5期134-145,共12页
隧道结构健康监测自诊断和状态评估都是建立在数据分析的基础上,但传感器所获取的数据不可避免地会出现诸多异常数据。这些异常数据不仅包含非结构性因素引起的干扰信息,还包括结构性因素引起的损伤信息。如何提取淹没在异常数据中的有... 隧道结构健康监测自诊断和状态评估都是建立在数据分析的基础上,但传感器所获取的数据不可避免地会出现诸多异常数据。这些异常数据不仅包含非结构性因素引起的干扰信息,还包括结构性因素引起的损伤信息。如何提取淹没在异常数据中的有用损伤信息,并剔除无用的干扰信息成为了重点。根据异常数据变化率的不稳定、不连续性,提出一种滑动斜率异常检测和数据重构法。首先,采用滑动窗口对数据进行动态分段处理,再对各窗口内数据进行最小二乘线性拟合,得到斜率和截距向量;其次,根据斜率的方差和拟合优度设置阈值,检测并剔除离散程度较大、拟合程度较差的斜率和截距值;最后,利用回归计算和中位数法进行数据重构。基于钢筋混凝土试件损伤试验数据和现场监测的异常数据,对所提方法与传统3σ法、滑动中值滤波、小波变换和经验模态分解法的应用效果进行对比。结果表明:所提方法能够有效清洗其他方法难以处理的增益和偏移数据,能有效识别和清洗数据的异常趋势,同时不破坏原有的结构性损伤数据,保证了监测数据的质量,能够满足实际应用的需要。 展开更多
关键词 隧道工程 时间序列数据 数据清洗 滑动斜率检测法 振弦式传感器 滑动窗口 损伤试验
原文传递
环境健康风险评估数据清洗框架研究 被引量:1
18
作者 刘悦 郝舒欣 +1 位作者 刘婕 徐东群 《环境卫生学杂志》 2025年第10期878-884,907,共8页
目的提出并规范环境健康风险评估数据清洗的工作流程和操作步骤、选择合适的处理方法,提高数据清洗工作的效率。方法通过检索国内外文献梳理总结环境健康数据的清洗方法和应用情况,结合风险评估对数据的要求,根据实践应用经验提出环境... 目的提出并规范环境健康风险评估数据清洗的工作流程和操作步骤、选择合适的处理方法,提高数据清洗工作的效率。方法通过检索国内外文献梳理总结环境健康数据的清洗方法和应用情况,结合风险评估对数据的要求,根据实践应用经验提出环境健康风险评估数据清洗框架。结果建立了环境健康风险评估数据清洗框架,包括工作准备、数据探索、数据检测、数据清洗和数据终库。结论本研究提出的环境健康风险评估数据清洗框架,规范了清洗流程和操作步骤,为从事环境健康风险评估的工作人员提供参考和技术支撑。 展开更多
关键词 环境健康 风险评估 数据检测 数据清洗 数据清洗框架
暂未订购
区块链技术赋能智慧城市社区精细化管理系统研究 被引量:2
19
作者 张惠峰 《软件》 2025年第3期169-171,共3页
本文探讨了区块链技术赋能智慧城市社区精细化管理系统设计与应用。区块链技术以其去中心化、不可篡改和透明性的核心特性,为智慧社区的建设提供了坚实的技术支撑。系统架构包括感知层、网络层、数据层和应用层,各层紧密协作,实现了社... 本文探讨了区块链技术赋能智慧城市社区精细化管理系统设计与应用。区块链技术以其去中心化、不可篡改和透明性的核心特性,为智慧社区的建设提供了坚实的技术支撑。系统架构包括感知层、网络层、数据层和应用层,各层紧密协作,实现了社区管理的智能化与高效化。前端界面设计注重用户友好性,后台系统则采用微服务架构,确保系统的稳定性和安全性。数据存储与共享通过区块链技术实现,确保了数据的可信性和安全性。 展开更多
关键词 区块链技术 智慧社区 管理系统 数据清洗
在线阅读 下载PDF
基于XGBoost的丢头地震记录自动识别模型
20
作者 李山有 谢博楠 +3 位作者 卢建旗 谢志南 李伟 陈欣 《应用基础与工程科学学报》 北大核心 2025年第2期338-348,共11页
约1/2以上的强震动观测数据面临信号丢头的问题.如何在海量记录中自动剔除丢头的地震记录是地震P波参数相关算法研究的重要需求.基于极限梯度提升树(XGBoost)方法,建立了丢头地震动记录的自动识别模型.采用日本K-NET台网记录的970次地震... 约1/2以上的强震动观测数据面临信号丢头的问题.如何在海量记录中自动剔除丢头的地震记录是地震P波参数相关算法研究的重要需求.基于极限梯度提升树(XGBoost)方法,建立了丢头地震动记录的自动识别模型.采用日本K-NET台网记录的970次地震的83825条竖向分量加速度记录作为XGBoost模型的训练/测试数据集.该模型对正样本(未丢头记录)的识别成功率为92.07%,对负样本(丢头记录)的识别成功率为98.93%.在相同测试数据集下与基于Fisher线性分辨的传统模型相比,XGBoost模型不仅极大地提高了正样本的识别成功率,同时也保证了负样本较高的识别成功率.结果表明,该模型对(未)丢头地震记录有很高的识别精度,当需要从海量强震动观测数据中自动提取P波参数时,可以运用该模型自动剔除丢头地震记录,以避免丢头地震记录对数据质量造成污染. 展开更多
关键词 海量地震数据 丢头地震记录 XGBoost 集成学习 地震P波 数据清洗
原文传递
上一页 1 2 54 下一页 到第
使用帮助 返回顶部