期刊文献+
共找到4,750篇文章
< 1 2 238 >
每页显示 20 50 100
Question classification in question answering based on real-world web data sets 被引量:1
1
作者 袁晓洁 于士涛 +1 位作者 师建兴 陈秋双 《Journal of Southeast University(English Edition)》 EI CAS 2008年第3期272-275,共4页
To improve question answering (QA) performance based on real-world web data sets,a new set of question classes and a general answer re-ranking model are defined.With pre-defined dictionary and grammatical analysis,t... To improve question answering (QA) performance based on real-world web data sets,a new set of question classes and a general answer re-ranking model are defined.With pre-defined dictionary and grammatical analysis,the question classifier draws both semantic and grammatical information into information retrieval and machine learning methods in the form of various training features,including the question word,the main verb of the question,the dependency structure,the position of the main auxiliary verb,the main noun of the question,the top hypernym of the main noun,etc.Then the QA query results are re-ranked by question class information.Experiments show that the questions in real-world web data sets can be accurately classified by the classifier,and the QA results after re-ranking can be obviously improved.It is proved that with both semantic and grammatical information,applications such as QA, built upon real-world web data sets, can be improved,thus showing better performance. 展开更多
关键词 question classification question answering real-world web data sets question and answer web forums re-ranking model
在线阅读 下载PDF
Reconstruction of incomplete satellite SST data sets based on EOF method 被引量:2
2
作者 DING Youzhuan WEI Zhihui +2 位作者 MAO Zhihua WANG Xiaofei PAN Delu 《Acta Oceanologica Sinica》 SCIE CAS CSCD 2009年第2期36-44,共9页
As for the satellite remote sensing data obtained by the visible and infrared bands myers,on, the clouds coverage in the sky over the ocean often results in missing data of inversion products on a large scale, and thi... As for the satellite remote sensing data obtained by the visible and infrared bands myers,on, the clouds coverage in the sky over the ocean often results in missing data of inversion products on a large scale, and thin clouds difficult to be detected would cause the data of the inversion products to be abnormal. Alvera et a1.(2005) proposed a method for the reconstruction of missing data based on an Empirical Orthogonal Functions (EOF) decomposition, but his method couldn't process these images presenting extreme cloud coverage(more than 95%), and required a long time for recon- struction. Besides, the abnormal data in the images had a great effect on the reconstruction result. Therefore, this paper tries to improve the study result. It has reconstructed missing data sets by twice applying EOF decomposition method. Firstly, the abnormity time has been detected by analyzing the temporal modes of EOF decomposition, and the abnormal data have been eliminated. Secondly, the data sets, excluding the abnormal data, are analyzed by using EOF decomposition, and then the temporal modes undergo a filtering process so as to enhance the ability of reconstruct- ing the images which are of no or just a little data, by using EOF. At last, this method has been applied to a large data set, i.e. 43 Sea Surface Temperature (SST) satellite images of the Changjiang River (Yangtze River) estuary and its adjacent areas, and the total reconstruction root mean square error (RMSE) is 0.82℃. And it has been proved that this improved EOF reconstruction method is robust for reconstructing satellite missing data and unreliable data. 展开更多
关键词 EOF SST Changjiang River estuary Missing data sets
在线阅读 下载PDF
Traffic Flow Data Forecasting Based on Interval Type-2 Fuzzy Sets Theory 被引量:5
3
作者 Runmei Li Chaoyang Jiang +1 位作者 Fenghua Zhu Xiaolong Chen 《IEEE/CAA Journal of Automatica Sinica》 SCIE EI 2016年第2期141-148,共8页
This paper proposes a long-term forecasting scheme and implementation method based on the interval type-2 fuzzy sets theory for traffic flow data. The type-2 fuzzy sets have advantages in modeling uncertainties becaus... This paper proposes a long-term forecasting scheme and implementation method based on the interval type-2 fuzzy sets theory for traffic flow data. The type-2 fuzzy sets have advantages in modeling uncertainties because their membership functions are fuzzy. The scheme includes traffic flow data preprocessing module, type-2 fuzzification operation module and long-term traffic flow data forecasting output module, in which the Interval Approach acts as the core algorithm. The central limit theorem is adopted to convert point data of mass traffic flow in some time range into interval data of the same time range (also called confidence interval data) which is being used as the input of interval approach. The confidence interval data retain the uncertainty and randomness of traffic flow, meanwhile reduce the influence of noise from the detection data. The proposed scheme gets not only the traffic flow forecasting result but also can show the possible range of traffic flow variation with high precision using upper and lower limit forecasting result. The effectiveness of the proposed scheme is verified using the actual sample application. © 2014 Chinese Association of Automation. 展开更多
关键词 data handling Forecasting Fuzzy sets Membership functions Uncertainty analysis
在线阅读 下载PDF
An Evaluation of the Reliability of Complex Systems Using Shadowed Sets and Fuzzy Lifetime Data 被引量:3
4
作者 Olgierd Hryniewicz 《International Journal of Automation and computing》 EI 2006年第2期145-150,共6页
In this paper, we consider the problem of the evaluation of system reliability using statistical data obtained from reliability tests of its elements, in which the lifetimes of elements are described using an exponent... In this paper, we consider the problem of the evaluation of system reliability using statistical data obtained from reliability tests of its elements, in which the lifetimes of elements are described using an exponential distribution. We assume that this lifetime data may be reported imprecisely and that this lack of precision may be described using fuzzy sets. As the direct application of the fuzzy sets methodology leads in this case to very complicated and time consuming calculations, we propose simple approximations of fuzzy numbers using shadowed sets introduced by Pedrycz (1998). The proposed methodology may be simply extended to the case of general lifetime probability distributions. 展开更多
关键词 Estimation of reliability fuzzy reliability data shadowed sets.
在线阅读 下载PDF
Frequent item sets mining from high-dimensional dataset based on a novel binary particle swarm optimization 被引量:2
5
作者 张中杰 黄健 卫莹 《Journal of Central South University》 SCIE EI CAS CSCD 2016年第7期1700-1708,共9页
A novel binary particle swarm optimization for frequent item sets mining from high-dimensional dataset(BPSO-HD) was proposed, where two improvements were joined. Firstly, the dimensionality reduction of initial partic... A novel binary particle swarm optimization for frequent item sets mining from high-dimensional dataset(BPSO-HD) was proposed, where two improvements were joined. Firstly, the dimensionality reduction of initial particles was designed to ensure the reasonable initial fitness, and then, the dynamically dimensionality cutting of dataset was built to decrease the search space. Based on four high-dimensional datasets, BPSO-HD was compared with Apriori to test its reliability, and was compared with the ordinary BPSO and quantum swarm evolutionary(QSE) to prove its advantages. The experiments show that the results given by BPSO-HD is reliable and better than the results generated by BPSO and QSE. 展开更多
关键词 data mining frequent item sets particle swarm optimization
在线阅读 下载PDF
Domain-Oriented Data-Driven Data Mining Based on Rough Sets 被引量:1
6
作者 Guoyin Wang 《南昌工程学院学报》 CAS 2006年第2期46-46,共1页
Data mining (also known as Knowledge Discovery in Databases - KDD) is defined as the nontrivial extraction of implicit, previously unknown, and potentially useful information from data. The aims and objectives of data... Data mining (also known as Knowledge Discovery in Databases - KDD) is defined as the nontrivial extraction of implicit, previously unknown, and potentially useful information from data. The aims and objectives of data mining are to discover knowledge of interest to user needs.Data mining is really a useful tool in many domains such as marketing, decision making, etc. However, some basic issues of data mining are ignored. What is data mining? What is the product of a data mining process? What are we doing in a data mining process? Is there any rule we should obey in a data mining process? In order to discover patterns and knowledge really interesting and actionable to the real world Zhang et al proposed a domain-driven human-machine-cooperated data mining process.Zhao and Yao proposed an interactive user-driven classification method using the granule network. In our work, we find that data mining is a kind of knowledge transforming process to transform knowledge from data format into symbol format. Thus, no new knowledge could be generated (born) in a data mining process. In a data mining process, knowledge is just transformed from data format, which is not understandable for human, into symbol format,which is understandable for human and easy to be used.It is similar to the process of translating a book from Chinese into English.In this translating process,the knowledge itself in the book should remain unchanged. What will be changed is the format of the knowledge only. That is, the knowledge in the English book should be kept the same as the knowledge in the Chinese one.Otherwise, there must be some mistakes in the translating proces, that is, we are transforming knowledge from one format into another format while not producing new knowledge in a data mining process. The knowledge is originally stored in data (data is a representation format of knowledge). Unfortunately, we can not read, understand, or use it, since we can not understand data. With this understanding of data mining, we proposed a data-driven knowledge acquisition method based on rough sets. It also improved the performance of classical knowledge acquisition methods. In fact, we also find that the domain-driven data mining and user-driven data mining do not conflict with our data-driven data mining. They could be integrated into domain-oriented data-driven data mining. It is just like the views of data base. Users with different views could look at different partial data of a data base. Thus, users with different tasks or objectives wish, or could discover different knowledge (partial knowledge) from the same data base. However, all these partial knowledge should be originally existed in the data base. So, a domain-oriented data-driven data mining method would help us to extract the knowledge which is really existed in a data base, and really interesting and actionable to the real world. 展开更多
关键词 data mining data-DRIVEN USER-DRIVEN domain-driven KDD Machine Learning Knowledge Acquisition rough sets
在线阅读 下载PDF
Scaling up Kernel Grower Clustering Method for Large Data Sets via Core-sets 被引量:2
7
作者 CHANG Liang DENG Xiao-Ming +1 位作者 ZHENG Sui-Wu WANG Yong-Qing 《自动化学报》 EI CSCD 北大核心 2008年第3期376-382,共7页
核栽培者是聚类最近 Camastra 和 Verri 建议的方法的一个新奇的核。它证明为各种各样的数据的好性能关于流行聚类的算法有利地设定并且比较。然而,方法的主要缺点是在处理大数据集合的弱可伸缩能力,它极大地限制它的应用程序。在这... 核栽培者是聚类最近 Camastra 和 Verri 建议的方法的一个新奇的核。它证明为各种各样的数据的好性能关于流行聚类的算法有利地设定并且比较。然而,方法的主要缺点是在处理大数据集合的弱可伸缩能力,它极大地限制它的应用程序。在这份报纸,我们用核心集合建议一个可伸缩起来的核栽培者方法,它是比为聚类的大数据的原来的方法显著地快的。同时,它能处理很大的数据集合。象合成数据集合一样的基准数据集合的数字实验显示出建议方法的效率。方法也被用于真实图象分割说明它的性能。 展开更多
关键词 大型数据集 图象分割 模式识别 磁心配置 核聚类
在线阅读 下载PDF
Rough Sets Probabilistic Data Association Algorithm and its Application in Multi-target Tracking 被引量:1
8
作者 Long-qiang NI She-sheng GAO +1 位作者 Peng-cheng FENG Kai ZHAO 《Defence Technology(防务技术)》 SCIE EI CAS 2013年第4期208-216,共9页
A rough set probabilistic data association(RS-PDA)algorithm is proposed for reducing the complexity and time consumption of data association and enhancing the accuracy of tracking results in multi-target tracking appl... A rough set probabilistic data association(RS-PDA)algorithm is proposed for reducing the complexity and time consumption of data association and enhancing the accuracy of tracking results in multi-target tracking application.In this new algorithm,the measurements lying in the intersection of two or more validation regions are allocated to the corresponding targets through rough set theory,and the multi-target tracking problem is transformed into a single target tracking after the classification of measurements lying in the intersection region.Several typical multi-target tracking applications are given.The simulation results show that the algorithm can not only reduce the complexity and time consumption but also enhance the accuracy and stability of the tracking results. 展开更多
关键词 数据关联算法 多目标跟踪 粗糙集理论 应用 概率 时间消耗 问题转化 仿真结果
在线阅读 下载PDF
Evolution algorithm for water storage forecasting response to climate change with little data sets:the Wolonghu Wetland,China
9
作者 尼庆伟 叶人珍 +1 位作者 杨凤林 雷坤 《Journal of Harbin Institute of Technology(New Series)》 EI CAS 2011年第2期127-133,共7页
An attempt of applying a novel genetic programming(GP) technique,a new member of evolution algorithms,has been made to predict the water storage of Wolonghu wetland response to the climate change in northeastern part ... An attempt of applying a novel genetic programming(GP) technique,a new member of evolution algorithms,has been made to predict the water storage of Wolonghu wetland response to the climate change in northeastern part of China with little data set.Fourteen years(1993-2006) of annual water storage and climatic data set of the wetland were taken for model training and testing.The results of simulations and predictions illustrated a good fit between calculated water storage and observed values(MAPE=9.47,r=0.99).By comparison,a multilayer perceptron(MLP)(a popular artificial neural network model) method and a grey model(GM) with the same data set were applied for performances estimation.It was found that GP technique had better performances than the other two methods both in the simulation step and predicting phase and the results were analyzed and discussed.The case study confirmed that GP method is a promising way for wetland managers to make a quick estimation of fluctuations of water storage in some wetlands under condition of little data set. 展开更多
关键词 water storage little data set evolution algorism Wolonghu wetland
在线阅读 下载PDF
一个基于现实世界的大型Web参照数据集——UK2006 Datasets的初步研究
10
作者 曾刚 李宏 《企业技术开发》 2009年第5期16-17,31,共3页
文章介绍了WEBSPAM-UK2006数据集,一个大型的基于现实世界的,人工评判过一些垃圾行为的web数据集合,详细的对数据集的构成进行了分析,对数据集采用Python进行了初步的预处理,为以后在反垃圾网页行为方面的算法和判定研究提供了非常有意... 文章介绍了WEBSPAM-UK2006数据集,一个大型的基于现实世界的,人工评判过一些垃圾行为的web数据集合,详细的对数据集的构成进行了分析,对数据集采用Python进行了初步的预处理,为以后在反垃圾网页行为方面的算法和判定研究提供了非常有意的经验和参考。 展开更多
关键词 搜索引擎作弊 Web数据集 链接分析 Web图
在线阅读 下载PDF
Threshold Selection Study on Fisher Discriminant Analysis Used in Exon Prediction for Unbalanced Data Sets
11
作者 Yutao Ma Yanbing Fang +1 位作者 Ping Liu Jianfu Teng 《Communications and Network》 2013年第3期601-605,共5页
In gene prediction, the Fisher discriminant analysis (FDA) is used to separate protein coding region (exon) from non-coding regions (intron). Usually, the positive data set and the negative data set are of the same si... In gene prediction, the Fisher discriminant analysis (FDA) is used to separate protein coding region (exon) from non-coding regions (intron). Usually, the positive data set and the negative data set are of the same size if the number of the data is big enough. But for some situations the data are not sufficient or not equal, the threshold used in FDA may have important influence on prediction results. This paper presents a study on the selection of the threshold. The eigen value of each exon/intron sequence is computed using the Z-curve method with 69 variables. The experiments results suggest that the size and the standard deviation of the data sets and the threshold are the three key elements to be taken into consideration to improve the prediction results. 展开更多
关键词 FISHER DISCRIMINANT Analysis THRESHOLD Selection Gene PREDICTION Z-Curve Size of data set
暂未订购
Contrasting Vertical Structure of Recent Arctic Warming in Different Data Sets
12
作者 Igor Esau Vladimir Alexeev +1 位作者 Irina Repina Svetlana Sorokina 《Atmospheric and Climate Sciences》 2013年第1期1-5,共5页
Arctic region is experiencing strong warming and related changes in the state of sea ice, permafrost, tundra, marine environment and terrestrial ecosystems. These changes are found in any climatological data set compr... Arctic region is experiencing strong warming and related changes in the state of sea ice, permafrost, tundra, marine environment and terrestrial ecosystems. These changes are found in any climatological data set comprising the Arctic region. This study compares the temperature trends in several surface, satellite and reanalysis data sets. We demonstrate large differences in the 1979-2002 temperature trends. Data sets disagree on the magnitude of the trends as well as on their seasonal, zonal and vertical pattern. It was found that the surface temperature trends are stronger than the trends in the tropospheric temperature for each latitude band north of 50?N for each month except for the months during the ice-melting season. These results emphasize that the conclusions of climate studies drawn on the basis of a single data set analysis should be treated with caution as they may be affected by the artificial biases in data. 展开更多
关键词 ARCTIC WARMING data set Intercomparison ATMOSPHERIC VERTICAL Structure
在线阅读 下载PDF
The Solution to Poor Data Bank Using Rough Sets Theory 被引量:1
13
作者 Zhang Shilin 《工程科学(英文版)》 2006年第1期94-97,共4页
This article states the poor database which is very common when being used them. So the demanding database must be all-round, effective collection. When the offering database is poor database, it will affect the appli... This article states the poor database which is very common when being used them. So the demanding database must be all-round, effective collection. When the offering database is poor database, it will affect the application of Supporter Deciding. To this question, the author brings out one solution to solve the poor database basing on the Rough Sets Theory. It can scientifically, correctly, effectively supplement the poor database, and can offer greatly help to enforce the application of data and artificial intelligence. 展开更多
关键词 数据库 决策表 粗集理论 关联度
在线阅读 下载PDF
Semi-supervised Affinity Propagation Clustering Based on Subtractive Clustering for Large-Scale Data Sets
14
作者 Qi Zhu Huifu Zhang Quanqin Yang 《国际计算机前沿大会会议论文集》 2015年第1期76-77,共2页
In the face of a growing number of large-scale data sets, affinity propagation clustering algorithm to calculate the process required to build the similarity matrix, will bring huge storage and computation. Therefore,... In the face of a growing number of large-scale data sets, affinity propagation clustering algorithm to calculate the process required to build the similarity matrix, will bring huge storage and computation. Therefore, this paper proposes an improved affinity propagation clustering algorithm. First, add the subtraction clustering, using the density value of the data points to obtain the point of initial clusters. Then, calculate the similarity distance between the initial cluster points, and reference the idea of semi-supervised clustering, adding pairs restriction information, structure sparse similarity matrix. Finally, the cluster representative points conduct AP clustering until a suitable cluster division.Experimental results show that the algorithm allows the calculation is greatly reduced, the similarity matrix storage capacity is also reduced, and better than the original algorithm on the clustering effect and processing speed. 展开更多
关键词 subtractive CLUSTERING INITIAL cluster AFFINITY propagation CLUSTERING SEMI-SUPERVISED CLUSTERING LARGE-SCALE data sets
在线阅读 下载PDF
An Interpretation of Multi-pole Sonic Logging Data Mining Based on Rough Sets
15
作者 ZENG Xiao-hui SHI Yi-bing LIAN Yi 《通讯和计算机(中英文版)》 2007年第1期8-10,共3页
关键词 声波测井 数据挖掘 数值模拟 油田
在线阅读 下载PDF
基于Rough Sets的传感器异常数据处理 被引量:3
16
作者 雷霖 陈锋 +1 位作者 代传龙 王厚军 《电子科技大学学报》 EI CAS CSCD 北大核心 2006年第S1期678-681,共4页
在各种传感器的应用中,经常要对传感器的测量数据进行处理,以保证测量结果的可靠性.为了利用粗糙集理论处理不确定数据的优点,根据粗糙集理论的思想,先由已知测量数据提取出决策表,再进行补全、离散化等预处理,最后进行属性约简并提取... 在各种传感器的应用中,经常要对传感器的测量数据进行处理,以保证测量结果的可靠性.为了利用粗糙集理论处理不确定数据的优点,根据粗糙集理论的思想,先由已知测量数据提取出决策表,再进行补全、离散化等预处理,最后进行属性约简并提取出分类规则,对测量数据进行分类,剔除测量数据中的异常数据.实验结果显示该异常数据发现方法比常用的异常数据处理方法更为客观、精确和可靠. 展开更多
关键词 粗糙集 数据处理 分类规则 决策表 异常数据 传感器
在线阅读 下载PDF
基于Rough Sets的中医指症挖掘研究与应用 被引量:2
17
作者 丁卫平 管致锦 顾春华 《计算机工程与应用》 CSCD 北大核心 2008年第7期234-237,共4页
针对中医病历数据库中指症样本维数较大、数据特征和属性冗余量较多等特征,在对Rough Sets基本理论和属性约简算法研究的基础上,提出了将属性频度和属性重要性相结合的GENRED_GROWTH中医指症挖掘算法,并进行了基于GENRED_GROWTH的中医... 针对中医病历数据库中指症样本维数较大、数据特征和属性冗余量较多等特征,在对Rough Sets基本理论和属性约简算法研究的基础上,提出了将属性频度和属性重要性相结合的GENRED_GROWTH中医指症挖掘算法,并进行了基于GENRED_GROWTH的中医指症挖掘原型系统设计与实现。通过分析和实验结果表明:该算法能较好地进行中医指症属性约简,分类精度较高,并且能抽取中医指症相关诊断规则以辅助医生的诊断和治疗。 展开更多
关键词 ROUGH sets 属性约简 中医指症 数据挖掘
在线阅读 下载PDF
TVBPS:一种基于Parallel Sets的具有度量属性的多变元时态数据可视化方法 被引量:2
18
作者 孙宁伟 刘海峰 +3 位作者 赵瑜 刘勇 王璐 肖卫东 《计算机应用研究》 CSCD 北大核心 2014年第5期1591-1596,1600,共7页
针对现有"具有度量属性的多变元时态数据"可视化方法不足,提出Parallel Sets分类值排列顺序优化算法ACLEARCR、基于相关度的Parallel Sets变元轴配置算法(VABC)、深度信息Parallel Sets(DCPS)共同组成基于Parallel Sets的具... 针对现有"具有度量属性的多变元时态数据"可视化方法不足,提出Parallel Sets分类值排列顺序优化算法ACLEARCR、基于相关度的Parallel Sets变元轴配置算法(VABC)、深度信息Parallel Sets(DCPS)共同组成基于Parallel Sets的具有度量属性的多变元时态数据可视化方法 TVBPS。使用具体数据集对提出的可视化方法进行实验,获得的视图能够挖掘数据集中的隐含知识,证明了该方法的有效性。TVBPS可视化方法为分析多变元时态数据集提供了有效手段,具有较高的适用性和易用性。 展开更多
关键词 度量属性 多变元 时态 信息可视化 PARALLEL sets
在线阅读 下载PDF
变精度Rough Sets模型在数据挖掘中的应用
19
作者 王志龙 李向新 《兰州工业高等专科学校学报》 2011年第1期19-22,共4页
经典的粗糙集理论在进行分类时其类之间的分界线很严格,这样提高了知识属性对被研究对象识别分类的精度,但这种方式的容错能力很差,使得模型的实际适用性很弱.变精度粗糙集是对经典粗糙集理论的一种扩展,通过研究得出了知识的依赖程度... 经典的粗糙集理论在进行分类时其类之间的分界线很严格,这样提高了知识属性对被研究对象识别分类的精度,但这种方式的容错能力很差,使得模型的实际适用性很弱.变精度粗糙集是对经典粗糙集理论的一种扩展,通过研究得出了知识的依赖程度饱和值不变约简法及信息熵不减约简法. 展开更多
关键词 数据挖掘 粗糙集理论 变精度粗糙集理论 知识约简
在线阅读 下载PDF
Rough Sets,Their Extensions and Applications 被引量:6
20
作者 Richard Jensen 《International Journal of Automation and computing》 EI 2007年第3期217-228,共12页
Rough set theory provides a useful mathematical foundation for developing automated computational systems that can help understand and make use of imperfect knowledge. Despite its recency, the theory and its extension... Rough set theory provides a useful mathematical foundation for developing automated computational systems that can help understand and make use of imperfect knowledge. Despite its recency, the theory and its extensions have been widely applied to many problems, including decision analysis, data mining, intelligent control and pattern recognition. This paper presents an outline of the basic concepts of rough sets and their major extensions, covering variable precision, tolerance and fuzzy rough sets. It also shows the diversity of successful applications these theories have entailed, ranging from financial and business, through biological and medicine, to physical, art, and meteorological. 展开更多
关键词 Rough sets data processing fuzzy sets
在线阅读 下载PDF
上一页 1 2 238 下一页 到第
使用帮助 返回顶部