With the rapid growth of Web databases, it is necessary to extract and integrate large-scale data available in Deep Web automatically. But current Web search engines conduct page-level ranking, which are becoming inad...With the rapid growth of Web databases, it is necessary to extract and integrate large-scale data available in Deep Web automatically. But current Web search engines conduct page-level ranking, which are becoming inadequate for entity-oriented vertical search. In this paper, we present an entity-level ranking mechanism called LG-ERM for Deep Web queries based on local scoring and global aggregation. Unlike traditional approaches, LG-ERM considers more rank influencing factors including the uncertainty of entity extraction, the style information of the entities and the importance of the Web sources, as well as the entity relationship. By combining local scoring and global aggregation in ranking, the query result can be more accurate and effective to meet users' needs. The experiments demonstrate the feasibility and effectiveness of the key techniques of LG-ERM.展开更多
Survivability refers to the ability of a network system to fulfill critical services in a timely manner to end users in the presence of failures and/or attacks. In order to establish a highly survivable system, it is ...Survivability refers to the ability of a network system to fulfill critical services in a timely manner to end users in the presence of failures and/or attacks. In order to establish a highly survivable system, it is necessary to measure its survivability to evaluate the performance of the system's services under adverse conditions. According to survivability requirements of large-scale mobile ad-hoc networks (MANETs), we propose a novel model for quantitative evaluation on survivability. The proposed model considers various types of faults and connection states of mobile hosts, and uses the continuous time Markov chain (CTMC) to describe the survivability of MANETs in a precise manner. We introduce the reliability theory to perform quantitative analysis and survivability evaluation of segment-by-segment routing (SSR), multipath-based segment-by-segment routing (MP-SSR), and segment-by-segment-based multipath routing (SS-MPR) in large-scale MANETs. The proposed model can be used to analyze the network performance much more easily than a simulation-based approach. Numerical validation shows that the proposed model can be used to obtain a better evaluation result on the survivability of large-scale MANETs.展开更多
Ambiguous words refer to words that have multiple meanings such as apple, window. In text classification they are usually removed by feature reduction methods like Information Gain. Sometimes there are too many ambigu...Ambiguous words refer to words that have multiple meanings such as apple, window. In text classification they are usually removed by feature reduction methods like Information Gain. Sometimes there are too many ambiguous words in the corpus, which makes throwing away all of them not a viable option, as in the case when classifying documents from the Web. In this paper we look for a method to classify Titled documents with the help of ambiguous words. Titled documents are a kind of documents that have a simple structure containing a title and an excerpt. News, messages, and paper abstracts with titles are examples of titled documents. Instead of introducing another feature reduction method, we describe a framework to make the best use of ambiguous words in the titled documents. The framework improves the performance of a traditional bag-of-words classifier with the help of a bag-of-word-pairs classifier. The framework is implemented using one of the most popular classifiers, Multinomial NaiveBayes (MNB) as an example. The experiments with three real life datasets show that in our framework the MNB model performs much better than traditional MNB classifier and a naive weighted algorithm, which simply puts more weight on words in the title.展开更多
基金Supported by the National Natural Science Foundation of China under Grant No.60673139the National High Technology Development and Research 863 Program of China under Grant No.2008AA01Z146
文摘With the rapid growth of Web databases, it is necessary to extract and integrate large-scale data available in Deep Web automatically. But current Web search engines conduct page-level ranking, which are becoming inadequate for entity-oriented vertical search. In this paper, we present an entity-level ranking mechanism called LG-ERM for Deep Web queries based on local scoring and global aggregation. Unlike traditional approaches, LG-ERM considers more rank influencing factors including the uncertainty of entity extraction, the style information of the entities and the importance of the Web sources, as well as the entity relationship. By combining local scoring and global aggregation in ranking, the query result can be more accurate and effective to meet users' needs. The experiments demonstrate the feasibility and effectiveness of the key techniques of LG-ERM.
基金supported by the National Basic Research 973 Program of China under Grant No.2003CB317003the Research Grants Council of the Hong Kong Special Administrative Region,China under Grant No.9041350(CityU 114908)+3 种基金CityU AppliedR&D Funding(ARD-(Ctr-)) under Grant Nos.9681001 and 9678002the Hunan Provincial Natural Science Foundation of China forDistinguished Young Scholars under Grant No.07J J1010the National Natural Science Foundation of China for Major Research Planunder Grant No.90718034the Program for Changjiang Scholars and Innovative Research Team in University under Grant No.IRT0661
文摘Survivability refers to the ability of a network system to fulfill critical services in a timely manner to end users in the presence of failures and/or attacks. In order to establish a highly survivable system, it is necessary to measure its survivability to evaluate the performance of the system's services under adverse conditions. According to survivability requirements of large-scale mobile ad-hoc networks (MANETs), we propose a novel model for quantitative evaluation on survivability. The proposed model considers various types of faults and connection states of mobile hosts, and uses the continuous time Markov chain (CTMC) to describe the survivability of MANETs in a precise manner. We introduce the reliability theory to perform quantitative analysis and survivability evaluation of segment-by-segment routing (SSR), multipath-based segment-by-segment routing (MP-SSR), and segment-by-segment-based multipath routing (SS-MPR) in large-scale MANETs. The proposed model can be used to analyze the network performance much more easily than a simulation-based approach. Numerical validation shows that the proposed model can be used to obtain a better evaluation result on the survivability of large-scale MANETs.
基金supported by the National Natural Science Foundation of China under Grant Nos.60833003 and 60773156
文摘Ambiguous words refer to words that have multiple meanings such as apple, window. In text classification they are usually removed by feature reduction methods like Information Gain. Sometimes there are too many ambiguous words in the corpus, which makes throwing away all of them not a viable option, as in the case when classifying documents from the Web. In this paper we look for a method to classify Titled documents with the help of ambiguous words. Titled documents are a kind of documents that have a simple structure containing a title and an excerpt. News, messages, and paper abstracts with titles are examples of titled documents. Instead of introducing another feature reduction method, we describe a framework to make the best use of ambiguous words in the titled documents. The framework improves the performance of a traditional bag-of-words classifier with the help of a bag-of-word-pairs classifier. The framework is implemented using one of the most popular classifiers, Multinomial NaiveBayes (MNB) as an example. The experiments with three real life datasets show that in our framework the MNB model performs much better than traditional MNB classifier and a naive weighted algorithm, which simply puts more weight on words in the title.