This paper presents anew way to extract concept that can beused to improve text classification per-formance (precision and recall). Thecomputational measure will be dividedinto two layers. The bottom layercalled docum...This paper presents anew way to extract concept that can beused to improve text classification per-formance (precision and recall). Thecomputational measure will be dividedinto two layers. The bottom layercalled document layer is concernedwith extracting the concepts of parti-cular document and the upper layercalled category layer is with findingthe description and subject concepts ofparticular category. The relevant im-plementation algorithm that dramatic-ally decreases the search space is dis-cussed in detail. The experiment basedon real-world data collected from Info-Bank shows that the approach is supe-rior to the traditional ones.展开更多
The cost and strict input format requirements of GraphRAG make it less efficient for processing large documents. This paper proposes an alternative approach for constructing a knowledge graph (KG) from a PDF document ...The cost and strict input format requirements of GraphRAG make it less efficient for processing large documents. This paper proposes an alternative approach for constructing a knowledge graph (KG) from a PDF document with a focus on simplicity and cost-effectiveness. The process involves splitting the document into chunks, extracting concepts within each chunk using a large language model (LLM), and building relationships based on the proximity of concepts in the same chunk. Unlike traditional named entity recognition (NER), which identifies entities like “Shanghai”, the proposed method identifies concepts, such as “Convenient transportation in Shanghai” which is found to be more meaningful for KG construction. Each edge in the KG represents a relationship between concepts occurring in the same text chunk. The process is computationally inexpensive, leveraging locally set up tools like Mistral 7B openorca instruct and Ollama for model inference, ensuring the entire graph generation process is cost-free. A method of assigning weights to relationships, grouping similar pairs, and summarizing multiple relationships into a single edge with associated weight and relation details is introduced. Additionally, node degrees and communities are calculated for node sizing and coloring. This approach offers a scalable, cost-effective solution for generating meaningful knowledge graphs from large documents, achieving results comparable to GraphRAG while maintaining accessibility for personal machines.展开更多
The information integration method of semantic web based on agent ontology(SWAO method) was put forward aiming at the problems in current network environment,which integrates,analyzes and processes enormous web inform...The information integration method of semantic web based on agent ontology(SWAO method) was put forward aiming at the problems in current network environment,which integrates,analyzes and processes enormous web information and extracts answers on the basis of semantics. With SWAO method as the clue,the following technologies were studied:the method of concept extraction based on semantic term mining,agent ontology construction method on account of multi-points and the answer extraction in view of semantic inference. Meanwhile,the structural model of the question answering system applying ontology was presented,which adopts OWL language to describe domain knowledge from where QA system infers and extracts answers by Jena inference engine. In the system testing,the precision rate reaches 86%,and the recalling rate is 93%. The experimental results prove that it is feasible to use the method to develop a question answering system,which is valuable for further study in more depth.展开更多
A new ontology-based question expansion (OBQE) method is proposed for question similarity calculation in a frequently asked question (FAQ) answering system. Traditional question similarity calculation methods use ...A new ontology-based question expansion (OBQE) method is proposed for question similarity calculation in a frequently asked question (FAQ) answering system. Traditional question similarity calculation methods use "word" to compose question vector, that the semantic relations between words are ignored. OBQE takes the relation as an important part. The process of the new system is:① to build two-layered domain ontology referring to WordNet and domain corpse;② to expand question trunks into domain cases;③ to use domain case composed vector to calculate question similarity. The experimental result shows that the performance of question similarity calculation with OBQE is being improved.展开更多
To use educational resources efficiently and dig out the nature of relations among MOOCs(massive open online courses),a knowledge graph was built for MOOCs on four major platforms:Coursera,EDX,XuetangX,and ICourse.Thi...To use educational resources efficiently and dig out the nature of relations among MOOCs(massive open online courses),a knowledge graph was built for MOOCs on four major platforms:Coursera,EDX,XuetangX,and ICourse.This paper demonstrates the whole process of educational knowledge graph construction for reference.And this knowledge graph,the largest knowledge graph of MOOC resources at present,stores and represents five classes,11 kinds of relations and 52779 entities with their corresponding properties,amounting to more than 300000 triples.Notably,24188 concepts are extracted from text attributes of MOOCs and linked them directly with corresponding Wikipedia entries or the closest entries calculated semantically,which provides the normalized representation of knowledge and a more precise description for MOOCs far more than enriching words with explanatory links.Besides,prerequisites discovered by direct extractions are viewed as an essential supplement to augment the connectivity in the knowledge graph.This knowledge graph could be considered as a collection of unified MOOC resources for learners and the abundant data for researchers on MOOC-related applications,such as prerequisites mining.展开更多
基金Project supported by the National Natural Science Foundation of China (No. 60082003) and the National High Technology Research and Development Program of China (N0.863-306-ZD03-04-1).
文摘This paper presents anew way to extract concept that can beused to improve text classification per-formance (precision and recall). Thecomputational measure will be dividedinto two layers. The bottom layercalled document layer is concernedwith extracting the concepts of parti-cular document and the upper layercalled category layer is with findingthe description and subject concepts ofparticular category. The relevant im-plementation algorithm that dramatic-ally decreases the search space is dis-cussed in detail. The experiment basedon real-world data collected from Info-Bank shows that the approach is supe-rior to the traditional ones.
文摘The cost and strict input format requirements of GraphRAG make it less efficient for processing large documents. This paper proposes an alternative approach for constructing a knowledge graph (KG) from a PDF document with a focus on simplicity and cost-effectiveness. The process involves splitting the document into chunks, extracting concepts within each chunk using a large language model (LLM), and building relationships based on the proximity of concepts in the same chunk. Unlike traditional named entity recognition (NER), which identifies entities like “Shanghai”, the proposed method identifies concepts, such as “Convenient transportation in Shanghai” which is found to be more meaningful for KG construction. Each edge in the KG represents a relationship between concepts occurring in the same text chunk. The process is computationally inexpensive, leveraging locally set up tools like Mistral 7B openorca instruct and Ollama for model inference, ensuring the entire graph generation process is cost-free. A method of assigning weights to relationships, grouping similar pairs, and summarizing multiple relationships into a single edge with associated weight and relation details is introduced. Additionally, node degrees and communities are calculated for node sizing and coloring. This approach offers a scalable, cost-effective solution for generating meaningful knowledge graphs from large documents, achieving results comparable to GraphRAG while maintaining accessibility for personal machines.
基金Projects(60773462, 60672171) supported by the National Natural Science Foundation of ChinaProjects(2009AA12143, 2009AA012136) supported by the National High-Tech Research and Development Program of ChinaProject(20080430250) supported by the Foundation of Post-Doctor in China
文摘The information integration method of semantic web based on agent ontology(SWAO method) was put forward aiming at the problems in current network environment,which integrates,analyzes and processes enormous web information and extracts answers on the basis of semantics. With SWAO method as the clue,the following technologies were studied:the method of concept extraction based on semantic term mining,agent ontology construction method on account of multi-points and the answer extraction in view of semantic inference. Meanwhile,the structural model of the question answering system applying ontology was presented,which adopts OWL language to describe domain knowledge from where QA system infers and extracts answers by Jena inference engine. In the system testing,the precision rate reaches 86%,and the recalling rate is 93%. The experimental results prove that it is feasible to use the method to develop a question answering system,which is valuable for further study in more depth.
文摘A new ontology-based question expansion (OBQE) method is proposed for question similarity calculation in a frequently asked question (FAQ) answering system. Traditional question similarity calculation methods use "word" to compose question vector, that the semantic relations between words are ignored. OBQE takes the relation as an important part. The process of the new system is:① to build two-layered domain ontology referring to WordNet and domain corpse;② to expand question trunks into domain cases;③ to use domain case composed vector to calculate question similarity. The experimental result shows that the performance of question similarity calculation with OBQE is being improved.
基金supported by the National Key Research and Development Program of China under Grant No.2018YFB1004502the National Natural Science Foundation of China under Grant Nos.61532001,61702532 and 61303190.
文摘To use educational resources efficiently and dig out the nature of relations among MOOCs(massive open online courses),a knowledge graph was built for MOOCs on four major platforms:Coursera,EDX,XuetangX,and ICourse.This paper demonstrates the whole process of educational knowledge graph construction for reference.And this knowledge graph,the largest knowledge graph of MOOC resources at present,stores and represents five classes,11 kinds of relations and 52779 entities with their corresponding properties,amounting to more than 300000 triples.Notably,24188 concepts are extracted from text attributes of MOOCs and linked them directly with corresponding Wikipedia entries or the closest entries calculated semantically,which provides the normalized representation of knowledge and a more precise description for MOOCs far more than enriching words with explanatory links.Besides,prerequisites discovered by direct extractions are viewed as an essential supplement to augment the connectivity in the knowledge graph.This knowledge graph could be considered as a collection of unified MOOC resources for learners and the abundant data for researchers on MOOC-related applications,such as prerequisites mining.