The cost and strict input format requirements of GraphRAG make it less efficient for processing large documents. This paper proposes an alternative approach for constructing a knowledge graph (KG) from a PDF document ...The cost and strict input format requirements of GraphRAG make it less efficient for processing large documents. This paper proposes an alternative approach for constructing a knowledge graph (KG) from a PDF document with a focus on simplicity and cost-effectiveness. The process involves splitting the document into chunks, extracting concepts within each chunk using a large language model (LLM), and building relationships based on the proximity of concepts in the same chunk. Unlike traditional named entity recognition (NER), which identifies entities like “Shanghai”, the proposed method identifies concepts, such as “Convenient transportation in Shanghai” which is found to be more meaningful for KG construction. Each edge in the KG represents a relationship between concepts occurring in the same text chunk. The process is computationally inexpensive, leveraging locally set up tools like Mistral 7B openorca instruct and Ollama for model inference, ensuring the entire graph generation process is cost-free. A method of assigning weights to relationships, grouping similar pairs, and summarizing multiple relationships into a single edge with associated weight and relation details is introduced. Additionally, node degrees and communities are calculated for node sizing and coloring. This approach offers a scalable, cost-effective solution for generating meaningful knowledge graphs from large documents, achieving results comparable to GraphRAG while maintaining accessibility for personal machines.展开更多
电力系统的安全稳定运行是保障国家能源安全和经济发展的关键,而这在很大程度上依赖于对电力物联设备故障的准确预测。当前,随着电力物联网技术的发展,大量的数据被采集,但这些数据的潜在价值尚未得到充分挖掘,这在一定程度上限制了故...电力系统的安全稳定运行是保障国家能源安全和经济发展的关键,而这在很大程度上依赖于对电力物联设备故障的准确预测。当前,随着电力物联网技术的发展,大量的数据被采集,但这些数据的潜在价值尚未得到充分挖掘,这在一定程度上限制了故障预测的准确性,影响了电力系统的可靠运行。针对这一问题,该文提出了一种创新的基于GraphSAGE(Graph Sample and Aggregate)算法的电力物联设备故障预测。该方法通过PowerGraph数据集,将电力物联设备故障场景细分为四类,利用GraphSAGE模型的特性,深入学习和分析节点特征与边特征,从而实现对物联设备故障的有效预测。实验结果表明,该方法准确率达到97.5%,相较于其它传统方法,准确率提高了0.39%~6.21%,同时GraphSAGE模型实现了快速训练。该方法为电力物联设备安全稳定运行提供重要决策支持,能够对动态和相互联系的复杂系统进行更精细的分析,并增强电力系统运营部门对潜在干扰的预见和应对能力。展开更多
Equipment has dual nature: physical objects existing in nature, and artificial objects designed by human. The decision on the configuration and structural parameters of equipment is made by engineers based on technica...Equipment has dual nature: physical objects existing in nature, and artificial objects designed by human. The decision on the configuration and structural parameters of equipment is made by engineers based on technical-physical effects which control the behavioral parameters of the equipment. Sensors are mounted on the equipment to monitor the equipment state. Current methods for state monitoring and diagnosis mostly use mathematics and artificial intelligence technology to construct evaluation methods. This paper presents an integrated design and state maintenance method, in which graph and dual graph are used for recording design data and sensor arrangement and for mapping method from signals to substructures and connection pairs. An example of state maintenance of hydro power generating equipment is illustrated.展开更多
以旅游大数据为基础,考虑长时间范围内的滞后效应以及不同搜索强度指数(Search Intensity Index,SII)之间的多任务影响,提出一种基于大数据的多任务旅游信息分析(Multi-tasking Tourism Information Analysis Based on Big Data,MTIABD...以旅游大数据为基础,考虑长时间范围内的滞后效应以及不同搜索强度指数(Search Intensity Index,SII)之间的多任务影响,提出一种基于大数据的多任务旅游信息分析(Multi-tasking Tourism Information Analysis Based on Big Data,MTIABD)框架。使用融合信息重排序技术预测旅游需求,具体根据图引导结构模拟历史变量对未来变量的滞后影响。每个变量通过时间维度上的卷积神经网络(Convolutional Neural Network,CNN)进行独立编码,利用二分图动态建模滞后效应,通过图聚合进行挖掘,实现对旅游需求的精准预测。基于上述技术,构建旅游需求预测系统,旅游者能够根据需求检索不同景点的信息。在真实数据集上进行大量实验,结果表明所提出的MTIABD框架在一步和多步预测方面均优于现有方法。在平均绝对百分比误差(Mean Absolute Percentage Error,MAPE)指标下,相较于基于实例的多变量时间序列图预测框架(Instance-wise Graph-rased Framework for Multivariate Time Series Forecasting,IGMTF),MTIABD在HK-2021数据集上的性能提高了16.75%,在MO-2021数据集上的性能提高了19.79%。展开更多
目的识别结直肠癌患者治疗后不良结局的直接与间接影响因素,并探讨这些因素与不良结局之间的因果效应,为改善患者不良结局提供依据。方法收集2013年至2015年在哈尔滨医科大学附属肿瘤医院入院并确诊为结直肠癌的患者病例信息,将治疗后...目的识别结直肠癌患者治疗后不良结局的直接与间接影响因素,并探讨这些因素与不良结局之间的因果效应,为改善患者不良结局提供依据。方法收集2013年至2015年在哈尔滨医科大学附属肿瘤医院入院并确诊为结直肠癌的患者病例信息,将治疗后两年内发生死亡、转移或复发定义为不良结局。以快速等价贪婪搜索算法构建因果图模型并分析不良结局的直接与间接影响因素,在此基础上采用无因果图时的干预演算(intervention calculus when the directed acyclic graph is absent,IDA)算法评估影响因素对不良结局的因果效应。结果共纳入2332例患者,平均年龄(68.0±10.9)岁,不良结局发生率6.22%。因果图包含20个节点、36条边;不良结局发生的直接影响因素包括化疗、病理类型、手术治疗及住院天数(|IDA|分别为0.039、0.059、0.255、0.054);间接影响因素包括年龄、饮酒、身体质量指数、分化程度、放疗、手术性质(|IDA|分别为0.011、0.021、0.012、0.042、0.021、0.030)。结论在因果图识别结直肠癌不良结局的关键因素基础上,IDA算法可量化影响因素对不良结局的因果效应。研究提示在结直肠癌的临床治疗中,提高无手术、化疗禁忌症患者的手术及化疗接受率可降低不良结局发生率,从而改善预后。展开更多
文摘The cost and strict input format requirements of GraphRAG make it less efficient for processing large documents. This paper proposes an alternative approach for constructing a knowledge graph (KG) from a PDF document with a focus on simplicity and cost-effectiveness. The process involves splitting the document into chunks, extracting concepts within each chunk using a large language model (LLM), and building relationships based on the proximity of concepts in the same chunk. Unlike traditional named entity recognition (NER), which identifies entities like “Shanghai”, the proposed method identifies concepts, such as “Convenient transportation in Shanghai” which is found to be more meaningful for KG construction. Each edge in the KG represents a relationship between concepts occurring in the same text chunk. The process is computationally inexpensive, leveraging locally set up tools like Mistral 7B openorca instruct and Ollama for model inference, ensuring the entire graph generation process is cost-free. A method of assigning weights to relationships, grouping similar pairs, and summarizing multiple relationships into a single edge with associated weight and relation details is introduced. Additionally, node degrees and communities are calculated for node sizing and coloring. This approach offers a scalable, cost-effective solution for generating meaningful knowledge graphs from large documents, achieving results comparable to GraphRAG while maintaining accessibility for personal machines.
文摘电力系统的安全稳定运行是保障国家能源安全和经济发展的关键,而这在很大程度上依赖于对电力物联设备故障的准确预测。当前,随着电力物联网技术的发展,大量的数据被采集,但这些数据的潜在价值尚未得到充分挖掘,这在一定程度上限制了故障预测的准确性,影响了电力系统的可靠运行。针对这一问题,该文提出了一种创新的基于GraphSAGE(Graph Sample and Aggregate)算法的电力物联设备故障预测。该方法通过PowerGraph数据集,将电力物联设备故障场景细分为四类,利用GraphSAGE模型的特性,深入学习和分析节点特征与边特征,从而实现对物联设备故障的有效预测。实验结果表明,该方法准确率达到97.5%,相较于其它传统方法,准确率提高了0.39%~6.21%,同时GraphSAGE模型实现了快速训练。该方法为电力物联设备安全稳定运行提供重要决策支持,能够对动态和相互联系的复杂系统进行更精细的分析,并增强电力系统运营部门对潜在干扰的预见和应对能力。
基金the National Natural Science Foundation of China(No.51175284)
文摘Equipment has dual nature: physical objects existing in nature, and artificial objects designed by human. The decision on the configuration and structural parameters of equipment is made by engineers based on technical-physical effects which control the behavioral parameters of the equipment. Sensors are mounted on the equipment to monitor the equipment state. Current methods for state monitoring and diagnosis mostly use mathematics and artificial intelligence technology to construct evaluation methods. This paper presents an integrated design and state maintenance method, in which graph and dual graph are used for recording design data and sensor arrangement and for mapping method from signals to substructures and connection pairs. An example of state maintenance of hydro power generating equipment is illustrated.
文摘以旅游大数据为基础,考虑长时间范围内的滞后效应以及不同搜索强度指数(Search Intensity Index,SII)之间的多任务影响,提出一种基于大数据的多任务旅游信息分析(Multi-tasking Tourism Information Analysis Based on Big Data,MTIABD)框架。使用融合信息重排序技术预测旅游需求,具体根据图引导结构模拟历史变量对未来变量的滞后影响。每个变量通过时间维度上的卷积神经网络(Convolutional Neural Network,CNN)进行独立编码,利用二分图动态建模滞后效应,通过图聚合进行挖掘,实现对旅游需求的精准预测。基于上述技术,构建旅游需求预测系统,旅游者能够根据需求检索不同景点的信息。在真实数据集上进行大量实验,结果表明所提出的MTIABD框架在一步和多步预测方面均优于现有方法。在平均绝对百分比误差(Mean Absolute Percentage Error,MAPE)指标下,相较于基于实例的多变量时间序列图预测框架(Instance-wise Graph-rased Framework for Multivariate Time Series Forecasting,IGMTF),MTIABD在HK-2021数据集上的性能提高了16.75%,在MO-2021数据集上的性能提高了19.79%。
文摘目的识别结直肠癌患者治疗后不良结局的直接与间接影响因素,并探讨这些因素与不良结局之间的因果效应,为改善患者不良结局提供依据。方法收集2013年至2015年在哈尔滨医科大学附属肿瘤医院入院并确诊为结直肠癌的患者病例信息,将治疗后两年内发生死亡、转移或复发定义为不良结局。以快速等价贪婪搜索算法构建因果图模型并分析不良结局的直接与间接影响因素,在此基础上采用无因果图时的干预演算(intervention calculus when the directed acyclic graph is absent,IDA)算法评估影响因素对不良结局的因果效应。结果共纳入2332例患者,平均年龄(68.0±10.9)岁,不良结局发生率6.22%。因果图包含20个节点、36条边;不良结局发生的直接影响因素包括化疗、病理类型、手术治疗及住院天数(|IDA|分别为0.039、0.059、0.255、0.054);间接影响因素包括年龄、饮酒、身体质量指数、分化程度、放疗、手术性质(|IDA|分别为0.011、0.021、0.012、0.042、0.021、0.030)。结论在因果图识别结直肠癌不良结局的关键因素基础上,IDA算法可量化影响因素对不良结局的因果效应。研究提示在结直肠癌的临床治疗中,提高无手术、化疗禁忌症患者的手术及化疗接受率可降低不良结局发生率,从而改善预后。