By modeling the spatiotemporal data of the power grid, it is possible to better understand its operational status, identify potential issues and risks, and take timely measures to adjust and optimize the system. Compa...By modeling the spatiotemporal data of the power grid, it is possible to better understand its operational status, identify potential issues and risks, and take timely measures to adjust and optimize the system. Compared to the bus-branch model, the node-breaker model provides higher granularity in describing grid components and can dynamically reflect changes in equipment status, thus improving the efficiency of grid dispatching and operation. This paper proposes a spatiotemporal data modeling method based on a graph database. It elaborates on constructing graph nodes, graph ontology models, and graph entity models from grid dispatch data, describing the construction of the spatiotemporal node-breaker graph model and the transformation to the bus-branch model. Subsequently, by integrating spatiotemporal data attributes into the pre-built static grid graph model, a spatiotemporal evolving graph of the power grid is constructed. Furthermore, the concept of the “Power Grid One Graph” and its requirements in modern power systems are elucidated. Leveraging the constructed spatiotemporal node-breaker graph model and graph computing technology, the paper explores the feasibility of grid situational awareness. Finally, typical applications in an operational provincial grid are showcased, and potential scenarios of the proposed spatiotemporal graph model are discussed.展开更多
With increasingly complex website structure and continuously advancing web technologies,accurate user clicks recognition from massive HTTP data,which is critical for web usage mining,becomes more difficult.In this pap...With increasingly complex website structure and continuously advancing web technologies,accurate user clicks recognition from massive HTTP data,which is critical for web usage mining,becomes more difficult.In this paper,we propose a dependency graph model to describe the relationships between web requests.Based on this model,we design and implement a heuristic parallel algorithm to distinguish user clicks with the assistance of cloud computing technology.We evaluate the proposed algorithm with real massive data.The size of the dataset collected from a mobile core network is 228.7GB.It covers more than three million users.The experiment results demonstrate that the proposed algorithm can achieve higher accuracy than previous methods.展开更多
Due to limitations in geometric representation and semantic description, the current pedestrian route analysis models are inadequate. To express the geometry of geographic entities in a micro-spatial environment accur...Due to limitations in geometric representation and semantic description, the current pedestrian route analysis models are inadequate. To express the geometry of geographic entities in a micro-spatial environment accurately, the concept of a grid is presented, and grid-based methods for modeling geospatial objects are described. The semantic constitution of a building environment and the methods for modeling rooms, corridors, and staircases with grid objects are described. Based on the topology relationship between grid objects, a grid-based graph for a building environment is presented, and the corresponding route algorithm for pedestrians is proposed. The main advantages of the graph model proposed in this paper are as follows: 1) consideration of both semantic and geometric information, 2) consideration of the need for accurate geometric representation of the micro-spatial environment and the efficiency of pedestrian route analysis, 3) applicability of the graph model to route analysis in both static and dynamic environments, and 4) ability of the multi-hierarchical route analysis to integrate the multiple levels of pedestrian decision characteristics, from the high to the low, to determine the optimal path.展开更多
Enterprise applications utilize relational databases and structured business processes, requiring slow and expensive conversion of inputs and outputs, from business documents such as invoices, purchase orders, and rec...Enterprise applications utilize relational databases and structured business processes, requiring slow and expensive conversion of inputs and outputs, from business documents such as invoices, purchase orders, and receipts, into known templates and schemas before processing. We propose a new LLM Agent-based intelligent data extraction, transformation, and load (IntelligentETL) pipeline that not only ingests PDFs and detects inputs within it but also addresses the extraction of structured and unstructured data by developing tools that most efficiently and securely deal with respective data types. We study the efficiency of our proposed pipeline and compare it with enterprise solutions that also utilize LLMs. We establish the supremacy in timely and accurate data extraction and transformation capabilities of our approach for analyzing the data from varied sources based on nested and/or interlinked input constraints.展开更多
大语言模型(large language model,LLM)技术热潮对数据质量的要求提升到了一个新的高度.在现实场景中,数据通常来源不同且高度相关.但由于数据隐私安全问题,跨域异质数据往往不允许集中共享,难以被LLM高效利用.鉴于此,提出了一种LLM和...大语言模型(large language model,LLM)技术热潮对数据质量的要求提升到了一个新的高度.在现实场景中,数据通常来源不同且高度相关.但由于数据隐私安全问题,跨域异质数据往往不允许集中共享,难以被LLM高效利用.鉴于此,提出了一种LLM和知识图谱(knowledge graph,KG)协同的跨域异质数据查询框架,在LLM+KG的范式下给出跨域异质数据查询的一个治理方案.为确保LLM能够适应多场景中的跨域异质数据,首先采用适配器对跨域异质数据进行融合,并构建相应的知识图谱.为提高查询效率,引入线性知识图,并提出同源知识图抽取算法HKGE来实现知识图谱的重构,可显著提高查询性能,确保跨域异质数据治理的高效性.进而,为保证多域数据查询的高可信度,提出可信候选子图匹配算法Trust HKGM,用于检验跨域同源数据的置信度计算和可信候选子图匹配,剔除低质量节点.最后,提出基于线性知识图提示的多域数据查询算法MKLGP,实现LLM+KG范式下的高效可信跨域查询.该方法在多个真实数据集上进行了广泛实验,验证了所提方法的有效性和高效性.展开更多
基金supported by the Project of China Southern Power Grid Digital Grid Research Institute Co.,Ltd.(210002KK52222026)。
文摘By modeling the spatiotemporal data of the power grid, it is possible to better understand its operational status, identify potential issues and risks, and take timely measures to adjust and optimize the system. Compared to the bus-branch model, the node-breaker model provides higher granularity in describing grid components and can dynamically reflect changes in equipment status, thus improving the efficiency of grid dispatching and operation. This paper proposes a spatiotemporal data modeling method based on a graph database. It elaborates on constructing graph nodes, graph ontology models, and graph entity models from grid dispatch data, describing the construction of the spatiotemporal node-breaker graph model and the transformation to the bus-branch model. Subsequently, by integrating spatiotemporal data attributes into the pre-built static grid graph model, a spatiotemporal evolving graph of the power grid is constructed. Furthermore, the concept of the “Power Grid One Graph” and its requirements in modern power systems are elucidated. Leveraging the constructed spatiotemporal node-breaker graph model and graph computing technology, the paper explores the feasibility of grid situational awareness. Finally, typical applications in an operational provincial grid are showcased, and potential scenarios of the proposed spatiotemporal graph model are discussed.
基金supported in part by the Fundamental Research Funds for the Central Universities under Grant No.2013RC0114111 Project of China under Grant No.B08004
文摘With increasingly complex website structure and continuously advancing web technologies,accurate user clicks recognition from massive HTTP data,which is critical for web usage mining,becomes more difficult.In this paper,we propose a dependency graph model to describe the relationships between web requests.Based on this model,we design and implement a heuristic parallel algorithm to distinguish user clicks with the assistance of cloud computing technology.We evaluate the proposed algorithm with real massive data.The size of the dataset collected from a mobile core network is 228.7GB.It covers more than three million users.The experiment results demonstrate that the proposed algorithm can achieve higher accuracy than previous methods.
基金supported by National Natural Science Foundation of China(Nos.41571387,41201375 and 41501440)Tianjin Research Program of Application Foundation and Advanced Technology(No.14JCQNJC07900)+1 种基金Tianjin Science and Technology Planning Project(Nos.15ZCZDSF00390 and 14TXGCCX00015)Opening Fund of Tianjin Engineering Research Center of Geospatial Information Technology"Modeling and analysis of path graph in 3D indoor spatial environment"
文摘Due to limitations in geometric representation and semantic description, the current pedestrian route analysis models are inadequate. To express the geometry of geographic entities in a micro-spatial environment accurately, the concept of a grid is presented, and grid-based methods for modeling geospatial objects are described. The semantic constitution of a building environment and the methods for modeling rooms, corridors, and staircases with grid objects are described. Based on the topology relationship between grid objects, a grid-based graph for a building environment is presented, and the corresponding route algorithm for pedestrians is proposed. The main advantages of the graph model proposed in this paper are as follows: 1) consideration of both semantic and geometric information, 2) consideration of the need for accurate geometric representation of the micro-spatial environment and the efficiency of pedestrian route analysis, 3) applicability of the graph model to route analysis in both static and dynamic environments, and 4) ability of the multi-hierarchical route analysis to integrate the multiple levels of pedestrian decision characteristics, from the high to the low, to determine the optimal path.
文摘Enterprise applications utilize relational databases and structured business processes, requiring slow and expensive conversion of inputs and outputs, from business documents such as invoices, purchase orders, and receipts, into known templates and schemas before processing. We propose a new LLM Agent-based intelligent data extraction, transformation, and load (IntelligentETL) pipeline that not only ingests PDFs and detects inputs within it but also addresses the extraction of structured and unstructured data by developing tools that most efficiently and securely deal with respective data types. We study the efficiency of our proposed pipeline and compare it with enterprise solutions that also utilize LLMs. We establish the supremacy in timely and accurate data extraction and transformation capabilities of our approach for analyzing the data from varied sources based on nested and/or interlinked input constraints.
文摘大语言模型(large language model,LLM)技术热潮对数据质量的要求提升到了一个新的高度.在现实场景中,数据通常来源不同且高度相关.但由于数据隐私安全问题,跨域异质数据往往不允许集中共享,难以被LLM高效利用.鉴于此,提出了一种LLM和知识图谱(knowledge graph,KG)协同的跨域异质数据查询框架,在LLM+KG的范式下给出跨域异质数据查询的一个治理方案.为确保LLM能够适应多场景中的跨域异质数据,首先采用适配器对跨域异质数据进行融合,并构建相应的知识图谱.为提高查询效率,引入线性知识图,并提出同源知识图抽取算法HKGE来实现知识图谱的重构,可显著提高查询性能,确保跨域异质数据治理的高效性.进而,为保证多域数据查询的高可信度,提出可信候选子图匹配算法Trust HKGM,用于检验跨域同源数据的置信度计算和可信候选子图匹配,剔除低质量节点.最后,提出基于线性知识图提示的多域数据查询算法MKLGP,实现LLM+KG范式下的高效可信跨域查询.该方法在多个真实数据集上进行了广泛实验,验证了所提方法的有效性和高效性.