By modeling the spatiotemporal data of the power grid, it is possible to better understand its operational status, identify potential issues and risks, and take timely measures to adjust and optimize the system. Compa...By modeling the spatiotemporal data of the power grid, it is possible to better understand its operational status, identify potential issues and risks, and take timely measures to adjust and optimize the system. Compared to the bus-branch model, the node-breaker model provides higher granularity in describing grid components and can dynamically reflect changes in equipment status, thus improving the efficiency of grid dispatching and operation. This paper proposes a spatiotemporal data modeling method based on a graph database. It elaborates on constructing graph nodes, graph ontology models, and graph entity models from grid dispatch data, describing the construction of the spatiotemporal node-breaker graph model and the transformation to the bus-branch model. Subsequently, by integrating spatiotemporal data attributes into the pre-built static grid graph model, a spatiotemporal evolving graph of the power grid is constructed. Furthermore, the concept of the “Power Grid One Graph” and its requirements in modern power systems are elucidated. Leveraging the constructed spatiotemporal node-breaker graph model and graph computing technology, the paper explores the feasibility of grid situational awareness. Finally, typical applications in an operational provincial grid are showcased, and potential scenarios of the proposed spatiotemporal graph model are discussed.展开更多
With increasingly complex website structure and continuously advancing web technologies,accurate user clicks recognition from massive HTTP data,which is critical for web usage mining,becomes more difficult.In this pap...With increasingly complex website structure and continuously advancing web technologies,accurate user clicks recognition from massive HTTP data,which is critical for web usage mining,becomes more difficult.In this paper,we propose a dependency graph model to describe the relationships between web requests.Based on this model,we design and implement a heuristic parallel algorithm to distinguish user clicks with the assistance of cloud computing technology.We evaluate the proposed algorithm with real massive data.The size of the dataset collected from a mobile core network is 228.7GB.It covers more than three million users.The experiment results demonstrate that the proposed algorithm can achieve higher accuracy than previous methods.展开更多
Outlier detection has very important applied value in data mining literature. Different outlier detection algorithms based on distinct theories have different definitions and mining processes. The three-dimensional sp...Outlier detection has very important applied value in data mining literature. Different outlier detection algorithms based on distinct theories have different definitions and mining processes. The three-dimensional space graph for constructing applied algorithms and an improved GridOf algorithm were proposed in terms of analyzing the existing outlier detection algorithms from criterion and theory. Key words outlier - detection - three-dimensional space graph - data mining CLC number TP 311. 13 - TP 391 Foundation item: Supported by the National Natural Science Foundation of China (70371015)Biography: ZHANG Jing (1975-), female, Ph. D, lecturer, research direction: data mining and knowledge discovery.展开更多
This paper proposes a Graph regularized Lpsmooth non-negative matrix factorization(GSNMF) method by incorporating graph regularization and L_p smoothing constraint, which considers the intrinsic geometric information ...This paper proposes a Graph regularized Lpsmooth non-negative matrix factorization(GSNMF) method by incorporating graph regularization and L_p smoothing constraint, which considers the intrinsic geometric information of a data set and produces smooth and stable solutions. The main contributions are as follows: first, graph regularization is added into NMF to discover the hidden semantics and simultaneously respect the intrinsic geometric structure information of a data set. Second,the Lpsmoothing constraint is incorporated into NMF to combine the merits of isotropic(L_2-norm) and anisotropic(L_1-norm)diffusion smoothing, and produces a smooth and more accurate solution to the optimization problem. Finally, the update rules and proof of convergence of GSNMF are given. Experiments on several data sets show that the proposed method outperforms related state-of-the-art methods.展开更多
Due to limitations in geometric representation and semantic description, the current pedestrian route analysis models are inadequate. To express the geometry of geographic entities in a micro-spatial environment accur...Due to limitations in geometric representation and semantic description, the current pedestrian route analysis models are inadequate. To express the geometry of geographic entities in a micro-spatial environment accurately, the concept of a grid is presented, and grid-based methods for modeling geospatial objects are described. The semantic constitution of a building environment and the methods for modeling rooms, corridors, and staircases with grid objects are described. Based on the topology relationship between grid objects, a grid-based graph for a building environment is presented, and the corresponding route algorithm for pedestrians is proposed. The main advantages of the graph model proposed in this paper are as follows: 1) consideration of both semantic and geometric information, 2) consideration of the need for accurate geometric representation of the micro-spatial environment and the efficiency of pedestrian route analysis, 3) applicability of the graph model to route analysis in both static and dynamic environments, and 4) ability of the multi-hierarchical route analysis to integrate the multiple levels of pedestrian decision characteristics, from the high to the low, to determine the optimal path.展开更多
In this paper, a new approach for visualizing multivariate categorical data is presented. The approach uses a graph to represent multivariate categorical data and draws the graph in such a way that we can identify pat...In this paper, a new approach for visualizing multivariate categorical data is presented. The approach uses a graph to represent multivariate categorical data and draws the graph in such a way that we can identify patterns, trends and relationship within the data. A mathematical model for the graph layout problem is deduced and a spectral graph drawing algorithm for visualizing multivariate categorical data is proposed. The experiments show that the drawings by the algorithm well capture the structures of multivariate categorical data and the computing speed is fast.展开更多
Integrating marketing and distribution businesses is crucial for improving the coordination of equipment and the efficient management of multi-energy systems.New energy sources are continuously being connected to dist...Integrating marketing and distribution businesses is crucial for improving the coordination of equipment and the efficient management of multi-energy systems.New energy sources are continuously being connected to distribution grids;this,however,increases the complexity of the information structure of marketing and distribution businesses.The existing unified data model and the coordinated application of marketing and distribution suffer from various drawbacks.As a solution,this paper presents a data model of"one graph of marketing and distribution"and a framework for graph computing,by analyzing the current trends of business and data in the marketing and distribution fields and using graph data theory.Specifically,this work aims to determine the correlation between distribution transformers and marketing users,which is crucial for elucidating the connection between marketing and distribution.In this manner,a novel identification algorithm is proposed based on the collected data for marketing and distribution.Lastly,a forecasting application is developed based on the proposed algorithm to realize the coordinated prediction and consumption of distributed photovoltaic power generation and distribution loads.Furthermore,an operation and maintenance(O&M)knowledge graph reasoning application is developed to improve the intelligent O&M ability of marketing and distribution equipment.展开更多
The wide application of intelligent terminals in microgrids has fueled the surge of data amount in recent years.In real-world scenarios,microgrids must store large amounts of data efficiently while also being able to ...The wide application of intelligent terminals in microgrids has fueled the surge of data amount in recent years.In real-world scenarios,microgrids must store large amounts of data efficiently while also being able to withstand malicious cyberattacks.To meet the high hardware resource requirements,address the vulnerability to network attacks and poor reliability in the tradi-tional centralized data storage schemes,this paper proposes a secure storage management method for microgrid data that considers node trust and directed acyclic graph(DAG)consensus mechanism.Firstly,the microgrid data storage model is designed based on the edge computing technology.The blockchain,deployed on the edge computing server and combined with cloud storage,ensures reliable data storage in the microgrid.Secondly,a blockchain consen-sus algorithm based on directed acyclic graph data structure is then proposed to effectively improve the data storage timeliness and avoid disadvantages in traditional blockchain topology such as long chain construction time and low consensus efficiency.Finally,considering the tolerance differences among the candidate chain-building nodes to network attacks,a hash value update mechanism of blockchain header with node trust identification to ensure data storage security is proposed.Experimental results from the microgrid data storage platform show that the proposed method can achieve a private key update time of less than 5 milliseconds.When the number of blockchain nodes is less than 25,the blockchain construction takes no more than 80 mins,and the data throughput is close to 300 kbps.Compared with the traditional chain-topology-based consensus methods that do not consider node trust,the proposed method has higher efficiency in data storage and better resistance to network attacks.展开更多
Join operation is a critical problem when dealing with sliding window over data streams. There have been many optimization strategies for sliding window join in the literature, but a simple heuristic is always used fo...Join operation is a critical problem when dealing with sliding window over data streams. There have been many optimization strategies for sliding window join in the literature, but a simple heuristic is always used for selecting the join sequence of many sliding windows, which is ineffectively. The graph-based approach is proposed to process the problem. The sliding window join model is introduced primarily. In this model vertex represent join operator and edge indicated the join relationship among sliding windows. Vertex weight and edge weight represent the cost of join and the reciprocity of join operators respectively. Then good query plan with minimal cost can be found in the model. Thus a complete join algorithm combining setting up model, finding optimal query plan and executing query plan is shown. Experiments show that the graph-based approach is feasible and can work better in above environment.展开更多
Much data such as geometric image data and drawings have graph structures. Such data are called graph structured data. In order to manage efficiently such graph structured data, we need to analyze and abstract graph s...Much data such as geometric image data and drawings have graph structures. Such data are called graph structured data. In order to manage efficiently such graph structured data, we need to analyze and abstract graph structures of such data. The purpose of this paper is to find knowledge representations which indicate plural abstractions of graph structured data. Firstly, we introduce a term graph as a graph pattern having structural variables, and a substitution over term graphs which is graph rewriting system. Next, for a graph G, we define a multiple layer ( g,(θ 1,…,θ k )) of G as a pair of a term graph g and a list of k substitutions θ 1,…,θ k such that G can be obtained from g by applying substitutions θ 1,…,θ k to g. In the same way, for a set S of graphs, we also define a multiple layer for S as a pair ( D,Θ ) of a set D of term graphs and a list Θ of substitutions. Secondly, for a graph G and a set S of graphs, we present effective algorithms for extracting minimal multiple layers of G and S which give us stratifying abstractions of G and S, respectively. Finally, we report experimental results obtained by applying our algorithms to both artificial data and drawings of power plants which are real world data.展开更多
图结构数据在社交网络、交通系统、生物信息等场景中广泛存在。图神经网络(graph neural networks,GNNs)利用消息传递机制迭代地聚合邻居信息,在节点分类、链路预测和图分类等任务中展现出良好性能。然而,随着数据规模的持续扩大与应用...图结构数据在社交网络、交通系统、生物信息等场景中广泛存在。图神经网络(graph neural networks,GNNs)利用消息传递机制迭代地聚合邻居信息,在节点分类、链路预测和图分类等任务中展现出良好性能。然而,随着数据规模的持续扩大与应用场景的日趋复杂,GNNs面临表达能力有限与泛化能力不足等关键挑战。近年来,以大语言模型(large language models,LLMs)为代表的基础模型迅速发展,展现出卓越的泛化与推理能力,为图机器学习领域带来了新的启发。基于此,本研究提出图基础模型(graph foundation model,GFM)的概念,希望通过在大规模图数据上预训练,获得能够灵活适配多种下游任务的通用模型;同时系统梳理了近年来图基础模型的相关研究,并依据其对GNNs与LLMs的依赖程度,将现有方法归纳为3类,综述其研究进展并介绍了作者团队在相关方向的实践探索经验。最后,展望了图基础模型未来发展可能面临的关键挑战与前景,以期为图机器学习领域的持续创新提供参考。展开更多
Enterprise applications utilize relational databases and structured business processes, requiring slow and expensive conversion of inputs and outputs, from business documents such as invoices, purchase orders, and rec...Enterprise applications utilize relational databases and structured business processes, requiring slow and expensive conversion of inputs and outputs, from business documents such as invoices, purchase orders, and receipts, into known templates and schemas before processing. We propose a new LLM Agent-based intelligent data extraction, transformation, and load (IntelligentETL) pipeline that not only ingests PDFs and detects inputs within it but also addresses the extraction of structured and unstructured data by developing tools that most efficiently and securely deal with respective data types. We study the efficiency of our proposed pipeline and compare it with enterprise solutions that also utilize LLMs. We establish the supremacy in timely and accurate data extraction and transformation capabilities of our approach for analyzing the data from varied sources based on nested and/or interlinked input constraints.展开更多
随着大数据环境下数据安全风险复杂化,现有数据安全审计技术因碎片化特征利用及扩展能力不足,难以实现全生命周期风险覆盖,限制了风险检测效能.因此,提出一种基于风险要素的图嵌入数据安全审计方案(graph-embedded data security audit ...随着大数据环境下数据安全风险复杂化,现有数据安全审计技术因碎片化特征利用及扩展能力不足,难以实现全生命周期风险覆盖,限制了风险检测效能.因此,提出一种基于风险要素的图嵌入数据安全审计方案(graph-embedded data security audit scheme based on risk elements,RE-GDSA).首先构建含数据属性D(data)、用户特征U(user)、载体环境C(carrier)、操作行为A(action)的安全风险要素空间,实现数据全生命周期风险特征的结构化映射;然后利用图嵌入技术将风险要素映射为低维语义向量,构建跨维度关联模型以实现高效风险检测.通过有效性分析和性能分析验证了该方案的可行性.展开更多
基金supported by the Project of China Southern Power Grid Digital Grid Research Institute Co.,Ltd.(210002KK52222026)。
文摘By modeling the spatiotemporal data of the power grid, it is possible to better understand its operational status, identify potential issues and risks, and take timely measures to adjust and optimize the system. Compared to the bus-branch model, the node-breaker model provides higher granularity in describing grid components and can dynamically reflect changes in equipment status, thus improving the efficiency of grid dispatching and operation. This paper proposes a spatiotemporal data modeling method based on a graph database. It elaborates on constructing graph nodes, graph ontology models, and graph entity models from grid dispatch data, describing the construction of the spatiotemporal node-breaker graph model and the transformation to the bus-branch model. Subsequently, by integrating spatiotemporal data attributes into the pre-built static grid graph model, a spatiotemporal evolving graph of the power grid is constructed. Furthermore, the concept of the “Power Grid One Graph” and its requirements in modern power systems are elucidated. Leveraging the constructed spatiotemporal node-breaker graph model and graph computing technology, the paper explores the feasibility of grid situational awareness. Finally, typical applications in an operational provincial grid are showcased, and potential scenarios of the proposed spatiotemporal graph model are discussed.
基金supported in part by the Fundamental Research Funds for the Central Universities under Grant No.2013RC0114111 Project of China under Grant No.B08004
文摘With increasingly complex website structure and continuously advancing web technologies,accurate user clicks recognition from massive HTTP data,which is critical for web usage mining,becomes more difficult.In this paper,we propose a dependency graph model to describe the relationships between web requests.Based on this model,we design and implement a heuristic parallel algorithm to distinguish user clicks with the assistance of cloud computing technology.We evaluate the proposed algorithm with real massive data.The size of the dataset collected from a mobile core network is 228.7GB.It covers more than three million users.The experiment results demonstrate that the proposed algorithm can achieve higher accuracy than previous methods.
文摘Outlier detection has very important applied value in data mining literature. Different outlier detection algorithms based on distinct theories have different definitions and mining processes. The three-dimensional space graph for constructing applied algorithms and an improved GridOf algorithm were proposed in terms of analyzing the existing outlier detection algorithms from criterion and theory. Key words outlier - detection - three-dimensional space graph - data mining CLC number TP 311. 13 - TP 391 Foundation item: Supported by the National Natural Science Foundation of China (70371015)Biography: ZHANG Jing (1975-), female, Ph. D, lecturer, research direction: data mining and knowledge discovery.
基金supported by the National Natural Science Foundation of China(61702251,61363049,11571011)the State Scholarship Fund of China Scholarship Council(CSC)(201708360040)+3 种基金the Natural Science Foundation of Jiangxi Province(20161BAB212033)the Natural Science Basic Research Plan in Shaanxi Province of China(2018JM6030)the Doctor Scientific Research Starting Foundation of Northwest University(338050050)Youth Academic Talent Support Program of Northwest University
文摘This paper proposes a Graph regularized Lpsmooth non-negative matrix factorization(GSNMF) method by incorporating graph regularization and L_p smoothing constraint, which considers the intrinsic geometric information of a data set and produces smooth and stable solutions. The main contributions are as follows: first, graph regularization is added into NMF to discover the hidden semantics and simultaneously respect the intrinsic geometric structure information of a data set. Second,the Lpsmoothing constraint is incorporated into NMF to combine the merits of isotropic(L_2-norm) and anisotropic(L_1-norm)diffusion smoothing, and produces a smooth and more accurate solution to the optimization problem. Finally, the update rules and proof of convergence of GSNMF are given. Experiments on several data sets show that the proposed method outperforms related state-of-the-art methods.
基金supported by National Natural Science Foundation of China(Nos.41571387,41201375 and 41501440)Tianjin Research Program of Application Foundation and Advanced Technology(No.14JCQNJC07900)+1 种基金Tianjin Science and Technology Planning Project(Nos.15ZCZDSF00390 and 14TXGCCX00015)Opening Fund of Tianjin Engineering Research Center of Geospatial Information Technology"Modeling and analysis of path graph in 3D indoor spatial environment"
文摘Due to limitations in geometric representation and semantic description, the current pedestrian route analysis models are inadequate. To express the geometry of geographic entities in a micro-spatial environment accurately, the concept of a grid is presented, and grid-based methods for modeling geospatial objects are described. The semantic constitution of a building environment and the methods for modeling rooms, corridors, and staircases with grid objects are described. Based on the topology relationship between grid objects, a grid-based graph for a building environment is presented, and the corresponding route algorithm for pedestrians is proposed. The main advantages of the graph model proposed in this paper are as follows: 1) consideration of both semantic and geometric information, 2) consideration of the need for accurate geometric representation of the micro-spatial environment and the efficiency of pedestrian route analysis, 3) applicability of the graph model to route analysis in both static and dynamic environments, and 4) ability of the multi-hierarchical route analysis to integrate the multiple levels of pedestrian decision characteristics, from the high to the low, to determine the optimal path.
基金Supported by the National Natural Science Foundation of China (601133010)
文摘In this paper, a new approach for visualizing multivariate categorical data is presented. The approach uses a graph to represent multivariate categorical data and draws the graph in such a way that we can identify patterns, trends and relationship within the data. A mathematical model for the graph layout problem is deduced and a spectral graph drawing algorithm for visualizing multivariate categorical data is proposed. The experiments show that the drawings by the algorithm well capture the structures of multivariate categorical data and the computing speed is fast.
基金This work was supported by the National Key R&D Program of China(2020YFB0905900).
文摘Integrating marketing and distribution businesses is crucial for improving the coordination of equipment and the efficient management of multi-energy systems.New energy sources are continuously being connected to distribution grids;this,however,increases the complexity of the information structure of marketing and distribution businesses.The existing unified data model and the coordinated application of marketing and distribution suffer from various drawbacks.As a solution,this paper presents a data model of"one graph of marketing and distribution"and a framework for graph computing,by analyzing the current trends of business and data in the marketing and distribution fields and using graph data theory.Specifically,this work aims to determine the correlation between distribution transformers and marketing users,which is crucial for elucidating the connection between marketing and distribution.In this manner,a novel identification algorithm is proposed based on the collected data for marketing and distribution.Lastly,a forecasting application is developed based on the proposed algorithm to realize the coordinated prediction and consumption of distributed photovoltaic power generation and distribution loads.Furthermore,an operation and maintenance(O&M)knowledge graph reasoning application is developed to improve the intelligent O&M ability of marketing and distribution equipment.
文摘The wide application of intelligent terminals in microgrids has fueled the surge of data amount in recent years.In real-world scenarios,microgrids must store large amounts of data efficiently while also being able to withstand malicious cyberattacks.To meet the high hardware resource requirements,address the vulnerability to network attacks and poor reliability in the tradi-tional centralized data storage schemes,this paper proposes a secure storage management method for microgrid data that considers node trust and directed acyclic graph(DAG)consensus mechanism.Firstly,the microgrid data storage model is designed based on the edge computing technology.The blockchain,deployed on the edge computing server and combined with cloud storage,ensures reliable data storage in the microgrid.Secondly,a blockchain consen-sus algorithm based on directed acyclic graph data structure is then proposed to effectively improve the data storage timeliness and avoid disadvantages in traditional blockchain topology such as long chain construction time and low consensus efficiency.Finally,considering the tolerance differences among the candidate chain-building nodes to network attacks,a hash value update mechanism of blockchain header with node trust identification to ensure data storage security is proposed.Experimental results from the microgrid data storage platform show that the proposed method can achieve a private key update time of less than 5 milliseconds.When the number of blockchain nodes is less than 25,the blockchain construction takes no more than 80 mins,and the data throughput is close to 300 kbps.Compared with the traditional chain-topology-based consensus methods that do not consider node trust,the proposed method has higher efficiency in data storage and better resistance to network attacks.
文摘Join operation is a critical problem when dealing with sliding window over data streams. There have been many optimization strategies for sliding window join in the literature, but a simple heuristic is always used for selecting the join sequence of many sliding windows, which is ineffectively. The graph-based approach is proposed to process the problem. The sliding window join model is introduced primarily. In this model vertex represent join operator and edge indicated the join relationship among sliding windows. Vertex weight and edge weight represent the cost of join and the reciprocity of join operators respectively. Then good query plan with minimal cost can be found in the model. Thus a complete join algorithm combining setting up model, finding optimal query plan and executing query plan is shown. Experiments show that the graph-based approach is feasible and can work better in above environment.
文摘Much data such as geometric image data and drawings have graph structures. Such data are called graph structured data. In order to manage efficiently such graph structured data, we need to analyze and abstract graph structures of such data. The purpose of this paper is to find knowledge representations which indicate plural abstractions of graph structured data. Firstly, we introduce a term graph as a graph pattern having structural variables, and a substitution over term graphs which is graph rewriting system. Next, for a graph G, we define a multiple layer ( g,(θ 1,…,θ k )) of G as a pair of a term graph g and a list of k substitutions θ 1,…,θ k such that G can be obtained from g by applying substitutions θ 1,…,θ k to g. In the same way, for a set S of graphs, we also define a multiple layer for S as a pair ( D,Θ ) of a set D of term graphs and a list Θ of substitutions. Secondly, for a graph G and a set S of graphs, we present effective algorithms for extracting minimal multiple layers of G and S which give us stratifying abstractions of G and S, respectively. Finally, we report experimental results obtained by applying our algorithms to both artificial data and drawings of power plants which are real world data.
文摘图结构数据在社交网络、交通系统、生物信息等场景中广泛存在。图神经网络(graph neural networks,GNNs)利用消息传递机制迭代地聚合邻居信息,在节点分类、链路预测和图分类等任务中展现出良好性能。然而,随着数据规模的持续扩大与应用场景的日趋复杂,GNNs面临表达能力有限与泛化能力不足等关键挑战。近年来,以大语言模型(large language models,LLMs)为代表的基础模型迅速发展,展现出卓越的泛化与推理能力,为图机器学习领域带来了新的启发。基于此,本研究提出图基础模型(graph foundation model,GFM)的概念,希望通过在大规模图数据上预训练,获得能够灵活适配多种下游任务的通用模型;同时系统梳理了近年来图基础模型的相关研究,并依据其对GNNs与LLMs的依赖程度,将现有方法归纳为3类,综述其研究进展并介绍了作者团队在相关方向的实践探索经验。最后,展望了图基础模型未来发展可能面临的关键挑战与前景,以期为图机器学习领域的持续创新提供参考。
文摘Enterprise applications utilize relational databases and structured business processes, requiring slow and expensive conversion of inputs and outputs, from business documents such as invoices, purchase orders, and receipts, into known templates and schemas before processing. We propose a new LLM Agent-based intelligent data extraction, transformation, and load (IntelligentETL) pipeline that not only ingests PDFs and detects inputs within it but also addresses the extraction of structured and unstructured data by developing tools that most efficiently and securely deal with respective data types. We study the efficiency of our proposed pipeline and compare it with enterprise solutions that also utilize LLMs. We establish the supremacy in timely and accurate data extraction and transformation capabilities of our approach for analyzing the data from varied sources based on nested and/or interlinked input constraints.
文摘随着大数据环境下数据安全风险复杂化,现有数据安全审计技术因碎片化特征利用及扩展能力不足,难以实现全生命周期风险覆盖,限制了风险检测效能.因此,提出一种基于风险要素的图嵌入数据安全审计方案(graph-embedded data security audit scheme based on risk elements,RE-GDSA).首先构建含数据属性D(data)、用户特征U(user)、载体环境C(carrier)、操作行为A(action)的安全风险要素空间,实现数据全生命周期风险特征的结构化映射;然后利用图嵌入技术将风险要素映射为低维语义向量,构建跨维度关联模型以实现高效风险检测.通过有效性分析和性能分析验证了该方案的可行性.