Recommendation Information Systems(RIS)are pivotal in helping users in swiftly locating desired content from the vast amount of information available on the Internet.Graph Convolution Network(GCN)algorithms have been ...Recommendation Information Systems(RIS)are pivotal in helping users in swiftly locating desired content from the vast amount of information available on the Internet.Graph Convolution Network(GCN)algorithms have been employed to implement the RIS efficiently.However,the GCN algorithm faces limitations in terms of performance enhancement owing to the due to the embedding value-vanishing problem that occurs during the learning process.To address this issue,we propose a Weighted Forwarding method using the GCN(WF-GCN)algorithm.The proposed method involves multiplying the embedding results with different weights for each hop layer during graph learning.By applying the WF-GCN algorithm,which adjusts weights for each hop layer before forwarding to the next,nodes with many neighbors achieve higher embedding values.This approach facilitates the learning of more hop layers within the GCN framework.The efficacy of the WF-GCN was demonstrated through its application to various datasets.In the MovieLens dataset,the implementation of WF-GCN in LightGCN resulted in significant performance improvements,with recall and NDCG increasing by up to+163.64%and+132.04%,respectively.Similarly,in the Last.FM dataset,LightGCN using WF-GCN enhanced with WF-GCN showed substantial improvements,with the recall and NDCG metrics rising by up to+174.40%and+169.95%,respectively.Furthermore,the application of WF-GCN to Self-supervised Graph Learning(SGL)and Simple Graph Contrastive Learning(SimGCL)also demonstrated notable enhancements in both recall and NDCG across these datasets.展开更多
In this paper, we will explain the relevance of the starant graphs, graphs created by us in the year of 2002. They were basically circulant graphs with a star graph that connects to all the vertices of the circulant g...In this paper, we will explain the relevance of the starant graphs, graphs created by us in the year of 2002. They were basically circulant graphs with a star graph that connects to all the vertices of the circulant graphs from inside of them, but they did not exist as a separate object of study in the year of 2002, as for all we knew. We now know that they can be used to model even social networking interactions, and they do that job better than any other graph we could be trying to use there. With the development of our mathematical tools, lots of conclusions will be made much more believable and therefore will become much more likely to get support from the relevant industries when attached to new queries.展开更多
As one of the main characteristics of atmospheric pollutants,PM_(2.5) severely affects human health and has received widespread attention in recent years.How to predict the variations of PM_(2.5) concentrations with h...As one of the main characteristics of atmospheric pollutants,PM_(2.5) severely affects human health and has received widespread attention in recent years.How to predict the variations of PM_(2.5) concentrations with high accuracy is an important topic.The PM_(2.5) monitoring stations in Xinjiang Uygur Autonomous Region,China,are unevenly distributed,which makes it challenging to conduct comprehensive analyses and predictions.Therefore,this study primarily addresses the limitations mentioned above and the poor generalization ability of PM_(2.5) concentration prediction models across different monitoring stations.We chose the northern slope of the Tianshan Mountains as the study area and took the January−December in 2019 as the research period.On the basis of data from 21 PM_(2.5) monitoring stations as well as meteorological data(temperature,instantaneous wind speed,and pressure),we developed an improved model,namely GCN−TCN−AR(where GCN is the graph convolution network,TCN is the temporal convolutional network,and AR is the autoregression),for predicting PM_(2.5) concentrations on the northern slope of the Tianshan Mountains.The GCN−TCN−AR model is composed of an improved GCN model,a TCN model,and an AR model.The results revealed that the R2 values predicted by the GCN−TCN−AR model at the four monitoring stations(Urumqi,Wujiaqu,Shihezi,and Changji)were 0.93,0.91,0.93,and 0.92,respectively,and the RMSE(root mean square error)values were 6.85,7.52,7.01,and 7.28μg/m^(3),respectively.The performance of the GCN−TCN−AR model was also compared with the currently neural network models,including the GCN−TCN,GCN,TCN,Support Vector Regression(SVR),and AR.The GCN−TCN−AR outperformed the other current neural network models,with high prediction accuracy and good stability,making it especially suitable for the predictions of PM_(2.5)concentrations.This study revealed the significant spatiotemporal variations of PM_(2.5)concentrations.First,the PM_(2.5) concentrations exhibited clear seasonal fluctuations,with higher levels typically observed in winter and differences presented between months.Second,the spatial distribution analysis revealed that cities such as Urumqi and Wujiaqu have high PM_(2.5) concentrations,with a noticeable geographical clustering of pollutions.Understanding the variations in PM_(2.5) concentrations is highly important for the sustainable development of ecological environment in arid areas.展开更多
为确立船舶营运过程中的风险涌现特征,需要考虑复杂系统组成因子的不确定结构问题。以复杂性系统为视角,提出了一种复杂网络不确定结构的风险功能共振分析模型。首先,利用Apriori算法对船舶系统组分进行风险分析,计算组成因子间的非线...为确立船舶营运过程中的风险涌现特征,需要考虑复杂系统组成因子的不确定结构问题。以复杂性系统为视角,提出了一种复杂网络不确定结构的风险功能共振分析模型。首先,利用Apriori算法对船舶系统组分进行风险分析,计算组成因子间的非线性交互效用,生成交互强度矩阵,从而确立船舶营运安全风险的功能共振分析模型(Functional Resonance Analysis Model,FRAM)。随后,采用图卷积网络(Graph Convolutional Network,GCN)构建系统组分网络,识别关键节点,并对因子交互关系网络结构进行重塑。最后,引入深度优先搜索(Depth First Search,DFS)算法,识别关键风险路径,计算出船舶系统组分因子的影响度。结合港口国监督(Port State Control,PSC)缺陷数据,运用前述模型对船舶营运风险进行仿真应用。应用结果表明,船舶的不安全状态受到内外部组成因子的属性影响,并存在关键共振路径关系,其中消防系统、船舶结构状态等是影响船舶不安全状态的核心节点。构建的风险功能共振分析模型能够基于不同的数据输入,自适应生成相应的风险路径依赖。基于复杂网络结构的风险功能共振模型有助于分析不确定结构复杂系统的风险涌现。展开更多
With the continuous development of artificial intelligence and natural language processing technologies, traditional retrieval-augmented generation (RAG) techniques face numerous challenges in document answer precisio...With the continuous development of artificial intelligence and natural language processing technologies, traditional retrieval-augmented generation (RAG) techniques face numerous challenges in document answer precision and similarity measurement. This study, set against the backdrop of the shipping industry, combines top-down and bottom-up schema design strategies to achieve precise and flexible knowledge representation. The research adopts a semi-structured approach, innovatively constructing an adaptive schema generation mechanism based on reinforcement learning, which models the knowledge graph construction process as a Markov decision process. This method begins with general concepts, defining foundational industry concepts, and then delves into abstracting core concepts specific to the maritime domain through an adaptive pattern generation mechanism that dynamically adjusts the knowledge structure. Specifically, the study designs a four-layer knowledge construction framework, including the data layer, modeling layer, technology layer, and application layer. It draws on a mutual indexing strategy, integrating large language models and traditional information extraction techniques. By leveraging self-attention mechanisms and graph attention networks, it efficiently extracts semantic relationships. The introduction of logic-form-driven solvers and symbolic decomposition techniques for reasoning significantly enhances the model’s ability to understand complex semantic relationships. Additionally, the use of open information extraction and knowledge alignment techniques further improves the efficiency and accuracy of information retrieval. Experimental results demonstrate that the proposed method not only achieves significant performance improvements in knowledge graph retrieval within the shipping domain but also holds important theoretical innovation and practical application value.展开更多
现有基于预训练语言模型(PLM)的作文自动评分(AES)方法偏向于直接使用从PLM提取的全局语义特征表示作文的质量,却忽略了作文质量与更细粒度特征关联关系的问题。聚焦于中文AES研究,从多种文本角度分析和评估作文质量,提出利用图神经网络...现有基于预训练语言模型(PLM)的作文自动评分(AES)方法偏向于直接使用从PLM提取的全局语义特征表示作文的质量,却忽略了作文质量与更细粒度特征关联关系的问题。聚焦于中文AES研究,从多种文本角度分析和评估作文质量,提出利用图神经网络(GNN)对作文的多尺度特征进行联合学习的中文AES方法。首先,利用GNN分别获取作文在句子级别和段落级别的篇章特征;然后,将这些篇章特征与作文的全局语义特征进行联合特征学习,实现对作文更精准的评分;最后,构建一个中文AES数据集,为中文AES研究提供数据基础。在所构建的数据集上的实验结果表明,所提方法在6个作文主题上的平均二次加权Kappa(QWK)系数相较于R2-BERT(Bidirectional Encoder Representations from Transformers model with Regression and Ranking)提升了1.1个百分点,验证了在AES任务中进行多尺度特征联合学习的有效性。同时,消融实验结果进一步表明了不同尺度的作文特征对评分效果的贡献。为了证明小模型在特定任务场景下的优越性,与当前流行的通用大语言模型GPT-3.5-turbo和DeepSeek-V3进行了对比。结果表明,使用所提方法的BERT(Bidirectional Encoder Representations from Transformers)模型在6个作文主题上的平均QWK比GPT-3.5-turbo和DeepSeek-V3分别高出了65.8和45.3个百分点,验证了大语言模型(LLMs)在面向领域的篇章级作文评分任务中,因缺乏大规模有监督微调数据而表现不佳的观点。展开更多
人工智能技术在教育领域的深度应用,已成为国家教育数字化转型的核心战略。在计算机实践教学领域,实践学习资料的精准推荐是提升学生学习效能与质量的重要途径。针对高校教育规模化与学生需求多元化之间的矛盾,提出一种基于轻量级教育...人工智能技术在教育领域的深度应用,已成为国家教育数字化转型的核心战略。在计算机实践教学领域,实践学习资料的精准推荐是提升学生学习效能与质量的重要途径。针对高校教育规模化与学生需求多元化之间的矛盾,提出一种基于轻量级教育大模型的个性化实践学习资料推荐模型LightPLRec(Lightweight Personalized Learning Recommender for Dynamic Practice Materials),旨在依据学生个体特征的动态变化智能推荐个性化的实践学习资料。基于低算力需求的轻量级大模型,通过指令微调和强化学习方法构建了面向个性化实践学习资料推荐的教育大模型SPIR(Student Profile&Interest-based Re-commender)。通过整合多源异构数据,深度融入课程知识体系、学科前沿动态、产业发展趋势、国家战略导向,构建了跨学科、多模态的实践学习资料库,并设计了图转主题文本方法gragh2topic。依托于SPIR大模型的强大赋能和多源资料库的坚实支撑,提出了基于智能工作流的资料推荐方法。设计主题分析方法从学生能力评估结果中提取学生的能力特征,应用图卷积网络算法GCN从学生学习行为数据中挖掘学生的兴趣特征,创建了“能力-推荐智能体”和“兴趣-推荐智能体”,构建了双智能体协同驱动的智能化流程体系,实现了从学生个性化画像智能生成到实践学习资料动态推荐的系列工作流任务;并且构建了个性化资料推荐数据集,在该数据集上验证了所提模型的性能显著优于基线模型。其中,以Qwen2.5-3.0B为基模型训练的LightPLRec模型,在能力推荐与兴趣推荐这两项任务中展现出卓越性能,准确率分别高达0.947和0.939,其表现均优于DeepSeek-V3在同一数据集上的测评结果。该研究为教育大模型的垂直场景应用提供了技术范式,同时通过创建个性化实践学习资料动态推荐模型,为践行“因材施教”理念和培育高素质计算机实践人才提供了创新路径。展开更多
图结构数据在社交网络、交通系统、生物信息等场景中广泛存在。图神经网络(graph neural networks,GNNs)利用消息传递机制迭代地聚合邻居信息,在节点分类、链路预测和图分类等任务中展现出良好性能。然而,随着数据规模的持续扩大与应用...图结构数据在社交网络、交通系统、生物信息等场景中广泛存在。图神经网络(graph neural networks,GNNs)利用消息传递机制迭代地聚合邻居信息,在节点分类、链路预测和图分类等任务中展现出良好性能。然而,随着数据规模的持续扩大与应用场景的日趋复杂,GNNs面临表达能力有限与泛化能力不足等关键挑战。近年来,以大语言模型(large language models,LLMs)为代表的基础模型迅速发展,展现出卓越的泛化与推理能力,为图机器学习领域带来了新的启发。基于此,本研究提出图基础模型(graph foundation model,GFM)的概念,希望通过在大规模图数据上预训练,获得能够灵活适配多种下游任务的通用模型;同时系统梳理了近年来图基础模型的相关研究,并依据其对GNNs与LLMs的依赖程度,将现有方法归纳为3类,综述其研究进展并介绍了作者团队在相关方向的实践探索经验。最后,展望了图基础模型未来发展可能面临的关键挑战与前景,以期为图机器学习领域的持续创新提供参考。展开更多
基金This work was supported by the Kyonggi University Research Grant 2022.
文摘Recommendation Information Systems(RIS)are pivotal in helping users in swiftly locating desired content from the vast amount of information available on the Internet.Graph Convolution Network(GCN)algorithms have been employed to implement the RIS efficiently.However,the GCN algorithm faces limitations in terms of performance enhancement owing to the due to the embedding value-vanishing problem that occurs during the learning process.To address this issue,we propose a Weighted Forwarding method using the GCN(WF-GCN)algorithm.The proposed method involves multiplying the embedding results with different weights for each hop layer during graph learning.By applying the WF-GCN algorithm,which adjusts weights for each hop layer before forwarding to the next,nodes with many neighbors achieve higher embedding values.This approach facilitates the learning of more hop layers within the GCN framework.The efficacy of the WF-GCN was demonstrated through its application to various datasets.In the MovieLens dataset,the implementation of WF-GCN in LightGCN resulted in significant performance improvements,with recall and NDCG increasing by up to+163.64%and+132.04%,respectively.Similarly,in the Last.FM dataset,LightGCN using WF-GCN enhanced with WF-GCN showed substantial improvements,with the recall and NDCG metrics rising by up to+174.40%and+169.95%,respectively.Furthermore,the application of WF-GCN to Self-supervised Graph Learning(SGL)and Simple Graph Contrastive Learning(SimGCL)also demonstrated notable enhancements in both recall and NDCG across these datasets.
文摘In this paper, we will explain the relevance of the starant graphs, graphs created by us in the year of 2002. They were basically circulant graphs with a star graph that connects to all the vertices of the circulant graphs from inside of them, but they did not exist as a separate object of study in the year of 2002, as for all we knew. We now know that they can be used to model even social networking interactions, and they do that job better than any other graph we could be trying to use there. With the development of our mathematical tools, lots of conclusions will be made much more believable and therefore will become much more likely to get support from the relevant industries when attached to new queries.
基金supported by the Program of Support Xinjiang by Technology(2024E02028,B2-2024-0359)Xinjiang Tianchi Talent Program of 2024,the Foundation of Chinese Academy of Sciences(B2-2023-0239)the Youth Foundation of Shandong Natural Science(ZR2023QD070).
文摘As one of the main characteristics of atmospheric pollutants,PM_(2.5) severely affects human health and has received widespread attention in recent years.How to predict the variations of PM_(2.5) concentrations with high accuracy is an important topic.The PM_(2.5) monitoring stations in Xinjiang Uygur Autonomous Region,China,are unevenly distributed,which makes it challenging to conduct comprehensive analyses and predictions.Therefore,this study primarily addresses the limitations mentioned above and the poor generalization ability of PM_(2.5) concentration prediction models across different monitoring stations.We chose the northern slope of the Tianshan Mountains as the study area and took the January−December in 2019 as the research period.On the basis of data from 21 PM_(2.5) monitoring stations as well as meteorological data(temperature,instantaneous wind speed,and pressure),we developed an improved model,namely GCN−TCN−AR(where GCN is the graph convolution network,TCN is the temporal convolutional network,and AR is the autoregression),for predicting PM_(2.5) concentrations on the northern slope of the Tianshan Mountains.The GCN−TCN−AR model is composed of an improved GCN model,a TCN model,and an AR model.The results revealed that the R2 values predicted by the GCN−TCN−AR model at the four monitoring stations(Urumqi,Wujiaqu,Shihezi,and Changji)were 0.93,0.91,0.93,and 0.92,respectively,and the RMSE(root mean square error)values were 6.85,7.52,7.01,and 7.28μg/m^(3),respectively.The performance of the GCN−TCN−AR model was also compared with the currently neural network models,including the GCN−TCN,GCN,TCN,Support Vector Regression(SVR),and AR.The GCN−TCN−AR outperformed the other current neural network models,with high prediction accuracy and good stability,making it especially suitable for the predictions of PM_(2.5)concentrations.This study revealed the significant spatiotemporal variations of PM_(2.5)concentrations.First,the PM_(2.5) concentrations exhibited clear seasonal fluctuations,with higher levels typically observed in winter and differences presented between months.Second,the spatial distribution analysis revealed that cities such as Urumqi and Wujiaqu have high PM_(2.5) concentrations,with a noticeable geographical clustering of pollutions.Understanding the variations in PM_(2.5) concentrations is highly important for the sustainable development of ecological environment in arid areas.
文摘为确立船舶营运过程中的风险涌现特征,需要考虑复杂系统组成因子的不确定结构问题。以复杂性系统为视角,提出了一种复杂网络不确定结构的风险功能共振分析模型。首先,利用Apriori算法对船舶系统组分进行风险分析,计算组成因子间的非线性交互效用,生成交互强度矩阵,从而确立船舶营运安全风险的功能共振分析模型(Functional Resonance Analysis Model,FRAM)。随后,采用图卷积网络(Graph Convolutional Network,GCN)构建系统组分网络,识别关键节点,并对因子交互关系网络结构进行重塑。最后,引入深度优先搜索(Depth First Search,DFS)算法,识别关键风险路径,计算出船舶系统组分因子的影响度。结合港口国监督(Port State Control,PSC)缺陷数据,运用前述模型对船舶营运风险进行仿真应用。应用结果表明,船舶的不安全状态受到内外部组成因子的属性影响,并存在关键共振路径关系,其中消防系统、船舶结构状态等是影响船舶不安全状态的核心节点。构建的风险功能共振分析模型能够基于不同的数据输入,自适应生成相应的风险路径依赖。基于复杂网络结构的风险功能共振模型有助于分析不确定结构复杂系统的风险涌现。
文摘With the continuous development of artificial intelligence and natural language processing technologies, traditional retrieval-augmented generation (RAG) techniques face numerous challenges in document answer precision and similarity measurement. This study, set against the backdrop of the shipping industry, combines top-down and bottom-up schema design strategies to achieve precise and flexible knowledge representation. The research adopts a semi-structured approach, innovatively constructing an adaptive schema generation mechanism based on reinforcement learning, which models the knowledge graph construction process as a Markov decision process. This method begins with general concepts, defining foundational industry concepts, and then delves into abstracting core concepts specific to the maritime domain through an adaptive pattern generation mechanism that dynamically adjusts the knowledge structure. Specifically, the study designs a four-layer knowledge construction framework, including the data layer, modeling layer, technology layer, and application layer. It draws on a mutual indexing strategy, integrating large language models and traditional information extraction techniques. By leveraging self-attention mechanisms and graph attention networks, it efficiently extracts semantic relationships. The introduction of logic-form-driven solvers and symbolic decomposition techniques for reasoning significantly enhances the model’s ability to understand complex semantic relationships. Additionally, the use of open information extraction and knowledge alignment techniques further improves the efficiency and accuracy of information retrieval. Experimental results demonstrate that the proposed method not only achieves significant performance improvements in knowledge graph retrieval within the shipping domain but also holds important theoretical innovation and practical application value.
文摘现有基于预训练语言模型(PLM)的作文自动评分(AES)方法偏向于直接使用从PLM提取的全局语义特征表示作文的质量,却忽略了作文质量与更细粒度特征关联关系的问题。聚焦于中文AES研究,从多种文本角度分析和评估作文质量,提出利用图神经网络(GNN)对作文的多尺度特征进行联合学习的中文AES方法。首先,利用GNN分别获取作文在句子级别和段落级别的篇章特征;然后,将这些篇章特征与作文的全局语义特征进行联合特征学习,实现对作文更精准的评分;最后,构建一个中文AES数据集,为中文AES研究提供数据基础。在所构建的数据集上的实验结果表明,所提方法在6个作文主题上的平均二次加权Kappa(QWK)系数相较于R2-BERT(Bidirectional Encoder Representations from Transformers model with Regression and Ranking)提升了1.1个百分点,验证了在AES任务中进行多尺度特征联合学习的有效性。同时,消融实验结果进一步表明了不同尺度的作文特征对评分效果的贡献。为了证明小模型在特定任务场景下的优越性,与当前流行的通用大语言模型GPT-3.5-turbo和DeepSeek-V3进行了对比。结果表明,使用所提方法的BERT(Bidirectional Encoder Representations from Transformers)模型在6个作文主题上的平均QWK比GPT-3.5-turbo和DeepSeek-V3分别高出了65.8和45.3个百分点,验证了大语言模型(LLMs)在面向领域的篇章级作文评分任务中,因缺乏大规模有监督微调数据而表现不佳的观点。
文摘人工智能技术在教育领域的深度应用,已成为国家教育数字化转型的核心战略。在计算机实践教学领域,实践学习资料的精准推荐是提升学生学习效能与质量的重要途径。针对高校教育规模化与学生需求多元化之间的矛盾,提出一种基于轻量级教育大模型的个性化实践学习资料推荐模型LightPLRec(Lightweight Personalized Learning Recommender for Dynamic Practice Materials),旨在依据学生个体特征的动态变化智能推荐个性化的实践学习资料。基于低算力需求的轻量级大模型,通过指令微调和强化学习方法构建了面向个性化实践学习资料推荐的教育大模型SPIR(Student Profile&Interest-based Re-commender)。通过整合多源异构数据,深度融入课程知识体系、学科前沿动态、产业发展趋势、国家战略导向,构建了跨学科、多模态的实践学习资料库,并设计了图转主题文本方法gragh2topic。依托于SPIR大模型的强大赋能和多源资料库的坚实支撑,提出了基于智能工作流的资料推荐方法。设计主题分析方法从学生能力评估结果中提取学生的能力特征,应用图卷积网络算法GCN从学生学习行为数据中挖掘学生的兴趣特征,创建了“能力-推荐智能体”和“兴趣-推荐智能体”,构建了双智能体协同驱动的智能化流程体系,实现了从学生个性化画像智能生成到实践学习资料动态推荐的系列工作流任务;并且构建了个性化资料推荐数据集,在该数据集上验证了所提模型的性能显著优于基线模型。其中,以Qwen2.5-3.0B为基模型训练的LightPLRec模型,在能力推荐与兴趣推荐这两项任务中展现出卓越性能,准确率分别高达0.947和0.939,其表现均优于DeepSeek-V3在同一数据集上的测评结果。该研究为教育大模型的垂直场景应用提供了技术范式,同时通过创建个性化实践学习资料动态推荐模型,为践行“因材施教”理念和培育高素质计算机实践人才提供了创新路径。
文摘图结构数据在社交网络、交通系统、生物信息等场景中广泛存在。图神经网络(graph neural networks,GNNs)利用消息传递机制迭代地聚合邻居信息,在节点分类、链路预测和图分类等任务中展现出良好性能。然而,随着数据规模的持续扩大与应用场景的日趋复杂,GNNs面临表达能力有限与泛化能力不足等关键挑战。近年来,以大语言模型(large language models,LLMs)为代表的基础模型迅速发展,展现出卓越的泛化与推理能力,为图机器学习领域带来了新的启发。基于此,本研究提出图基础模型(graph foundation model,GFM)的概念,希望通过在大规模图数据上预训练,获得能够灵活适配多种下游任务的通用模型;同时系统梳理了近年来图基础模型的相关研究,并依据其对GNNs与LLMs的依赖程度,将现有方法归纳为3类,综述其研究进展并介绍了作者团队在相关方向的实践探索经验。最后,展望了图基础模型未来发展可能面临的关键挑战与前景,以期为图机器学习领域的持续创新提供参考。