As one of the main characteristics of atmospheric pollutants,PM_(2.5) severely affects human health and has received widespread attention in recent years.How to predict the variations of PM_(2.5) concentrations with h...As one of the main characteristics of atmospheric pollutants,PM_(2.5) severely affects human health and has received widespread attention in recent years.How to predict the variations of PM_(2.5) concentrations with high accuracy is an important topic.The PM_(2.5) monitoring stations in Xinjiang Uygur Autonomous Region,China,are unevenly distributed,which makes it challenging to conduct comprehensive analyses and predictions.Therefore,this study primarily addresses the limitations mentioned above and the poor generalization ability of PM_(2.5) concentration prediction models across different monitoring stations.We chose the northern slope of the Tianshan Mountains as the study area and took the January−December in 2019 as the research period.On the basis of data from 21 PM_(2.5) monitoring stations as well as meteorological data(temperature,instantaneous wind speed,and pressure),we developed an improved model,namely GCN−TCN−AR(where GCN is the graph convolution network,TCN is the temporal convolutional network,and AR is the autoregression),for predicting PM_(2.5) concentrations on the northern slope of the Tianshan Mountains.The GCN−TCN−AR model is composed of an improved GCN model,a TCN model,and an AR model.The results revealed that the R2 values predicted by the GCN−TCN−AR model at the four monitoring stations(Urumqi,Wujiaqu,Shihezi,and Changji)were 0.93,0.91,0.93,and 0.92,respectively,and the RMSE(root mean square error)values were 6.85,7.52,7.01,and 7.28μg/m^(3),respectively.The performance of the GCN−TCN−AR model was also compared with the currently neural network models,including the GCN−TCN,GCN,TCN,Support Vector Regression(SVR),and AR.The GCN−TCN−AR outperformed the other current neural network models,with high prediction accuracy and good stability,making it especially suitable for the predictions of PM_(2.5)concentrations.This study revealed the significant spatiotemporal variations of PM_(2.5)concentrations.First,the PM_(2.5) concentrations exhibited clear seasonal fluctuations,with higher levels typically observed in winter and differences presented between months.Second,the spatial distribution analysis revealed that cities such as Urumqi and Wujiaqu have high PM_(2.5) concentrations,with a noticeable geographical clustering of pollutions.Understanding the variations in PM_(2.5) concentrations is highly important for the sustainable development of ecological environment in arid areas.展开更多
引入基于风向相似度自适应的GCN-LSTM模型进行昆山市PM_(2.5)预测,并与GCN模型和LSTM模型预测结果进行比较。结果显示,风向相似度自适应GCN-LSTM模型对昆山市PM_(2.5)浓度模拟的整体平均绝对误差、均方根误差和平均绝对百分比误差分别为...引入基于风向相似度自适应的GCN-LSTM模型进行昆山市PM_(2.5)预测,并与GCN模型和LSTM模型预测结果进行比较。结果显示,风向相似度自适应GCN-LSTM模型对昆山市PM_(2.5)浓度模拟的整体平均绝对误差、均方根误差和平均绝对百分比误差分别为3.30μg/m^(3)、5.16μg/m^(3)和15.6%,低于GCN模型和LSTM模型的对应指标。对于未来1 h PM_(2.5)浓度预测,风向相似度自适应GCN-LSTM模型在多个方面均比GCN模型和LSTM模型表现更好。展开更多
针对自然语言处理中传统因果关系抽取主要用基于模式匹配的方法或机器学习算法进行抽取,结果准确率较低,且只能抽取带有因果提示词的显性因果关系问题,提出一种使用大规模的预训练模型结合图卷积神经网络的算法BERT-GCN.首先,使用BERT(b...针对自然语言处理中传统因果关系抽取主要用基于模式匹配的方法或机器学习算法进行抽取,结果准确率较低,且只能抽取带有因果提示词的显性因果关系问题,提出一种使用大规模的预训练模型结合图卷积神经网络的算法BERT-GCN.首先,使用BERT(bidirectional encoder representation from transformers)对语料进行编码,生成词向量;然后,将生成的词向量放入图卷积神经网络中进行训练;最后,放入Softmax层中完成对因果关系的抽取.实验结果表明,该模型在数据集SEDR-CE上获得了较好的结果,且针对隐式的因果关系效果也较好.展开更多
中文语义错误不同于简单的拼写错误和语法错误,它们通常更加隐蔽和复杂。中文语义错误识别(CSER)旨在判断中文句子是否包含语义错误,作为语义校对的前置任务,识别模型的性能对语义错误校对至关重要。针对CSER模型在融合句法信息时忽视...中文语义错误不同于简单的拼写错误和语法错误,它们通常更加隐蔽和复杂。中文语义错误识别(CSER)旨在判断中文句子是否包含语义错误,作为语义校对的前置任务,识别模型的性能对语义错误校对至关重要。针对CSER模型在融合句法信息时忽视句法结构与上下文结构之间差异的问题,提出一种层次信息增强的图卷积神经网络(HIE-GCN)模型,旨在将句法树中节点的层次信息嵌入上下文编码器,从而缩小句法结构与上下文结构之间的差异。首先,采用遍历算法提取句法树中节点的层次信息;其次,将层次信息嵌入BERT(Bidirectional Encoder Representations from Transformers)模型生成字符特征,而图卷积网络(GCN)将字符特征用于图上节点,并在图卷积计算后得到整个句子的特征向量;最后,利用全连接层进行单分类错误识别或多分类错误识别。在FCGEC(Fine-grained corpus for Chinese Grammatical Error Correction)和NaCGEC(Native Chinese Grammatical Error Correction)数据集上进行语义错误识别和校对的实验结果表明,在识别任务中,与基线模型相比,HIE-GCN模型在FCGEC数据集的单分类错误识别中将准确率至少提高0.10个百分点,F1值至少提高0.13个百分点;在多分类错误识别中将准确率至少提高1.05个百分点,F1值至少提高0.53个百分点;消融实验验证了层次信息嵌入的有效性;与GPT、Qwen等多个大语言模型(LLM)相比,所提模型的整体识别性能更高。在校对实验中,与序列到序列的直接纠错模型相比,采用识别-纠错二阶段流水线可将纠错精确率提高8.01个百分点,同时还发现,在LLM GLM4纠错过程中,向模型提示句子错误类型可将纠错的精确率提高4.62个百分点。展开更多
目前现有问答系统模型大多数都采用模板匹配的方式进行推理,对问题推理不够充分,因此,提出基于认知图谱的问答系统推理模型。依据专业领域知识作为知识源构建本体;并基于该认知图谱构建了"问题-关系"一对一的认知图谱问答系...目前现有问答系统模型大多数都采用模板匹配的方式进行推理,对问题推理不够充分,因此,提出基于认知图谱的问答系统推理模型。依据专业领域知识作为知识源构建本体;并基于该认知图谱构建了"问题-关系"一对一的认知图谱问答系统模型。最后通过将问答问题划分为简单问题与复杂问题分别对问题进行处理,其中简单问题运用BERT+CRF(Bidirectional Encoder Representations from Transformers+Conditional Random Field)模型进行模板匹配;针对复杂问题运用Node2vec生成子图后用GCN(Graph Convolutional Network)推理模型进行推理,将得出的答案作为输出结果。最后对所提出的模型通过井下作业领域进行了实验,结果表明认知图谱问答模型优于其他算法模型。展开更多
基金supported by the Program of Support Xinjiang by Technology(2024E02028,B2-2024-0359)Xinjiang Tianchi Talent Program of 2024,the Foundation of Chinese Academy of Sciences(B2-2023-0239)the Youth Foundation of Shandong Natural Science(ZR2023QD070).
文摘As one of the main characteristics of atmospheric pollutants,PM_(2.5) severely affects human health and has received widespread attention in recent years.How to predict the variations of PM_(2.5) concentrations with high accuracy is an important topic.The PM_(2.5) monitoring stations in Xinjiang Uygur Autonomous Region,China,are unevenly distributed,which makes it challenging to conduct comprehensive analyses and predictions.Therefore,this study primarily addresses the limitations mentioned above and the poor generalization ability of PM_(2.5) concentration prediction models across different monitoring stations.We chose the northern slope of the Tianshan Mountains as the study area and took the January−December in 2019 as the research period.On the basis of data from 21 PM_(2.5) monitoring stations as well as meteorological data(temperature,instantaneous wind speed,and pressure),we developed an improved model,namely GCN−TCN−AR(where GCN is the graph convolution network,TCN is the temporal convolutional network,and AR is the autoregression),for predicting PM_(2.5) concentrations on the northern slope of the Tianshan Mountains.The GCN−TCN−AR model is composed of an improved GCN model,a TCN model,and an AR model.The results revealed that the R2 values predicted by the GCN−TCN−AR model at the four monitoring stations(Urumqi,Wujiaqu,Shihezi,and Changji)were 0.93,0.91,0.93,and 0.92,respectively,and the RMSE(root mean square error)values were 6.85,7.52,7.01,and 7.28μg/m^(3),respectively.The performance of the GCN−TCN−AR model was also compared with the currently neural network models,including the GCN−TCN,GCN,TCN,Support Vector Regression(SVR),and AR.The GCN−TCN−AR outperformed the other current neural network models,with high prediction accuracy and good stability,making it especially suitable for the predictions of PM_(2.5)concentrations.This study revealed the significant spatiotemporal variations of PM_(2.5)concentrations.First,the PM_(2.5) concentrations exhibited clear seasonal fluctuations,with higher levels typically observed in winter and differences presented between months.Second,the spatial distribution analysis revealed that cities such as Urumqi and Wujiaqu have high PM_(2.5) concentrations,with a noticeable geographical clustering of pollutions.Understanding the variations in PM_(2.5) concentrations is highly important for the sustainable development of ecological environment in arid areas.
文摘引入基于风向相似度自适应的GCN-LSTM模型进行昆山市PM_(2.5)预测,并与GCN模型和LSTM模型预测结果进行比较。结果显示,风向相似度自适应GCN-LSTM模型对昆山市PM_(2.5)浓度模拟的整体平均绝对误差、均方根误差和平均绝对百分比误差分别为3.30μg/m^(3)、5.16μg/m^(3)和15.6%,低于GCN模型和LSTM模型的对应指标。对于未来1 h PM_(2.5)浓度预测,风向相似度自适应GCN-LSTM模型在多个方面均比GCN模型和LSTM模型表现更好。
文摘针对自然语言处理中传统因果关系抽取主要用基于模式匹配的方法或机器学习算法进行抽取,结果准确率较低,且只能抽取带有因果提示词的显性因果关系问题,提出一种使用大规模的预训练模型结合图卷积神经网络的算法BERT-GCN.首先,使用BERT(bidirectional encoder representation from transformers)对语料进行编码,生成词向量;然后,将生成的词向量放入图卷积神经网络中进行训练;最后,放入Softmax层中完成对因果关系的抽取.实验结果表明,该模型在数据集SEDR-CE上获得了较好的结果,且针对隐式的因果关系效果也较好.
文摘中文语义错误不同于简单的拼写错误和语法错误,它们通常更加隐蔽和复杂。中文语义错误识别(CSER)旨在判断中文句子是否包含语义错误,作为语义校对的前置任务,识别模型的性能对语义错误校对至关重要。针对CSER模型在融合句法信息时忽视句法结构与上下文结构之间差异的问题,提出一种层次信息增强的图卷积神经网络(HIE-GCN)模型,旨在将句法树中节点的层次信息嵌入上下文编码器,从而缩小句法结构与上下文结构之间的差异。首先,采用遍历算法提取句法树中节点的层次信息;其次,将层次信息嵌入BERT(Bidirectional Encoder Representations from Transformers)模型生成字符特征,而图卷积网络(GCN)将字符特征用于图上节点,并在图卷积计算后得到整个句子的特征向量;最后,利用全连接层进行单分类错误识别或多分类错误识别。在FCGEC(Fine-grained corpus for Chinese Grammatical Error Correction)和NaCGEC(Native Chinese Grammatical Error Correction)数据集上进行语义错误识别和校对的实验结果表明,在识别任务中,与基线模型相比,HIE-GCN模型在FCGEC数据集的单分类错误识别中将准确率至少提高0.10个百分点,F1值至少提高0.13个百分点;在多分类错误识别中将准确率至少提高1.05个百分点,F1值至少提高0.53个百分点;消融实验验证了层次信息嵌入的有效性;与GPT、Qwen等多个大语言模型(LLM)相比,所提模型的整体识别性能更高。在校对实验中,与序列到序列的直接纠错模型相比,采用识别-纠错二阶段流水线可将纠错精确率提高8.01个百分点,同时还发现,在LLM GLM4纠错过程中,向模型提示句子错误类型可将纠错的精确率提高4.62个百分点。
文摘目前现有问答系统模型大多数都采用模板匹配的方式进行推理,对问题推理不够充分,因此,提出基于认知图谱的问答系统推理模型。依据专业领域知识作为知识源构建本体;并基于该认知图谱构建了"问题-关系"一对一的认知图谱问答系统模型。最后通过将问答问题划分为简单问题与复杂问题分别对问题进行处理,其中简单问题运用BERT+CRF(Bidirectional Encoder Representations from Transformers+Conditional Random Field)模型进行模板匹配;针对复杂问题运用Node2vec生成子图后用GCN(Graph Convolutional Network)推理模型进行推理,将得出的答案作为输出结果。最后对所提出的模型通过井下作业领域进行了实验,结果表明认知图谱问答模型优于其他算法模型。