The denoising problem of impure chaotic signals is addressed in this paper. A method based on sparse representation is proposed, in which the random frame dictionary is generated by a chaotic random search algorithm. ...The denoising problem of impure chaotic signals is addressed in this paper. A method based on sparse representation is proposed, in which the random frame dictionary is generated by a chaotic random search algorithm. The numerical simulation shows the proposed algorithm outperforms those recently reported alternative denoising methods.展开更多
A new definition of the alternative coherent-mode representation of a random planar source with the a priori unknown statistical properties is proposed. This definition is based on the measurements of the source cross...A new definition of the alternative coherent-mode representation of a random planar source with the a priori unknown statistical properties is proposed. This definition is based on the measurements of the source cross-spectral density followed by the optimal approximation of the obtained results in the chosen basis of modal functions. The proposed definition is illustrated by the results of numerical simulation.展开更多
为了实现在城市内涝舆情信息中快速、精准地识别相关风险要素,首先基于新浪微博平台,对用户评论信息及媒体发布信息进行采集、整理及标注,构建了城市内涝灾害事件语料数据集。进而针对城市内涝舆情信息格式不统一、语义复杂且风险要素...为了实现在城市内涝舆情信息中快速、精准地识别相关风险要素,首先基于新浪微博平台,对用户评论信息及媒体发布信息进行采集、整理及标注,构建了城市内涝灾害事件语料数据集。进而针对城市内涝舆情信息格式不统一、语义复杂且风险要素识别的专业性、精准度要求较高等问题,结合自然灾害系统理论的风险要素框架,提出了一种基于双向编码器表征法-双向长短期记忆-条件随机场(Bidirectional Encoder Representations from Transformers-Bidirectional Long Short-Term Memory-Conditional Random Field,BERT-BiLSTM-CRF)的识别方法,并开展了一系列模型验证试验。对比试验结果表明,该模型在准确率、召回率、F_(1)三项指标上均有较好表现,其中准确率为84.62%,召回率为86.19%,F_(1)为85.35%,优于其他对比模型。消融试验结果表明,BERT预训练模型对于该模型性能有着更为显著的影响。综合上述试验结果,可以验证该模型能够有效识别城市内涝舆情信息中的各类风险要素,进而为城市内涝灾害风险管控的数智化转型提供研究依据。展开更多
针对中文文本检错纠错研究任务,提出了基于知识增强的自然语言表示模型(enhanced representation through knowledge integration, ERNIE)与序列标注结合的中文文本检错纠错模型。该模型由检错和纠错两部分组成,检错阶段ERNIE使用全局...针对中文文本检错纠错研究任务,提出了基于知识增强的自然语言表示模型(enhanced representation through knowledge integration, ERNIE)与序列标注结合的中文文本检错纠错模型。该模型由检错和纠错两部分组成,检错阶段ERNIE使用全局注意力机制进行词向量编码输入到BiLSTM-CRF序列标注模型中,双向长短期记忆网络(bi-directional long short-term memory, BiLSTM)提取上下文的信息进行拼接生成双向的词向量,再通过条件随机场(conditional random field, CRF)计算联合概率增加对邻近词标签的依赖性优化整个序列,从而解决标注偏置等问题给出的错误标注。纠错阶段根据检错模型输出的结果采用不同策略分类纠错,将标注为错字、缺字的错误使用ERNIE掩码语言模型和混淆集匹配进行预测,对多字、乱序错误直接纠正。实验结果表明,引入序列标注根据错误类型进行分类纠错有效提升了纠错率,在SIGHAN数据集上测试F1达到了81.8%。展开更多
A effective approximate scheme which is combined by cluster with the discrelized path-integral representation (DPIR) is used in the study on the random-bond Ising model in a transverse field (RTIM). The critical therm...A effective approximate scheme which is combined by cluster with the discrelized path-integral representation (DPIR) is used in the study on the random-bond Ising model in a transverse field (RTIM). The critical thermodynamical properties, such as the critical temperature, the critical transverse field, the average magnetization ,the susceptibility and the special heat atc.. are calculated, And some results have been improved.展开更多
Metapaths with specific complex semantics are critical to learning diverse semantic and structural information of heterogeneous networks(HNs)for most of the existing representation learning models.However,any metapath...Metapaths with specific complex semantics are critical to learning diverse semantic and structural information of heterogeneous networks(HNs)for most of the existing representation learning models.However,any metapaths consisting of multiple,simple metarelations must be driven by domain experts.These sensitive,expensive,and limited metapaths severely reduce the flexibility and scalability of the existing models.A metapath-free,scalable representation learning model,called Metarelation2vec,is proposed for HNs with biased joint learning of all metarelations in a bid to address this problem.Specifically,a metarelation-aware,biased walk strategy is first designed to obtain better training samples by using autogenerating cooperation probabilities for all metarelations rather than using expert-given metapaths.Thereafter,grouped nodes by the type,a common and shallow skip-gram model is used to separately learn structural proximity for each node type.Next,grouped links by the type,a novel and shallow model is used to separately learn the semantic proximity for each link type.Finally,supervised by the cooperation probabilities of all meta-words,the biased training samples are thrown into the shallow models to jointly learn the structural and semantic information in the HNs,ensuring the accuracy and scalability of the models.Extensive experimental results on three tasks and four open datasets demonstrate the advantages of our proposed model.展开更多
基金Project supported by the National Natural Science Foundation of China (Grant No. 60872123)the Joint Fund of the National Natural Science Foundation and the Guangdong Provincial Natural Science Foundation (Grant No. U0835001)by the Doctorate Foundation of South China University of Technology,China
文摘The denoising problem of impure chaotic signals is addressed in this paper. A method based on sparse representation is proposed, in which the random frame dictionary is generated by a chaotic random search algorithm. The numerical simulation shows the proposed algorithm outperforms those recently reported alternative denoising methods.
文摘A new definition of the alternative coherent-mode representation of a random planar source with the a priori unknown statistical properties is proposed. This definition is based on the measurements of the source cross-spectral density followed by the optimal approximation of the obtained results in the chosen basis of modal functions. The proposed definition is illustrated by the results of numerical simulation.
文摘该研究致力于构建一个高质量的数据集,用于南美白对虾养殖领域的命名实体识别(named entity recognition,NER)任务,命名为VamNER。为确保数据集的多样性,从CNKI数据库中收集了近10年的高质量论文,并结合权威书籍进行语料构建。邀请专家讨论实体类型,并经过专业培训的标注人员使用IOB2标注格式进行标注,标注过程分为预标注和正式标注两个阶段以提高效率。在预标注阶段,标注者间一致性(inter-annotation agreement,IAA)达到0.87,表明标注人员的一致性较高。最终,VamNER包含6115个句子,总字符数达384602,涵盖10个实体类型,共有12814个实体。研究通过与多个通用领域数据集和一个特定领域数据集进行比较,揭示了VamNER的独特特性。在实验中使用了预训练的基于变换器的双向编码器表示(bidirectional encoder representations from Transformers,BERT)模型、双向长短期记忆神经网络(bidirectional long short-term memory network,BiLSTM)和条件随机场模型(conditional random fields,CRF),最优模型在测试集上的F1值达到82.8%。VamNER成为首个专注于南美白对虾养殖领域的NER数据集,为中文特定领域NER研究提供了丰富资源,有望推动水产养殖领域NER研究的发展。
文摘为了实现在城市内涝舆情信息中快速、精准地识别相关风险要素,首先基于新浪微博平台,对用户评论信息及媒体发布信息进行采集、整理及标注,构建了城市内涝灾害事件语料数据集。进而针对城市内涝舆情信息格式不统一、语义复杂且风险要素识别的专业性、精准度要求较高等问题,结合自然灾害系统理论的风险要素框架,提出了一种基于双向编码器表征法-双向长短期记忆-条件随机场(Bidirectional Encoder Representations from Transformers-Bidirectional Long Short-Term Memory-Conditional Random Field,BERT-BiLSTM-CRF)的识别方法,并开展了一系列模型验证试验。对比试验结果表明,该模型在准确率、召回率、F_(1)三项指标上均有较好表现,其中准确率为84.62%,召回率为86.19%,F_(1)为85.35%,优于其他对比模型。消融试验结果表明,BERT预训练模型对于该模型性能有着更为显著的影响。综合上述试验结果,可以验证该模型能够有效识别城市内涝舆情信息中的各类风险要素,进而为城市内涝灾害风险管控的数智化转型提供研究依据。
文摘针对现有的中文命名实体识别算法没有充分考虑实体识别任务的数据特征,存在中文样本数据的类别不平衡、训练数据中的噪声太大和每次模型生成数据的分布差异较大的问题,提出了一种以BERT-BiLSTM-CRF(Bidirectional Encoder Representations from Transformers-Bidirectional Long Short-Term Memory-Conditional Random Field)为基线改进的中文命名实体识别模型。首先在BERT-BiLSTM-CRF模型上结合P-Tuning v2技术,精确提取数据特征,然后使用3个损失函数包括聚焦损失(Focal Loss)、标签平滑(Label Smoothing)和KL Loss(Kullback-Leibler divergence loss)作为正则项参与损失计算。实验结果表明,改进的模型在Weibo、Resume和MSRA(Microsoft Research Asia)数据集上的F 1得分分别为71.13%、96.31%、95.90%,验证了所提算法具有更好的性能,并且在不同的下游任务中,所提算法易于与其他的神经网络结合与扩展。
文摘针对中文文本检错纠错研究任务,提出了基于知识增强的自然语言表示模型(enhanced representation through knowledge integration, ERNIE)与序列标注结合的中文文本检错纠错模型。该模型由检错和纠错两部分组成,检错阶段ERNIE使用全局注意力机制进行词向量编码输入到BiLSTM-CRF序列标注模型中,双向长短期记忆网络(bi-directional long short-term memory, BiLSTM)提取上下文的信息进行拼接生成双向的词向量,再通过条件随机场(conditional random field, CRF)计算联合概率增加对邻近词标签的依赖性优化整个序列,从而解决标注偏置等问题给出的错误标注。纠错阶段根据检错模型输出的结果采用不同策略分类纠错,将标注为错字、缺字的错误使用ERNIE掩码语言模型和混淆集匹配进行预测,对多字、乱序错误直接纠正。实验结果表明,引入序列标注根据错误类型进行分类纠错有效提升了纠错率,在SIGHAN数据集上测试F1达到了81.8%。
文摘A effective approximate scheme which is combined by cluster with the discrelized path-integral representation (DPIR) is used in the study on the random-bond Ising model in a transverse field (RTIM). The critical thermodynamical properties, such as the critical temperature, the critical transverse field, the average magnetization ,the susceptibility and the special heat atc.. are calculated, And some results have been improved.
基金supported by the National Key Research and Development Program(No.2019YFE0105300)the National Natural Science Foundation of China(No.62103143)+2 种基金the Hunan Province Key Research and Development Program(No.2022WK2006)the Special Project for the Construction of Innovative Provinces in Hunan(Nos.2020TP2018 and 2019GK4030)the Scientific Research Fund of Hunan Provincial Education Department(No.22B0471).
文摘Metapaths with specific complex semantics are critical to learning diverse semantic and structural information of heterogeneous networks(HNs)for most of the existing representation learning models.However,any metapaths consisting of multiple,simple metarelations must be driven by domain experts.These sensitive,expensive,and limited metapaths severely reduce the flexibility and scalability of the existing models.A metapath-free,scalable representation learning model,called Metarelation2vec,is proposed for HNs with biased joint learning of all metarelations in a bid to address this problem.Specifically,a metarelation-aware,biased walk strategy is first designed to obtain better training samples by using autogenerating cooperation probabilities for all metarelations rather than using expert-given metapaths.Thereafter,grouped nodes by the type,a common and shallow skip-gram model is used to separately learn structural proximity for each node type.Next,grouped links by the type,a novel and shallow model is used to separately learn the semantic proximity for each link type.Finally,supervised by the cooperation probabilities of all meta-words,the biased training samples are thrown into the shallow models to jointly learn the structural and semantic information in the HNs,ensuring the accuracy and scalability of the models.Extensive experimental results on three tasks and four open datasets demonstrate the advantages of our proposed model.