Classifying job offers into occupational categories is a fundamental task in human resource information systems,as it improves and streamlines indexing,search,and matching between openings and job seekers.Comprehensiv...Classifying job offers into occupational categories is a fundamental task in human resource information systems,as it improves and streamlines indexing,search,and matching between openings and job seekers.Comprehensive occupational databases such as O∗NET or ESCO provide detailed taxonomies of interrelated positions that can be leveraged to align the textual content of postings with occupational categories,thereby facilitating standardization,cross-system interoperability,and access to metadata for each occupation(e.g.,tasks,knowledge,skills,and abilities).In this work,we explore the effectiveness of fine-tuning existing language models(LMs)to classify job offers with occupational descriptors from O∗NET.This enables a more precise assessment of candidate suitability by identifying the specific knowledge and skills required for each position,and helps automate recruitment processes by mitigating human bias and subjectivity in candidate selection.We evaluate three representative BERT-like models:BERT,RoBERTa,and DeBERTa.BERT serves as the baseline encoder-only architecture;RoBERTa incorporates advances in pretraining objectives and data scale;and DeBERTa introduces architectural improvements through disentangled attention mechanisms.The best performance was achieved with the DeBERTa model,although the other models also produced strong results,and no statistically significant differences were observed acrossmodels.We also find that these models typically reach optimal performance after only a few training epochs,and that training with smaller,balanced datasets is effective.Consequently,comparable results can be obtained with models that require fewer computational resources and less training time,facilitating deployment and practical use.展开更多
针对“卡脖子”技术研究存在替代技术识别机制缺失与技术要素解析精度不足等局限,文章提出融合提示工程与BERT-LSTM模型的“卡脖子”替代技术识别方法。首先,基于商业管制清单(Commercial Control List,CCL)对ECCN物项进行解析,并开展...针对“卡脖子”技术研究存在替代技术识别机制缺失与技术要素解析精度不足等局限,文章提出融合提示工程与BERT-LSTM模型的“卡脖子”替代技术识别方法。首先,基于商业管制清单(Commercial Control List,CCL)对ECCN物项进行解析,并开展专利检索工作,通过SPC算法提取技术主路径的关键核心专利;其次,运用大语言模型提示工程抽取“问题-方案对”,借此解析技术功效,并结合功能导向搜索(Function-Oriented Search,FOS)初步查找可能具备技术替代功效的相关专利;再次,采用BERT-LSTM模型对专利文本实施二元分类,精准识别出具备技术替代功效的专利样本;通过提示工程抽取“方案-类别对”,系统识别替代技术方案;最后,建立科学-产业双维度评估体系完成替代技术潜力分级。文章以光刻技术为例,阐述该识别方法的应用流程,系统识别出极紫外(Extreme Ultra-violet,EUV)光刻技术的五种替代技术及其替代潜力。展开更多
针对非侵入式负荷分解方法负荷特征捕捉不足、负荷分解精度不够等问题,文章提出一种基于改进BERT(bidirectional encoder representations from transformers)模型的多头自注意力非侵入式负荷分解方法(frequency and temporal attention...针对非侵入式负荷分解方法负荷特征捕捉不足、负荷分解精度不够等问题,文章提出一种基于改进BERT(bidirectional encoder representations from transformers)模型的多头自注意力非侵入式负荷分解方法(frequency and temporal attention-BERT, FAT-BERT)。首先通过傅里叶变换将时域数据转换为频域数据,采用多尺度卷积全面捕捉负荷信号的时域和频域特征,从而增强模型对多样化负荷信号的表达能力;其次,在多头自注意力机制中引入频率注意力机制,从而增强模型对时序数据中频率成分的感知能力,进一步改善复杂负荷模式的表示,改进BERT模型中增加局部自注意力从而减少不必要的全局计算,提升模型的运行速度;接着将残差连接和正则化技术结合使模型在训练过程中更加稳定,并且能够更好地避免过拟合,最后在REDD和UK-DALE数据集上对提出的方法进行实验,实验结果验证了所提方法的有效性。展开更多
方面级情感分析旨在识别文本中针对特定方面的情感倾向,然而现有研究仍面临多重挑战:基于BERT的方面级情感分析研究存在语义过拟合、低层级语义利用不足的问题;自注意力机制存在局部信息丢失的问题;多编码层和多粒度语义的结构存在信息...方面级情感分析旨在识别文本中针对特定方面的情感倾向,然而现有研究仍面临多重挑战:基于BERT的方面级情感分析研究存在语义过拟合、低层级语义利用不足的问题;自注意力机制存在局部信息丢失的问题;多编码层和多粒度语义的结构存在信息冗余问题。为此,提出一种融合BERT编码层的多粒度语义方面级情感分析模型(multi-granular semantic aspect-based sentiment analysis model with fusion of BERT encoding layers,MSBEL)。具体地,引入金字塔注意力机制,利用各个编码层的语义特征,并结合低层编码器以降低过拟合;通过多尺度门控卷积增强模型处理局部信息丢失的能力;使用余弦注意力突出与方面词相关的情感特征,从而减少信息冗余。t-SNE的可视化分析表明,MSBEL的情感表示聚类效果优于BERT。此外,在多个基准数据集上将本文模型与主流模型的性能进行了对比,结果显示:与LCF-BERT相比,本文模型在5个数据集上的F1分别提升了1.53%、3.94%、1.39%、6.68%、5.97%;与SenticGCN相比,本文模型的F1平均提升0.94%,最大提升2.12%;与ABSA-DeBERTa相比,本文模型的F1平均提升1.16%,最大提升4.20%,验证了本文模型在方面级情感分析任务上的有效性和优越性。展开更多
文摘Classifying job offers into occupational categories is a fundamental task in human resource information systems,as it improves and streamlines indexing,search,and matching between openings and job seekers.Comprehensive occupational databases such as O∗NET or ESCO provide detailed taxonomies of interrelated positions that can be leveraged to align the textual content of postings with occupational categories,thereby facilitating standardization,cross-system interoperability,and access to metadata for each occupation(e.g.,tasks,knowledge,skills,and abilities).In this work,we explore the effectiveness of fine-tuning existing language models(LMs)to classify job offers with occupational descriptors from O∗NET.This enables a more precise assessment of candidate suitability by identifying the specific knowledge and skills required for each position,and helps automate recruitment processes by mitigating human bias and subjectivity in candidate selection.We evaluate three representative BERT-like models:BERT,RoBERTa,and DeBERTa.BERT serves as the baseline encoder-only architecture;RoBERTa incorporates advances in pretraining objectives and data scale;and DeBERTa introduces architectural improvements through disentangled attention mechanisms.The best performance was achieved with the DeBERTa model,although the other models also produced strong results,and no statistically significant differences were observed acrossmodels.We also find that these models typically reach optimal performance after only a few training epochs,and that training with smaller,balanced datasets is effective.Consequently,comparable results can be obtained with models that require fewer computational resources and less training time,facilitating deployment and practical use.
文摘针对“卡脖子”技术研究存在替代技术识别机制缺失与技术要素解析精度不足等局限,文章提出融合提示工程与BERT-LSTM模型的“卡脖子”替代技术识别方法。首先,基于商业管制清单(Commercial Control List,CCL)对ECCN物项进行解析,并开展专利检索工作,通过SPC算法提取技术主路径的关键核心专利;其次,运用大语言模型提示工程抽取“问题-方案对”,借此解析技术功效,并结合功能导向搜索(Function-Oriented Search,FOS)初步查找可能具备技术替代功效的相关专利;再次,采用BERT-LSTM模型对专利文本实施二元分类,精准识别出具备技术替代功效的专利样本;通过提示工程抽取“方案-类别对”,系统识别替代技术方案;最后,建立科学-产业双维度评估体系完成替代技术潜力分级。文章以光刻技术为例,阐述该识别方法的应用流程,系统识别出极紫外(Extreme Ultra-violet,EUV)光刻技术的五种替代技术及其替代潜力。
文摘针对非侵入式负荷分解方法负荷特征捕捉不足、负荷分解精度不够等问题,文章提出一种基于改进BERT(bidirectional encoder representations from transformers)模型的多头自注意力非侵入式负荷分解方法(frequency and temporal attention-BERT, FAT-BERT)。首先通过傅里叶变换将时域数据转换为频域数据,采用多尺度卷积全面捕捉负荷信号的时域和频域特征,从而增强模型对多样化负荷信号的表达能力;其次,在多头自注意力机制中引入频率注意力机制,从而增强模型对时序数据中频率成分的感知能力,进一步改善复杂负荷模式的表示,改进BERT模型中增加局部自注意力从而减少不必要的全局计算,提升模型的运行速度;接着将残差连接和正则化技术结合使模型在训练过程中更加稳定,并且能够更好地避免过拟合,最后在REDD和UK-DALE数据集上对提出的方法进行实验,实验结果验证了所提方法的有效性。
文摘方面级情感分析旨在识别文本中针对特定方面的情感倾向,然而现有研究仍面临多重挑战:基于BERT的方面级情感分析研究存在语义过拟合、低层级语义利用不足的问题;自注意力机制存在局部信息丢失的问题;多编码层和多粒度语义的结构存在信息冗余问题。为此,提出一种融合BERT编码层的多粒度语义方面级情感分析模型(multi-granular semantic aspect-based sentiment analysis model with fusion of BERT encoding layers,MSBEL)。具体地,引入金字塔注意力机制,利用各个编码层的语义特征,并结合低层编码器以降低过拟合;通过多尺度门控卷积增强模型处理局部信息丢失的能力;使用余弦注意力突出与方面词相关的情感特征,从而减少信息冗余。t-SNE的可视化分析表明,MSBEL的情感表示聚类效果优于BERT。此外,在多个基准数据集上将本文模型与主流模型的性能进行了对比,结果显示:与LCF-BERT相比,本文模型在5个数据集上的F1分别提升了1.53%、3.94%、1.39%、6.68%、5.97%;与SenticGCN相比,本文模型的F1平均提升0.94%,最大提升2.12%;与ABSA-DeBERTa相比,本文模型的F1平均提升1.16%,最大提升4.20%,验证了本文模型在方面级情感分析任务上的有效性和优越性。