In the era of intelligent economy, the click-through rate(CTR) prediction system can evaluate massive service information based on user historical information, and screen out the products that are most likely to be fa...In the era of intelligent economy, the click-through rate(CTR) prediction system can evaluate massive service information based on user historical information, and screen out the products that are most likely to be favored by users, thus realizing customized push of information and achieve the ultimate goal of improving economic benefits. Sequence modeling is one of the main research directions of CTR prediction models based on deep learning. The user's general interest hidden in the entire click history and the short-term interest hidden in the recent click behaviors have different influences on the CTR prediction results, which are highly important. In terms of capturing the user's general interest, existing models paid more attention to the relationships between item embedding vectors(point-level), while ignoring the relationships between elements in item embedding vectors(union-level). The Lambda layer-based Convolutional Sequence Embedding(LCSE) model proposed in this paper uses the Lambda layer to capture features from click history through weight distribution, and uses horizontal and vertical filters on this basis to learn the user's general preferences from union-level and point-level. In addition, we also incorporate the user's short-term preferences captured by the embedding-based convolutional model to further improve the prediction results. The AUC(Area Under Curve) values of the LCSE model on the datasets Electronic, Movie & TV and MovieLens are 0.870 7, 0.903 6 and 0.946 7, improving 0.45%, 0.36% and 0.07% over the Caser model, proving the effectiveness of our proposed model.展开更多
Understanding the underlying goal behind a user's Web query has been proved to be helpful to improve the quality of search. This paper focuses on the problem of automatic identification of query types according to th...Understanding the underlying goal behind a user's Web query has been proved to be helpful to improve the quality of search. This paper focuses on the problem of automatic identification of query types according to the goals. Four novel entropy-based features extracted from anchor data and click-through data are proposed, and a support vector machines (SVM) classifier is used to identify the user's goal based on these features. Experi- mental results show that the proposed entropy-based features are more effective than those reported in previous work. By combin- ing multiple features the goals for more than 97% of the queries studied can be correctly identified. Besides these, this paper reaches the following important conclusions: First, anchor-based features are more effective than click-through-based features; Second, the number of sites is more reliable than the number of links; Third, click-distribution- based features are more effective than session-based ones.展开更多
In recent years,deep learning has been widely applied in the fields of recommendation systems and click-through rate(CTR)prediction,and thus recommendation models incorporating deep learning have emerged.In addition,t...In recent years,deep learning has been widely applied in the fields of recommendation systems and click-through rate(CTR)prediction,and thus recommendation models incorporating deep learning have emerged.In addition,the design and implementation of recommendation models using information related to user behavior sequences is an important direction of current research in recommendation systems,and models calculate the likelihood of users clicking on target items based on their behavior sequence information.In order to explore the relationship between features,this paper improves and optimizes on the basis of deep interest network(DIN)proposed by Ali’s team.Based on the user behavioral sequences information,the attentional factorization machine(AFM)is integrated to obtain richer and more accurate behavioral sequence information.In addition,this paper designs a new way of calculating attention weights,which uses the relationship between the cosine similarity of any two vectors and the absolute value of their modal length difference to measure their relevance degree.Thus,a novel deep learning CTR prediction mode is proposed,that is,the CTR prediction network based on user behavior sequence and feature interactions deep interest and machines network(DIMN).We conduct extensive comparison experiments on three public datasets and one private music dataset,which are more recognized in the industry,and the results show that the DIMN obtains a better performance compared with the classical CTR prediction model.展开更多
新闻点击率预估是个性化新闻推荐的关键技术之一。针对现有新闻点击率预估方法忽略新闻全局特征建模及信息压缩导致语义丢失的问题,提出融合多模态和长短期历史行为的新闻点击率预估模型(MLSTH:Multimodal and Long-Short Term Historic...新闻点击率预估是个性化新闻推荐的关键技术之一。针对现有新闻点击率预估方法忽略新闻全局特征建模及信息压缩导致语义丢失的问题,提出融合多模态和长短期历史行为的新闻点击率预估模型(MLSTH:Multimodal and Long-Short Term Historical Behavior News Click-Through Rate Prediction)。MLSTH主要包括新闻编码和用户编码两部分。在新闻编码中,首先利用预训练模型对新闻多模态特征编码;然后,基于跨模态注意力构建视觉语义融合模块分别得到全局特征信息和局部特征信息;最后,将得到的局部特征和全局特征拼接,作为多模态新闻编码。通过在公开数据集V-MIND上验证,与现有多模态模型MMRec, VLSNR,IM-Rec相比较,AUC平均提升1.68%,2.54%和2.36%,证明了其有效性和优越性。展开更多
点击率(Click-Through Rate,CTR)预测在广告和电子商务领域应用广泛,众多点击率预测模型应运而生。然而,已有的CTR预测模型大多只关注单一特征的固定表示,忽视了每个特征在不同上下文中的不同重要性,导致模型的性能不佳。此外,现有模型...点击率(Click-Through Rate,CTR)预测在广告和电子商务领域应用广泛,众多点击率预测模型应运而生。然而,已有的CTR预测模型大多只关注单一特征的固定表示,忽视了每个特征在不同上下文中的不同重要性,导致模型的性能不佳。此外,现有模型在高阶特征交互与细粒度特征融合方面存在不足,难以提升模型的表达能力。为解决上述问题,提出了一种融合上下文感知的深度残差点击率预测模型(Context-aware Deep Residual,CDR)。首先,该模型通过上下文聚合单元CAU捕获上下文相关信息以及特征之间的关系信息,生成上下文感知特征以丰富特征表示;其次,通过将残差连接与MLP网络相结合,以优化特征交互的非线性变换,增强模型对高阶特征的学习能力;最后,利用双线性融合操作实现更加细粒度的特征融合,提升了特征表示的全面性与鲁棒性。在Criteo、Avazu、Movielens和Frappe等公开数据集上进行了对比实验,AUC指标平均提升了1.04%,LogLoss指标平均改善了2.27%。结果表明,该模型的性能优于现有先进模型,有效提升了CTR预测的精度。展开更多
点击率(CTR)预测通过预测用户对广告或商品的点击概率,实现数字广告精准推荐。针对现有CTR模型存在原始嵌入向量未精化、特征交互方式偏简单的问题,本文提出自注意力深度域嵌入因子分解机(self-attention deep field-embedded factoriza...点击率(CTR)预测通过预测用户对广告或商品的点击概率,实现数字广告精准推荐。针对现有CTR模型存在原始嵌入向量未精化、特征交互方式偏简单的问题,本文提出自注意力深度域嵌入因子分解机(self-attention deep field-embedded factorization machine,Self-AtDFEFM)模型。首先,通过多头自注意力对原始嵌入向量加权,精化出关键低层特征;其次,构建深度域嵌入因子分解机(FEFM)模块,设计域对对称矩阵以提升不同特征域之间的交互强度,为高阶特征交互优选出低阶特征组合;再次,基于低阶特征组合构建深度神经网络(DNN),完成隐式高阶特征交互;然后,围绕精化后的嵌入向量,联合多头自注意力与残差机制堆叠多个显式高阶特征交互层,通过自注意力捕获同一特征在不同子空间上的互补信息,完成显示高阶特征交互;最后,联合显式与隐式高阶特征交互实现点击率预测。在Criteo和Avazu两大公开数据集上,将Self-AtDFEFM模型与主流基线模型在AUC和LogLoss指标上进行对比实验;为Self-AtDFEFM模型调制显式高阶特征交互层层数、注意力头数量、嵌入层维度及隐式高阶特征交互层层数等参数;对Self-AtDFEFM模型进行消融实验。实验结果表明:在两大数据集上,Self-AtDFEFM模型的AUC、LogLoss均优于主流基线模型;Self-AtDFEFM模型的全部参数已调为最佳;各模块形成合力以促使Self-AtDFEFM模型性能达到最优,其中显示高阶特征交互层的作用最大。Self-AtDFEFM模型各模块即插即用,易于构建和部署,且在性能与复杂度之间取得平衡,具备较高实用性。展开更多
基金Supported by the National Natural Science Foundation of China (62272214)。
文摘In the era of intelligent economy, the click-through rate(CTR) prediction system can evaluate massive service information based on user historical information, and screen out the products that are most likely to be favored by users, thus realizing customized push of information and achieve the ultimate goal of improving economic benefits. Sequence modeling is one of the main research directions of CTR prediction models based on deep learning. The user's general interest hidden in the entire click history and the short-term interest hidden in the recent click behaviors have different influences on the CTR prediction results, which are highly important. In terms of capturing the user's general interest, existing models paid more attention to the relationships between item embedding vectors(point-level), while ignoring the relationships between elements in item embedding vectors(union-level). The Lambda layer-based Convolutional Sequence Embedding(LCSE) model proposed in this paper uses the Lambda layer to capture features from click history through weight distribution, and uses horizontal and vertical filters on this basis to learn the user's general preferences from union-level and point-level. In addition, we also incorporate the user's short-term preferences captured by the embedding-based convolutional model to further improve the prediction results. The AUC(Area Under Curve) values of the LCSE model on the datasets Electronic, Movie & TV and MovieLens are 0.870 7, 0.903 6 and 0.946 7, improving 0.45%, 0.36% and 0.07% over the Caser model, proving the effectiveness of our proposed model.
基金the Tianjin Applied Fundamental Research Plan (07JCYBJC14500)
文摘Understanding the underlying goal behind a user's Web query has been proved to be helpful to improve the quality of search. This paper focuses on the problem of automatic identification of query types according to the goals. Four novel entropy-based features extracted from anchor data and click-through data are proposed, and a support vector machines (SVM) classifier is used to identify the user's goal based on these features. Experi- mental results show that the proposed entropy-based features are more effective than those reported in previous work. By combin- ing multiple features the goals for more than 97% of the queries studied can be correctly identified. Besides these, this paper reaches the following important conclusions: First, anchor-based features are more effective than click-through-based features; Second, the number of sites is more reliable than the number of links; Third, click-distribution- based features are more effective than session-based ones.
文摘In recent years,deep learning has been widely applied in the fields of recommendation systems and click-through rate(CTR)prediction,and thus recommendation models incorporating deep learning have emerged.In addition,the design and implementation of recommendation models using information related to user behavior sequences is an important direction of current research in recommendation systems,and models calculate the likelihood of users clicking on target items based on their behavior sequence information.In order to explore the relationship between features,this paper improves and optimizes on the basis of deep interest network(DIN)proposed by Ali’s team.Based on the user behavioral sequences information,the attentional factorization machine(AFM)is integrated to obtain richer and more accurate behavioral sequence information.In addition,this paper designs a new way of calculating attention weights,which uses the relationship between the cosine similarity of any two vectors and the absolute value of their modal length difference to measure their relevance degree.Thus,a novel deep learning CTR prediction mode is proposed,that is,the CTR prediction network based on user behavior sequence and feature interactions deep interest and machines network(DIMN).We conduct extensive comparison experiments on three public datasets and one private music dataset,which are more recognized in the industry,and the results show that the DIMN obtains a better performance compared with the classical CTR prediction model.
文摘点击率(Click-Through Rate,CTR)预测在广告和电子商务领域应用广泛,众多点击率预测模型应运而生。然而,已有的CTR预测模型大多只关注单一特征的固定表示,忽视了每个特征在不同上下文中的不同重要性,导致模型的性能不佳。此外,现有模型在高阶特征交互与细粒度特征融合方面存在不足,难以提升模型的表达能力。为解决上述问题,提出了一种融合上下文感知的深度残差点击率预测模型(Context-aware Deep Residual,CDR)。首先,该模型通过上下文聚合单元CAU捕获上下文相关信息以及特征之间的关系信息,生成上下文感知特征以丰富特征表示;其次,通过将残差连接与MLP网络相结合,以优化特征交互的非线性变换,增强模型对高阶特征的学习能力;最后,利用双线性融合操作实现更加细粒度的特征融合,提升了特征表示的全面性与鲁棒性。在Criteo、Avazu、Movielens和Frappe等公开数据集上进行了对比实验,AUC指标平均提升了1.04%,LogLoss指标平均改善了2.27%。结果表明,该模型的性能优于现有先进模型,有效提升了CTR预测的精度。
文摘点击率(CTR)预测通过预测用户对广告或商品的点击概率,实现数字广告精准推荐。针对现有CTR模型存在原始嵌入向量未精化、特征交互方式偏简单的问题,本文提出自注意力深度域嵌入因子分解机(self-attention deep field-embedded factorization machine,Self-AtDFEFM)模型。首先,通过多头自注意力对原始嵌入向量加权,精化出关键低层特征;其次,构建深度域嵌入因子分解机(FEFM)模块,设计域对对称矩阵以提升不同特征域之间的交互强度,为高阶特征交互优选出低阶特征组合;再次,基于低阶特征组合构建深度神经网络(DNN),完成隐式高阶特征交互;然后,围绕精化后的嵌入向量,联合多头自注意力与残差机制堆叠多个显式高阶特征交互层,通过自注意力捕获同一特征在不同子空间上的互补信息,完成显示高阶特征交互;最后,联合显式与隐式高阶特征交互实现点击率预测。在Criteo和Avazu两大公开数据集上,将Self-AtDFEFM模型与主流基线模型在AUC和LogLoss指标上进行对比实验;为Self-AtDFEFM模型调制显式高阶特征交互层层数、注意力头数量、嵌入层维度及隐式高阶特征交互层层数等参数;对Self-AtDFEFM模型进行消融实验。实验结果表明:在两大数据集上,Self-AtDFEFM模型的AUC、LogLoss均优于主流基线模型;Self-AtDFEFM模型的全部参数已调为最佳;各模块形成合力以促使Self-AtDFEFM模型性能达到最优,其中显示高阶特征交互层的作用最大。Self-AtDFEFM模型各模块即插即用,易于构建和部署,且在性能与复杂度之间取得平衡,具备较高实用性。