【目的】设计一种基于FIML和DAE的填充缺失值的方法,即聚类全信息选择性过滤编码器数据填补算法(clustering-based comprehensive information selective filtering encoder data imputation algorithm,CFSM-DAE),为水稻种质资源缺失数...【目的】设计一种基于FIML和DAE的填充缺失值的方法,即聚类全信息选择性过滤编码器数据填补算法(clustering-based comprehensive information selective filtering encoder data imputation algorithm,CFSM-DAE),为水稻种质资源缺失数据进行填充。【方法】利用聚类辅助避免数据异常值对算法的影响,采用选择性过滤层用于识别高质量估算、减少低质量估算的影响。传统的DAE框架通常没有选择性过滤层,所有的估算值都被视为同等重要,无法区分高质量和低质量的估算值。为了进一步提高估算精度,研究采用集成框架将全信息最大似然性(FIML)与多对抗性自编码器(DAE)结合的方法(CFSM-DAE),在选择性过滤层基础上,自适应填充,即当估算值不符合设定阈值时,采用FIML填充策略以确保填充结果的稳定性和精确度,从而进一步来提高整体估算精度。在3种缺失数据机制(随机缺失(MAR)、完全随机缺失(MCAR)和非随机缺失(MNAR))下对模拟数据和实际水稻种质资源数据集进行研究,将CFSM-DAE方法与多种常用填充算法比较(全信息最大似然性(FIML)、对抗自编码器(DAE)、K近邻填充(KNN)、随机森林(RF)、链式方程多重插补(MICE))。【结果】CFSM-DAE在模拟数据上的表现为S_(RME)=0.0676,E_(MA)=0.0093,R^(2)=0.9958;在水稻种质资源数据上的表现为S_(RME)=0.0395,E_(MA)=0.0078,R^(2)=0.8913。相比之下,其他算法如DAE在这两类数据下的SRME表现分别为0.8896和0.7707;KNN算法的EMA表现分别为0.1183和0.1305;FIML算法的R2表现为0.3382和0.7321。因此,CFSM-DAE在多个评价指标上相较于其他算法都表现出了一定的提升,CFSM-DAE在模拟数据和水稻种质资源数据的表现优于其他算法。【结论】CFSM-DAE方法通过结合聚类、选择性过滤和全信息最大似然性等策略,显著提高了水稻种质资源数据中缺失值的填补精度,展示了其在处理复杂缺失值问题上的有效性和潜力。展开更多
The fusion of infrared and visible images should emphasize the salient targets in the infrared image while preserving the textural details of the visible images.To meet these requirements,an autoencoder-based method f...The fusion of infrared and visible images should emphasize the salient targets in the infrared image while preserving the textural details of the visible images.To meet these requirements,an autoencoder-based method for infrared and visible image fusion is proposed.The encoder designed according to the optimization objective consists of a base encoder and a detail encoder,which is used to extract low-frequency and high-frequency information from the image.This extraction may lead to some information not being captured,so a compensation encoder is proposed to supplement the missing information.Multi-scale decomposition is also employed to extract image features more comprehensively.The decoder combines low-frequency,high-frequency and supplementary information to obtain multi-scale features.Subsequently,the attention strategy and fusion module are introduced to perform multi-scale fusion for image reconstruction.Experimental results on three datasets show that the fused images generated by this network effectively retain salient targets while being more consistent with human visual perception.展开更多
The application of machine learning in the healthcare domain has groundbreaking potential across a wide range of scenarios.However,this potential is often stalled by data-related challenges,such as the imbalanced natu...The application of machine learning in the healthcare domain has groundbreaking potential across a wide range of scenarios.However,this potential is often stalled by data-related challenges,such as the imbalanced nature of the domain's data,where critical outcomes tend to be inherently rare.To address this challenge,we propose a novel oversampling approach,the counterfactual synthetic minority oversampling technique(Counterfactual SMOTE),which combines SMOTE with a counterfactual generation framework.Our method intrinsically performs an oversampling process near the decision boundary within a safe region of space,allowing for the generation of informative but non-noisy minority samples.To validate the proposed framework,a rigorous experimental procedure was conducted across a set of highly imbalanced binary classification challenges in healthcare.The results demonstrate the superiority of the proposed method over several of the most commonly used oversampling alternatives presented in the literature.Notably,Counterfactual SMOTE was the only method to present a convincingly superior performance when compared with the original SMOTE.Although the proposed method was specifically validated in the healthcare domain,owing to its relevance and frequently imbalanced nature,we expect the findings of this study to be generalizable to any imbalanced scenario.展开更多
The e-commerce industry’s rapid growth,accelerated by the COVID-19 pandemic,has led to an alarming increase in digital fraud and associated losses.To establish a healthy e-commerce ecosystem,robust cyber security and...The e-commerce industry’s rapid growth,accelerated by the COVID-19 pandemic,has led to an alarming increase in digital fraud and associated losses.To establish a healthy e-commerce ecosystem,robust cyber security and anti-fraud measures are crucial.However,research on fraud detection systems has struggled to keep pace due to limited real-world datasets.Advances in artificial intelligence,Machine Learning(ML),and cloud computing have revitalized research and applications in this domain.While ML and data mining techniques are popular in fraud detection,specific reviews focusing on their application in e-commerce platforms like eBay and Facebook are lacking depth.Existing reviews provide broad overviews but fail to grasp the intricacies of ML algorithms in the e-commerce context.To bridge this gap,our study conducts a systematic literature review using the Preferred Reporting Items for Systematic reviews and Meta-Analysis(PRISMA)methodology.We aim to explore the effectiveness of these techniques in fraud detection within digital marketplaces and the broader e-commerce landscape.Understanding the current state of the literature and emerging trends is crucial given the rising fraud incidents and associated costs.Through our investigation,we identify research opportunities and provide insights to industry stakeholders on key ML and data mining techniques for combating e-commerce fraud.Our paper examines the research on these techniques as published in the past decade.Employing the PRISMA approach,we conducted a content analysis of 101 publications,identifying research gaps,recent techniques,and highlighting the increasing utilization of artificial neural networks in fraud detection within the industry.展开更多
文摘【目的】设计一种基于FIML和DAE的填充缺失值的方法,即聚类全信息选择性过滤编码器数据填补算法(clustering-based comprehensive information selective filtering encoder data imputation algorithm,CFSM-DAE),为水稻种质资源缺失数据进行填充。【方法】利用聚类辅助避免数据异常值对算法的影响,采用选择性过滤层用于识别高质量估算、减少低质量估算的影响。传统的DAE框架通常没有选择性过滤层,所有的估算值都被视为同等重要,无法区分高质量和低质量的估算值。为了进一步提高估算精度,研究采用集成框架将全信息最大似然性(FIML)与多对抗性自编码器(DAE)结合的方法(CFSM-DAE),在选择性过滤层基础上,自适应填充,即当估算值不符合设定阈值时,采用FIML填充策略以确保填充结果的稳定性和精确度,从而进一步来提高整体估算精度。在3种缺失数据机制(随机缺失(MAR)、完全随机缺失(MCAR)和非随机缺失(MNAR))下对模拟数据和实际水稻种质资源数据集进行研究,将CFSM-DAE方法与多种常用填充算法比较(全信息最大似然性(FIML)、对抗自编码器(DAE)、K近邻填充(KNN)、随机森林(RF)、链式方程多重插补(MICE))。【结果】CFSM-DAE在模拟数据上的表现为S_(RME)=0.0676,E_(MA)=0.0093,R^(2)=0.9958;在水稻种质资源数据上的表现为S_(RME)=0.0395,E_(MA)=0.0078,R^(2)=0.8913。相比之下,其他算法如DAE在这两类数据下的SRME表现分别为0.8896和0.7707;KNN算法的EMA表现分别为0.1183和0.1305;FIML算法的R2表现为0.3382和0.7321。因此,CFSM-DAE在多个评价指标上相较于其他算法都表现出了一定的提升,CFSM-DAE在模拟数据和水稻种质资源数据的表现优于其他算法。【结论】CFSM-DAE方法通过结合聚类、选择性过滤和全信息最大似然性等策略,显著提高了水稻种质资源数据中缺失值的填补精度,展示了其在处理复杂缺失值问题上的有效性和潜力。
基金Supported by the Henan Province Key Research and Development Project(231111211300)the Central Government of Henan Province Guides Local Science and Technology Development Funds(Z20231811005)+2 种基金Henan Province Key Research and Development Project(231111110100)Henan Provincial Outstanding Foreign Scientist Studio(GZS2024006)Henan Provincial Joint Fund for Scientific and Technological Research and Development Plan(Application and Overcoming Technical Barriers)(242103810028)。
文摘The fusion of infrared and visible images should emphasize the salient targets in the infrared image while preserving the textural details of the visible images.To meet these requirements,an autoencoder-based method for infrared and visible image fusion is proposed.The encoder designed according to the optimization objective consists of a base encoder and a detail encoder,which is used to extract low-frequency and high-frequency information from the image.This extraction may lead to some information not being captured,so a compensation encoder is proposed to supplement the missing information.Multi-scale decomposition is also employed to extract image features more comprehensively.The decoder combines low-frequency,high-frequency and supplementary information to obtain multi-scale features.Subsequently,the attention strategy and fusion module are introduced to perform multi-scale fusion for image reconstruction.Experimental results on three datasets show that the fused images generated by this network effectively retain salient targets while being more consistent with human visual perception.
基金supported by national funds through FCT(Fundaçao para a Ciencia e a Tecnologia),under the project-UIDB/04152/2020-Centro de Investigaçao em Gestao de Informaçao(MagIC)/NOVA IMS).
文摘The application of machine learning in the healthcare domain has groundbreaking potential across a wide range of scenarios.However,this potential is often stalled by data-related challenges,such as the imbalanced nature of the domain's data,where critical outcomes tend to be inherently rare.To address this challenge,we propose a novel oversampling approach,the counterfactual synthetic minority oversampling technique(Counterfactual SMOTE),which combines SMOTE with a counterfactual generation framework.Our method intrinsically performs an oversampling process near the decision boundary within a safe region of space,allowing for the generation of informative but non-noisy minority samples.To validate the proposed framework,a rigorous experimental procedure was conducted across a set of highly imbalanced binary classification challenges in healthcare.The results demonstrate the superiority of the proposed method over several of the most commonly used oversampling alternatives presented in the literature.Notably,Counterfactual SMOTE was the only method to present a convincingly superior performance when compared with the original SMOTE.Although the proposed method was specifically validated in the healthcare domain,owing to its relevance and frequently imbalanced nature,we expect the findings of this study to be generalizable to any imbalanced scenario.
文摘The e-commerce industry’s rapid growth,accelerated by the COVID-19 pandemic,has led to an alarming increase in digital fraud and associated losses.To establish a healthy e-commerce ecosystem,robust cyber security and anti-fraud measures are crucial.However,research on fraud detection systems has struggled to keep pace due to limited real-world datasets.Advances in artificial intelligence,Machine Learning(ML),and cloud computing have revitalized research and applications in this domain.While ML and data mining techniques are popular in fraud detection,specific reviews focusing on their application in e-commerce platforms like eBay and Facebook are lacking depth.Existing reviews provide broad overviews but fail to grasp the intricacies of ML algorithms in the e-commerce context.To bridge this gap,our study conducts a systematic literature review using the Preferred Reporting Items for Systematic reviews and Meta-Analysis(PRISMA)methodology.We aim to explore the effectiveness of these techniques in fraud detection within digital marketplaces and the broader e-commerce landscape.Understanding the current state of the literature and emerging trends is crucial given the rising fraud incidents and associated costs.Through our investigation,we identify research opportunities and provide insights to industry stakeholders on key ML and data mining techniques for combating e-commerce fraud.Our paper examines the research on these techniques as published in the past decade.Employing the PRISMA approach,we conducted a content analysis of 101 publications,identifying research gaps,recent techniques,and highlighting the increasing utilization of artificial neural networks in fraud detection within the industry.