Age estimation using forensics odontology is an important process in identifying victims in criminal or mass disaster cases.Traditionally,this process is done manually by human expert.However,the speed and accuracy ma...Age estimation using forensics odontology is an important process in identifying victims in criminal or mass disaster cases.Traditionally,this process is done manually by human expert.However,the speed and accuracy may vary depending on the expertise level of the human expert and other human factors such as level of fatigue and attentiveness.To improve the recognition speed and consistency,researchers have proposed automated age estimation using deep learning techniques such as Convolutional Neural Network(CNN).CNN requires many training images to obtain high percentage of recognition accuracy.Unfortunately,it is very difficult to get large number of samples of dental images for training the CNN due to the need to comply to privacy acts.A promising solution to this problem is a technique called Generative Adversarial Network(GAN).GAN is a technique that can generate synthetic images that has similar statistics as the training set.A variation of GAN called Conditional GAN(CGAN)enables the generation of the synthetic images to be controlled more precisely such that only the specified type of images will be generated.This paper proposes a CGAN for generating new dental images to increase the number of images available for training a CNN model to perform age estimation.We also propose a pseudolabelling technique to label the generated images with proper age and gender.We used the combination of real and generated images to trainDentalAge and Sex Net(DASNET),which is a CNN model for dental age estimation.Based on the experiment conducted,the accuracy,coefficient of determination(R2)and Absolute Error(AE)of DASNET have improved to 87%,0.85 and 1.18 years respectively as opposed to 74%,0.72 and 3.45 years when DASNET is trained using real,but smaller number of images.展开更多
By leveraging data from a fully labeled source domain,unsupervised domain adaptation(UDA)im-proves classification performance on an unlabeled target domain through explicit discrepancy minimization of data distributio...By leveraging data from a fully labeled source domain,unsupervised domain adaptation(UDA)im-proves classification performance on an unlabeled target domain through explicit discrepancy minimization of data distribution or adversarial learning.As an enhancement,category alignment is involved during adaptation to reinforce target feature discrimination by utilizing model prediction.However,there remain unexplored prob-lems about pseudo-label inaccuracy incurred by wrong category predictions on target domain,and distribution deviation caused by overfitting on source domain.In this paper,we propose a model-agnostic two-stage learning framework,which greatly reduces flawed model predictions using soft pseudo-label strategy and avoids overfitting on source domain with a curriculum learning strategy.Theoretically,it successfully decreases the combined risk in the upper bound of expected error on the target domain.In the first stage,we train a model with distribution alignment-based UDA method to obtain soft semantic label on target domain with rather high confidence.To avoid overfitting on source domain,in the second stage,we propose a curriculum learning strategy to adaptively control the weighting between losses from the two domains so that the focus of the training stage is gradually shifted from source distribution to target distribution with prediction confidence boosted on the target domain.Extensive experiments on two well-known benchmark datasets validate the universal effectiveness of our proposed framework on promoting the performance of the top-ranked UDA algorithms and demonstrate its consistent su-perior performance.展开更多
Predicting grapevine phenological stages(GPHS)is critical for precisely managing vineyard operations,including plant disease treatments,pruning,and harvest.Solutions commonly used to address viticulture challenges rel...Predicting grapevine phenological stages(GPHS)is critical for precisely managing vineyard operations,including plant disease treatments,pruning,and harvest.Solutions commonly used to address viticulture challenges rely on image processing techniques,which have achieved significant results.However,they require the installation of dedicated hardware in the vineyard,making it invasive and difficult to maintain.Moreover,accurate prediction is influenced by the interplay of climatic factors,especially temperature,and the impact of global warming,which are difficult to model using images.Another problem frequently found in GPHS prediction is the persistent issue of missing values in viticultural datasets,particularly in phenological stages.This paper proposes a semi-supervised approach that begins with a small set of labeled phenological stage examples and automatically generates new annotations for large volumes of unlabeled climatic data.This approach aims to address key challenges in phenological analysis.This novel climatic data-based approach offers advantages over common image processing methods,as it is non-intrusive,cost-effective,and adaptable for vineyards of various sizes and technological levels.To ensure the robustness of the proposed Pseudo-labelling strategy,we integrated it into eight machine-learning algorithms.We evaluated its performance across seven diverse datasets,each exhibiting varying percentages of missing values.Performance metrics,including the coefficient of determination(R2)and root-mean-square error(RMSE),are employed to assess the effectiveness of the models.The study demonstrates that integrating the proposed Pseudo-labeling strategy with supervised learning approaches significantly improves predictive accuracy.Moreover,the study shows that the proposed methodology can also be integrated with explainable artificial intelligence techniques to determine the importance of the input features.In particular,the investigation highlights that growing degree days are crucial for improved GPHS prediction.展开更多
Few-shot learning attempts to identify novel categories by exploiting limited labeled training data,while the performances of existing methods still have much room for improvement.Thanks to a very low cost,many recent...Few-shot learning attempts to identify novel categories by exploiting limited labeled training data,while the performances of existing methods still have much room for improvement.Thanks to a very low cost,many recent methods resort to additional unlabeled training data to boost performance,known as semi-supervised few-shot learning(SSFSL).The general idea of SSFSL methods is to first generate pseudo labels for all unlabeled data and then augment the labeled training set with selected pseudo-labeled data.However,almost all previous SSFSL methods only take supervision signal from pseudo-labeling,ignoring that the distribution of training data can also be utilized as an effective unsupervised regularization.In this paper,we propose a simple yet effective SSFSL method named feature reconstruction based regression method(TENET),which takes low-rank feature reconstruction as the unsupervised objective function and pseudo labels as the supervised constraint.We provide several theoretical insights on why TENET can mitigate overfitting on low-quality training data,and why it can enhance the robustness against inaccurate pseudo labels.Extensive experiments on four popular datasets validate the effectiveness of TENET.展开更多
随着网络信息规模的迅速增长,网络结构和数据流日益复杂,如何有效识别这些海量数据中的异常行为已成为网络安全领域的重要挑战。目前,基于深度学习的异常行为检测方法主要针对静态网络,并且依赖标注数据,忽略了大量未标记数据的潜在价...随着网络信息规模的迅速增长,网络结构和数据流日益复杂,如何有效识别这些海量数据中的异常行为已成为网络安全领域的重要挑战。目前,基于深度学习的异常行为检测方法主要针对静态网络,并且依赖标注数据,忽略了大量未标记数据的潜在价值。因此,提出一种基于动态图嵌入与对比学习的网络异常行为检测方法(network anomaly behavior detection method based on Dynamic Graph embedding and Contrastive Learning,DGCL)。该方法融合全局空间特征、局部结构特征和时间动态特征,利用Transformer生成高质量的节点表示,结合伪标签和对比学习策略提升检测性能。在Wikipedia、Reddit和Mooc这3个数据集上进行实验验证,结果表明:DGCL分别达到了87.89%、70.38%和70.11%的AUC值,相比其他同类方法,DGCL在动态网络异常检测中表现出更好的性能。展开更多
为有效解决多维时间序列(multivariate time series, MTS)无监督异常检测模型中自编码器模块容易拟合异常样本、正常MTS样本对应的隐空间特征可能被重构为异常MTS的问题,设计一种具有三重生成对抗的MTS异常检测模型。以LSTM自编码器为...为有效解决多维时间序列(multivariate time series, MTS)无监督异常检测模型中自编码器模块容易拟合异常样本、正常MTS样本对应的隐空间特征可能被重构为异常MTS的问题,设计一种具有三重生成对抗的MTS异常检测模型。以LSTM自编码器为生成器,基于重构误差生成伪标签,由判别器区分经伪标签过滤后的重构MTS和原始MTS;采用两次对抗训练将LSTM自编码器的隐空间约束为均匀分布,减少LSTM自编码器隐空间特征重构出异常MTS的可能性。多个公开MTS数据集上的实验结果表明,T-GAN能在带有污染数据的训练集上更好学习正常MTS分布,取得较高的异常检测效果。展开更多
文摘Age estimation using forensics odontology is an important process in identifying victims in criminal or mass disaster cases.Traditionally,this process is done manually by human expert.However,the speed and accuracy may vary depending on the expertise level of the human expert and other human factors such as level of fatigue and attentiveness.To improve the recognition speed and consistency,researchers have proposed automated age estimation using deep learning techniques such as Convolutional Neural Network(CNN).CNN requires many training images to obtain high percentage of recognition accuracy.Unfortunately,it is very difficult to get large number of samples of dental images for training the CNN due to the need to comply to privacy acts.A promising solution to this problem is a technique called Generative Adversarial Network(GAN).GAN is a technique that can generate synthetic images that has similar statistics as the training set.A variation of GAN called Conditional GAN(CGAN)enables the generation of the synthetic images to be controlled more precisely such that only the specified type of images will be generated.This paper proposes a CGAN for generating new dental images to increase the number of images available for training a CNN model to perform age estimation.We also propose a pseudolabelling technique to label the generated images with proper age and gender.We used the combination of real and generated images to trainDentalAge and Sex Net(DASNET),which is a CNN model for dental age estimation.Based on the experiment conducted,the accuracy,coefficient of determination(R2)and Absolute Error(AE)of DASNET have improved to 87%,0.85 and 1.18 years respectively as opposed to 74%,0.72 and 3.45 years when DASNET is trained using real,but smaller number of images.
基金the 111 Project(No.BP0719010)the Project of the Science and Technology Commission of Shanghai Municipality(No.18DZ2270700)。
文摘By leveraging data from a fully labeled source domain,unsupervised domain adaptation(UDA)im-proves classification performance on an unlabeled target domain through explicit discrepancy minimization of data distribution or adversarial learning.As an enhancement,category alignment is involved during adaptation to reinforce target feature discrimination by utilizing model prediction.However,there remain unexplored prob-lems about pseudo-label inaccuracy incurred by wrong category predictions on target domain,and distribution deviation caused by overfitting on source domain.In this paper,we propose a model-agnostic two-stage learning framework,which greatly reduces flawed model predictions using soft pseudo-label strategy and avoids overfitting on source domain with a curriculum learning strategy.Theoretically,it successfully decreases the combined risk in the upper bound of expected error on the target domain.In the first stage,we train a model with distribution alignment-based UDA method to obtain soft semantic label on target domain with rather high confidence.To avoid overfitting on source domain,in the second stage,we propose a curriculum learning strategy to adaptively control the weighting between losses from the two domains so that the focus of the training stage is gradually shifted from source distribution to target distribution with prediction confidence boosted on the target domain.Extensive experiments on two well-known benchmark datasets validate the universal effectiveness of our proposed framework on promoting the performance of the top-ranked UDA algorithms and demonstrate its consistent su-perior performance.
基金supported by the Departmental Strategic Plan(PSD)of the University of Udine-Interdepartmental Project on Artificial Intelligence(2020-25)this study was carried out within the Agritech National Research Center and received funding from the European Union's Next-Generation EU(National Recovery and Resilience Plan(PNRR)-Mission 4,Component 2,Investment 1.4-Decree No.1032 of 17/06/2022,CN00000022).
文摘Predicting grapevine phenological stages(GPHS)is critical for precisely managing vineyard operations,including plant disease treatments,pruning,and harvest.Solutions commonly used to address viticulture challenges rely on image processing techniques,which have achieved significant results.However,they require the installation of dedicated hardware in the vineyard,making it invasive and difficult to maintain.Moreover,accurate prediction is influenced by the interplay of climatic factors,especially temperature,and the impact of global warming,which are difficult to model using images.Another problem frequently found in GPHS prediction is the persistent issue of missing values in viticultural datasets,particularly in phenological stages.This paper proposes a semi-supervised approach that begins with a small set of labeled phenological stage examples and automatically generates new annotations for large volumes of unlabeled climatic data.This approach aims to address key challenges in phenological analysis.This novel climatic data-based approach offers advantages over common image processing methods,as it is non-intrusive,cost-effective,and adaptable for vineyards of various sizes and technological levels.To ensure the robustness of the proposed Pseudo-labelling strategy,we integrated it into eight machine-learning algorithms.We evaluated its performance across seven diverse datasets,each exhibiting varying percentages of missing values.Performance metrics,including the coefficient of determination(R2)and root-mean-square error(RMSE),are employed to assess the effectiveness of the models.The study demonstrates that integrating the proposed Pseudo-labeling strategy with supervised learning approaches significantly improves predictive accuracy.Moreover,the study shows that the proposed methodology can also be integrated with explainable artificial intelligence techniques to determine the importance of the input features.In particular,the investigation highlights that growing degree days are crucial for improved GPHS prediction.
基金supported in part by the Beijing Natural Science Foundation,China(No.L221013)in part by the National Science Foundation of China(Nos.U20B2070 and 61832016).
文摘Few-shot learning attempts to identify novel categories by exploiting limited labeled training data,while the performances of existing methods still have much room for improvement.Thanks to a very low cost,many recent methods resort to additional unlabeled training data to boost performance,known as semi-supervised few-shot learning(SSFSL).The general idea of SSFSL methods is to first generate pseudo labels for all unlabeled data and then augment the labeled training set with selected pseudo-labeled data.However,almost all previous SSFSL methods only take supervision signal from pseudo-labeling,ignoring that the distribution of training data can also be utilized as an effective unsupervised regularization.In this paper,we propose a simple yet effective SSFSL method named feature reconstruction based regression method(TENET),which takes low-rank feature reconstruction as the unsupervised objective function and pseudo labels as the supervised constraint.We provide several theoretical insights on why TENET can mitigate overfitting on low-quality training data,and why it can enhance the robustness against inaccurate pseudo labels.Extensive experiments on four popular datasets validate the effectiveness of TENET.
文摘随着网络信息规模的迅速增长,网络结构和数据流日益复杂,如何有效识别这些海量数据中的异常行为已成为网络安全领域的重要挑战。目前,基于深度学习的异常行为检测方法主要针对静态网络,并且依赖标注数据,忽略了大量未标记数据的潜在价值。因此,提出一种基于动态图嵌入与对比学习的网络异常行为检测方法(network anomaly behavior detection method based on Dynamic Graph embedding and Contrastive Learning,DGCL)。该方法融合全局空间特征、局部结构特征和时间动态特征,利用Transformer生成高质量的节点表示,结合伪标签和对比学习策略提升检测性能。在Wikipedia、Reddit和Mooc这3个数据集上进行实验验证,结果表明:DGCL分别达到了87.89%、70.38%和70.11%的AUC值,相比其他同类方法,DGCL在动态网络异常检测中表现出更好的性能。
文摘为有效解决多维时间序列(multivariate time series, MTS)无监督异常检测模型中自编码器模块容易拟合异常样本、正常MTS样本对应的隐空间特征可能被重构为异常MTS的问题,设计一种具有三重生成对抗的MTS异常检测模型。以LSTM自编码器为生成器,基于重构误差生成伪标签,由判别器区分经伪标签过滤后的重构MTS和原始MTS;采用两次对抗训练将LSTM自编码器的隐空间约束为均匀分布,减少LSTM自编码器隐空间特征重构出异常MTS的可能性。多个公开MTS数据集上的实验结果表明,T-GAN能在带有污染数据的训练集上更好学习正常MTS分布,取得较高的异常检测效果。