Accurate fruit size estimation is crucial for plant phenotyping,as it enables precise crop management and en-hances agricultural productivity by providing essential data for growth and resource efficiency analysis.In ...Accurate fruit size estimation is crucial for plant phenotyping,as it enables precise crop management and en-hances agricultural productivity by providing essential data for growth and resource efficiency analysis.In this study,we estimated the size of on-plant oriental melons grown in a vertical cultivation system to address the challenges posed by leaf occlusion.Data augmentation was achieved using a diffusion model to generate syn-thetic leaves to cover existing fruits and create an enriched dataset.Three instance segmentation models-mask region-based convolutional neural network(CNN),Mask2Former,and detection transformer(DETR)-and six de-occlusion models derived from these architectures were implemented.These models successfully inferred both visible and occluded areas of the fruit.Notably,Amodal Mask2Former and occlusion-aware RCNN(ORCNN)achieved average precision scores of 85.92%and 85.35%,respectively.The inferred masks were used to es-timate the height and diameter of the fruit,with Amodal Mask2Former yielding a mean absolute error of 5.46 mm and 4.20 mm and a mean absolute percentage error of 4.86%and 5.33%,respectively.The results indicate enhanced performance of the transformer-based Amodal Mask2Former over CNN architectures in de-occlusion tasks and size estimation.Finally,the enhancement in de-occlusion models compared to conventional models was assessed and demonstrated across occlusion ratios ranging from 0 to 70%.However,generating synthetic datasets with occlusion ratios over 70%remains a limitation.展开更多
Existing multi-person reconstruction methods require the human bodies in the input image to occupy a considerable portion of the picture.However,low-resolution human objects are ubiquitous due to trade-offbetween the ...Existing multi-person reconstruction methods require the human bodies in the input image to occupy a considerable portion of the picture.However,low-resolution human objects are ubiquitous due to trade-offbetween the field of view and target distance given a limited camera resolution.In this paper,we propose an end-to-end multi-task framework for multi-person inference from a low-resolution image(MILI).To perceive more information from a low-resolution image,we use pair-wise images at high resolution and low resolution for training,and design a restoration network with a simple loss for better feature extraction from the low-resolution image.To address the occlusion problem in multi-person scenes,we propose an occlusion-aware mask prediction network to estimate the mask of each person during 3D mesh regression.Experimental results on both small-scale scenes and large-scale scenes demonstrate that our method outperforms the state-of-the-art methods both quantitatively and qualitatively.The code is available at http://cic.tju.edu.cn/faculty/likun/projects/MILI.展开更多
基金This work was supported by the Rural Development Administration(RDA)through the Cooperative Research Program for Agriculture Science and Technology Development[Project No.RS-2024-00440583].
文摘Accurate fruit size estimation is crucial for plant phenotyping,as it enables precise crop management and en-hances agricultural productivity by providing essential data for growth and resource efficiency analysis.In this study,we estimated the size of on-plant oriental melons grown in a vertical cultivation system to address the challenges posed by leaf occlusion.Data augmentation was achieved using a diffusion model to generate syn-thetic leaves to cover existing fruits and create an enriched dataset.Three instance segmentation models-mask region-based convolutional neural network(CNN),Mask2Former,and detection transformer(DETR)-and six de-occlusion models derived from these architectures were implemented.These models successfully inferred both visible and occluded areas of the fruit.Notably,Amodal Mask2Former and occlusion-aware RCNN(ORCNN)achieved average precision scores of 85.92%and 85.35%,respectively.The inferred masks were used to es-timate the height and diameter of the fruit,with Amodal Mask2Former yielding a mean absolute error of 5.46 mm and 4.20 mm and a mean absolute percentage error of 4.86%and 5.33%,respectively.The results indicate enhanced performance of the transformer-based Amodal Mask2Former over CNN architectures in de-occlusion tasks and size estimation.Finally,the enhancement in de-occlusion models compared to conventional models was assessed and demonstrated across occlusion ratios ranging from 0 to 70%.However,generating synthetic datasets with occlusion ratios over 70%remains a limitation.
基金partly supported by the National Natural Science Foundation of China(62122058,62171317,and 62231018).
文摘Existing multi-person reconstruction methods require the human bodies in the input image to occupy a considerable portion of the picture.However,low-resolution human objects are ubiquitous due to trade-offbetween the field of view and target distance given a limited camera resolution.In this paper,we propose an end-to-end multi-task framework for multi-person inference from a low-resolution image(MILI).To perceive more information from a low-resolution image,we use pair-wise images at high resolution and low resolution for training,and design a restoration network with a simple loss for better feature extraction from the low-resolution image.To address the occlusion problem in multi-person scenes,we propose an occlusion-aware mask prediction network to estimate the mask of each person during 3D mesh regression.Experimental results on both small-scale scenes and large-scale scenes demonstrate that our method outperforms the state-of-the-art methods both quantitatively and qualitatively.The code is available at http://cic.tju.edu.cn/faculty/likun/projects/MILI.