Objective This study aimed to explore a novel method that integrates the segmentation guidance classification and the dif-fusion model augmentation to realize the automatic classification for tibial plateau fractures(...Objective This study aimed to explore a novel method that integrates the segmentation guidance classification and the dif-fusion model augmentation to realize the automatic classification for tibial plateau fractures(TPFs).Methods YOLOv8n-cls was used to construct a baseline model on the data of 3781 patients from the Orthopedic Trauma Center of Wuhan Union Hospital.Additionally,a segmentation-guided classification approach was proposed.To enhance the dataset,a diffusion model was further demonstrated for data augmentation.Results The novel method that integrated the segmentation-guided classification and diffusion model augmentation sig-nificantly improved the accuracy and robustness of fracture classification.The average accuracy of classification for TPFs rose from 0.844 to 0.896.The comprehensive performance of the dual-stream model was also significantly enhanced after many rounds of training,with both the macro-area under the curve(AUC)and the micro-AUC increasing from 0.94 to 0.97.By utilizing diffusion model augmentation and segmentation map integration,the model demonstrated superior efficacy in identifying SchatzkerⅠ,achieving an accuracy of 0.880.It yielded an accuracy of 0.898 for SchatzkerⅡandⅢand 0.913 for SchatzkerⅣ;for SchatzkerⅤandⅥ,the accuracy was 0.887;and for intercondylar ridge fracture,the accuracy was 0.923.Conclusion The dual-stream attention-based classification network,which has been verified by many experiments,exhibited great potential in predicting the classification of TPFs.This method facilitates automatic TPF assessment and may assist surgeons in the rapid formulation of surgical plans.展开更多
To the Editor:Laparoscopic liver resection(LLR)is widely used as a standard procedure for liver malignancies and benign diseases.Consensus guidelines stated that LLR may be feasible and safe in experienced centers.Evi...To the Editor:Laparoscopic liver resection(LLR)is widely used as a standard procedure for liver malignancies and benign diseases.Consensus guidelines stated that LLR may be feasible and safe in experienced centers.Evidence has shown that LLR is less invasive and has bet-ter patient prognosis than conventional procedures[1].However,laparoscopic anatomic liver resection(LALR)such as segment 8(S8)resection is still challenging due to difficulties in segmental mapping and surgical techniques[2,3].Liver S8 is in a deep-seated area surrounded by the ribs and the diaphragm,and closely con-nected to the right and middle hepatic veins and inferior vena cava.Furthermore,the Glissonean pedicle of segment 8(G8)is lo-cated deep in the liver parenchyma,lacking anatomical landmarks,and making forceps manipulation difficult.Therefore,LALR-S8 has been described as the most challenging procedure[4].展开更多
Background Lack of depth perception from medical imaging systems is one of the long-standing technological limitations of minimally invasive surgeries.The ability to visualize anatomical structures in 3D can improve c...Background Lack of depth perception from medical imaging systems is one of the long-standing technological limitations of minimally invasive surgeries.The ability to visualize anatomical structures in 3D can improve conventional arthroscopic surgeries,as a full 3D semantic representation of the surgical site can directly improve surgeons’ability.It also brings the possibility of intraoperative image registration with preoperative clinical records for the development of semi-autonomous,and fully autonomous platforms.This study aimed to present a novel monocular depth prediction model to infer depth maps from a single-color arthroscopic video frame.Methods We applied a novel technique that provides the ability to combine both supervised and self-supervised loss terms and thus eliminate the drawback of each technique.It enabled the estimation of edge-preserving depth maps from a single untextured arthroscopic frame.The proposed image acquisition technique projected artificial textures on the surface to improve the quality of disparity maps from stereo images.Moreover,following the integration of the attention-ware multi-scale feature extraction technique along with scene global contextual constraints and multiscale depth fusion,the model could predict reliable and accurate tissue depth of the surgical sites that complies with scene geometry.Results A total of 4,128 stereo frames from a knee phantom were used to train a network,and during the pre-trained stage,the network learned disparity maps from the stereo images.The fine-tuned training phase uses 12,695 knee arthroscopic stereo frames from cadaver experiments along with their corresponding coarse disparity maps obtained from the stereo matching technique.In a supervised fashion,the network learns the left image to the disparity map transformation process,whereas the self-supervised loss term refines the coarse depth map by minimizing reprojection,gradients,and structural dissimilarity loss.Together,our method produces high-quality 3D maps with minimum re-projection loss that are 0.0004132(structural similarity index),0.00036120156(L1 error distance)and 6.591908×10^(−5)(L1 gradient error distance).Conclusion Machine learning techniques for monocular depth prediction is studied to infer accurate depth maps from a single-color arthroscopic video frame.Moreover,the study integrates segmentation model hence,3D segmented maps are inferred that provides extended perception ability and tissue awareness.展开更多
Crowd density estimation,in general,is a challenging task due to the large variation of head sizes in the crowds.Existing methods always use a multi-column convolutional neural network(MCNN)to adapt to this variation,...Crowd density estimation,in general,is a challenging task due to the large variation of head sizes in the crowds.Existing methods always use a multi-column convolutional neural network(MCNN)to adapt to this variation,which results in an average effect in areas with different densities and brings a lot of noise to the density map.To address this problem,we propose a new method called the segmentation-aware prior network(SAPNet),which generates a high-quality density map without noise based on a coarse head-segmentation map.SAPNet is composed of two networks,i.e.,a foreground-segmentation convolutional neural network(FS-CNN)as the front end and a crowd-regression convolutional neural network(CR-CNN)as the back end.With only the single dot annotation,we generate the ground truth of segmentation masks in heads.Then,based on the ground truth,FS-CNN outputs a coarse head-segmentation map,which helps eliminate the noise in regions without people in the density map.By inputting the head-segmentation map generated by the front end,CR-CNN performs accurate crowd counting estimation and generates a high-quality density map.We demonstrate SAPNet on four datasets(i.e.,ShanghaiTech,UCF-CC-50,WorldExpo’10,and UCSD),and show the state-of-the-art performances on ShanghaiTech part B and UCF-CC-50 datasets.展开更多
基金supported by the National Natural Science Foundation of China(Nos.81974355 and 82172524)Key Research and Development Program of Hubei Province(No.2021BEA161)+2 种基金National Innovation Platform Development Program(No.2020021105012440)Open Project Funding of the Hubei Key Laboratory of Big Data Intelligent Analysis and Application,Hubei University(No.2024BDIAA03)Free Innovation Preliminary Research Fund of Wuhan Union Hospital(No.2024XHYN047).
文摘Objective This study aimed to explore a novel method that integrates the segmentation guidance classification and the dif-fusion model augmentation to realize the automatic classification for tibial plateau fractures(TPFs).Methods YOLOv8n-cls was used to construct a baseline model on the data of 3781 patients from the Orthopedic Trauma Center of Wuhan Union Hospital.Additionally,a segmentation-guided classification approach was proposed.To enhance the dataset,a diffusion model was further demonstrated for data augmentation.Results The novel method that integrated the segmentation-guided classification and diffusion model augmentation sig-nificantly improved the accuracy and robustness of fracture classification.The average accuracy of classification for TPFs rose from 0.844 to 0.896.The comprehensive performance of the dual-stream model was also significantly enhanced after many rounds of training,with both the macro-area under the curve(AUC)and the micro-AUC increasing from 0.94 to 0.97.By utilizing diffusion model augmentation and segmentation map integration,the model demonstrated superior efficacy in identifying SchatzkerⅠ,achieving an accuracy of 0.880.It yielded an accuracy of 0.898 for SchatzkerⅡandⅢand 0.913 for SchatzkerⅣ;for SchatzkerⅤandⅥ,the accuracy was 0.887;and for intercondylar ridge fracture,the accuracy was 0.923.Conclusion The dual-stream attention-based classification network,which has been verified by many experiments,exhibited great potential in predicting the classification of TPFs.This method facilitates automatic TPF assessment and may assist surgeons in the rapid formulation of surgical plans.
文摘To the Editor:Laparoscopic liver resection(LLR)is widely used as a standard procedure for liver malignancies and benign diseases.Consensus guidelines stated that LLR may be feasible and safe in experienced centers.Evidence has shown that LLR is less invasive and has bet-ter patient prognosis than conventional procedures[1].However,laparoscopic anatomic liver resection(LALR)such as segment 8(S8)resection is still challenging due to difficulties in segmental mapping and surgical techniques[2,3].Liver S8 is in a deep-seated area surrounded by the ribs and the diaphragm,and closely con-nected to the right and middle hepatic veins and inferior vena cava.Furthermore,the Glissonean pedicle of segment 8(G8)is lo-cated deep in the liver parenchyma,lacking anatomical landmarks,and making forceps manipulation difficult.Therefore,LALR-S8 has been described as the most challenging procedure[4].
基金supported by the Australian Indian Strategic Research Fund(Project AISRF53820).
文摘Background Lack of depth perception from medical imaging systems is one of the long-standing technological limitations of minimally invasive surgeries.The ability to visualize anatomical structures in 3D can improve conventional arthroscopic surgeries,as a full 3D semantic representation of the surgical site can directly improve surgeons’ability.It also brings the possibility of intraoperative image registration with preoperative clinical records for the development of semi-autonomous,and fully autonomous platforms.This study aimed to present a novel monocular depth prediction model to infer depth maps from a single-color arthroscopic video frame.Methods We applied a novel technique that provides the ability to combine both supervised and self-supervised loss terms and thus eliminate the drawback of each technique.It enabled the estimation of edge-preserving depth maps from a single untextured arthroscopic frame.The proposed image acquisition technique projected artificial textures on the surface to improve the quality of disparity maps from stereo images.Moreover,following the integration of the attention-ware multi-scale feature extraction technique along with scene global contextual constraints and multiscale depth fusion,the model could predict reliable and accurate tissue depth of the surgical sites that complies with scene geometry.Results A total of 4,128 stereo frames from a knee phantom were used to train a network,and during the pre-trained stage,the network learned disparity maps from the stereo images.The fine-tuned training phase uses 12,695 knee arthroscopic stereo frames from cadaver experiments along with their corresponding coarse disparity maps obtained from the stereo matching technique.In a supervised fashion,the network learns the left image to the disparity map transformation process,whereas the self-supervised loss term refines the coarse depth map by minimizing reprojection,gradients,and structural dissimilarity loss.Together,our method produces high-quality 3D maps with minimum re-projection loss that are 0.0004132(structural similarity index),0.00036120156(L1 error distance)and 6.591908×10^(−5)(L1 gradient error distance).Conclusion Machine learning techniques for monocular depth prediction is studied to infer accurate depth maps from a single-color arthroscopic video frame.Moreover,the study integrates segmentation model hence,3D segmented maps are inferred that provides extended perception ability and tissue awareness.
基金the National Natural Science Foundation of China(No.61775048)the Fundamental Research Funds for the Central UniversitiesChina(No.ZDXMPY20180103)。
文摘Crowd density estimation,in general,is a challenging task due to the large variation of head sizes in the crowds.Existing methods always use a multi-column convolutional neural network(MCNN)to adapt to this variation,which results in an average effect in areas with different densities and brings a lot of noise to the density map.To address this problem,we propose a new method called the segmentation-aware prior network(SAPNet),which generates a high-quality density map without noise based on a coarse head-segmentation map.SAPNet is composed of two networks,i.e.,a foreground-segmentation convolutional neural network(FS-CNN)as the front end and a crowd-regression convolutional neural network(CR-CNN)as the back end.With only the single dot annotation,we generate the ground truth of segmentation masks in heads.Then,based on the ground truth,FS-CNN outputs a coarse head-segmentation map,which helps eliminate the noise in regions without people in the density map.By inputting the head-segmentation map generated by the front end,CR-CNN performs accurate crowd counting estimation and generates a high-quality density map.We demonstrate SAPNet on four datasets(i.e.,ShanghaiTech,UCF-CC-50,WorldExpo’10,and UCSD),and show the state-of-the-art performances on ShanghaiTech part B and UCF-CC-50 datasets.