Colonoscopy is currently one of the most sensitive screening methods for colorectal cancer.This study investigates the frontiers of intelligent colonoscopy techniques and their prospective implications for multimodal ...Colonoscopy is currently one of the most sensitive screening methods for colorectal cancer.This study investigates the frontiers of intelligent colonoscopy techniques and their prospective implications for multimodal medical applications.With this goal,we begin by assessing the current data-centric and model-centric landscapes through four tasks for colonoscopic scene perception,including classification,detection,segmentation,and vision-language understanding.Our assessment reveals domain-specific challenges and underscores the need for further multimodal research in colonoscopy.To address these gaps,we establish three foundational initiatives:a large-scale multimodal instruction tuning dataset ColonINST,a colonoscopy-designed multimodal language model ColonGPT,and a multimodal benchmark.To facilitate continuous advancements in this rapidly evolving field,we provide a public website for the latest updates:https://github.com/ai4colonoscopy/IntelliScope.展开更多
It remains a challenging task to detect pedestrians in crowds and it needs more efforts to understand why the detectors fail.When we perform an error analysis based on the traditional evaluation strategy,we find that ...It remains a challenging task to detect pedestrians in crowds and it needs more efforts to understand why the detectors fail.When we perform an error analysis based on the traditional evaluation strategy,we find that it produces many misleading false positives,which in fact cover occluded pedestrians.The reason for this is that we usually have two kinds of annotations in the dataset:regular pedestrians(detection targets)labeled by full-body boxes and ignored pedestrians(NOT detection targets)labeled by visible boxes.Ignored pedestrians are labeled as an additional category termed the“ignore region”.Nevertheless,our detectors always predict a full-body box for each pedestrian.This gap results in the following case:when a detector successfully predicts a full-body box for those ignored pedestrians,a false positive is triggered due to the low overlap between the predicted full-body box and the labeled visible box for the ignored pedestrian.This becomes even more harmful as the detector improves and becomes more capable of locating occluded pedestrians.To alleviate this issue,we devise a new pedestrian detection pipeline,which considers the additional visible box at both the detection and evaluation stages.During detection,we predict an extra visible box apart from the full-body box for every instance;during evaluation,we employ visible boxes instead of full-body boxes to match the“ignore region”.We apply the new pipeline to dozens of detection methods and validate the effectiveness of our pipeline in reducing the over-reporting of false positives and providing more reliable evaluation results.展开更多
基金supported by NSFC,China(No.62476143)the Fundamental Research Funds for the Central Universities,China(Nankai University,No.63253218)+1 种基金ANU-Optus Bushfire Research Centre of Excellence(BRCoE)(scholarship awarded to Ge-Peng Ji)supported by Natural Science Foundation of China(NSFC)(No.62306162).
文摘Colonoscopy is currently one of the most sensitive screening methods for colorectal cancer.This study investigates the frontiers of intelligent colonoscopy techniques and their prospective implications for multimodal medical applications.With this goal,we begin by assessing the current data-centric and model-centric landscapes through four tasks for colonoscopic scene perception,including classification,detection,segmentation,and vision-language understanding.Our assessment reveals domain-specific challenges and underscores the need for further multimodal research in colonoscopy.To address these gaps,we establish three foundational initiatives:a large-scale multimodal instruction tuning dataset ColonINST,a colonoscopy-designed multimodal language model ColonGPT,and a multimodal benchmark.To facilitate continuous advancements in this rapidly evolving field,we provide a public website for the latest updates:https://github.com/ai4colonoscopy/IntelliScope.
基金partially supported by the National Natural Science Foundation of China(Grant No.62322602)Natural Science Foundation of Jiangsu Province,China(Grant No.BK20230033)+1 种基金National Natural Science Foundation of China(Grant No.62172225)CAAI-Huawei MindSpore Open Fund.
文摘It remains a challenging task to detect pedestrians in crowds and it needs more efforts to understand why the detectors fail.When we perform an error analysis based on the traditional evaluation strategy,we find that it produces many misleading false positives,which in fact cover occluded pedestrians.The reason for this is that we usually have two kinds of annotations in the dataset:regular pedestrians(detection targets)labeled by full-body boxes and ignored pedestrians(NOT detection targets)labeled by visible boxes.Ignored pedestrians are labeled as an additional category termed the“ignore region”.Nevertheless,our detectors always predict a full-body box for each pedestrian.This gap results in the following case:when a detector successfully predicts a full-body box for those ignored pedestrians,a false positive is triggered due to the low overlap between the predicted full-body box and the labeled visible box for the ignored pedestrian.This becomes even more harmful as the detector improves and becomes more capable of locating occluded pedestrians.To alleviate this issue,we devise a new pedestrian detection pipeline,which considers the additional visible box at both the detection and evaluation stages.During detection,we predict an extra visible box apart from the full-body box for every instance;during evaluation,we employ visible boxes instead of full-body boxes to match the“ignore region”.We apply the new pipeline to dozens of detection methods and validate the effectiveness of our pipeline in reducing the over-reporting of false positives and providing more reliable evaluation results.