Flood disasters triggered by excessive rainfall cause severe damage to infrastructure and pose significant risks to human life.Within the context of disaster management,accurately identifying affected structures and p...Flood disasters triggered by excessive rainfall cause severe damage to infrastructure and pose significant risks to human life.Within the context of disaster management,accurately identifying affected structures and providing interpretable analytical results are of critical importance.This study proposes a new disaster analysis framework that integrates the Multi-Atrous Self-Attention(MASA)mechanism,which is designed to capture multi-scale spatial features effectively,with vision-language models for explainable flood assessment.The proposed approach consists of two main components.The first component performs segmentation to detect and quantify flood-affected structures,while the second component employs a fine-tuned vision language model to generate natural language descriptions of the disaster scene.The MASA module processes image-mask pairs from the FloodNet dataset to segment disaster related structures,whereas the LoRA(Low Rank Adaptation)enhanced BLIP-2(Bootstrapped Language Image Pre-training)model learns image-text pairs from the LADI-v2 dataset to produce textual disaster descriptions.Through this dual stage structure,the system provides both quantitative and linguistic outputs,enabling interpretable flood impact assessment.Experimental results demonstrate that the proposed MASA-based segmentation model achieves a mean Intersection over Union(mIoU)of 73.78%on FloodNet,outperforming state-of-the-art segmentation models.Furthermore,the LoRA-fine-tuned BLIP-2 model achieves a BLEU score of 80.77%on the LADI-v2 dataset,indicating fluent,contextually relevant,and semantically coherent textual outputs.The proposed system contributes to disaster analysis by enhancing explainability and interpretability in flood damage assessment.展开更多
基金supported by The Scientific and Technological Research Council of Turkey(TUBITAK)under project number 123E669.
文摘Flood disasters triggered by excessive rainfall cause severe damage to infrastructure and pose significant risks to human life.Within the context of disaster management,accurately identifying affected structures and providing interpretable analytical results are of critical importance.This study proposes a new disaster analysis framework that integrates the Multi-Atrous Self-Attention(MASA)mechanism,which is designed to capture multi-scale spatial features effectively,with vision-language models for explainable flood assessment.The proposed approach consists of two main components.The first component performs segmentation to detect and quantify flood-affected structures,while the second component employs a fine-tuned vision language model to generate natural language descriptions of the disaster scene.The MASA module processes image-mask pairs from the FloodNet dataset to segment disaster related structures,whereas the LoRA(Low Rank Adaptation)enhanced BLIP-2(Bootstrapped Language Image Pre-training)model learns image-text pairs from the LADI-v2 dataset to produce textual disaster descriptions.Through this dual stage structure,the system provides both quantitative and linguistic outputs,enabling interpretable flood impact assessment.Experimental results demonstrate that the proposed MASA-based segmentation model achieves a mean Intersection over Union(mIoU)of 73.78%on FloodNet,outperforming state-of-the-art segmentation models.Furthermore,the LoRA-fine-tuned BLIP-2 model achieves a BLEU score of 80.77%on the LADI-v2 dataset,indicating fluent,contextually relevant,and semantically coherent textual outputs.The proposed system contributes to disaster analysis by enhancing explainability and interpretability in flood damage assessment.