This study proposed an optimization method for multimodal large language models(MLLMs)reasoning based on structured chain of thought,aiming to enhance the visual decision-making capability in tree falling scenarios.Th...This study proposed an optimization method for multimodal large language models(MLLMs)reasoning based on structured chain of thought,aiming to enhance the visual decision-making capability in tree falling scenarios.The research first analyzed challenges faced by existing MLLMs when processing complicated visual scenes,including insufficient reasoning performance and low integration efficiency with other systems.To address these issues,an innovative structured chain of thought approach was introduced,which significantly improved the reasoning accuracy of the model in handling complex visual scenarios.To validate the proposed method,a specialized dataset focusing on tree falling scenarios in social governance was constructed,and a practical agent workflow was designed based on this dataset.Experimental results demonstrated that the proposed approach achieved better performance in real-world applications.The findings provide a reliable and efficient technical solution to visual decision-making in social governance.展开更多
文摘This study proposed an optimization method for multimodal large language models(MLLMs)reasoning based on structured chain of thought,aiming to enhance the visual decision-making capability in tree falling scenarios.The research first analyzed challenges faced by existing MLLMs when processing complicated visual scenes,including insufficient reasoning performance and low integration efficiency with other systems.To address these issues,an innovative structured chain of thought approach was introduced,which significantly improved the reasoning accuracy of the model in handling complex visual scenarios.To validate the proposed method,a specialized dataset focusing on tree falling scenarios in social governance was constructed,and a practical agent workflow was designed based on this dataset.Experimental results demonstrated that the proposed approach achieved better performance in real-world applications.The findings provide a reliable and efficient technical solution to visual decision-making in social governance.