Recent advances in spatially resolved transcriptomic technologies have enabled unprecedented opportunities to elucidate tissue architecture and function in situ.Spatial transcriptomics can provide multimodal and compl...Recent advances in spatially resolved transcriptomic technologies have enabled unprecedented opportunities to elucidate tissue architecture and function in situ.Spatial transcriptomics can provide multimodal and complementary information simultaneously,including gene expression profiles,spatial locations,and histology images.However,most existing methods have limitations in efficiently utilizing spatial information and matched high-resolution histology images.To fully leverage the multi-modal information,we propose a SPAtially embedded Deep Attentional graph Clustering(SpaDAC)method to identify spatial domains while reconstructing denoised gene expression profiles.This method can efficiently learn the low-dimensional embeddings for spatial transcriptomics data by constructing multi-view graph modules to capture both spatial location connectives and morphological connectives.Benchmark results demonstrate that SpaDAC outperforms other algorithms on several recent spatial transcriptomics datasets.SpaDAC is a valuable tool for spatial domain detection,facilitating the comprehension of tissue architecture and cellular microenvironment.The source code of SpaDAC is freely available at Github(https://github.com/huoyuying/SpaDAC.git).展开更多
To address the challenge of missing modal information in entity alignment and to mitigate information loss or bias arising frommodal heterogeneity during fusion,while also capturing shared information acrossmodalities...To address the challenge of missing modal information in entity alignment and to mitigate information loss or bias arising frommodal heterogeneity during fusion,while also capturing shared information acrossmodalities,this paper proposes a Multi-modal Pre-synergistic Entity Alignmentmodel based on Cross-modalMutual Information Strategy Optimization(MPSEA).The model first employs independent encoders to process multi-modal features,including text,images,and numerical values.Next,a multi-modal pre-synergistic fusion mechanism integrates graph structural and visual modal features into the textual modality as preparatory information.This pre-fusion strategy enables unified perception of heterogeneous modalities at the model’s initial stage,reducing discrepancies during the fusion process.Finally,using cross-modal deep perception reinforcement learning,the model achieves adaptive multilevel feature fusion between modalities,supporting learningmore effective alignment strategies.Extensive experiments on multiple public datasets show that the MPSEA method achieves gains of up to 7% in Hits@1 and 8.2% in MRR on the FBDB15K dataset,and up to 9.1% in Hits@1 and 7.7% in MRR on the FBYG15K dataset,compared to existing state-of-the-art methods.These results confirm the effectiveness of the proposed model.展开更多
1 General information Journal of Geographical Sciences is an international academic journal that publishes papers of the highest quality in physical geography, natural resources, environmental sciences, geographic inf...1 General information Journal of Geographical Sciences is an international academic journal that publishes papers of the highest quality in physical geography, natural resources, environmental sciences, geographic information sciences, remote sensing and cartography. Manuscripts come from different parts of the world.展开更多
With the growing advancement of wireless communication technologies,WiFi-based human sensing has gained increasing attention as a non-intrusive and device-free solution.Among the available signal types,Channel State I...With the growing advancement of wireless communication technologies,WiFi-based human sensing has gained increasing attention as a non-intrusive and device-free solution.Among the available signal types,Channel State Information(CSI)offers fine-grained temporal,frequency,and spatial insights into multipath propagation,making it a crucial data source for human-centric sensing.Recently,the integration of deep learning has significantly improved the robustness and automation of feature extraction from CSI in complex environments.This paper provides a comprehensive review of deep learning-enhanced human sensing based on CSI.We first outline mainstream CSI acquisition tools and their hardware specifications,then provide a detailed discussion of preprocessing methods such as denoising,time–frequency transformation,data segmentation,and augmentation.Subsequently,we categorize deep learning approaches according to sensing tasks—namely detection,localization,and recognition—and highlight representative models across application scenarios.Finally,we examine key challenges including domain generalization,multi-user interference,and limited data availability,and we propose future research directions involving lightweight model deployment,multimodal data fusion,and semantic-level sensing.展开更多
With the increasing of the elderly population and the growing hearth care cost, the role of service robots in aiding the disabled and the elderly is becoming important. Many researchers in the world have paid much att...With the increasing of the elderly population and the growing hearth care cost, the role of service robots in aiding the disabled and the elderly is becoming important. Many researchers in the world have paid much attention to heaRthcare robots and rehabilitation robots. To get natural and harmonious communication between the user and a service robot, the information perception/feedback ability, and interaction ability for service robots become more important in many key issues.展开更多
This paper focuses on multi-modal Information Perception(IP)for Soft Robotic Hands(SRHs)using Machine Learning(ML)algorithms.A flexible Optical Fiber-based Curvature Sensor(OFCS)is fabricated,consisting of a Light-Emi...This paper focuses on multi-modal Information Perception(IP)for Soft Robotic Hands(SRHs)using Machine Learning(ML)algorithms.A flexible Optical Fiber-based Curvature Sensor(OFCS)is fabricated,consisting of a Light-Emitting Diode(LED),photosensitive detector,and optical fiber.Bending the roughened optical fiber generates lower light intensity,which reflecting the curvature of the soft finger.Together with the curvature and pressure information,multi-modal IP is performed to improve the recognition accuracy.Recognitions of gesture,object shape,size,and weight are implemented with multiple ML approaches,including the Supervised Learning Algorithms(SLAs)of K-Nearest Neighbor(KNN),Support Vector Machine(SVM),Logistic Regression(LR),and the unSupervised Learning Algorithm(un-SLA)of K-Means Clustering(KMC).Moreover,Optical Sensor Information(OSI),Pressure Sensor Information(PSI),and Double-Sensor Information(DSI)are adopted to compare the recognition accuracies.The experiment results demonstrate that the proposed sensors and recognition approaches are feasible and effective.The recognition accuracies obtained using the above ML algorithms and three modes of sensor information are higer than 85 percent for almost all combinations.Moreover,DSI is more accurate when compared to single modal sensor information and the KNN algorithm with a DSI outperforms the other combinations in recognition accuracy.展开更多
With the rapid development of economy,air pollution caused by industrial expansion has caused serious harm to human health and social development.Therefore,establishing an effective air pollution concentration predict...With the rapid development of economy,air pollution caused by industrial expansion has caused serious harm to human health and social development.Therefore,establishing an effective air pollution concentration prediction system is of great scientific and practical significance for accurate and reliable predictions.This paper proposes a combination of pointinterval prediction system for pollutant concentration prediction by leveraging neural network,meta-heuristic optimization algorithm,and fuzzy theory.Fuzzy information granulation technology is used in data preprocessing to transform numerical sequences into fuzzy particles for comprehensive feature extraction.The golden Jackal optimization algorithm is employed in the optimization stage to fine-tune model hyperparameters.In the prediction stage,an ensemble learning method combines training results frommultiplemodels to obtain final point predictions while also utilizing quantile regression and kernel density estimation methods for interval predictions on the test set.Experimental results demonstrate that the combined model achieves a high goodness of fit coefficient of determination(R^(2))at 99.3% and a maximum difference between prediction accuracy mean absolute percentage error(MAPE)and benchmark model at 12.6%.This suggests that the integrated learning system proposed in this paper can provide more accurate deterministic predictions as well as reliable uncertainty analysis compared to traditionalmodels,offering practical reference for air quality early warning.展开更多
With the development of smart cities and smart technologies,parks,as functional units of the city,are facing smart transformation.The development of smart parks can help address challenges of technology integration wi...With the development of smart cities and smart technologies,parks,as functional units of the city,are facing smart transformation.The development of smart parks can help address challenges of technology integration within urban spaces and serve as testbeds for exploring smart city planning and governance models.Information models facilitate the effective integration of technology into space.Building Information Modeling(BIM)and City Information Modeling(CIM)have been widely used in urban construction.However,the existing information models have limitations in the application of the park,so it is necessary to develop an information model suitable for the park.This paper first traces the evolution of park smart transformation,reviews the global landscape of smart park development,and identifies key trends and persistent challenges.Addressing the particularities of parks,the concept of Park Information Modeling(PIM)is proposed.PIM leverages smart technologies such as artificial intelligence,digital twins,and collaborative sensing to help form a‘space-technology-system’smart structure,enabling systematic management of diverse park spaces,addressing the deficiency in park-level information models,and aiming to achieve scale articulation between BIM and CIM.Finally,through a detailed top-level design application case study of the Nanjing Smart Education Park in China,this paper illustrates the translation process of the PIM concept into practice,showcasing its potential to provide smart management tools for park managers and enhance services for park stakeholders,although further empirical validation is required.展开更多
Objective To develop a non-invasive predictive model for coronary artery stenosis severity based on adaptive multi-modal integration of traditional Chinese and western medicine data.Methods Clinical indicators,echocar...Objective To develop a non-invasive predictive model for coronary artery stenosis severity based on adaptive multi-modal integration of traditional Chinese and western medicine data.Methods Clinical indicators,echocardiographic data,traditional Chinese medicine(TCM)tongue manifestations,and facial features were collected from patients who underwent coro-nary computed tomography angiography(CTA)in the Cardiac Care Unit(CCU)of Shanghai Tenth People's Hospital between May 1,2023 and May 1,2024.An adaptive weighted multi-modal data fusion(AWMDF)model based on deep learning was constructed to predict the severity of coronary artery stenosis.The model was evaluated using metrics including accura-cy,precision,recall,F1 score,and the area under the receiver operating characteristic(ROC)curve(AUC).Further performance assessment was conducted through comparisons with six ensemble machine learning methods,data ablation,model component ablation,and various decision-level fusion strategies.Results A total of 158 patients were included in the study.The AWMDF model achieved ex-cellent predictive performance(AUC=0.973,accuracy=0.937,precision=0.937,recall=0.929,and F1 score=0.933).Compared with model ablation,data ablation experiments,and various traditional machine learning models,the AWMDF model demonstrated superior per-formance.Moreover,the adaptive weighting strategy outperformed alternative approaches,including simple weighting,averaging,voting,and fixed-weight schemes.Conclusion The AWMDF model demonstrates potential clinical value in the non-invasive prediction of coronary artery disease and could serve as a tool for clinical decision support.展开更多
Traditional Chinese medicine(TCM)demonstrates distinctive advantages in disease prevention and treatment.However,analyzing its biological mechanisms through the modern medical research paradigm of“single drug,single ...Traditional Chinese medicine(TCM)demonstrates distinctive advantages in disease prevention and treatment.However,analyzing its biological mechanisms through the modern medical research paradigm of“single drug,single target”presents significant challenges due to its holistic approach.Network pharmacology and its core theory of network targets connect drugs and diseases from a holistic and systematic perspective based on biological networks,overcoming the limitations of reductionist research models and showing considerable value in TCM research.Recent integration of network target computational and experimental methods with artificial intelligence(AI)and multi-modal multi-omics technologies has substantially enhanced network pharmacology methodology.The advancement in computational and experimental techniques provides complementary support for network target theory in decoding TCM principles.This review,centered on network targets,examines the progress of network target methods combined with AI in predicting disease molecular mechanisms and drug-target relationships,alongside the application of multi-modal multi-omics technologies in analyzing TCM formulae,syndromes,and toxicity.Looking forward,network target theory is expected to incorporate emerging technologies while developing novel approaches aligned with its unique characteristics,potentially leading to significant breakthroughs in TCM research and advancing scientific understanding and innovation in TCM.展开更多
With the advent of the next-generation Air Traffic Control(ATC)system,there is growing interest in using Artificial Intelligence(AI)techniques to enhance Situation Awareness(SA)for ATC Controllers(ATCOs),i.e.,Intellig...With the advent of the next-generation Air Traffic Control(ATC)system,there is growing interest in using Artificial Intelligence(AI)techniques to enhance Situation Awareness(SA)for ATC Controllers(ATCOs),i.e.,Intelligent SA(ISA).However,the existing AI-based SA approaches often rely on unimodal data and lack a comprehensive description and benchmark of the ISA tasks utilizing multi-modal data for real-time ATC environments.To address this gap,by analyzing the situation awareness procedure of the ATCOs,the ISA task is refined to the processing of the two primary elements,i.e.,spoken instructions and flight trajectories.Subsequently,the ISA is further formulated into Controlling Intent Understanding(CIU)and Flight Trajectory Prediction(FTP)tasks.For the CIU task,an innovative automatic speech recognition and understanding framework is designed to extract the controlling intent from unstructured and continuous ATC communications.For the FTP task,the single-and multi-horizon FTP approaches are investigated to support the high-precision prediction of the situation evolution.A total of 32 unimodal/multi-modal advanced methods with extensive evaluation metrics are introduced to conduct the benchmarks on the real-world multi-modal ATC situation dataset.Experimental results demonstrate the effectiveness of AI-based techniques in enhancing ISA for the ATC environment.展开更多
A personalized outfit recommendation has emerged as a hot research topic in the fashion domain.However,existing recommendations do not fully exploit user style preferences.Typically,users prefer particular styles such...A personalized outfit recommendation has emerged as a hot research topic in the fashion domain.However,existing recommendations do not fully exploit user style preferences.Typically,users prefer particular styles such as casual and athletic styles,and consider attributes like color and texture when selecting outfits.To achieve personalized outfit recommendations in line with user style preferences,this paper proposes a personal style guided outfit recommendation with multi-modal fashion compatibility modeling,termed as PSGNet.Firstly,a style classifier is designed to categorize fashion images of various clothing types and attributes into distinct style categories.Secondly,a personal style prediction module extracts user style preferences by analyzing historical data.Then,to address the limitations of single-modal representations and enhance fashion compatibility,both fashion images and text data are leveraged to extract multi-modal features.Finally,PSGNet integrates these components through Bayesian personalized ranking(BPR)to unify the personal style and fashion compatibility,where the former is used as personal style features and guides the output of the personalized outfit recommendation tailored to the target user.Extensive experiments on large-scale datasets demonstrate that the proposed model is efficient on the personalized outfit recommendation.展开更多
Information extraction(IE)aims to automatically identify and extract information about specific interests from raw texts.Despite the abundance of solutions based on fine-tuning pretrained language models,IE in the con...Information extraction(IE)aims to automatically identify and extract information about specific interests from raw texts.Despite the abundance of solutions based on fine-tuning pretrained language models,IE in the context of fewshot and zero-shot scenarios remains highly challenging due to the scarcity of training data.Large language models(LLMs),on the other hand,can generalize well to unseen tasks with few-shot demonstrations or even zero-shot instructions and have demonstrated impressive ability for a wide range of natural language understanding or generation tasks.Nevertheless,it is unclear,whether such effectiveness can be replicated in the task of IE,where the target tasks involve specialized schema and quite abstractive entity or relation concepts.In this paper,we first examine the validity of LLMs in executing IE tasks with an established prompting strategy and further propose multiple types of augmented prompting methods,including the structured fundamental prompt(SFP),the structured interactive reasoning prompt(SIRP),and the voting-enabled structured interactive reasoning prompt(VESIRP).The experimental results demonstrate that while directly promotes inferior performance,the proposed augmented prompt methods significantly improve the extraction accuracy,achieving comparable or even better performance(e.g.,zero-shot FewNERD,FewNERD-INTRA)than state-of-theart methods that require large-scale training samples.This study represents a systematic exploration of employing instruction-following LLM for the task of IE.It not only establishes a performance benchmark for this novel paradigm but,more importantly,validates a practical technical pathway through the proposed prompt enhancement method,offering a viable solution for efficient IE in low-resource settings.展开更多
Ecological monitoring vehicles are equipped with a range of sensors and monitoring devices designed to gather data on ecological and environmental factors.These vehicles are crucial in various fields,including environ...Ecological monitoring vehicles are equipped with a range of sensors and monitoring devices designed to gather data on ecological and environmental factors.These vehicles are crucial in various fields,including environmental science research,ecological and environmental monitoring projects,disaster response,and emergency management.A key method employed in these vehicles for achieving high-precision positioning is LiDAR(lightlaser detection and ranging)-Visual Simultaneous Localization and Mapping(SLAM).However,maintaining highprecision localization in complex scenarios,such as degraded environments or when dynamic objects are present,remains a significant challenge.To address this issue,we integrate both semantic and texture information from LiDAR and cameras to enhance the robustness and efficiency of data registration.Specifically,semantic information simplifies the modeling of scene elements,reducing the reliance on dense point clouds,which can be less efficient.Meanwhile,visual texture information complements LiDAR-Visual localization by providing additional contextual details.By incorporating semantic and texture details frompaired images and point clouds,we significantly improve the quality of data association,thereby increasing the success rate of localization.This approach not only enhances the operational capabilities of ecological monitoring vehicles in complex environments but also contributes to improving the overall efficiency and effectiveness of ecological monitoring and environmental protection efforts.展开更多
Advanced geological prediction is a crucial means to ensure safety and efficiency in tunnel construction.However,diff erent advanced geological forecasting methods have their own limitations,resulting in poor detectio...Advanced geological prediction is a crucial means to ensure safety and efficiency in tunnel construction.However,diff erent advanced geological forecasting methods have their own limitations,resulting in poor detection accuracy.Using multiple methods to carry out a comprehensive evaluation can eff ectively improve the accuracy of advanced geological prediction results.In this study,geological information is combined with the detection results of geophysical methods,including transient electromagnetic,induced polarization,and tunnel seismic prediction,to establish a comprehensive analysis method of adverse geology.First,the possible main adverse geological problems are determined according to the geological information.Subsequently,various physical parameters of the rock mass in front of the tunnel face can then be derived on the basis of multisource geophysical data.Finally,based on the analysis results of geological information,the multisource data fusion algorithm is used to determine the type,location,and scale of adverse geology.The advanced geological prediction results that can provide eff ective guidance for tunnel construction can then be obtained.展开更多
Multi-modal Named Entity Recognition(MNER)aims to better identify meaningful textual entities by integrating information from images.Previous work has focused on extracting visual semantics at a fine-grained level,or ...Multi-modal Named Entity Recognition(MNER)aims to better identify meaningful textual entities by integrating information from images.Previous work has focused on extracting visual semantics at a fine-grained level,or obtaining entity related external knowledge from knowledge bases or Large Language Models(LLMs).However,these approaches ignore the poor semantic correlation between visual and textual modalities in MNER datasets and do not explore different multi-modal fusion approaches.In this paper,we present MMAVK,a multi-modal named entity recognition model with auxiliary visual knowledge and word-level fusion,which aims to leverage the Multi-modal Large Language Model(MLLM)as an implicit knowledge base.It also extracts vision-based auxiliary knowledge from the image formore accurate and effective recognition.Specifically,we propose vision-based auxiliary knowledge generation,which guides the MLLM to extract external knowledge exclusively derived from images to aid entity recognition by designing target-specific prompts,thus avoiding redundant recognition and cognitive confusion caused by the simultaneous processing of image-text pairs.Furthermore,we employ a word-level multi-modal fusion mechanism to fuse the extracted external knowledge with each word-embedding embedded from the transformerbased encoder.Extensive experimental results demonstrate that MMAVK outperforms or equals the state-of-the-art methods on the two classical MNER datasets,even when the largemodels employed have significantly fewer parameters than other baselines.展开更多
Multi-modal knowledge graph completion(MMKGC)aims to complete missing entities or relations in multi-modal knowledge graphs,thereby discovering more previously unknown triples.Due to the continuous growth of data and ...Multi-modal knowledge graph completion(MMKGC)aims to complete missing entities or relations in multi-modal knowledge graphs,thereby discovering more previously unknown triples.Due to the continuous growth of data and knowledge and the limitations of data sources,the visual knowledge within the knowledge graphs is generally of low quality,and some entities suffer from the issue of missing visual modality.Nevertheless,previous studies of MMKGC have primarily focused on how to facilitate modality interaction and fusion while neglecting the problems of low modality quality and modality missing.In this case,mainstream MMKGC models only use pre-trained visual encoders to extract features and transfer the semantic information to the joint embeddings through modal fusion,which inevitably suffers from problems such as error propagation and increased uncertainty.To address these problems,we propose a Multi-modal knowledge graph Completion model based on Super-resolution and Detailed Description Generation(MMCSD).Specifically,we leverage a pre-trained residual network to enhance the resolution and improve the quality of the visual modality.Moreover,we design multi-level visual semantic extraction and entity description generation,thereby further extracting entity semantics from structural triples and visual images.Meanwhile,we train a variational multi-modal auto-encoder and utilize a pre-trained multi-modal language model to complement the missing visual features.We conducted experiments on FB15K-237 and DB13K,and the results showed that MMCSD can effectively perform MMKGC and achieve state-of-the-art performance.展开更多
Integrating multiple medical imaging techniques,including Magnetic Resonance Imaging(MRI),Computed Tomography,Positron Emission Tomography(PET),and ultrasound,provides a comprehensive view of the patient health status...Integrating multiple medical imaging techniques,including Magnetic Resonance Imaging(MRI),Computed Tomography,Positron Emission Tomography(PET),and ultrasound,provides a comprehensive view of the patient health status.Each of these methods contributes unique diagnostic insights,enhancing the overall assessment of patient condition.Nevertheless,the amalgamation of data from multiple modalities presents difficulties due to disparities in resolution,data collection methods,and noise levels.While traditional models like Convolutional Neural Networks(CNNs)excel in single-modality tasks,they struggle to handle multi-modal complexities,lacking the capacity to model global relationships.This research presents a novel approach for examining multi-modal medical imagery using a transformer-based system.The framework employs self-attention and cross-attention mechanisms to synchronize and integrate features across various modalities.Additionally,it shows resilience to variations in noise and image quality,making it adaptable for real-time clinical use.To address the computational hurdles linked to transformer models,particularly in real-time clinical applications in resource-constrained environments,several optimization techniques have been integrated to boost scalability and efficiency.Initially,a streamlined transformer architecture was adopted to minimize the computational load while maintaining model effectiveness.Methods such as model pruning,quantization,and knowledge distillation have been applied to reduce the parameter count and enhance the inference speed.Furthermore,efficient attention mechanisms such as linear or sparse attention were employed to alleviate the substantial memory and processing requirements of traditional self-attention operations.For further deployment optimization,researchers have implemented hardware-aware acceleration strategies,including the use of TensorRT and ONNX-based model compression,to ensure efficient execution on edge devices.These optimizations allow the approach to function effectively in real-time clinical settings,ensuring viability even in environments with limited resources.Future research directions include integrating non-imaging data to facilitate personalized treatment and enhancing computational efficiency for implementation in resource-limited environments.This study highlights the transformative potential of transformer models in multi-modal medical imaging,offering improvements in diagnostic accuracy and patient care outcomes.展开更多
基金supported by National Natural Science Foundation of China(62003028).X.L.was supported by a Scholarship from the China Scholarship Council.
文摘Recent advances in spatially resolved transcriptomic technologies have enabled unprecedented opportunities to elucidate tissue architecture and function in situ.Spatial transcriptomics can provide multimodal and complementary information simultaneously,including gene expression profiles,spatial locations,and histology images.However,most existing methods have limitations in efficiently utilizing spatial information and matched high-resolution histology images.To fully leverage the multi-modal information,we propose a SPAtially embedded Deep Attentional graph Clustering(SpaDAC)method to identify spatial domains while reconstructing denoised gene expression profiles.This method can efficiently learn the low-dimensional embeddings for spatial transcriptomics data by constructing multi-view graph modules to capture both spatial location connectives and morphological connectives.Benchmark results demonstrate that SpaDAC outperforms other algorithms on several recent spatial transcriptomics datasets.SpaDAC is a valuable tool for spatial domain detection,facilitating the comprehension of tissue architecture and cellular microenvironment.The source code of SpaDAC is freely available at Github(https://github.com/huoyuying/SpaDAC.git).
基金partially supported by the National Natural Science Foundation of China under Grants 62471493 and 62402257(for conceptualization and investigation)partially supported by the Natural Science Foundation of Shandong Province,China under Grants ZR2023LZH017,ZR2024MF066,and 2023QF025(for formal analysis and validation)+1 种基金partially supported by the Open Foundation of Key Laboratory of Computing Power Network and Information Security,Ministry of Education,Qilu University of Technology(Shandong Academy of Sciences)under Grant 2023ZD010(for methodology and model design)partially supported by the Russian Science Foundation(RSF)Project under Grant 22-71-10095-P(for validation and results verification).
文摘To address the challenge of missing modal information in entity alignment and to mitigate information loss or bias arising frommodal heterogeneity during fusion,while also capturing shared information acrossmodalities,this paper proposes a Multi-modal Pre-synergistic Entity Alignmentmodel based on Cross-modalMutual Information Strategy Optimization(MPSEA).The model first employs independent encoders to process multi-modal features,including text,images,and numerical values.Next,a multi-modal pre-synergistic fusion mechanism integrates graph structural and visual modal features into the textual modality as preparatory information.This pre-fusion strategy enables unified perception of heterogeneous modalities at the model’s initial stage,reducing discrepancies during the fusion process.Finally,using cross-modal deep perception reinforcement learning,the model achieves adaptive multilevel feature fusion between modalities,supporting learningmore effective alignment strategies.Extensive experiments on multiple public datasets show that the MPSEA method achieves gains of up to 7% in Hits@1 and 8.2% in MRR on the FBDB15K dataset,and up to 9.1% in Hits@1 and 7.7% in MRR on the FBYG15K dataset,compared to existing state-of-the-art methods.These results confirm the effectiveness of the proposed model.
文摘1 General information Journal of Geographical Sciences is an international academic journal that publishes papers of the highest quality in physical geography, natural resources, environmental sciences, geographic information sciences, remote sensing and cartography. Manuscripts come from different parts of the world.
基金supported by National Natural Science Foundation of China(NSFC)under grant U23A20310.
文摘With the growing advancement of wireless communication technologies,WiFi-based human sensing has gained increasing attention as a non-intrusive and device-free solution.Among the available signal types,Channel State Information(CSI)offers fine-grained temporal,frequency,and spatial insights into multipath propagation,making it a crucial data source for human-centric sensing.Recently,the integration of deep learning has significantly improved the robustness and automation of feature extraction from CSI in complex environments.This paper provides a comprehensive review of deep learning-enhanced human sensing based on CSI.We first outline mainstream CSI acquisition tools and their hardware specifications,then provide a detailed discussion of preprocessing methods such as denoising,time–frequency transformation,data segmentation,and augmentation.Subsequently,we categorize deep learning approaches according to sensing tasks—namely detection,localization,and recognition—and highlight representative models across application scenarios.Finally,we examine key challenges including domain generalization,multi-user interference,and limited data availability,and we propose future research directions involving lightweight model deployment,multimodal data fusion,and semantic-level sensing.
文摘With the increasing of the elderly population and the growing hearth care cost, the role of service robots in aiding the disabled and the elderly is becoming important. Many researchers in the world have paid much attention to heaRthcare robots and rehabilitation robots. To get natural and harmonious communication between the user and a service robot, the information perception/feedback ability, and interaction ability for service robots become more important in many key issues.
基金support provided by the National Natural Science Foundation of China (Nos. 61803267 and 61572328)the China Postdoctoral Science Foundation (No.2017M622757)+1 种基金the Beijing Science and Technology program (No.Z171100000817007)the National Science Foundation of China (NSFC) and the German Re-search Foundation (DFG) in the project Cross Modal Learning,NSFC 61621136008/DFG TRR-169
文摘This paper focuses on multi-modal Information Perception(IP)for Soft Robotic Hands(SRHs)using Machine Learning(ML)algorithms.A flexible Optical Fiber-based Curvature Sensor(OFCS)is fabricated,consisting of a Light-Emitting Diode(LED),photosensitive detector,and optical fiber.Bending the roughened optical fiber generates lower light intensity,which reflecting the curvature of the soft finger.Together with the curvature and pressure information,multi-modal IP is performed to improve the recognition accuracy.Recognitions of gesture,object shape,size,and weight are implemented with multiple ML approaches,including the Supervised Learning Algorithms(SLAs)of K-Nearest Neighbor(KNN),Support Vector Machine(SVM),Logistic Regression(LR),and the unSupervised Learning Algorithm(un-SLA)of K-Means Clustering(KMC).Moreover,Optical Sensor Information(OSI),Pressure Sensor Information(PSI),and Double-Sensor Information(DSI)are adopted to compare the recognition accuracies.The experiment results demonstrate that the proposed sensors and recognition approaches are feasible and effective.The recognition accuracies obtained using the above ML algorithms and three modes of sensor information are higer than 85 percent for almost all combinations.Moreover,DSI is more accurate when compared to single modal sensor information and the KNN algorithm with a DSI outperforms the other combinations in recognition accuracy.
基金supported by General Scientific Research Funding of the Science and Technology Development Fund(FDCT)in Macao(No.0150/2022/A)the Faculty Research Grants of Macao University of Science and Technology(No.FRG-22-074-FIE).
文摘With the rapid development of economy,air pollution caused by industrial expansion has caused serious harm to human health and social development.Therefore,establishing an effective air pollution concentration prediction system is of great scientific and practical significance for accurate and reliable predictions.This paper proposes a combination of pointinterval prediction system for pollutant concentration prediction by leveraging neural network,meta-heuristic optimization algorithm,and fuzzy theory.Fuzzy information granulation technology is used in data preprocessing to transform numerical sequences into fuzzy particles for comprehensive feature extraction.The golden Jackal optimization algorithm is employed in the optimization stage to fine-tune model hyperparameters.In the prediction stage,an ensemble learning method combines training results frommultiplemodels to obtain final point predictions while also utilizing quantile regression and kernel density estimation methods for interval predictions on the test set.Experimental results demonstrate that the combined model achieves a high goodness of fit coefficient of determination(R^(2))at 99.3% and a maximum difference between prediction accuracy mean absolute percentage error(MAPE)and benchmark model at 12.6%.This suggests that the integrated learning system proposed in this paper can provide more accurate deterministic predictions as well as reliable uncertainty analysis compared to traditionalmodels,offering practical reference for air quality early warning.
基金Under the auspices of National Natural Science Foundation of China(No.42330510)。
文摘With the development of smart cities and smart technologies,parks,as functional units of the city,are facing smart transformation.The development of smart parks can help address challenges of technology integration within urban spaces and serve as testbeds for exploring smart city planning and governance models.Information models facilitate the effective integration of technology into space.Building Information Modeling(BIM)and City Information Modeling(CIM)have been widely used in urban construction.However,the existing information models have limitations in the application of the park,so it is necessary to develop an information model suitable for the park.This paper first traces the evolution of park smart transformation,reviews the global landscape of smart park development,and identifies key trends and persistent challenges.Addressing the particularities of parks,the concept of Park Information Modeling(PIM)is proposed.PIM leverages smart technologies such as artificial intelligence,digital twins,and collaborative sensing to help form a‘space-technology-system’smart structure,enabling systematic management of diverse park spaces,addressing the deficiency in park-level information models,and aiming to achieve scale articulation between BIM and CIM.Finally,through a detailed top-level design application case study of the Nanjing Smart Education Park in China,this paper illustrates the translation process of the PIM concept into practice,showcasing its potential to provide smart management tools for park managers and enhance services for park stakeholders,although further empirical validation is required.
基金Construction Program of the Key Discipline of State Administration of Traditional Chinese Medicine of China(ZYYZDXK-2023069)Research Project of Shanghai Municipal Health Commission (2024QN018)Shanghai University of Traditional Chinese Medicine Science and Technology Development Program (23KFL005)。
文摘Objective To develop a non-invasive predictive model for coronary artery stenosis severity based on adaptive multi-modal integration of traditional Chinese and western medicine data.Methods Clinical indicators,echocardiographic data,traditional Chinese medicine(TCM)tongue manifestations,and facial features were collected from patients who underwent coro-nary computed tomography angiography(CTA)in the Cardiac Care Unit(CCU)of Shanghai Tenth People's Hospital between May 1,2023 and May 1,2024.An adaptive weighted multi-modal data fusion(AWMDF)model based on deep learning was constructed to predict the severity of coronary artery stenosis.The model was evaluated using metrics including accura-cy,precision,recall,F1 score,and the area under the receiver operating characteristic(ROC)curve(AUC).Further performance assessment was conducted through comparisons with six ensemble machine learning methods,data ablation,model component ablation,and various decision-level fusion strategies.Results A total of 158 patients were included in the study.The AWMDF model achieved ex-cellent predictive performance(AUC=0.973,accuracy=0.937,precision=0.937,recall=0.929,and F1 score=0.933).Compared with model ablation,data ablation experiments,and various traditional machine learning models,the AWMDF model demonstrated superior per-formance.Moreover,the adaptive weighting strategy outperformed alternative approaches,including simple weighting,averaging,voting,and fixed-weight schemes.Conclusion The AWMDF model demonstrates potential clinical value in the non-invasive prediction of coronary artery disease and could serve as a tool for clinical decision support.
文摘Traditional Chinese medicine(TCM)demonstrates distinctive advantages in disease prevention and treatment.However,analyzing its biological mechanisms through the modern medical research paradigm of“single drug,single target”presents significant challenges due to its holistic approach.Network pharmacology and its core theory of network targets connect drugs and diseases from a holistic and systematic perspective based on biological networks,overcoming the limitations of reductionist research models and showing considerable value in TCM research.Recent integration of network target computational and experimental methods with artificial intelligence(AI)and multi-modal multi-omics technologies has substantially enhanced network pharmacology methodology.The advancement in computational and experimental techniques provides complementary support for network target theory in decoding TCM principles.This review,centered on network targets,examines the progress of network target methods combined with AI in predicting disease molecular mechanisms and drug-target relationships,alongside the application of multi-modal multi-omics technologies in analyzing TCM formulae,syndromes,and toxicity.Looking forward,network target theory is expected to incorporate emerging technologies while developing novel approaches aligned with its unique characteristics,potentially leading to significant breakthroughs in TCM research and advancing scientific understanding and innovation in TCM.
基金supported by the National Natural Science Foundation of China(Nos.62371323,62401380,U2433217,U2333209,and U20A20161)Natural Science Foundation of Sichuan Province,China(Nos.2025ZNSFSC1476)+2 种基金Sichuan Science and Technology Program,China(Nos.2024YFG0010 and 2024ZDZX0046)the Institutional Research Fund from Sichuan University(Nos.2024SCUQJTX030)the Open Fund of Key Laboratory of Flight Techniques and Flight Safety,CAAC(Nos.GY2024-01A).
文摘With the advent of the next-generation Air Traffic Control(ATC)system,there is growing interest in using Artificial Intelligence(AI)techniques to enhance Situation Awareness(SA)for ATC Controllers(ATCOs),i.e.,Intelligent SA(ISA).However,the existing AI-based SA approaches often rely on unimodal data and lack a comprehensive description and benchmark of the ISA tasks utilizing multi-modal data for real-time ATC environments.To address this gap,by analyzing the situation awareness procedure of the ATCOs,the ISA task is refined to the processing of the two primary elements,i.e.,spoken instructions and flight trajectories.Subsequently,the ISA is further formulated into Controlling Intent Understanding(CIU)and Flight Trajectory Prediction(FTP)tasks.For the CIU task,an innovative automatic speech recognition and understanding framework is designed to extract the controlling intent from unstructured and continuous ATC communications.For the FTP task,the single-and multi-horizon FTP approaches are investigated to support the high-precision prediction of the situation evolution.A total of 32 unimodal/multi-modal advanced methods with extensive evaluation metrics are introduced to conduct the benchmarks on the real-world multi-modal ATC situation dataset.Experimental results demonstrate the effectiveness of AI-based techniques in enhancing ISA for the ATC environment.
基金Shanghai Frontier Science Research Center for Modern Textiles,Donghua University,ChinaOpen Project of Henan Key Laboratory of Intelligent Manufacturing of Mechanical Equipment,Zhengzhou University of Light Industry,China(No.IM202303)National Key Research and Development Program of China(No.2019YFB1706300)。
文摘A personalized outfit recommendation has emerged as a hot research topic in the fashion domain.However,existing recommendations do not fully exploit user style preferences.Typically,users prefer particular styles such as casual and athletic styles,and consider attributes like color and texture when selecting outfits.To achieve personalized outfit recommendations in line with user style preferences,this paper proposes a personal style guided outfit recommendation with multi-modal fashion compatibility modeling,termed as PSGNet.Firstly,a style classifier is designed to categorize fashion images of various clothing types and attributes into distinct style categories.Secondly,a personal style prediction module extracts user style preferences by analyzing historical data.Then,to address the limitations of single-modal representations and enhance fashion compatibility,both fashion images and text data are leveraged to extract multi-modal features.Finally,PSGNet integrates these components through Bayesian personalized ranking(BPR)to unify the personal style and fashion compatibility,where the former is used as personal style features and guides the output of the personalized outfit recommendation tailored to the target user.Extensive experiments on large-scale datasets demonstrate that the proposed model is efficient on the personalized outfit recommendation.
基金supported by the National Natural Science Foundation of China(62222212).
文摘Information extraction(IE)aims to automatically identify and extract information about specific interests from raw texts.Despite the abundance of solutions based on fine-tuning pretrained language models,IE in the context of fewshot and zero-shot scenarios remains highly challenging due to the scarcity of training data.Large language models(LLMs),on the other hand,can generalize well to unseen tasks with few-shot demonstrations or even zero-shot instructions and have demonstrated impressive ability for a wide range of natural language understanding or generation tasks.Nevertheless,it is unclear,whether such effectiveness can be replicated in the task of IE,where the target tasks involve specialized schema and quite abstractive entity or relation concepts.In this paper,we first examine the validity of LLMs in executing IE tasks with an established prompting strategy and further propose multiple types of augmented prompting methods,including the structured fundamental prompt(SFP),the structured interactive reasoning prompt(SIRP),and the voting-enabled structured interactive reasoning prompt(VESIRP).The experimental results demonstrate that while directly promotes inferior performance,the proposed augmented prompt methods significantly improve the extraction accuracy,achieving comparable or even better performance(e.g.,zero-shot FewNERD,FewNERD-INTRA)than state-of-theart methods that require large-scale training samples.This study represents a systematic exploration of employing instruction-following LLM for the task of IE.It not only establishes a performance benchmark for this novel paradigm but,more importantly,validates a practical technical pathway through the proposed prompt enhancement method,offering a viable solution for efficient IE in low-resource settings.
基金supported by the project“GEF9874:Strengthening Coordinated Approaches to Reduce Invasive Alien Species(lAS)Threats to Globally Significant Agrobiodiversity and Agroecosystems in China”funding from the Excellent Talent Training Funding Project in Dongcheng District,Beijing,with project number 2024-dchrcpyzz-9.
文摘Ecological monitoring vehicles are equipped with a range of sensors and monitoring devices designed to gather data on ecological and environmental factors.These vehicles are crucial in various fields,including environmental science research,ecological and environmental monitoring projects,disaster response,and emergency management.A key method employed in these vehicles for achieving high-precision positioning is LiDAR(lightlaser detection and ranging)-Visual Simultaneous Localization and Mapping(SLAM).However,maintaining highprecision localization in complex scenarios,such as degraded environments or when dynamic objects are present,remains a significant challenge.To address this issue,we integrate both semantic and texture information from LiDAR and cameras to enhance the robustness and efficiency of data registration.Specifically,semantic information simplifies the modeling of scene elements,reducing the reliance on dense point clouds,which can be less efficient.Meanwhile,visual texture information complements LiDAR-Visual localization by providing additional contextual details.By incorporating semantic and texture details frompaired images and point clouds,we significantly improve the quality of data association,thereby increasing the success rate of localization.This approach not only enhances the operational capabilities of ecological monitoring vehicles in complex environments but also contributes to improving the overall efficiency and effectiveness of ecological monitoring and environmental protection efforts.
基金National Natural Science Foundation of China(grant numbers 42293351,41877239,51422904 and 51379112).
文摘Advanced geological prediction is a crucial means to ensure safety and efficiency in tunnel construction.However,diff erent advanced geological forecasting methods have their own limitations,resulting in poor detection accuracy.Using multiple methods to carry out a comprehensive evaluation can eff ectively improve the accuracy of advanced geological prediction results.In this study,geological information is combined with the detection results of geophysical methods,including transient electromagnetic,induced polarization,and tunnel seismic prediction,to establish a comprehensive analysis method of adverse geology.First,the possible main adverse geological problems are determined according to the geological information.Subsequently,various physical parameters of the rock mass in front of the tunnel face can then be derived on the basis of multisource geophysical data.Finally,based on the analysis results of geological information,the multisource data fusion algorithm is used to determine the type,location,and scale of adverse geology.The advanced geological prediction results that can provide eff ective guidance for tunnel construction can then be obtained.
基金funded by Research Project,grant number BHQ090003000X03.
文摘Multi-modal Named Entity Recognition(MNER)aims to better identify meaningful textual entities by integrating information from images.Previous work has focused on extracting visual semantics at a fine-grained level,or obtaining entity related external knowledge from knowledge bases or Large Language Models(LLMs).However,these approaches ignore the poor semantic correlation between visual and textual modalities in MNER datasets and do not explore different multi-modal fusion approaches.In this paper,we present MMAVK,a multi-modal named entity recognition model with auxiliary visual knowledge and word-level fusion,which aims to leverage the Multi-modal Large Language Model(MLLM)as an implicit knowledge base.It also extracts vision-based auxiliary knowledge from the image formore accurate and effective recognition.Specifically,we propose vision-based auxiliary knowledge generation,which guides the MLLM to extract external knowledge exclusively derived from images to aid entity recognition by designing target-specific prompts,thus avoiding redundant recognition and cognitive confusion caused by the simultaneous processing of image-text pairs.Furthermore,we employ a word-level multi-modal fusion mechanism to fuse the extracted external knowledge with each word-embedding embedded from the transformerbased encoder.Extensive experimental results demonstrate that MMAVK outperforms or equals the state-of-the-art methods on the two classical MNER datasets,even when the largemodels employed have significantly fewer parameters than other baselines.
基金funded by Research Project,grant number BHQ090003000X03。
文摘Multi-modal knowledge graph completion(MMKGC)aims to complete missing entities or relations in multi-modal knowledge graphs,thereby discovering more previously unknown triples.Due to the continuous growth of data and knowledge and the limitations of data sources,the visual knowledge within the knowledge graphs is generally of low quality,and some entities suffer from the issue of missing visual modality.Nevertheless,previous studies of MMKGC have primarily focused on how to facilitate modality interaction and fusion while neglecting the problems of low modality quality and modality missing.In this case,mainstream MMKGC models only use pre-trained visual encoders to extract features and transfer the semantic information to the joint embeddings through modal fusion,which inevitably suffers from problems such as error propagation and increased uncertainty.To address these problems,we propose a Multi-modal knowledge graph Completion model based on Super-resolution and Detailed Description Generation(MMCSD).Specifically,we leverage a pre-trained residual network to enhance the resolution and improve the quality of the visual modality.Moreover,we design multi-level visual semantic extraction and entity description generation,thereby further extracting entity semantics from structural triples and visual images.Meanwhile,we train a variational multi-modal auto-encoder and utilize a pre-trained multi-modal language model to complement the missing visual features.We conducted experiments on FB15K-237 and DB13K,and the results showed that MMCSD can effectively perform MMKGC and achieve state-of-the-art performance.
基金supported by the Deanship of Research and Graduate Studies at King Khalid University under Small Research Project grant number RGP1/139/45.
文摘Integrating multiple medical imaging techniques,including Magnetic Resonance Imaging(MRI),Computed Tomography,Positron Emission Tomography(PET),and ultrasound,provides a comprehensive view of the patient health status.Each of these methods contributes unique diagnostic insights,enhancing the overall assessment of patient condition.Nevertheless,the amalgamation of data from multiple modalities presents difficulties due to disparities in resolution,data collection methods,and noise levels.While traditional models like Convolutional Neural Networks(CNNs)excel in single-modality tasks,they struggle to handle multi-modal complexities,lacking the capacity to model global relationships.This research presents a novel approach for examining multi-modal medical imagery using a transformer-based system.The framework employs self-attention and cross-attention mechanisms to synchronize and integrate features across various modalities.Additionally,it shows resilience to variations in noise and image quality,making it adaptable for real-time clinical use.To address the computational hurdles linked to transformer models,particularly in real-time clinical applications in resource-constrained environments,several optimization techniques have been integrated to boost scalability and efficiency.Initially,a streamlined transformer architecture was adopted to minimize the computational load while maintaining model effectiveness.Methods such as model pruning,quantization,and knowledge distillation have been applied to reduce the parameter count and enhance the inference speed.Furthermore,efficient attention mechanisms such as linear or sparse attention were employed to alleviate the substantial memory and processing requirements of traditional self-attention operations.For further deployment optimization,researchers have implemented hardware-aware acceleration strategies,including the use of TensorRT and ONNX-based model compression,to ensure efficient execution on edge devices.These optimizations allow the approach to function effectively in real-time clinical settings,ensuring viability even in environments with limited resources.Future research directions include integrating non-imaging data to facilitate personalized treatment and enhancing computational efficiency for implementation in resource-limited environments.This study highlights the transformative potential of transformer models in multi-modal medical imaging,offering improvements in diagnostic accuracy and patient care outcomes.