Multimodal deep learning has emerged as a key paradigm in contemporary medical diagnostics,advancing precision medicine by enabling integration and learning from diverse data sources.The exponential growth of high-dim...Multimodal deep learning has emerged as a key paradigm in contemporary medical diagnostics,advancing precision medicine by enabling integration and learning from diverse data sources.The exponential growth of high-dimensional healthcare data,encompassing genomic,transcriptomic,and other omics profiles,as well as radiological imaging and histopathological slides,makes this approach increasingly important because,when examined separately,these data sources only offer a fragmented picture of intricate disease processes.Multimodal deep learning leverages the complementary properties of multiple data modalities to enable more accurate prognostic modeling,more robust disease characterization,and improved treatment decision-making.This review provides a comprehensive overview of the current state of multimodal deep learning approaches in medical diagnosis.We classify and examine important application domains,such as(1)radiology,where automated report generation and lesion detection are facilitated by image-text integration;(2)histopathology,where fusion models improve tumor classification and grading;and(3)multi-omics,where molecular subtypes and latent biomarkers are revealed through cross-modal learning.We provide an overview of representative research,methodological advancements,and clinical consequences for each domain.Additionally,we critically analyzed the fundamental issues preventing wider adoption,including computational complexity(particularly in training scalable,multi-branch networks),data heterogeneity(resulting from modality-specific noise,resolution variations,and inconsistent annotations),and the challenge of maintaining significant cross-modal correlations during fusion.These problems impede interpretability,which is crucial for clinical trust and use,in addition to performance and generalizability.Lastly,we outline important areas for future research,including the development of standardized protocols for harmonizing data,the creation of lightweight and interpretable fusion architectures,the integration of real-time clinical decision support systems,and the promotion of cooperation for federated multimodal learning.Our goal is to provide researchers and clinicians with a concise overview of the field’s present state,enduring constraints,and exciting directions for further research through this review.展开更多
Visual question answering(VQA)is a multimodal task,involving a deep understanding of the image scene and the question’s meaning and capturing the relevant correlations between both modalities to infer the appropriate...Visual question answering(VQA)is a multimodal task,involving a deep understanding of the image scene and the question’s meaning and capturing the relevant correlations between both modalities to infer the appropriate answer.In this paper,we propose a VQA system intended to answer yes/no questions about real-world images,in Arabic.To support a robust VQA system,we work in two directions:(1)Using deep neural networks to semantically represent the given image and question in a fine-grainedmanner,namely ResNet-152 and Gated Recurrent Units(GRU).(2)Studying the role of the utilizedmultimodal bilinear pooling fusion technique in the trade-o.between the model complexity and the overall model performance.Some fusion techniques could significantly increase the model complexity,which seriously limits their applicability for VQA models.So far,there is no evidence of how efficient these multimodal bilinear pooling fusion techniques are for VQA systems dedicated to yes/no questions.Hence,a comparative analysis is conducted between eight bilinear pooling fusion techniques,in terms of their ability to reduce themodel complexity and improve themodel performance in this case of VQA systems.Experiments indicate that these multimodal bilinear pooling fusion techniques have improved the VQA model’s performance,until reaching the best performance of 89.25%.Further,experiments have proven that the number of answers in the developed VQA system is a critical factor that a.ects the effectiveness of these multimodal bilinear pooling techniques in achieving their main objective of reducing the model complexity.The Multimodal Local Perception Bilinear Pooling(MLPB)technique has shown the best balance between the model complexity and its performance,for VQA systems designed to answer yes/no questions.展开更多
Hateful meme is a multimodal medium that combines images and texts.The potential hate content of hateful memes has caused serious problems for social media security.The current hateful memes classification task faces ...Hateful meme is a multimodal medium that combines images and texts.The potential hate content of hateful memes has caused serious problems for social media security.The current hateful memes classification task faces significant data scarcity challenges,and direct fine-tuning of large-scale pre-trained models often leads to severe overfitting issues.In addition,it is a challenge to understand the underlying relationship between text and images in the hateful memes.To address these issues,we propose a multimodal hateful memes classification model named LABF,which is based on low-rank adapter layers and bidirectional gated feature fusion.Firstly,low-rank adapter layers are adopted to learn the feature representation of the new dataset.This is achieved by introducing a small number of additional parameters while retaining prior knowledge of the CLIP model,which effectively alleviates the overfitting phenomenon.Secondly,a bidirectional gated feature fusion mechanism is designed to dynamically adjust the interaction weights of text and image features to achieve finer cross-modal fusion.Experimental results show that the method significantly outperforms existing methods on two public datasets,verifying its effectiveness and robustness.展开更多
Multimodal sensor fusion can make full use of the advantages of various sensors,make up for the shortcomings of a single sensor,achieve information verification or information security through information redundancy,a...Multimodal sensor fusion can make full use of the advantages of various sensors,make up for the shortcomings of a single sensor,achieve information verification or information security through information redundancy,and improve the reliability and safety of the system.Artificial intelligence(AI),referring to the simulation of human intelligence in machines that are programmed to think and learn like humans,represents a pivotal frontier in modern scientific research.With the continuous development and promotion of AI technology in Sensor 4.0 age,multimodal sensor fusion is becoming more and more intelligent and automated,and is expected to go further in the future.With this context,this review article takes a comprehensive look at the recent progress on AI-enhanced multimodal sensors and their integrated devices and systems.Based on the concept and principle of sensor technologies and AI algorithms,the theoretical underpinnings,technological breakthroughs,and pragmatic applications of AI-enhanced multimodal sensors in various fields such as robotics,healthcare,and environmental monitoring are highlighted.Through a comparative study of the dual/tri-modal sensors with and without using AI technologies(especially machine learning and deep learning),AI-enhanced multimodal sensors highlight the potential of AI to improve sensor performance,data processing,and decision-making capabilities.Furthermore,the review analyzes the challenges and opportunities afforded by AI-enhanced multimodal sensors,and offers a prospective outlook on the forthcoming advancements.展开更多
Thunderstorm wind gusts are small in scale,typically occurring within a range of a few kilometers.It is extremely challenging to monitor and forecast thunderstorm wind gusts using only automatic weather stations.There...Thunderstorm wind gusts are small in scale,typically occurring within a range of a few kilometers.It is extremely challenging to monitor and forecast thunderstorm wind gusts using only automatic weather stations.Therefore,it is necessary to establish thunderstorm wind gust identification techniques based on multisource high-resolution observations.This paper introduces a new algorithm,called thunderstorm wind gust identification network(TGNet).It leverages multimodal feature fusion to fuse the temporal and spatial features of thunderstorm wind gust events.The shapelet transform is first used to extract the temporal features of wind speeds from automatic weather stations,which is aimed at distinguishing thunderstorm wind gusts from those caused by synoptic-scale systems or typhoons.Then,the encoder,structured upon the U-shaped network(U-Net)and incorporating recurrent residual convolutional blocks(R2U-Net),is employed to extract the corresponding spatial convective characteristics of satellite,radar,and lightning observations.Finally,by using the multimodal deep fusion module based on multi-head cross-attention,the temporal features of wind speed at each automatic weather station are incorporated into the spatial features to obtain 10-minutely classification of thunderstorm wind gusts.TGNet products have high accuracy,with a critical success index reaching 0.77.Compared with those of U-Net and R2U-Net,the false alarm rate of TGNet products decreases by 31.28%and 24.15%,respectively.The new algorithm provides grid products of thunderstorm wind gusts with a spatial resolution of 0.01°,updated every 10minutes.The results are finer and more accurate,thereby helping to improve the accuracy of operational warnings for thunderstorm wind gusts.展开更多
Sleep monitoring is an important part of health management because sleep quality is crucial for restoration of human health.However,current commercial products of polysomnography are cumbersome with connecting wires a...Sleep monitoring is an important part of health management because sleep quality is crucial for restoration of human health.However,current commercial products of polysomnography are cumbersome with connecting wires and state-of-the-art flexible sensors are still interferential for being attached to the body.Herein,we develop a flexible-integrated multimodal sensing patch based on hydrogel and its application in unconstraint sleep monitoring.The patch comprises a bottom hydrogel-based dualmode pressure–temperature sensing layer and a top electrospun nanofiber-based non-contact detection layer as one integrated device.The hydrogel as core substrate exhibits strong toughness and water retention,and the multimodal sensing of temperature,pressure,and non-contact proximity is realized based on different sensing mechanisms with no crosstalk interference.The multimodal sensing function is verified in a simulated real-world scenario by a robotic hand grasping objects to validate its practicability.Multiple multimodal sensing patches integrated on different locations of a pillow are assembled for intelligent sleep monitoring.Versatile human–pillow interaction information as well as their evolution over time are acquired and analyzed by a one-dimensional convolutional neural network.Track of head movement and recognition of bad patterns that may lead to poor sleep are achieved,which provides a promising approach for sleep monitoring.展开更多
Joint Multimodal Aspect-based Sentiment Analysis(JMASA)is a significant task in the research of multimodal fine-grained sentiment analysis,which combines two subtasks:Multimodal Aspect Term Extraction(MATE)and Multimo...Joint Multimodal Aspect-based Sentiment Analysis(JMASA)is a significant task in the research of multimodal fine-grained sentiment analysis,which combines two subtasks:Multimodal Aspect Term Extraction(MATE)and Multimodal Aspect-oriented Sentiment Classification(MASC).Currently,most existing models for JMASA only perform text and image feature encoding from a basic level,but often neglect the in-depth analysis of unimodal intrinsic features,which may lead to the low accuracy of aspect term extraction and the poor ability of sentiment prediction due to the insufficient learning of intra-modal features.Given this problem,we propose a Text-Image Feature Fine-grained Learning(TIFFL)model for JMASA.First,we construct an enhanced adjacency matrix of word dependencies and adopt graph convolutional network to learn the syntactic structure features for text,which addresses the context interference problem of identifying different aspect terms.Then,the adjective-noun pairs extracted from image are introduced to enable the semantic representation of visual features more intuitive,which addresses the ambiguous semantic extraction problem during image feature learning.Thereby,the model performance of aspect term extraction and sentiment polarity prediction can be further optimized and enhanced.Experiments on two Twitter benchmark datasets demonstrate that TIFFL achieves competitive results for JMASA,MATE and MASC,thus validating the effectiveness of our proposed methods.展开更多
Aiming at the problems of traditional guide devices such as single environmental perception and poor terrain adaptability,this paper proposes an intelligent guide system based on a quadruped robot platform.Data fusion...Aiming at the problems of traditional guide devices such as single environmental perception and poor terrain adaptability,this paper proposes an intelligent guide system based on a quadruped robot platform.Data fusion between millimeter-wave radar(with an accuracy of±0.1°)and an RGB-D camera is achieved through multisensor spatiotemporal registration technology,and a dataset suitable for guide dog robots is constructed.For the application scenario of edge-end guide dog robots,a lightweight CA-YOLOv11 target detection model integrated with an attention mechanism is innovatively adopted,achieving a comprehensive recognition accuracy of 95.8% in complex scenarios,which is 2.2% higher than that of the benchmark YOLOv11 network.The system supports navigation on complex terrains such as stairs(25 cm steps)and slopes(35°gradient),and the response time to sudden disturbances is shortened to 100 ms.Actual tests show that the navigation success rate reaches 95% in eight types of scenarios,the user satisfaction score is 4.8/5.0,and the cost is 50% lower than that of traditional guide dogs.展开更多
A complete examination of Large Language Models’strengths,problems,and applications is needed due to their rising use across disciplines.Current studies frequently focus on single-use situations and lack a comprehens...A complete examination of Large Language Models’strengths,problems,and applications is needed due to their rising use across disciplines.Current studies frequently focus on single-use situations and lack a comprehensive understanding of LLM architectural performance,strengths,and weaknesses.This gap precludes finding the appropriate models for task-specific applications and limits awareness of emerging LLM optimization and deployment strategies.In this research,50 studies on 25+LLMs,including GPT-3,GPT-4,Claude 3.5,DeepKet,and hybrid multimodal frameworks like ContextDET and GeoRSCLIP,are thoroughly reviewed.We propose LLM application taxonomy by grouping techniques by task focus—healthcare,chemistry,sentiment analysis,agent-based simulations,and multimodal integration.Advanced methods like parameter-efficient tuning(LoRA),quantumenhanced embeddings(DeepKet),retrieval-augmented generation(RAG),and safety-focused models(GalaxyGPT)are evaluated for dataset requirements,computational efficiency,and performance measures.Frameworks for ethical issues,data limited hallucinations,and KDGI-enhanced fine-tuning like Woodpecker’s post-remedy corrections are highlighted.The investigation’s scope,mad,and methods are described,but the primary results are not.The work reveals that domain-specialized fine-tuned LLMs employing RAG and quantum-enhanced embeddings performbetter for context-heavy applications.In medical text normalization,ChatGPT-4 outperforms previous models,while two multimodal frameworks,GeoRSCLIP,increase remote sensing.Parameter-efficient tuning technologies like LoRA have minimal computing cost and similar performance,demonstrating the necessity for adaptive models in multiple domains.To discover the optimum domain-specific models,explain domain-specific fine-tuning,and present quantum andmultimodal LLMs to address scalability and cross-domain issues.The framework helps academics and practitioners identify,adapt,and innovate LLMs for different purposes.This work advances the field of efficient,interpretable,and ethical LLM application research.展开更多
In this study,we present a small,integrated jumping-crawling robot capable of intermittent jumping and self-resetting.Compared to robots with a single mode of locomotion,this multi-modal robot exhibits enhanced obstac...In this study,we present a small,integrated jumping-crawling robot capable of intermittent jumping and self-resetting.Compared to robots with a single mode of locomotion,this multi-modal robot exhibits enhanced obstacle-surmounting capabilities.To achieve this,the robot employs a novel combination of a jumping module and a crawling module.The jumping module features improved energy storage capacity and an active clutch.Within the constraints of structural robustness,the jumping module maximizes the explosive power of the linear spring by utilizing the mechanical advantage of a closed-loop mechanism and controls the energy flow of the jumping module through an active clutch mechanism.Furthermore,inspired by the limb movements of tortoises during crawling and self-righting,a single-degree-of-freedom spatial four-bar crawling mechanism was designed to enable crawling,steering,and resetting functions.To demonstrate its practicality,the integrated jumping-crawling robot was tested in a laboratory environment for functions such as jumping,crawling,self-resetting,and steering.Experimental results confirmed the feasibility of the proposed integrated jumping-crawling robot.展开更多
With the popularization of social media,stickers have become an important tool for young students to express themselves and resist mainstream culture due to their unique visual and emotional expressiveness.Most existi...With the popularization of social media,stickers have become an important tool for young students to express themselves and resist mainstream culture due to their unique visual and emotional expressiveness.Most existing studies focus on the negative impacts of spoof stickers,while paying insufficient attention to their positive functions.From the perspective of multimodal metaphor,this paper uses methods such as virtual ethnography and image-text analysis to clarify the connotation of stickers,understand the evolution of their digital dissemination forms,and explore the multiple functions of subcultural stickers in the social interactions between teachers and students.Young students use stickers to convey emotions and information.Their expressive function,social function,and cultural metaphor function progress in a progressive manner.This not only shapes students’values but also promotes self-expression and teacher-student interaction.It also reminds teachers to correct students’negative thoughts by using stickers,achieving the effect of“cultivating and influencing people through culture.”展开更多
Inverse Synthetic Aperture Radar(ISAR)images of complex targets have a low Signal-to-Noise Ratio(SNR)and contain fuzzy edges and large differences in scattering intensity,which limits the recognition performance of IS...Inverse Synthetic Aperture Radar(ISAR)images of complex targets have a low Signal-to-Noise Ratio(SNR)and contain fuzzy edges and large differences in scattering intensity,which limits the recognition performance of ISAR systems.Also,data scarcity poses a greater challenge to the accurate recognition of components.To address the issues of component recognition in complex ISAR targets,this paper adopts semantic segmentation and proposes a few-shot semantic segmentation framework fusing multimodal features.The scarcity of available data is mitigated by using a two-branch scattering feature encoding structure.Then,the high-resolution features are obtained by fusing the ISAR image texture features and scattering quantization information of complex-valued echoes,thereby achieving significantly higher structural adaptability.Meanwhile,the scattering trait enhancement module and the statistical quantification module are designed.The edge texture is enhanced based on the scatter quantization property,which alleviates the segmentation challenge of edge blurring under low SNR conditions.The coupling of query/support samples is enhanced through four-dimensional convolution.Additionally,to overcome fusion challenges caused by information differences,multimodal feature fusion is guided by equilibrium comprehension loss.In this way,the performance potential of the fusion framework is fully unleashed,and the decision risk is effectively reduced.Experiments demonstrate the great advantages of the proposed framework in multimodal feature fusion,and it still exhibits great component segmentation capability under low SNR/edge blurring conditions.展开更多
BACKGROUND Esophageal neuroendocrine carcinoma(NEC),a rare and aggressive malignancy with a poor prognosis,is often diagnosed at an advanced stage.The optimal treatment strategy for locally advanced and recurrent esop...BACKGROUND Esophageal neuroendocrine carcinoma(NEC),a rare and aggressive malignancy with a poor prognosis,is often diagnosed at an advanced stage.The optimal treatment strategy for locally advanced and recurrent esophageal NEC remains unclear,and conversion surgery has only been reported for a few cases.Herein,we present the case of a 66-year-old male with locally advanced esophageal NEC initially diagnosed as squamous cell carcinoma.CASE SUMMARY The patient underwent induction chemotherapy with docetaxel,cisplatin,and 5-fluorouracil,followed by conversion surgery,including subtotal esophagectomy,three-field lymph node dissection,and distal pancreatectomy with splenectomy,due to infiltration of the pancreas by the No.11p lymph node.Postoperative pathological findings revealed a large cell-type NEC without a squamous cell carcinoma component,suspected to be a mixed neuroendocrine/non-neuroendocrine neoplasm.Hepatic metastasis was diagnosed within one month of surgery.Despite the administration of four courses of irinotecan+cisplatin chemotherapy,the treatment effect was considered a‘progressive disease’.After a multidisciplinary discussion,the patient underwent partial liver resection,followed by second-line chemotherapy with amrubicin.The patient achieved three-year survival with no new recurrence.CONCLUSION This case highlights the potential of multimodal treatment for long-term survival in advanced esophageal NEC.展开更多
To address the inherent trade-off between mechanical strength and repair efficiency in conventional microcapsule-based self-healing technologies,this study presents an eggshell-inspired approach for fabricating high-l...To address the inherent trade-off between mechanical strength and repair efficiency in conventional microcapsule-based self-healing technologies,this study presents an eggshell-inspired approach for fabricating high-load rigid porous microcapsules(HLRPMs)through subcritical water etching.By optimizing the subcritical water treatment parameters(OH−concentration:0.031 mol/L,tem-perature:240°C,duration:1.5 h),nanoscale through-holes were generated on hollow glass microspheres(shell thickness≈700 nm).The subsequent gradient pressure infiltration of flaxseed oil enabled a record-high core content of 88.2%.Systematic investigations demonstrated that incorporating 3 wt%HLRPMs into epoxy resin composites preserved excellent dielectric properties(breakdown strength≥30 kV/mm)and enhanced tensile strength by 7.52%.In addressing multimodal damage,the system achieved a 95.5%filling efficiency for mechanical scratches,a 97.0%reduction in frictional damage depth,and a 96.2%recovery of insulation following electrical treeing.This biomimetic microcapsule system concurrently improved self-healing capability and matrix performance,offering a promising strategy for the development of next-generation smart insulating materials.展开更多
Objective:To explore the clinical effect of multimodal nursing intervention on postoperative pain management in patients undergoing gastrointestinal surgery.Methods:A total of 120 patients who underwent gastrointestin...Objective:To explore the clinical effect of multimodal nursing intervention on postoperative pain management in patients undergoing gastrointestinal surgery.Methods:A total of 120 patients who underwent gastrointestinal surgery in our hospital from January 2023 to January 2024 were selected as the research subjects.They were randomly divided into the intervention group and the control group,with 60 cases in each group.The control group received routine postoperative care,while the intervention group received multimodal pain care intervention.The postoperative pain scores,the rate of using analgesic drugs,postoperative recovery indicators,and nursing satisfaction were compared between the two groups.Results:At 24 hours,48 hours,and 72 hours after surgery,the VAS pain scores of the intervention group were significantly lower than those of the control group(p<0.05);the rate of using analgesic drugs in the intervention group(25.0%)was significantly lower than that in the control group(48.3%)(p<0.05);the first defecation time,first ambulation time,and hospital stay of the intervention group were shorter than those of the control group(p<0.05);the nursing satisfaction of the intervention group(96.7%)was significantly higher than that of the control group(80.0%)(p<0.05).Conclusion:Multimodal pain care intervention can effectively relieve postoperative pain in patients undergoing gastrointestinal surgery,reduce the use of analgesic drugs,promote postoperative recovery,and improve nursing satisfaction.展开更多
Benefiting from the widespread potential applications in the era of the Internet of Thing and metaverse,triboelectric and piezoelectric nanogenerators(TENG&PENG)have attracted considerably increasing attention.The...Benefiting from the widespread potential applications in the era of the Internet of Thing and metaverse,triboelectric and piezoelectric nanogenerators(TENG&PENG)have attracted considerably increasing attention.Their outstanding characteristics,such as self-powered ability,high output performance,integration compatibility,cost-effectiveness,simple configurations,and versatile operation modes,could effectively expand the lifetime of vastly distributed wearable,implantable,and environmental devices,eventually achieving self-sustainable,maintenance-free,and reliable systems.However,current triboelectric/piezoelectric based active(i.e.self-powered)sensors still encounter serious bottlenecks in continuous monitoring and multimodal applications due to their intrinsic limitations of monomodal kinetic response and discontinuous transient output.This work systematically summarizes and evaluates the recent research endeavors to address the above challenges,with detailed discussions on the challenge origins,designing strategies,device performance,and corresponding diverse applications.Finally,conclusions and outlook regarding the research gap in self-powered continuous multimodal monitoring systems are provided,proposing the necessity of future research development in this field.展开更多
Osteosarcoma is the most common primary bone tumor with high malignancy.It is particularly necessary to achieve rapid and accurate diagnosis in its intraoperative examination and early diagnosis.Accordingly,the multim...Osteosarcoma is the most common primary bone tumor with high malignancy.It is particularly necessary to achieve rapid and accurate diagnosis in its intraoperative examination and early diagnosis.Accordingly,the multimodal microscopic imaging diagnosis system constructed by bright field,spontaneous fluorescence and polarized light microscopic imaging was used to study the pathological mechanism of osteosarcoma from the tissue microenvironment level and achieve rapid and accurate diagnosis.First,the multimodal microscopic images of normal and osteosarcoma tissue slices were collected to characterize the overall morphology of the tissue microenvironment of the samples,the arrangement structure of collagen fibers and the content and distribution of endogenous fluorescent substances.Second,based on the correlation and complementarity of the feature information contained in the three single-mode images,combined with convolutional neural network(CNN)and image fusion methods,a multimodal intelligent diagnosis model was constructed to effectively improve the information utilization and diagnosis accuracy.The accuracy and true positivity of the multimodal diagnostic model were significantly improved to 0.8495 and 0.9412,respectively,compared to those of the single-modal models.Besides,the difference of tissue microenvironments before and after cancerization can be used as a basis for cancer diagnosis,and the information extraction and intelligent diagnosis of osteosarcoma tissue can be achieved by using multimodal microscopic imaging technology combined with deep learning,which significantly promoted the application of tissue microenvironment in pathological examination.This diagnostic system relies on its advantages of simple operation,high efficiency and accuracy and high cost-effectiveness,and has enormous clinical application potential and research significance.展开更多
Neuromorphic devices,inspired by the intricate architecture of the human brain,have garnered recognition for their prodigious computational speed and sophisticated parallel computing capabilities.Vision,the primary mo...Neuromorphic devices,inspired by the intricate architecture of the human brain,have garnered recognition for their prodigious computational speed and sophisticated parallel computing capabilities.Vision,the primary mode of external information acquisition in living organisms,has garnered substantial scholarly interest.Notwithstanding numerous studies simulating the retina through optical synapses,their applications remain circumscribed to single-mode perception.Moreover,the pivotal role of temperature,a fundamental regulator of biological activities,has regrettably been relegated to the periphery.To address these limitations,we proffer a neuromorphic device endowed with multimodal perception,grounded in the principles of light-modulated semiconductors.This device seamlessly accomplishes dynamic hybrid visual and thermal multimodal perception,featuring temperature-dependent paired pulse facilitation properties and adaptive storage.Crucially,our meticulous examination of transfer curves,capacitance–voltage(C–V)tests,and noise measurements provides insights into interface and bulk defects,elucidating the physical mechanisms underlying adaptive storage and other functionalities.Additionally,the device demonstrates a variety of synaptic functionalities,including filtering properties,Ebbinghaus curves,and memory applications in image recognition.Surprisingly,the digital recognition rate achieves a remarkable value of 98.8%.展开更多
Multimodal ultrasonic vibration(UV)assisted micro-forming has been widely investigated for its advantages of further reducing forming loads and improving forming quality.However,the influence mechanism of different UV...Multimodal ultrasonic vibration(UV)assisted micro-forming has been widely investigated for its advantages of further reducing forming loads and improving forming quality.However,the influence mechanism of different UV modes on microstructure evolution and mechanical properties was still unclear.Mul-timodal UV assisted micro-compression tests on T2 copper with different grains and sample sizes were conducted in this study.The microstructure evolution for different UV modes was observed by EBSD.The results showed that the true stress reduction caused by UV was increased sequentially with tool ultrasonic vibration(TV),mold ultrasonic vibration(MV)and compound ultrasonic vibration(CV).The region of grain deformation was shifted along the direction of UV,and the MV promoted the uniform distribution of deformation stress.The grain refinement,fiber streamline density,grain deformation and rotation degree were further enhanced under CV,due to the synergistic effect of TV and MV.Additionally,a coupled theoretical model considering both acoustic softening effect and size effect was proposed for describing the mechanical properties.And a physical model of dislocation motion in different UV modes was developed for describing the microstructure evolution.The maximum error between the theoretical and experimental results was only 2.39%.This study provides a theoretical basis for the optimization of UV assisted micro-forming process.展开更多
文摘Multimodal deep learning has emerged as a key paradigm in contemporary medical diagnostics,advancing precision medicine by enabling integration and learning from diverse data sources.The exponential growth of high-dimensional healthcare data,encompassing genomic,transcriptomic,and other omics profiles,as well as radiological imaging and histopathological slides,makes this approach increasingly important because,when examined separately,these data sources only offer a fragmented picture of intricate disease processes.Multimodal deep learning leverages the complementary properties of multiple data modalities to enable more accurate prognostic modeling,more robust disease characterization,and improved treatment decision-making.This review provides a comprehensive overview of the current state of multimodal deep learning approaches in medical diagnosis.We classify and examine important application domains,such as(1)radiology,where automated report generation and lesion detection are facilitated by image-text integration;(2)histopathology,where fusion models improve tumor classification and grading;and(3)multi-omics,where molecular subtypes and latent biomarkers are revealed through cross-modal learning.We provide an overview of representative research,methodological advancements,and clinical consequences for each domain.Additionally,we critically analyzed the fundamental issues preventing wider adoption,including computational complexity(particularly in training scalable,multi-branch networks),data heterogeneity(resulting from modality-specific noise,resolution variations,and inconsistent annotations),and the challenge of maintaining significant cross-modal correlations during fusion.These problems impede interpretability,which is crucial for clinical trust and use,in addition to performance and generalizability.Lastly,we outline important areas for future research,including the development of standardized protocols for harmonizing data,the creation of lightweight and interpretable fusion architectures,the integration of real-time clinical decision support systems,and the promotion of cooperation for federated multimodal learning.Our goal is to provide researchers and clinicians with a concise overview of the field’s present state,enduring constraints,and exciting directions for further research through this review.
文摘Visual question answering(VQA)is a multimodal task,involving a deep understanding of the image scene and the question’s meaning and capturing the relevant correlations between both modalities to infer the appropriate answer.In this paper,we propose a VQA system intended to answer yes/no questions about real-world images,in Arabic.To support a robust VQA system,we work in two directions:(1)Using deep neural networks to semantically represent the given image and question in a fine-grainedmanner,namely ResNet-152 and Gated Recurrent Units(GRU).(2)Studying the role of the utilizedmultimodal bilinear pooling fusion technique in the trade-o.between the model complexity and the overall model performance.Some fusion techniques could significantly increase the model complexity,which seriously limits their applicability for VQA models.So far,there is no evidence of how efficient these multimodal bilinear pooling fusion techniques are for VQA systems dedicated to yes/no questions.Hence,a comparative analysis is conducted between eight bilinear pooling fusion techniques,in terms of their ability to reduce themodel complexity and improve themodel performance in this case of VQA systems.Experiments indicate that these multimodal bilinear pooling fusion techniques have improved the VQA model’s performance,until reaching the best performance of 89.25%.Further,experiments have proven that the number of answers in the developed VQA system is a critical factor that a.ects the effectiveness of these multimodal bilinear pooling techniques in achieving their main objective of reducing the model complexity.The Multimodal Local Perception Bilinear Pooling(MLPB)technique has shown the best balance between the model complexity and its performance,for VQA systems designed to answer yes/no questions.
基金supported by the Funding for Research on the Evolution of Cyberbullying Incidents and Intervention Strategies(24BSH033)Discipline Innovation and Talent Introduction Bases in Higher Education Institutions(B20087).
文摘Hateful meme is a multimodal medium that combines images and texts.The potential hate content of hateful memes has caused serious problems for social media security.The current hateful memes classification task faces significant data scarcity challenges,and direct fine-tuning of large-scale pre-trained models often leads to severe overfitting issues.In addition,it is a challenge to understand the underlying relationship between text and images in the hateful memes.To address these issues,we propose a multimodal hateful memes classification model named LABF,which is based on low-rank adapter layers and bidirectional gated feature fusion.Firstly,low-rank adapter layers are adopted to learn the feature representation of the new dataset.This is achieved by introducing a small number of additional parameters while retaining prior knowledge of the CLIP model,which effectively alleviates the overfitting phenomenon.Secondly,a bidirectional gated feature fusion mechanism is designed to dynamically adjust the interaction weights of text and image features to achieve finer cross-modal fusion.Experimental results show that the method significantly outperforms existing methods on two public datasets,verifying its effectiveness and robustness.
基金supported by the National Natural Science Foundation of China(No.62404111)Natural Science Foundation of Jiangsu Province(No.BK20240635)+2 种基金Natural Science Foundation of the Jiangsu Higher Education Institutions of China(No.24KJB510025)Natural Science Research Start-up Foundation of Recruiting Talents of Nanjing University of Posts and Telecommunications(No.NY223157 and NY223156)Opening Project of Advanced Inte-grated Circuit Package and Testing Research Center of Jiangsu Province(No.NTIKFJJ202303).
文摘Multimodal sensor fusion can make full use of the advantages of various sensors,make up for the shortcomings of a single sensor,achieve information verification or information security through information redundancy,and improve the reliability and safety of the system.Artificial intelligence(AI),referring to the simulation of human intelligence in machines that are programmed to think and learn like humans,represents a pivotal frontier in modern scientific research.With the continuous development and promotion of AI technology in Sensor 4.0 age,multimodal sensor fusion is becoming more and more intelligent and automated,and is expected to go further in the future.With this context,this review article takes a comprehensive look at the recent progress on AI-enhanced multimodal sensors and their integrated devices and systems.Based on the concept and principle of sensor technologies and AI algorithms,the theoretical underpinnings,technological breakthroughs,and pragmatic applications of AI-enhanced multimodal sensors in various fields such as robotics,healthcare,and environmental monitoring are highlighted.Through a comparative study of the dual/tri-modal sensors with and without using AI technologies(especially machine learning and deep learning),AI-enhanced multimodal sensors highlight the potential of AI to improve sensor performance,data processing,and decision-making capabilities.Furthermore,the review analyzes the challenges and opportunities afforded by AI-enhanced multimodal sensors,and offers a prospective outlook on the forthcoming advancements.
基金supported by the National Key Research and Development Program of China(Grant No.2022YFC3004104)the National Natural Science Foundation of China(Grant No.U2342204)+4 种基金the Innovation and Development Program of the China Meteorological Administration(Grant No.CXFZ2024J001)the Open Research Project of the Key Open Laboratory of Hydrology and Meteorology of the China Meteorological Administration(Grant No.23SWQXZ010)the Science and Technology Plan Project of Zhejiang Province(Grant No.2022C03150)the Open Research Fund Project of Anyang National Climate Observatory(Grant No.AYNCOF202401)the Open Bidding for Selecting the Best Candidates Program(Grant No.CMAJBGS202318)。
文摘Thunderstorm wind gusts are small in scale,typically occurring within a range of a few kilometers.It is extremely challenging to monitor and forecast thunderstorm wind gusts using only automatic weather stations.Therefore,it is necessary to establish thunderstorm wind gust identification techniques based on multisource high-resolution observations.This paper introduces a new algorithm,called thunderstorm wind gust identification network(TGNet).It leverages multimodal feature fusion to fuse the temporal and spatial features of thunderstorm wind gust events.The shapelet transform is first used to extract the temporal features of wind speeds from automatic weather stations,which is aimed at distinguishing thunderstorm wind gusts from those caused by synoptic-scale systems or typhoons.Then,the encoder,structured upon the U-shaped network(U-Net)and incorporating recurrent residual convolutional blocks(R2U-Net),is employed to extract the corresponding spatial convective characteristics of satellite,radar,and lightning observations.Finally,by using the multimodal deep fusion module based on multi-head cross-attention,the temporal features of wind speed at each automatic weather station are incorporated into the spatial features to obtain 10-minutely classification of thunderstorm wind gusts.TGNet products have high accuracy,with a critical success index reaching 0.77.Compared with those of U-Net and R2U-Net,the false alarm rate of TGNet products decreases by 31.28%and 24.15%,respectively.The new algorithm provides grid products of thunderstorm wind gusts with a spatial resolution of 0.01°,updated every 10minutes.The results are finer and more accurate,thereby helping to improve the accuracy of operational warnings for thunderstorm wind gusts.
基金supported by the National Key Research and Development Program of China under Grant(2024YFE0100400)Taishan Scholars Project Special Funds(tsqn202312035)+2 种基金the open research foundation of State Key Laboratory of Integrated Chips and Systems,the Tianjin Science and Technology Plan Project(No.22JCZDJC00630)the Higher Education Institution Science and Technology Research Project of Hebei Province(No.JZX2024024)Jinan City-University Integrated Development Strategy Project under Grant(JNSX2023017).
文摘Sleep monitoring is an important part of health management because sleep quality is crucial for restoration of human health.However,current commercial products of polysomnography are cumbersome with connecting wires and state-of-the-art flexible sensors are still interferential for being attached to the body.Herein,we develop a flexible-integrated multimodal sensing patch based on hydrogel and its application in unconstraint sleep monitoring.The patch comprises a bottom hydrogel-based dualmode pressure–temperature sensing layer and a top electrospun nanofiber-based non-contact detection layer as one integrated device.The hydrogel as core substrate exhibits strong toughness and water retention,and the multimodal sensing of temperature,pressure,and non-contact proximity is realized based on different sensing mechanisms with no crosstalk interference.The multimodal sensing function is verified in a simulated real-world scenario by a robotic hand grasping objects to validate its practicability.Multiple multimodal sensing patches integrated on different locations of a pillow are assembled for intelligent sleep monitoring.Versatile human–pillow interaction information as well as their evolution over time are acquired and analyzed by a one-dimensional convolutional neural network.Track of head movement and recognition of bad patterns that may lead to poor sleep are achieved,which provides a promising approach for sleep monitoring.
基金supported by the Science and Technology Project of Henan Province(No.222102210081).
文摘Joint Multimodal Aspect-based Sentiment Analysis(JMASA)is a significant task in the research of multimodal fine-grained sentiment analysis,which combines two subtasks:Multimodal Aspect Term Extraction(MATE)and Multimodal Aspect-oriented Sentiment Classification(MASC).Currently,most existing models for JMASA only perform text and image feature encoding from a basic level,but often neglect the in-depth analysis of unimodal intrinsic features,which may lead to the low accuracy of aspect term extraction and the poor ability of sentiment prediction due to the insufficient learning of intra-modal features.Given this problem,we propose a Text-Image Feature Fine-grained Learning(TIFFL)model for JMASA.First,we construct an enhanced adjacency matrix of word dependencies and adopt graph convolutional network to learn the syntactic structure features for text,which addresses the context interference problem of identifying different aspect terms.Then,the adjective-noun pairs extracted from image are introduced to enable the semantic representation of visual features more intuitive,which addresses the ambiguous semantic extraction problem during image feature learning.Thereby,the model performance of aspect term extraction and sentiment polarity prediction can be further optimized and enhanced.Experiments on two Twitter benchmark datasets demonstrate that TIFFL achieves competitive results for JMASA,MATE and MASC,thus validating the effectiveness of our proposed methods.
文摘Aiming at the problems of traditional guide devices such as single environmental perception and poor terrain adaptability,this paper proposes an intelligent guide system based on a quadruped robot platform.Data fusion between millimeter-wave radar(with an accuracy of±0.1°)and an RGB-D camera is achieved through multisensor spatiotemporal registration technology,and a dataset suitable for guide dog robots is constructed.For the application scenario of edge-end guide dog robots,a lightweight CA-YOLOv11 target detection model integrated with an attention mechanism is innovatively adopted,achieving a comprehensive recognition accuracy of 95.8% in complex scenarios,which is 2.2% higher than that of the benchmark YOLOv11 network.The system supports navigation on complex terrains such as stairs(25 cm steps)and slopes(35°gradient),and the response time to sudden disturbances is shortened to 100 ms.Actual tests show that the navigation success rate reaches 95% in eight types of scenarios,the user satisfaction score is 4.8/5.0,and the cost is 50% lower than that of traditional guide dogs.
文摘A complete examination of Large Language Models’strengths,problems,and applications is needed due to their rising use across disciplines.Current studies frequently focus on single-use situations and lack a comprehensive understanding of LLM architectural performance,strengths,and weaknesses.This gap precludes finding the appropriate models for task-specific applications and limits awareness of emerging LLM optimization and deployment strategies.In this research,50 studies on 25+LLMs,including GPT-3,GPT-4,Claude 3.5,DeepKet,and hybrid multimodal frameworks like ContextDET and GeoRSCLIP,are thoroughly reviewed.We propose LLM application taxonomy by grouping techniques by task focus—healthcare,chemistry,sentiment analysis,agent-based simulations,and multimodal integration.Advanced methods like parameter-efficient tuning(LoRA),quantumenhanced embeddings(DeepKet),retrieval-augmented generation(RAG),and safety-focused models(GalaxyGPT)are evaluated for dataset requirements,computational efficiency,and performance measures.Frameworks for ethical issues,data limited hallucinations,and KDGI-enhanced fine-tuning like Woodpecker’s post-remedy corrections are highlighted.The investigation’s scope,mad,and methods are described,but the primary results are not.The work reveals that domain-specialized fine-tuned LLMs employing RAG and quantum-enhanced embeddings performbetter for context-heavy applications.In medical text normalization,ChatGPT-4 outperforms previous models,while two multimodal frameworks,GeoRSCLIP,increase remote sensing.Parameter-efficient tuning technologies like LoRA have minimal computing cost and similar performance,demonstrating the necessity for adaptive models in multiple domains.To discover the optimum domain-specific models,explain domain-specific fine-tuning,and present quantum andmultimodal LLMs to address scalability and cross-domain issues.The framework helps academics and practitioners identify,adapt,and innovate LLMs for different purposes.This work advances the field of efficient,interpretable,and ethical LLM application research.
基金supported by the National Natural Science Foundation of China(Nos.51375383).
文摘In this study,we present a small,integrated jumping-crawling robot capable of intermittent jumping and self-resetting.Compared to robots with a single mode of locomotion,this multi-modal robot exhibits enhanced obstacle-surmounting capabilities.To achieve this,the robot employs a novel combination of a jumping module and a crawling module.The jumping module features improved energy storage capacity and an active clutch.Within the constraints of structural robustness,the jumping module maximizes the explosive power of the linear spring by utilizing the mechanical advantage of a closed-loop mechanism and controls the energy flow of the jumping module through an active clutch mechanism.Furthermore,inspired by the limb movements of tortoises during crawling and self-righting,a single-degree-of-freedom spatial four-bar crawling mechanism was designed to enable crawling,steering,and resetting functions.To demonstrate its practicality,the integrated jumping-crawling robot was tested in a laboratory environment for functions such as jumping,crawling,self-resetting,and steering.Experimental results confirmed the feasibility of the proposed integrated jumping-crawling robot.
文摘With the popularization of social media,stickers have become an important tool for young students to express themselves and resist mainstream culture due to their unique visual and emotional expressiveness.Most existing studies focus on the negative impacts of spoof stickers,while paying insufficient attention to their positive functions.From the perspective of multimodal metaphor,this paper uses methods such as virtual ethnography and image-text analysis to clarify the connotation of stickers,understand the evolution of their digital dissemination forms,and explore the multiple functions of subcultural stickers in the social interactions between teachers and students.Young students use stickers to convey emotions and information.Their expressive function,social function,and cultural metaphor function progress in a progressive manner.This not only shapes students’values but also promotes self-expression and teacher-student interaction.It also reminds teachers to correct students’negative thoughts by using stickers,achieving the effect of“cultivating and influencing people through culture.”
文摘Inverse Synthetic Aperture Radar(ISAR)images of complex targets have a low Signal-to-Noise Ratio(SNR)and contain fuzzy edges and large differences in scattering intensity,which limits the recognition performance of ISAR systems.Also,data scarcity poses a greater challenge to the accurate recognition of components.To address the issues of component recognition in complex ISAR targets,this paper adopts semantic segmentation and proposes a few-shot semantic segmentation framework fusing multimodal features.The scarcity of available data is mitigated by using a two-branch scattering feature encoding structure.Then,the high-resolution features are obtained by fusing the ISAR image texture features and scattering quantization information of complex-valued echoes,thereby achieving significantly higher structural adaptability.Meanwhile,the scattering trait enhancement module and the statistical quantification module are designed.The edge texture is enhanced based on the scatter quantization property,which alleviates the segmentation challenge of edge blurring under low SNR conditions.The coupling of query/support samples is enhanced through four-dimensional convolution.Additionally,to overcome fusion challenges caused by information differences,multimodal feature fusion is guided by equilibrium comprehension loss.In this way,the performance potential of the fusion framework is fully unleashed,and the decision risk is effectively reduced.Experiments demonstrate the great advantages of the proposed framework in multimodal feature fusion,and it still exhibits great component segmentation capability under low SNR/edge blurring conditions.
文摘BACKGROUND Esophageal neuroendocrine carcinoma(NEC),a rare and aggressive malignancy with a poor prognosis,is often diagnosed at an advanced stage.The optimal treatment strategy for locally advanced and recurrent esophageal NEC remains unclear,and conversion surgery has only been reported for a few cases.Herein,we present the case of a 66-year-old male with locally advanced esophageal NEC initially diagnosed as squamous cell carcinoma.CASE SUMMARY The patient underwent induction chemotherapy with docetaxel,cisplatin,and 5-fluorouracil,followed by conversion surgery,including subtotal esophagectomy,three-field lymph node dissection,and distal pancreatectomy with splenectomy,due to infiltration of the pancreas by the No.11p lymph node.Postoperative pathological findings revealed a large cell-type NEC without a squamous cell carcinoma component,suspected to be a mixed neuroendocrine/non-neuroendocrine neoplasm.Hepatic metastasis was diagnosed within one month of surgery.Despite the administration of four courses of irinotecan+cisplatin chemotherapy,the treatment effect was considered a‘progressive disease’.After a multidisciplinary discussion,the patient underwent partial liver resection,followed by second-line chemotherapy with amrubicin.The patient achieved three-year survival with no new recurrence.CONCLUSION This case highlights the potential of multimodal treatment for long-term survival in advanced esophageal NEC.
基金supported by the National Natural Science Foundation of China(Nos.52377133 and 52077014)the Youth Talent Support Program of Chongqing(CQYC2021058945)the General Program of the Natural Science Foundation of Chongqing Municipality(CSTB2022NSCQ-MSX0444).
文摘To address the inherent trade-off between mechanical strength and repair efficiency in conventional microcapsule-based self-healing technologies,this study presents an eggshell-inspired approach for fabricating high-load rigid porous microcapsules(HLRPMs)through subcritical water etching.By optimizing the subcritical water treatment parameters(OH−concentration:0.031 mol/L,tem-perature:240°C,duration:1.5 h),nanoscale through-holes were generated on hollow glass microspheres(shell thickness≈700 nm).The subsequent gradient pressure infiltration of flaxseed oil enabled a record-high core content of 88.2%.Systematic investigations demonstrated that incorporating 3 wt%HLRPMs into epoxy resin composites preserved excellent dielectric properties(breakdown strength≥30 kV/mm)and enhanced tensile strength by 7.52%.In addressing multimodal damage,the system achieved a 95.5%filling efficiency for mechanical scratches,a 97.0%reduction in frictional damage depth,and a 96.2%recovery of insulation following electrical treeing.This biomimetic microcapsule system concurrently improved self-healing capability and matrix performance,offering a promising strategy for the development of next-generation smart insulating materials.
文摘Objective:To explore the clinical effect of multimodal nursing intervention on postoperative pain management in patients undergoing gastrointestinal surgery.Methods:A total of 120 patients who underwent gastrointestinal surgery in our hospital from January 2023 to January 2024 were selected as the research subjects.They were randomly divided into the intervention group and the control group,with 60 cases in each group.The control group received routine postoperative care,while the intervention group received multimodal pain care intervention.The postoperative pain scores,the rate of using analgesic drugs,postoperative recovery indicators,and nursing satisfaction were compared between the two groups.Results:At 24 hours,48 hours,and 72 hours after surgery,the VAS pain scores of the intervention group were significantly lower than those of the control group(p<0.05);the rate of using analgesic drugs in the intervention group(25.0%)was significantly lower than that in the control group(48.3%)(p<0.05);the first defecation time,first ambulation time,and hospital stay of the intervention group were shorter than those of the control group(p<0.05);the nursing satisfaction of the intervention group(96.7%)was significantly higher than that of the control group(80.0%)(p<0.05).Conclusion:Multimodal pain care intervention can effectively relieve postoperative pain in patients undergoing gastrointestinal surgery,reduce the use of analgesic drugs,promote postoperative recovery,and improve nursing satisfaction.
基金supported by the National Key R&D Program of China(Grant Nos.2022YFB3603403,2021YFB3600502)the National Natural Science Foundation of China(Grant Nos.62075040,62301150)+3 种基金the Southeast University Interdisciplinary Research Program for Young Scholars(2024FGC1007)the Start-up Research Fund of Southeast University(RF1028623164)the Nanjing Science and Technology Innovation Project for Returned Overseas Talent(4206002302)the Fundamental Research Funds for the Central Universities(2242024K40015).
文摘Benefiting from the widespread potential applications in the era of the Internet of Thing and metaverse,triboelectric and piezoelectric nanogenerators(TENG&PENG)have attracted considerably increasing attention.Their outstanding characteristics,such as self-powered ability,high output performance,integration compatibility,cost-effectiveness,simple configurations,and versatile operation modes,could effectively expand the lifetime of vastly distributed wearable,implantable,and environmental devices,eventually achieving self-sustainable,maintenance-free,and reliable systems.However,current triboelectric/piezoelectric based active(i.e.self-powered)sensors still encounter serious bottlenecks in continuous monitoring and multimodal applications due to their intrinsic limitations of monomodal kinetic response and discontinuous transient output.This work systematically summarizes and evaluates the recent research endeavors to address the above challenges,with detailed discussions on the challenge origins,designing strategies,device performance,and corresponding diverse applications.Finally,conclusions and outlook regarding the research gap in self-powered continuous multimodal monitoring systems are provided,proposing the necessity of future research development in this field.
基金the National Natural Science Foundation of China(62375127,82272664)Hunan Provincial Natural Science Foundation of China(2022JJ30843)+5 种基金the Science and Technology Development Fund Guided by Central Govern-ment(2021Szvup169)the Scientic Research Program of Hunan Provincial Health Commission(B202304077077)the Fundamental Research Funds for the Central Universities(NS2022035)Prospective Layout Special Fund of Nanjing University of Aero-nautics and Astronautics(ILA-22022)Graduate Research and Innovation Program of Nanjing University of Aeronautics and Astronautics(xcxjh20220328)Experimental Technology Research and Development Project of NUAA(No.SYJS202303Z)for the grant。
文摘Osteosarcoma is the most common primary bone tumor with high malignancy.It is particularly necessary to achieve rapid and accurate diagnosis in its intraoperative examination and early diagnosis.Accordingly,the multimodal microscopic imaging diagnosis system constructed by bright field,spontaneous fluorescence and polarized light microscopic imaging was used to study the pathological mechanism of osteosarcoma from the tissue microenvironment level and achieve rapid and accurate diagnosis.First,the multimodal microscopic images of normal and osteosarcoma tissue slices were collected to characterize the overall morphology of the tissue microenvironment of the samples,the arrangement structure of collagen fibers and the content and distribution of endogenous fluorescent substances.Second,based on the correlation and complementarity of the feature information contained in the three single-mode images,combined with convolutional neural network(CNN)and image fusion methods,a multimodal intelligent diagnosis model was constructed to effectively improve the information utilization and diagnosis accuracy.The accuracy and true positivity of the multimodal diagnostic model were significantly improved to 0.8495 and 0.9412,respectively,compared to those of the single-modal models.Besides,the difference of tissue microenvironments before and after cancerization can be used as a basis for cancer diagnosis,and the information extraction and intelligent diagnosis of osteosarcoma tissue can be achieved by using multimodal microscopic imaging technology combined with deep learning,which significantly promoted the application of tissue microenvironment in pathological examination.This diagnostic system relies on its advantages of simple operation,high efficiency and accuracy and high cost-effectiveness,and has enormous clinical application potential and research significance.
基金the financial support given by National Natural Science Foundation of China(52227808,62202285)the National Science Foundation for Distinguished Young Scholars of China(51725505)+1 种基金the Development Fund for Shanghai Talents(No.2021003)Shanghai Collaborative Innovation Center of Intelligent Perception Chip Technology。
文摘Neuromorphic devices,inspired by the intricate architecture of the human brain,have garnered recognition for their prodigious computational speed and sophisticated parallel computing capabilities.Vision,the primary mode of external information acquisition in living organisms,has garnered substantial scholarly interest.Notwithstanding numerous studies simulating the retina through optical synapses,their applications remain circumscribed to single-mode perception.Moreover,the pivotal role of temperature,a fundamental regulator of biological activities,has regrettably been relegated to the periphery.To address these limitations,we proffer a neuromorphic device endowed with multimodal perception,grounded in the principles of light-modulated semiconductors.This device seamlessly accomplishes dynamic hybrid visual and thermal multimodal perception,featuring temperature-dependent paired pulse facilitation properties and adaptive storage.Crucially,our meticulous examination of transfer curves,capacitance–voltage(C–V)tests,and noise measurements provides insights into interface and bulk defects,elucidating the physical mechanisms underlying adaptive storage and other functionalities.Additionally,the device demonstrates a variety of synaptic functionalities,including filtering properties,Ebbinghaus curves,and memory applications in image recognition.Surprisingly,the digital recognition rate achieves a remarkable value of 98.8%.
基金supported by the National Key Research and Development Program(No.2022YFB4602502)the Guangdong Basic and Applied Basic Research Foundation(No.2021A1515011991)+2 种基金the Key Research and Development Program Fund of Hubei Province(No.2022BAA057)the state Key Lab-oratory of Solidification Processing in NPU(No.SKLSP202325)the China Scholarship Council Visiting PhD Program(No.202306410136).
文摘Multimodal ultrasonic vibration(UV)assisted micro-forming has been widely investigated for its advantages of further reducing forming loads and improving forming quality.However,the influence mechanism of different UV modes on microstructure evolution and mechanical properties was still unclear.Mul-timodal UV assisted micro-compression tests on T2 copper with different grains and sample sizes were conducted in this study.The microstructure evolution for different UV modes was observed by EBSD.The results showed that the true stress reduction caused by UV was increased sequentially with tool ultrasonic vibration(TV),mold ultrasonic vibration(MV)and compound ultrasonic vibration(CV).The region of grain deformation was shifted along the direction of UV,and the MV promoted the uniform distribution of deformation stress.The grain refinement,fiber streamline density,grain deformation and rotation degree were further enhanced under CV,due to the synergistic effect of TV and MV.Additionally,a coupled theoretical model considering both acoustic softening effect and size effect was proposed for describing the mechanical properties.And a physical model of dislocation motion in different UV modes was developed for describing the microstructure evolution.The maximum error between the theoretical and experimental results was only 2.39%.This study provides a theoretical basis for the optimization of UV assisted micro-forming process.