Rock discontinuities control rock mechanical behaviors and significantly influence the stability of rock masses.However,existing discontinuity mapping algorithms are susceptible to noise,and the calculation results ca...Rock discontinuities control rock mechanical behaviors and significantly influence the stability of rock masses.However,existing discontinuity mapping algorithms are susceptible to noise,and the calculation results cannot be fed back to users timely.To address this issue,we proposed a human-machine interaction(HMI)method for discontinuity mapping.Users can help the algorithm identify the noise and make real-time result judgments and parameter adjustments.For this,a regular cube was selected to illustrate the workflows:(1)point cloud was acquired using remote sensing;(2)the HMI method was employed to select reference points and angle thresholds to detect group discontinuity;(3)individual discontinuities were extracted from the group discontinuity using a density-based cluster algorithm;and(4)the orientation of each discontinuity was measured based on a plane fitting algorithm.The method was applied to a well-studied highway road cut and a complex natural slope.The consistency of the computational results with field measurements demonstrates its good accuracy,and the average error in the dip direction and dip angle for both cases was less than 3.Finally,the computational time of the proposed method was compared with two other popular algorithms,and the reduction in computational time by tens of times proves its high computational efficiency.This method provides geologists and geological engineers with a new idea to map rapidly and accurately rock structures under large amounts of noises or unclear features.展开更多
With the development of globalization,intercultural communicative competence has become one of the core qualities of modern college students.As an important platform to cultivate students’language skills and cultural...With the development of globalization,intercultural communicative competence has become one of the core qualities of modern college students.As an important platform to cultivate students’language skills and cultural literacy,the innovation of college English teaching mode is essential.Based on this,this paper mainly discusses methods to effectively cultivate students’intercultural communicative competence in college English teaching from the perspective of multimodal interactive teaching mode,hoping to provide references for improving the quality of college English teaching and students’comprehensive quality.展开更多
Hydrogel-based triboelectric nanoge nerator(TENG)has a promising applied prospect in wearable electronic devices.However,its low performance,poor stability,insufficient recyclability and inferior self-healing seriousl...Hydrogel-based triboelectric nanoge nerator(TENG)has a promising applied prospect in wearable electronic devices.However,its low performance,poor stability,insufficient recyclability and inferior self-healing seriously hinder its development.Herein,we report a robust route to a liquid metal(LM)/polyvinyl alcohol(PVA)hydrogel-based TENG(LP-TENG).Owing to the intrinsically liquid feature of conductive LM within the flexible PVA hydrogel,the as-prepared LP-TENG exhibited comprehensiye advantages of adaptability,biocompatibility,outstanding electrical performance,superior stability,recyclability and diverse applications,which were unattainable by traditional systems.Concretely,the LP-TENG delivered appealing open circuit voltage of 250 V,short circuit current of 4μA and transferred charge of 120 nC with high stability,outperforming most advanced TENG systems.The LP-TENG was successfully employed for versatile applications with multifunctionality,including human motion detection,handwriting recognition,energy collection,message transmission and human-machine interaction.This work presents significant prospects for crafting advanced materials and devices in the fields of wearable electronics,flexible skin and smart robots.展开更多
As the Internet of Things advances,gesture recognition emerges as a prominent domain in human-machine interaction(HMI).However,interactive wearables based on conductive hydrogels for individuals with single-arm functi...As the Internet of Things advances,gesture recognition emerges as a prominent domain in human-machine interaction(HMI).However,interactive wearables based on conductive hydrogels for individuals with single-arm functionality or disabilities remain underexplored.Here,we devised a wearable one-handed keyboard with gesture recognition,employing machine learning algorithms and hydrogel-based mechanical sensors to boost productivity.PCG(PAM/CMC/rGO)hydrogels are composed of polyacrylamide(PAM),sodium carboxymethyl cellulose(CMC),and reduced graphene oxide(rGO),which function as a strain,pressure sensor,and electrode material.The PAM chains offer the gel’s elasticity by covalent cross-linking,while the biocompatible CMC improves the dispersion of rGO and promotes electromechanical properties.Integrating rGO sheets into the polymer matrix facilitates cross-linking and generates supple-mentary conductive pathways,thereby augmenting the gel system’s elasticity,sensitivity,and durability.Our hydrogel sensors include high sensitivity(gage factor(GF)=8.18,395.6%-551.96%)and superior pressure sensing capabilities(Sensitivity(S)=0.3116 kPa^(-1),0-9.82 kPa).Furthermore,we developed a wearable keyboard with up to 98.13%accuracy using convolutional neural networks and a custom data acquisition system.This study establishes the groundwork for creating multifunctional gel sensors for intelligent machines,wearable devices,and brain-computer interfaces.展开更多
With the popularization of social media,stickers have become an important tool for young students to express themselves and resist mainstream culture due to their unique visual and emotional expressiveness.Most existi...With the popularization of social media,stickers have become an important tool for young students to express themselves and resist mainstream culture due to their unique visual and emotional expressiveness.Most existing studies focus on the negative impacts of spoof stickers,while paying insufficient attention to their positive functions.From the perspective of multimodal metaphor,this paper uses methods such as virtual ethnography and image-text analysis to clarify the connotation of stickers,understand the evolution of their digital dissemination forms,and explore the multiple functions of subcultural stickers in the social interactions between teachers and students.Young students use stickers to convey emotions and information.Their expressive function,social function,and cultural metaphor function progress in a progressive manner.This not only shapes students’values but also promotes self-expression and teacher-student interaction.It also reminds teachers to correct students’negative thoughts by using stickers,achieving the effect of“cultivating and influencing people through culture.”展开更多
Aiming at the problems of traditional guide devices such as single environmental perception and poor terrain adaptability,this paper proposes an intelligent guide system based on a quadruped robot platform.Data fusion...Aiming at the problems of traditional guide devices such as single environmental perception and poor terrain adaptability,this paper proposes an intelligent guide system based on a quadruped robot platform.Data fusion between millimeter-wave radar(with an accuracy of±0.1°)and an RGB-D camera is achieved through multisensor spatiotemporal registration technology,and a dataset suitable for guide dog robots is constructed.For the application scenario of edge-end guide dog robots,a lightweight CA-YOLOv11 target detection model integrated with an attention mechanism is innovatively adopted,achieving a comprehensive recognition accuracy of 95.8% in complex scenarios,which is 2.2% higher than that of the benchmark YOLOv11 network.The system supports navigation on complex terrains such as stairs(25 cm steps)and slopes(35°gradient),and the response time to sudden disturbances is shortened to 100 ms.Actual tests show that the navigation success rate reaches 95% in eight types of scenarios,the user satisfaction score is 4.8/5.0,and the cost is 50% lower than that of traditional guide dogs.展开更多
Video action recognition(VAR)aims to analyze dynamic behaviors in videos and achieve semantic understanding.VAR faces challenges such as temporal dynamics,action-scene coupling,and the complexity of human interactions...Video action recognition(VAR)aims to analyze dynamic behaviors in videos and achieve semantic understanding.VAR faces challenges such as temporal dynamics,action-scene coupling,and the complexity of human interactions.Existing methods can be categorized into motion-level,event-level,and story-level ones based on spatiotemporal granularity.However,single-modal approaches struggle to capture complex behavioral semantics and human factors.Therefore,in recent years,vision-language models(VLMs)have been introduced into this field,providing new research perspectives for VAR.In this paper,we systematically review spatiotemporal hierarchical methods in VAR and explore how the introduction of large models has advanced the field.Additionally,we propose the concept of“Factor”to identify and integrate key information from both visual and textual modalities,enhancing multimodal alignment.We also summarize various multimodal alignment methods and provide in-depth analysis and insights into future research directions.展开更多
Electromyography(EMG)has already been broadly used in human-machine interaction(HMI)applications.Determining how to decode the information inside EMG signals robustly and accurately is a key problem for which we urgen...Electromyography(EMG)has already been broadly used in human-machine interaction(HMI)applications.Determining how to decode the information inside EMG signals robustly and accurately is a key problem for which we urgently need a solution.Recently,many EMG pattern recognition tasks have been addressed using deep learning methods.In this paper,we analyze recent papers and present a literature review describing the role that deep learning plays in EMG-based HMI.An overview of typical network structures and processing schemes will be provided.Recent progress in typical tasks such as movement classification,joint angle prediction,and force/torque estimation will be introduced.New issues,including multimodal sensing,inter-subject/inter-session,and robustness toward disturbances will be discussed.We attempt to provide a comprehensive analysis of current research by discussing the advantages,challenges,and opportunities brought by deep learning.We hope that deep learning can aid in eliminating factors that hinder the development of EMG-based HMI systems.Furthermore,possible future directions will be presented to pave the way for future research.展开更多
Disentangling the influence of multiple signal components on receivers and elucidating general processes influencing complex signal evolution are difficult tasks. In this study we test mate preferences of female squir...Disentangling the influence of multiple signal components on receivers and elucidating general processes influencing complex signal evolution are difficult tasks. In this study we test mate preferences of female squirrel treefrogs Hyla squirella and female tungara frogs Physalaemus pustulosus for similar combinations of acoustic and visual components of their multimodal courtship signals. In a two-choice playback experiment with squirrel treefrogs, the visual stimulus of a male model significantly increased the attractivness of a relatively unattractive slow call rate. A previous study demonstrated that faster call rates are more attractive to female squirrel treefrogs, and all else being equal, models of male frogs with large body stripes are more attractive. In a similar experiment with female tungara frogs, the visual stimulus of a robotic frog failed to increase the attractiveness of a relatively unattractive call. Females also showed no preference for the distinct stripe on the robot that males commonly bear on their throat. Thus, features of conspicuous signal components such as body stripes are not universally important and signal function is likely to differ even among species with similar ecologies and communication systems. Finally, we discuss the putative information content of anuran signals and suggest that the categorization of redundant versus multiple messages may not be sufficient as a general explanation for the evolution of multimodal signaling. Instead of relying on untested assumptions concerning the information content of signals, we discuss the value of initially collecting comparative empirical data sets related to receiver responses.展开更多
Background Augmented reality classrooms have become an interesting research topic in the field of education,but there are some limitations.Firstly,most researchers use cards to operate experiments,and a large number o...Background Augmented reality classrooms have become an interesting research topic in the field of education,but there are some limitations.Firstly,most researchers use cards to operate experiments,and a large number of cards cause difficulty and inconvenience for users.Secondly,most users conduct experiments only in the visual modal,and such single-modal interaction greatly reduces the users'real sense of interaction.In order to solve these problems,we propose the Multimodal Interaction Algorithm based on Augmented Reality(ARGEV),which is based on visual and tactile feedback in Augmented Reality.In addition,we design a Virtual and Real Fusion Interactive Tool Suite(VRFITS)with gesture recognition and intelligent equipment.Methods The ARGVE method fuses gesture,intelligent equipment,and virtual models.We use a gesture recognition model trained by a convolutional neural network to recognize the gestures in AR,and to trigger a vibration feedback after a recognizing a five finger grasp gesture.We establish a coordinate mapping relationship between real hands and the virtual model to achieve the fusion of gestures and the virtual model.Results The average accuracy rate of gesture recognition was 99.04%.We verify and apply VRFITS in the Augmented Reality Chemistry Lab(ARCL),and the overall operation load of ARCL is thus reduced by 29.42%,in comparison to traditional simulation virtual experiments.Conclusions We achieve real-time fusion of the gesture,virtual model,and intelligent equipment in ARCL.Compared with the NOBOOK virtual simulation experiment,ARCL improves the users'real sense of operation and interaction efficiency.展开更多
Speech recognition rate will deteriorate greatly in human-machine interaction when the speaker's speech mixes with a bystander's voice. This paper proposes a time-frequency approach for Blind Source Seperation...Speech recognition rate will deteriorate greatly in human-machine interaction when the speaker's speech mixes with a bystander's voice. This paper proposes a time-frequency approach for Blind Source Seperation (BSS) for intelligent Human-Machine Interaction(HMI). Main idea of the algorithm is to simultaneously diagonalize the correlation matrix of the pre-whitened signals at different time delays for every frequency bins in time-frequency domain. The prososed method has two merits: (1) fast convergence speed; (2) high signal to interference ratio of the separated signals. Numerical evaluations are used to compare the performance of the proposed algorithm with two other deconvolution algorithms. An efficient algorithm to resolve permutation ambiguity is also proposed in this paper. The algorithm proposed saves more than 10% of computational time with properly selected parameters and achieves good performances for both simulated convolutive mixtures and real room recorded speeches.展开更多
Teleoperation is of great importance in the area of robotics,especially when people are unavailable in the robot workshop.It provides a way for people to control robots remotely using human intelligence.In this paper,...Teleoperation is of great importance in the area of robotics,especially when people are unavailable in the robot workshop.It provides a way for people to control robots remotely using human intelligence.In this paper,a robotic teleoperation system for precise robotic manipulation is established.The data glove and the 7-degrees of freedom(DOFs)force feedback controller are used for the remote control interaction.The control system and the monitor system are designed for the remote precise manipulation.The monitor system contains an image acquisition system and a human-machine interaction module,and aims to simulate and detect the robot running state.Besides,a visual object tracking algorithm is developed to estimate the states of the dynamic system from noisy observations.The established robotic teleoperation systemis applied to a series of experiments,and high-precision results are obtained,showing the effectiveness of the physical system.展开更多
Biography videos based on life performances of prominent figures in history aim to describe great mens' life.In this paper,a novel interactive video summarization for biography video based on multimodal fusion is ...Biography videos based on life performances of prominent figures in history aim to describe great mens' life.In this paper,a novel interactive video summarization for biography video based on multimodal fusion is proposed,which is a novel approach of visualizing the specific features for biography video and interacting with video content by taking advantage of the ability of multimodality.In general,a story of movie progresses by dialogues of characters and the subtitles are produced with the basis on the dialogues which contains all the information related to the movie.In this paper,JGibbsLDA is applied to extract key words from subtitles because the biography video consists of different aspects to depict the characters' whole life.In terms of fusing keywords and key-frames,affinity propagation is adopted to calculate the similarity between each key-frame cluster and keywords.Through the method mentioned above,a video summarization is presented based on multimodal fusion which describes video content more completely.In order to reduce the time spent on searching the interest video content and get the relationship between main characters,a kind of map is adopted to visualize video content and interact with video summarization.An experiment is conducted to evaluate video summarization and the results demonstrate that this system can formally facilitate the exploration of video content while improving interaction and finding events of interest efficiently.展开更多
Continuous emotion recognition is to predict emotion states through affective information and more focus on the continuous variation of emotion. Fusion of electroencephalography (EEG) and facial expressions videos has...Continuous emotion recognition is to predict emotion states through affective information and more focus on the continuous variation of emotion. Fusion of electroencephalography (EEG) and facial expressions videos has been used in this field, while there are with some limitations in current researches, such as hand-engineered features, simple approaches to integration. Hence, a new continuous emotion recognition model is proposed based on the fusion of EEG and facial expressions videos named residual multimodal Transformer (RMMT). Firstly, the Resnet50 and temporal convolutional network (TCN) are utilised to extract spatiotemporal features from videos, and the TCN is also applied to process the computed EEG frequency power to acquire spatiotemporal features of EEG. Then, a multimodal Transformer is used to fuse the spatiotemporal features from the two modalities. Furthermore, a residual connection is introduced to fuse shallow features with deep features which is verified to be effective for continuous emotion recognition through experiments. Inspired by knowledge distillation, the authors incorporate feature-level loss into the loss function to further enhance the network performance. Experimental results show that the RMMT reaches a superior performance over other methods for the MAHNOB-HCI dataset. Ablation studies on the residual connection and loss function in the RMMT demonstrate that both of them is functional.展开更多
The fusion of VlSI (visual identity system Internet), digital maps and Web GIS is presented. Web GIS interface interactive design with VISI needs to consider more new factors. VISI can provide the design principle, ...The fusion of VlSI (visual identity system Internet), digital maps and Web GIS is presented. Web GIS interface interactive design with VISI needs to consider more new factors. VISI can provide the design principle, elements and contents for the Web GIS. The design of the Wuhan Bus Search System is fulfilled to confirm the validity and practicability of the fusion.展开更多
Background With an increasing number of vehicles becoming autonomous,intelligent,and connected,paying attention to the future usage of car human-machine interface with these vehicles should become more relevant.Severa...Background With an increasing number of vehicles becoming autonomous,intelligent,and connected,paying attention to the future usage of car human-machine interface with these vehicles should become more relevant.Several studies have addressed car HMI but were less attentive to designing and implementing interactive glazing for every day(autonomous)driving contexts.Methods Reflecting on the literature,we describe an engineering psychology practice and the design of six novel future user scenarios,which envision the application of a specific set of augmented reality(AR)support user interactions.Additionally,we conduct evaluations on specific scenarios and experiential prototypes,which reveal that these AR scenarios aid the target user groups in experiencing a new type of interaction.The overall evaluation is positive with valuable assessment results and suggestions.Conclusions This study can interest applied psychology educators who aspire to teach how AR can be operationalized in a human-centered design process to students with minimal pre-existing expertise or minimal scientific knowledge in engineering psychology.展开更多
Surgical robots are designed to provide enhanced precision and dexterity compared to manual surgical procedures,which mainly rely on multimodal sensing technologies for the surgeon to seamlessly operate the robotic ar...Surgical robots are designed to provide enhanced precision and dexterity compared to manual surgical procedures,which mainly rely on multimodal sensing technologies for the surgeon to seamlessly operate the robotic arms and instruments.Compared with single-mode sensors,optical and mechanical bi-modal sensors provide improved precision,enhanced safety,and robustness of human-machine interaction systems.Here,the template-guided and pneumatic printing technologies are combined to construct perovskite and graphene parallel structures with both optical and mechanical sensing capabilities.The printed uniformly crystallized perovskite microstructure exhibits fast and sensitive photoelectric response characteristics,enabling shadow recognition functionality.The combination of graphene and elastic rubber endows the great printability to prepare parallel structures near the perovskite arrays for force sensing capabilities.Thus,the printed perovskite and graphene structures possess non-contact optical sensing capabilities to detect hand position by recognizing shadows between the hand and the sensor,as well as contact mechanical sensing capabilities to detect touch force applied by the hand.It provides a synergistic platform for real-time and multidimensional feedback to improve human-machine interaction.展开更多
Population migration data derived from location-based services has often been used to delineate population flows between cities or construct intercity relationship networks to reveal and explore the complex interactio...Population migration data derived from location-based services has often been used to delineate population flows between cities or construct intercity relationship networks to reveal and explore the complex interaction patterns underlying human activities.Nevertheless,the inherent heterogeneity in multimodal migration big data has been ignored.This study conducts an in-depth comparison and quantitative analysis through a comprehensive lens of spatial association.Initially,the intercity interactive networks in China were constructed,utilizing migration data from Baidu and AutoNavi collected during the same time period.Subsequently,the characteristics and spatial structure similarities of the two types of intercity interactive networks were quantitatively assessed and analyzed from overall(network)and local(node)perspectives.Furthermore,the precision of these networks at the local scale is corroborated by constructing an intercity network from mobile phone(MP)data.Results indicate that the intercity interactive networks in China,as delineated by Baidu and AutoNavi migration flows,exhibit a high degree of structure equivalence.The correlation coefficient between these two networks is 0.874.Both networks exhibit a pronounced spatial polarization trend and hierarchical structure.This is evident in their distinct core and peripheral structures,as well as in the varying importance and influence of different nodes within the networks.Nevertheless,there are notable differences worthy of attention.Baidu intercity interactive network exhibits pronounced cross-regional effects,and its high-level interactions are characterized by a“rich-club”phenomenon.The AutoNavi intercity interactive network presents a more significant distance attenuation effect,and the high-level interactions display a gradient distribution pattern.Notably,there exists a substantial correlation between the AutoNavi and MP networks at the local scale,evidenced by a high correlation coefficient of 0.954.Furthermore,the“spatial dislocations”phenomenon was observed within the spatial structures at different levels,extracted from the Baidu and AutoNavi intercity networks.However,the measured results of network spatial structure similarity from three dimensions,namely,node location,node size,and local structure,indicate a relatively high similarity and consistency between the two networks.展开更多
基金supported by the National Key R&D Program of China(No.2023YFC3081200)the National Natural Science Foundation of China(No.42077264)the Scientific Research Project of PowerChina Huadong Engineering Corporation Limited(HDEC-2022-0301).
文摘Rock discontinuities control rock mechanical behaviors and significantly influence the stability of rock masses.However,existing discontinuity mapping algorithms are susceptible to noise,and the calculation results cannot be fed back to users timely.To address this issue,we proposed a human-machine interaction(HMI)method for discontinuity mapping.Users can help the algorithm identify the noise and make real-time result judgments and parameter adjustments.For this,a regular cube was selected to illustrate the workflows:(1)point cloud was acquired using remote sensing;(2)the HMI method was employed to select reference points and angle thresholds to detect group discontinuity;(3)individual discontinuities were extracted from the group discontinuity using a density-based cluster algorithm;and(4)the orientation of each discontinuity was measured based on a plane fitting algorithm.The method was applied to a well-studied highway road cut and a complex natural slope.The consistency of the computational results with field measurements demonstrates its good accuracy,and the average error in the dip direction and dip angle for both cases was less than 3.Finally,the computational time of the proposed method was compared with two other popular algorithms,and the reduction in computational time by tens of times proves its high computational efficiency.This method provides geologists and geological engineers with a new idea to map rapidly and accurately rock structures under large amounts of noises or unclear features.
文摘With the development of globalization,intercultural communicative competence has become one of the core qualities of modern college students.As an important platform to cultivate students’language skills and cultural literacy,the innovation of college English teaching mode is essential.Based on this,this paper mainly discusses methods to effectively cultivate students’intercultural communicative competence in college English teaching from the perspective of multimodal interactive teaching mode,hoping to provide references for improving the quality of college English teaching and students’comprehensive quality.
基金financially supported by the Natural Science Foundation of China(Nos.22109120,62104170 and 82202757)Zhejiang Provincial Natural Science Foundation of China(Nos.LQ21B030002 and LY23F040001)。
文摘Hydrogel-based triboelectric nanoge nerator(TENG)has a promising applied prospect in wearable electronic devices.However,its low performance,poor stability,insufficient recyclability and inferior self-healing seriously hinder its development.Herein,we report a robust route to a liquid metal(LM)/polyvinyl alcohol(PVA)hydrogel-based TENG(LP-TENG).Owing to the intrinsically liquid feature of conductive LM within the flexible PVA hydrogel,the as-prepared LP-TENG exhibited comprehensiye advantages of adaptability,biocompatibility,outstanding electrical performance,superior stability,recyclability and diverse applications,which were unattainable by traditional systems.Concretely,the LP-TENG delivered appealing open circuit voltage of 250 V,short circuit current of 4μA and transferred charge of 120 nC with high stability,outperforming most advanced TENG systems.The LP-TENG was successfully employed for versatile applications with multifunctionality,including human motion detection,handwriting recognition,energy collection,message transmission and human-machine interaction.This work presents significant prospects for crafting advanced materials and devices in the fields of wearable electronics,flexible skin and smart robots.
基金supported by the China Postdoctoral Science Foundation(No.2022BG011)the Fundamental Research Funds for Central Universities(No.2020CDJ-LHZZ-077)+1 种基金the Natural Science Foundation of Chongqing,China(No.c stc2020jcyj-msxmX0397)the Fundamental Research Funds for Central Universities(No.00007717).
文摘As the Internet of Things advances,gesture recognition emerges as a prominent domain in human-machine interaction(HMI).However,interactive wearables based on conductive hydrogels for individuals with single-arm functionality or disabilities remain underexplored.Here,we devised a wearable one-handed keyboard with gesture recognition,employing machine learning algorithms and hydrogel-based mechanical sensors to boost productivity.PCG(PAM/CMC/rGO)hydrogels are composed of polyacrylamide(PAM),sodium carboxymethyl cellulose(CMC),and reduced graphene oxide(rGO),which function as a strain,pressure sensor,and electrode material.The PAM chains offer the gel’s elasticity by covalent cross-linking,while the biocompatible CMC improves the dispersion of rGO and promotes electromechanical properties.Integrating rGO sheets into the polymer matrix facilitates cross-linking and generates supple-mentary conductive pathways,thereby augmenting the gel system’s elasticity,sensitivity,and durability.Our hydrogel sensors include high sensitivity(gage factor(GF)=8.18,395.6%-551.96%)and superior pressure sensing capabilities(Sensitivity(S)=0.3116 kPa^(-1),0-9.82 kPa).Furthermore,we developed a wearable keyboard with up to 98.13%accuracy using convolutional neural networks and a custom data acquisition system.This study establishes the groundwork for creating multifunctional gel sensors for intelligent machines,wearable devices,and brain-computer interfaces.
文摘With the popularization of social media,stickers have become an important tool for young students to express themselves and resist mainstream culture due to their unique visual and emotional expressiveness.Most existing studies focus on the negative impacts of spoof stickers,while paying insufficient attention to their positive functions.From the perspective of multimodal metaphor,this paper uses methods such as virtual ethnography and image-text analysis to clarify the connotation of stickers,understand the evolution of their digital dissemination forms,and explore the multiple functions of subcultural stickers in the social interactions between teachers and students.Young students use stickers to convey emotions and information.Their expressive function,social function,and cultural metaphor function progress in a progressive manner.This not only shapes students’values but also promotes self-expression and teacher-student interaction.It also reminds teachers to correct students’negative thoughts by using stickers,achieving the effect of“cultivating and influencing people through culture.”
文摘Aiming at the problems of traditional guide devices such as single environmental perception and poor terrain adaptability,this paper proposes an intelligent guide system based on a quadruped robot platform.Data fusion between millimeter-wave radar(with an accuracy of±0.1°)and an RGB-D camera is achieved through multisensor spatiotemporal registration technology,and a dataset suitable for guide dog robots is constructed.For the application scenario of edge-end guide dog robots,a lightweight CA-YOLOv11 target detection model integrated with an attention mechanism is innovatively adopted,achieving a comprehensive recognition accuracy of 95.8% in complex scenarios,which is 2.2% higher than that of the benchmark YOLOv11 network.The system supports navigation on complex terrains such as stairs(25 cm steps)and slopes(35°gradient),and the response time to sudden disturbances is shortened to 100 ms.Actual tests show that the navigation success rate reaches 95% in eight types of scenarios,the user satisfaction score is 4.8/5.0,and the cost is 50% lower than that of traditional guide dogs.
基金supported by the Zhejiang Provincial Natural Science Foundation of China(No.LQ23F030001)the National Natural Science Foundation of China(No.62406280)+5 种基金the Autism Research Special Fund of Zhejiang Foundation for Disabled Persons(No.2023008)the Liaoning Province Higher Education Innovative Talents Program Support Project(No.LR2019058)the Liaoning Province Joint Open Fund for Key Scientific and Technological Innovation Bases(No.2021-KF-12-05)the Central Guidance on Local Science and Technology Development Fund of Liaoning Province(No.2023JH6/100100066)the Key Laboratory for Biomedical Engineering of Ministry of Education,Zhejiang University,Chinain part by the Open Research Fund of the State Key Laboratory of Cognitive Neuroscience and Learning.
文摘Video action recognition(VAR)aims to analyze dynamic behaviors in videos and achieve semantic understanding.VAR faces challenges such as temporal dynamics,action-scene coupling,and the complexity of human interactions.Existing methods can be categorized into motion-level,event-level,and story-level ones based on spatiotemporal granularity.However,single-modal approaches struggle to capture complex behavioral semantics and human factors.Therefore,in recent years,vision-language models(VLMs)have been introduced into this field,providing new research perspectives for VAR.In this paper,we systematically review spatiotemporal hierarchical methods in VAR and explore how the introduction of large models has advanced the field.Additionally,we propose the concept of“Factor”to identify and integrate key information from both visual and textual modalities,enhancing multimodal alignment.We also summarize various multimodal alignment methods and provide in-depth analysis and insights into future research directions.
基金supported in part by the National Natural Science Foundation of China(U181321461773369+2 种基金61903360)the Selfplanned Project of the State Key Laboratory of Robotics(2020-Z12)China Postdoctoral Science Foundation funded project(2019M661155)。
文摘Electromyography(EMG)has already been broadly used in human-machine interaction(HMI)applications.Determining how to decode the information inside EMG signals robustly and accurately is a key problem for which we urgently need a solution.Recently,many EMG pattern recognition tasks have been addressed using deep learning methods.In this paper,we analyze recent papers and present a literature review describing the role that deep learning plays in EMG-based HMI.An overview of typical network structures and processing schemes will be provided.Recent progress in typical tasks such as movement classification,joint angle prediction,and force/torque estimation will be introduced.New issues,including multimodal sensing,inter-subject/inter-session,and robustness toward disturbances will be discussed.We attempt to provide a comprehensive analysis of current research by discussing the advantages,challenges,and opportunities brought by deep learning.We hope that deep learning can aid in eliminating factors that hinder the development of EMG-based HMI systems.Furthermore,possible future directions will be presented to pave the way for future research.
文摘Disentangling the influence of multiple signal components on receivers and elucidating general processes influencing complex signal evolution are difficult tasks. In this study we test mate preferences of female squirrel treefrogs Hyla squirella and female tungara frogs Physalaemus pustulosus for similar combinations of acoustic and visual components of their multimodal courtship signals. In a two-choice playback experiment with squirrel treefrogs, the visual stimulus of a male model significantly increased the attractivness of a relatively unattractive slow call rate. A previous study demonstrated that faster call rates are more attractive to female squirrel treefrogs, and all else being equal, models of male frogs with large body stripes are more attractive. In a similar experiment with female tungara frogs, the visual stimulus of a robotic frog failed to increase the attractiveness of a relatively unattractive call. Females also showed no preference for the distinct stripe on the robot that males commonly bear on their throat. Thus, features of conspicuous signal components such as body stripes are not universally important and signal function is likely to differ even among species with similar ecologies and communication systems. Finally, we discuss the putative information content of anuran signals and suggest that the categorization of redundant versus multiple messages may not be sufficient as a general explanation for the evolution of multimodal signaling. Instead of relying on untested assumptions concerning the information content of signals, we discuss the value of initially collecting comparative empirical data sets related to receiver responses.
基金the National Key R&D Program of China(2018YFB1004901)the Independent Innovation Team Project of Jinan City(2019GXRC013).
文摘Background Augmented reality classrooms have become an interesting research topic in the field of education,but there are some limitations.Firstly,most researchers use cards to operate experiments,and a large number of cards cause difficulty and inconvenience for users.Secondly,most users conduct experiments only in the visual modal,and such single-modal interaction greatly reduces the users'real sense of interaction.In order to solve these problems,we propose the Multimodal Interaction Algorithm based on Augmented Reality(ARGEV),which is based on visual and tactile feedback in Augmented Reality.In addition,we design a Virtual and Real Fusion Interactive Tool Suite(VRFITS)with gesture recognition and intelligent equipment.Methods The ARGVE method fuses gesture,intelligent equipment,and virtual models.We use a gesture recognition model trained by a convolutional neural network to recognize the gestures in AR,and to trigger a vibration feedback after a recognizing a five finger grasp gesture.We establish a coordinate mapping relationship between real hands and the virtual model to achieve the fusion of gestures and the virtual model.Results The average accuracy rate of gesture recognition was 99.04%.We verify and apply VRFITS in the Augmented Reality Chemistry Lab(ARCL),and the overall operation load of ARCL is thus reduced by 29.42%,in comparison to traditional simulation virtual experiments.Conclusions We achieve real-time fusion of the gesture,virtual model,and intelligent equipment in ARCL.Compared with the NOBOOK virtual simulation experiment,ARCL improves the users'real sense of operation and interaction efficiency.
文摘Speech recognition rate will deteriorate greatly in human-machine interaction when the speaker's speech mixes with a bystander's voice. This paper proposes a time-frequency approach for Blind Source Seperation (BSS) for intelligent Human-Machine Interaction(HMI). Main idea of the algorithm is to simultaneously diagonalize the correlation matrix of the pre-whitened signals at different time delays for every frequency bins in time-frequency domain. The prososed method has two merits: (1) fast convergence speed; (2) high signal to interference ratio of the separated signals. Numerical evaluations are used to compare the performance of the proposed algorithm with two other deconvolution algorithms. An efficient algorithm to resolve permutation ambiguity is also proposed in this paper. The algorithm proposed saves more than 10% of computational time with properly selected parameters and achieves good performances for both simulated convolutive mixtures and real room recorded speeches.
基金NSFC-Shenzhen Robotics Research Center Project(No.U2013207)the Beijing Science and Technology Plan Project(No.Z191100008019008)。
文摘Teleoperation is of great importance in the area of robotics,especially when people are unavailable in the robot workshop.It provides a way for people to control robots remotely using human intelligence.In this paper,a robotic teleoperation system for precise robotic manipulation is established.The data glove and the 7-degrees of freedom(DOFs)force feedback controller are used for the remote control interaction.The control system and the monitor system are designed for the remote precise manipulation.The monitor system contains an image acquisition system and a human-machine interaction module,and aims to simulate and detect the robot running state.Besides,a visual object tracking algorithm is developed to estimate the states of the dynamic system from noisy observations.The established robotic teleoperation systemis applied to a series of experiments,and high-precision results are obtained,showing the effectiveness of the physical system.
基金Supported by the National Key Research and Development Plan(2016YFB1001200)the Natural Science Foundation of China(U1435220,61232013)Natural Science Research Projects of Universities in Jiangsu Province(16KJA520003)
文摘Biography videos based on life performances of prominent figures in history aim to describe great mens' life.In this paper,a novel interactive video summarization for biography video based on multimodal fusion is proposed,which is a novel approach of visualizing the specific features for biography video and interacting with video content by taking advantage of the ability of multimodality.In general,a story of movie progresses by dialogues of characters and the subtitles are produced with the basis on the dialogues which contains all the information related to the movie.In this paper,JGibbsLDA is applied to extract key words from subtitles because the biography video consists of different aspects to depict the characters' whole life.In terms of fusing keywords and key-frames,affinity propagation is adopted to calculate the similarity between each key-frame cluster and keywords.Through the method mentioned above,a video summarization is presented based on multimodal fusion which describes video content more completely.In order to reduce the time spent on searching the interest video content and get the relationship between main characters,a kind of map is adopted to visualize video content and interact with video summarization.An experiment is conducted to evaluate video summarization and the results demonstrate that this system can formally facilitate the exploration of video content while improving interaction and finding events of interest efficiently.
基金State Key Development Program in 14th Five-Year under Grant No.2021YFF0900701.
文摘Continuous emotion recognition is to predict emotion states through affective information and more focus on the continuous variation of emotion. Fusion of electroencephalography (EEG) and facial expressions videos has been used in this field, while there are with some limitations in current researches, such as hand-engineered features, simple approaches to integration. Hence, a new continuous emotion recognition model is proposed based on the fusion of EEG and facial expressions videos named residual multimodal Transformer (RMMT). Firstly, the Resnet50 and temporal convolutional network (TCN) are utilised to extract spatiotemporal features from videos, and the TCN is also applied to process the computed EEG frequency power to acquire spatiotemporal features of EEG. Then, a multimodal Transformer is used to fuse the spatiotemporal features from the two modalities. Furthermore, a residual connection is introduced to fuse shallow features with deep features which is verified to be effective for continuous emotion recognition through experiments. Inspired by knowledge distillation, the authors incorporate feature-level loss into the loss function to further enhance the network performance. Experimental results show that the RMMT reaches a superior performance over other methods for the MAHNOB-HCI dataset. Ablation studies on the residual connection and loss function in the RMMT demonstrate that both of them is functional.
基金Supported by the National Natural Science Foundation of China (No. 40071071).
文摘The fusion of VlSI (visual identity system Internet), digital maps and Web GIS is presented. Web GIS interface interactive design with VISI needs to consider more new factors. VISI can provide the design principle, elements and contents for the Web GIS. The design of the Wuhan Bus Search System is fulfilled to confirm the validity and practicability of the fusion.
基金Supported by the‘Automotive Glazing Application in Intelligent Cockpit Human-Machine Interface’project(SKHX2021049)a collaboration between the Saint-Go Bain Research and the Beijing Normal University。
文摘Background With an increasing number of vehicles becoming autonomous,intelligent,and connected,paying attention to the future usage of car human-machine interface with these vehicles should become more relevant.Several studies have addressed car HMI but were less attentive to designing and implementing interactive glazing for every day(autonomous)driving contexts.Methods Reflecting on the literature,we describe an engineering psychology practice and the design of six novel future user scenarios,which envision the application of a specific set of augmented reality(AR)support user interactions.Additionally,we conduct evaluations on specific scenarios and experiential prototypes,which reveal that these AR scenarios aid the target user groups in experiencing a new type of interaction.The overall evaluation is positive with valuable assessment results and suggestions.Conclusions This study can interest applied psychology educators who aspire to teach how AR can be operationalized in a human-centered design process to students with minimal pre-existing expertise or minimal scientific knowledge in engineering psychology.
基金supported by the National Natural Science Foundation of China(Grant Nos.52222313,22075296,52321006,T2394480,and T2394484)the National Key R&D Program of China(Grant Nos.2023YFE0111500,2021YFB3200701,and 2022YFB4700804)+1 种基金Beijing National Laboratory for Molecular Sciences(Grant No.BNLMSCXXM-202005)Beijing Municipal Science&Technology Commission(Grant No.Z231100005923039).
文摘Surgical robots are designed to provide enhanced precision and dexterity compared to manual surgical procedures,which mainly rely on multimodal sensing technologies for the surgeon to seamlessly operate the robotic arms and instruments.Compared with single-mode sensors,optical and mechanical bi-modal sensors provide improved precision,enhanced safety,and robustness of human-machine interaction systems.Here,the template-guided and pneumatic printing technologies are combined to construct perovskite and graphene parallel structures with both optical and mechanical sensing capabilities.The printed uniformly crystallized perovskite microstructure exhibits fast and sensitive photoelectric response characteristics,enabling shadow recognition functionality.The combination of graphene and elastic rubber endows the great printability to prepare parallel structures near the perovskite arrays for force sensing capabilities.Thus,the printed perovskite and graphene structures possess non-contact optical sensing capabilities to detect hand position by recognizing shadows between the hand and the sensor,as well as contact mechanical sensing capabilities to detect touch force applied by the hand.It provides a synergistic platform for real-time and multidimensional feedback to improve human-machine interaction.
基金National Natural Science Foundation of China,No.42361040。
文摘Population migration data derived from location-based services has often been used to delineate population flows between cities or construct intercity relationship networks to reveal and explore the complex interaction patterns underlying human activities.Nevertheless,the inherent heterogeneity in multimodal migration big data has been ignored.This study conducts an in-depth comparison and quantitative analysis through a comprehensive lens of spatial association.Initially,the intercity interactive networks in China were constructed,utilizing migration data from Baidu and AutoNavi collected during the same time period.Subsequently,the characteristics and spatial structure similarities of the two types of intercity interactive networks were quantitatively assessed and analyzed from overall(network)and local(node)perspectives.Furthermore,the precision of these networks at the local scale is corroborated by constructing an intercity network from mobile phone(MP)data.Results indicate that the intercity interactive networks in China,as delineated by Baidu and AutoNavi migration flows,exhibit a high degree of structure equivalence.The correlation coefficient between these two networks is 0.874.Both networks exhibit a pronounced spatial polarization trend and hierarchical structure.This is evident in their distinct core and peripheral structures,as well as in the varying importance and influence of different nodes within the networks.Nevertheless,there are notable differences worthy of attention.Baidu intercity interactive network exhibits pronounced cross-regional effects,and its high-level interactions are characterized by a“rich-club”phenomenon.The AutoNavi intercity interactive network presents a more significant distance attenuation effect,and the high-level interactions display a gradient distribution pattern.Notably,there exists a substantial correlation between the AutoNavi and MP networks at the local scale,evidenced by a high correlation coefficient of 0.954.Furthermore,the“spatial dislocations”phenomenon was observed within the spatial structures at different levels,extracted from the Baidu and AutoNavi intercity networks.However,the measured results of network spatial structure similarity from three dimensions,namely,node location,node size,and local structure,indicate a relatively high similarity and consistency between the two networks.