Audio-visual speech recognition(AVSR),which integrates audio and visual modalities to improve recognition performance and robustness in noisy or adverse acoustic conditions,has attracted significant research interest....Audio-visual speech recognition(AVSR),which integrates audio and visual modalities to improve recognition performance and robustness in noisy or adverse acoustic conditions,has attracted significant research interest.However,Conformer-based architectures remain computational expensive due to the quadratic increase in the spatial and temporal complexity of their softmax-based attention mechanisms with sequence length.In addition,Conformerbased architectures may not provide sufficient flexibility for modeling local dependencies at different granularities.To mitigate these limitations,this study introduces a novel AVSR framework based on a ReLU-based Sparse and Grouped Conformer(RSG-Conformer)architecture.Specifically,we propose a Global-enhanced Sparse Attention(GSA)module incorporating an efficient context restoration block to recover lost contextual cues.Concurrently,a Grouped-scale Convolution(GSC)module replaces the standard Conformer convolution module,providing adaptive local modeling across varying temporal resolutions.Furthermore,we integrate a Refined Intermediate Contextual CTC(RIC-CTC)supervision strategy.This approach applies progressively increasing loss weights combined with convolution-based context aggregation,thereby further relaxing the constraint of conditional independence inherent in standard CTC frameworks.Evaluations on the LRS2 and LRS3 benchmark validate the efficacy of our approach,with word error rates(WERs)reduced to 1.8%and 1.5%,respectively.These results further demonstrate and validate its state-of-the-art performance in AVSR tasks.展开更多
With the increasingly prominent trend of globalization,English,as the common language of international communication,plays an increasingly important role in university education.As a key link in English teaching,the c...With the increasingly prominent trend of globalization,English,as the common language of international communication,plays an increasingly important role in university education.As a key link in English teaching,the college English audio-visual oral course not only imparts language knowledge and skills,but also shoulders the important task of cultivating students’critical thinking.As one of the essential core qualities of modern talents,critical thinking ability plays an irreplaceable role in students’in-depth understanding of English knowledge,improving intercultural communication ability and cultivating innovative thinking.This paper expounds the significance of cultivating students’critical thinking ability in college English audio-visual and oral teaching,and puts forward a series of innovative teaching strategies to cultivate students’critical thinking ability combined with practical teaching experience and cutting-edge education theory,in order to provide new ideas and practical guidance for the improvement of college English teaching quality and the development of students’comprehensive quality.展开更多
This paper proposes a robust control-oriented identification method for errors-in-variables(EIV)systems in output feedbacks using frequency-response(FR)experimental data.An important relation between such a closed-loo...This paper proposes a robust control-oriented identification method for errors-in-variables(EIV)systems in output feedbacks using frequency-response(FR)experimental data.An important relation between such a closed-loop EIV system and its coprime factor(CF)uncertainty description is first derived,based on which the FR measurements suitable for plant CF identification are able to be generated.Different factorizations of a given controller in the closed-loop system can be made best use to adjust right coprime factors(RCFs)of the plant so as to realize an improvement on the signal-to-noise ratio of identification experimental data.Subsequently,a nominal RCF model is estimated by linear matrix inequalities from the applicable FR measurements and its associated worst-case errors are quantified from a priori and a posteriori information on the underlying system.A resulting RCF perturbation model set can then be described by the nominal RCF model and its worst-case error bounds.Such a model set capable of being stabilized by the given controller is ready for its robust stabilizing controller redesign and robust performance analysis.Finally,a numerical simulation is given to show the efficacy of the proposed identification method.展开更多
To enhance the accuracy of short-term photovoltaic power output prediction and address issues such as insufficient spatial resolution of meteorological forecast data and weak generalization ability of models,this pape...To enhance the accuracy of short-term photovoltaic power output prediction and address issues such as insufficient spatial resolution of meteorological forecast data and weak generalization ability of models,this paper proposes a prediction method that integrates spatial downscaling meteorological data with a convolutional neural network(CNN)-iTransformer-long short-term memory(LSTM)model.First,the rime-optimized random forest regression algorithm(RIME-RF)is employed to perform spatial downscaling on numerical weather prediction(NWP)data,thereby improving its local applicability.Second,a CNN-iTransformer-LSTM hybrid prediction model is constructed.This model utilizes a CNN as a spatial feature extractor to capture local patterns in meteorological data,employs an iTransformer to model the global dependencies among multiple variables,and leverages an LSTM to enhance the learning of short-term temporal dynamic features,thereby achieving efficient collaborative mining of multi-scale features.Finally,experiments are conducted using actual data from a photovoltaic power station in Hebei,China,during various seasons and weather conditions.The results show that the proposed model outperforms the comparison models in terms of the root mean square error(RMSE),mean absolute error(MAE),and R2,maintaining high prediction accuracy and stability even under complex weather conditions such as overcast and rainy days.The downscaling process further enhances the prediction performance,verifying the effectiveness and practicality of this method.展开更多
Audio-visual learning,aimed at exploiting the relationship between audio and visual modalities,has drawn considerable attention since deep learning started to be used successfully.Researchers tend to leverage these tw...Audio-visual learning,aimed at exploiting the relationship between audio and visual modalities,has drawn considerable attention since deep learning started to be used successfully.Researchers tend to leverage these two modalities to improve the performance of previously considered single-modality tasks or address new challenging problems.In this paper,we provide a comprehensive survey of recent audio-visual learning development.We divide the current audio-visual learning tasks into four different subfields:audiovisual separation and localization,audio-visual correspondence learning,audio-visual generation,and audio-visual representation learning.State-of-the-art methods,as well as the remaining challenges of each subfield,are further discussed.Finally,we summarize the commonly used datasets and challenges.展开更多
Video data are composed of multimodal information streams including visual, auditory and textual streams, so an approach of story segmentation for news video using multimodal analysis is described in this paper. The p...Video data are composed of multimodal information streams including visual, auditory and textual streams, so an approach of story segmentation for news video using multimodal analysis is described in this paper. The proposed approach detects the topic-caption frames, and integrates them with silence clips detection results, as well as shot segmentation results to locate the news story boundaries. The integration of audio-visual features and text information overcomes the weakness of the approach using only image analysis techniques. On test data with 135 400 frames, when the boundaries between news stories are detected, the accuracy rate 85.8% and the recall rate 97.5% are obtained. The experimental results show the approach is valid and robust.展开更多
In response to the evolving challenges posed by small unmanned aerial vehicles(UAVs),which have the potential to transport harmful payloads or cause significant damage,we present AV-FDTI,an innovative Audio-Visual Fus...In response to the evolving challenges posed by small unmanned aerial vehicles(UAVs),which have the potential to transport harmful payloads or cause significant damage,we present AV-FDTI,an innovative Audio-Visual Fusion system designed for Drone Threat Identification.AV-FDTI leverages the fusion of audio and omnidirectional camera feature inputs,providing a comprehensive solution to enhance the precision and resilience of drone classification and 3D localization.Specifically,AV-FDTI employs a CRNN network to capture vital temporal dynamics within the audio domain and utilizes a pretrained ResNet50 model for image feature extraction.Furthermore,we adopt a visual information entropy and cross-attention-based mechanism to enhance the fusion of visual and audio data.Notably,our system is trained based on automated Leica tracking annotations,offering accurate ground truth data with millimeter-level accuracy.Comprehensive comparative evaluations demonstrate the superiority of our solution over the existing systems.In our commitment to advancing this field,we will release this work as open-source code and wearable AV-FDTI design,contributing valuable resources to the research community.展开更多
This paper is dedicated to a thorough review on the audio-visual related translations from both home and abroad.In reviewing the foreign achievements on this specific field of translation studies it can shed some ligh...This paper is dedicated to a thorough review on the audio-visual related translations from both home and abroad.In reviewing the foreign achievements on this specific field of translation studies it can shed some lights on our national audio-visual practice and research.The review on the Chinese scholars’ audio-visual translation studies is to offer the potential developing direction and guidelines to the studies and aspects neglected as well.Based on the summary of relevant studies,possible topics for further studies are proposed.展开更多
Emotion recognition has become an important task of modern human-computer interac- tion. A multilayer boosted HMM ( MBHMM ) classifier for automatic audio-visual emotion recognition is presented in this paper. A mod...Emotion recognition has become an important task of modern human-computer interac- tion. A multilayer boosted HMM ( MBHMM ) classifier for automatic audio-visual emotion recognition is presented in this paper. A modified Baum-Welch algorithm is proposed for component HMM learn- ing and adaptive boosting (AdaBoost) is used to train ensemble classifiers for different layers (cues). Except for the first layer, the initial weights of training samples in current layer are decided by recognition results of the ensemble classifier in the upper layer. Thus the training procedure using current cue can focus more on the difficult samples according to the previous cue. Our MBHMM clas- sifier is combined by these ensemble classifiers and takes advantage of the complementary informa- tion from multiple cues and modalities. Experimental results on audio-visual emotion data collected in Wizard of Oz scenarios and labeled under two types of emotion category sets demonstrate that our approach is effective and promising.展开更多
February 10 (US Central Time), 2019, China National Peking Opera Company (CNPOC) and the Hubei Chime Bells National Chinese Orchestra presented a fantastic audio-visual performance of Chinese Peking Opera and Chinese ...February 10 (US Central Time), 2019, China National Peking Opera Company (CNPOC) and the Hubei Chime Bells National Chinese Orchestra presented a fantastic audio-visual performance of Chinese Peking Opera and Chinese chime bells for the American audience at the world s top-level Buntrock Hall at Symphony Center.展开更多
Mongolian audio-visual works are an important carrier of exploring the true significance to this national culture.This paper believes that the Mongolian people in Inner Mongolia constantly enhance the individual sense...Mongolian audio-visual works are an important carrier of exploring the true significance to this national culture.This paper believes that the Mongolian people in Inner Mongolia constantly enhance the individual sense of identity to the overall ethnic group through the influence of film and television and music,and on this basis constantly evolve a new culture in line with modern and contemporary life to further enhance their sense of belonging to the ethnic nation.展开更多
Based on the current situation of college audio-visual English teaching in China, this article points out that the avoidance in class is a serious phenomenon in the process of college audio-visual English teaching. Af...Based on the current situation of college audio-visual English teaching in China, this article points out that the avoidance in class is a serious phenomenon in the process of college audio-visual English teaching. After further analysis and combination with the characteristics of college English audio-visual teaching in China, it puts forward the application of task-based teaching method to college audio-visual English teaching and its steps, attempting to alleviate the avoidance phenomenon in students through task-based teaching method.展开更多
Zhuang culture,a representative of the native ethnic culture of Guangxi,China,is of great significance to Chinese culture.In order to promote traditional culture,enrich the teaching content of College English Audio-Vi...Zhuang culture,a representative of the native ethnic culture of Guangxi,China,is of great significance to Chinese culture.In order to promote traditional culture,enrich the teaching content of College English Audio-Visual Speaking Course,and enhance the intercultural communication ability of college students,this paper,from a multicultural perspective,explores the classroom practices of integrating indigenous Zhuang cultural elements in College English Audio-Visual Speaking Course,providing new perspectives and reference for multicultural education in foreign languages.展开更多
By distinguishing the differences between audio-visual interpretation and visual interpretation, it is clear that the two belong to different categories in essence and working methods, in order to avoid misunderstandi...By distinguishing the differences between audio-visual interpretation and visual interpretation, it is clear that the two belong to different categories in essence and working methods, in order to avoid misunderstanding and confusion between the two in learning. At the same time, there are some misconceptions in their teaching methods. This paper explores the teaching methods of visual interpretation and audio-visual interpretation, which will make them more reasonable and scientific in the teaching process.展开更多
The object-based scalable coding in MPEG-4 is investigated, and a prioritized transmission scheme of MPEG-4 audio-visual objects (AVOs) over the DiffServ network with the QoS guarantee is proposed. MPEG-4 AVOs are e...The object-based scalable coding in MPEG-4 is investigated, and a prioritized transmission scheme of MPEG-4 audio-visual objects (AVOs) over the DiffServ network with the QoS guarantee is proposed. MPEG-4 AVOs are extracted and classified into different groups according to their priority values and scalable layers (visual importance). These priority values are mapped to the 1P DiffServ per hop behaviors (PHB). This scheme can selectively discard packets with low importance, in order to avoid the network congestion. Simulation results show that the quality of received video can gracefully adapt to network state, as compared with the ‘best-effort' manner. Also, by allowing the content provider to define prioritization of each audio-visual object, the adaptive transmission of object-based scalable video can be customized based on the content.展开更多
The output feedback active disturbance rejection control of a valve-controlled cylinder electro-hydraulic servo system is investigated in this paper.First,a comprehensive nonlinear mathematical model that encompasses ...The output feedback active disturbance rejection control of a valve-controlled cylinder electro-hydraulic servo system is investigated in this paper.First,a comprehensive nonlinear mathematical model that encompasses both matched and mismatched disturbances is formulated.Due to the fact that only position information can be measured,a linear Extended State Observer(ESO)is introduced to estimate unknown states and matched disturbances,while a dedicated disturbance observer is constructed to estimate mismatched disturbances.Different from the traditional observer results,the design of the disturbance observer used in this study is carried out under the constraint of output feedback.Furthermore,an output feedback nonlinear controller is proposed leveraging the aforementioned observers to achieve accurate trajectory tracking.To mitigate the inherent differential explosion problem of the traditional backstepping framework,a finite-time stable command filter is incorporated.Simultaneously,considering transient filtering errors,a set of error compensation signals are designed to counter their negative impact effectively.Theoretical analysis affirms that the proposed control strategy ensures the boundedness of all signals within the closed-loop system.Additionally,under the specific condition of only time-invariant disturbances in the system,the conclusion of asymptotic stability is established.Finally,the algorithm’s efficacy is validated through comparative experiments.展开更多
Fueled by the increasing imperative for sustainable energy solutions and the burgeoning emphasis on health awareness,self-powered techniques have undergone notable strides in advancement.Triboelectric nanogenerators(T...Fueled by the increasing imperative for sustainable energy solutions and the burgeoning emphasis on health awareness,self-powered techniques have undergone notable strides in advancement.Triboelectric nanogenerators(TENGs)stand out as a prominent device capitalizing on the principles of triboelectrification and electrostatic induction to generate electricity or electrical signals.In efforts to augment the electrical output performance of TENGs and broaden their range of applications,researchers have endeavored to refine materials,surface morphology,and structural design.Among them,physical morphological modifications play a pivotal role in enhancing the electrical properties of TENGs by increasing the contact surface area,which can be achieved by building micro-/nano-structures on the surface or inside the friction material.In this review,we summarize the common morphologies of TENGs,categorize the morphologies into surface and internal structures,and elucidate their roles in enhancing the electric output performance of devices.Moreover,we systematically classify the methodologies employed for morphological preparation into physical and chemical approaches,thereby furnishing a comprehensive survey of the diverse techniques.Subsequently,typical applications of TENGs with special morphology divided by energy harvesting and self-powered sensors are presented.Finally,an overview of the challenges and future trajectories pertinent to TENGs is conducted.Through this endeavor,the aim of this article is to catalyze the evolution of further strategies for enhancing performance of TENGs.展开更多
Dear Editor,This letter studies output consensus problem of heterogeneous linear multiagent systems over directed graphs. A novel adaptive dynamic event-triggered controller is presented based only on the feedback com...Dear Editor,This letter studies output consensus problem of heterogeneous linear multiagent systems over directed graphs. A novel adaptive dynamic event-triggered controller is presented based only on the feedback combination of the agent's own state and neighbors' output,which can achieve exponential output consensus through intermittent communication. The controller is obtained by solving two linear matrix equations, and Zeno behavior is excluded.展开更多
In this paper, we proposed an output voltage stabilization of a DC-DC Zeta converter using hybrid control. We modeled the Zeta converter under continuous conduction mode operation. We derived a switching control law t...In this paper, we proposed an output voltage stabilization of a DC-DC Zeta converter using hybrid control. We modeled the Zeta converter under continuous conduction mode operation. We derived a switching control law that brings the output voltage to the desired level. Due to infinite switching occurring at the desired level, we enhanced the switching control law by allowing a sizeable output voltage ripple. We derived mathematical models that allow one to choose the desired switching frequency. In practice, the existence of the non-ideal properties of the Zeta converter results in steady-state output voltage error. By analyzing the power loss in the zeta converter, we proposed an improved switching control law that eliminates the steady-state output voltage error. The effectiveness of the proposed method is illustrated with simulation results.展开更多
In this paper,a pair of dynamic high-gain observer and output feedback controller is proposed for nonlinear systems with multiple unknown time delays.By constructing Lyapunov-Krasovskii functionals,it shows that globa...In this paper,a pair of dynamic high-gain observer and output feedback controller is proposed for nonlinear systems with multiple unknown time delays.By constructing Lyapunov-Krasovskii functionals,it shows that global state asymptotic regulation can be ensured by introducing a single dynamic gain;furthermore,global asymptotic stabilization can be achieved by choosing a sufficiently large static scaling gain when the upper bounds of all system parameters are known.Especially,the output coefficient is allowed to be non-differentiable with unknown upper bound.This paper proposes a generalized Lyapunov matrix inequality based dynamic-gain scaling method,which significantly simplifies the design computational complexity by comparing with the classic backstepping method.展开更多
基金supported in part by the National Natural Science Foundation of China:61773330.
文摘Audio-visual speech recognition(AVSR),which integrates audio and visual modalities to improve recognition performance and robustness in noisy or adverse acoustic conditions,has attracted significant research interest.However,Conformer-based architectures remain computational expensive due to the quadratic increase in the spatial and temporal complexity of their softmax-based attention mechanisms with sequence length.In addition,Conformerbased architectures may not provide sufficient flexibility for modeling local dependencies at different granularities.To mitigate these limitations,this study introduces a novel AVSR framework based on a ReLU-based Sparse and Grouped Conformer(RSG-Conformer)architecture.Specifically,we propose a Global-enhanced Sparse Attention(GSA)module incorporating an efficient context restoration block to recover lost contextual cues.Concurrently,a Grouped-scale Convolution(GSC)module replaces the standard Conformer convolution module,providing adaptive local modeling across varying temporal resolutions.Furthermore,we integrate a Refined Intermediate Contextual CTC(RIC-CTC)supervision strategy.This approach applies progressively increasing loss weights combined with convolution-based context aggregation,thereby further relaxing the constraint of conditional independence inherent in standard CTC frameworks.Evaluations on the LRS2 and LRS3 benchmark validate the efficacy of our approach,with word error rates(WERs)reduced to 1.8%and 1.5%,respectively.These results further demonstrate and validate its state-of-the-art performance in AVSR tasks.
基金A Study on the Teaching Reform of College English Audio-Visual Oral Course Oriented towards the Cultivation of Critical Thinking Ability(2501032339)。
文摘With the increasingly prominent trend of globalization,English,as the common language of international communication,plays an increasingly important role in university education.As a key link in English teaching,the college English audio-visual oral course not only imparts language knowledge and skills,but also shoulders the important task of cultivating students’critical thinking.As one of the essential core qualities of modern talents,critical thinking ability plays an irreplaceable role in students’in-depth understanding of English knowledge,improving intercultural communication ability and cultivating innovative thinking.This paper expounds the significance of cultivating students’critical thinking ability in college English audio-visual and oral teaching,and puts forward a series of innovative teaching strategies to cultivate students’critical thinking ability combined with practical teaching experience and cutting-edge education theory,in order to provide new ideas and practical guidance for the improvement of college English teaching quality and the development of students’comprehensive quality.
文摘This paper proposes a robust control-oriented identification method for errors-in-variables(EIV)systems in output feedbacks using frequency-response(FR)experimental data.An important relation between such a closed-loop EIV system and its coprime factor(CF)uncertainty description is first derived,based on which the FR measurements suitable for plant CF identification are able to be generated.Different factorizations of a given controller in the closed-loop system can be made best use to adjust right coprime factors(RCFs)of the plant so as to realize an improvement on the signal-to-noise ratio of identification experimental data.Subsequently,a nominal RCF model is estimated by linear matrix inequalities from the applicable FR measurements and its associated worst-case errors are quantified from a priori and a posteriori information on the underlying system.A resulting RCF perturbation model set can then be described by the nominal RCF model and its worst-case error bounds.Such a model set capable of being stabilized by the given controller is ready for its robust stabilizing controller redesign and robust performance analysis.Finally,a numerical simulation is given to show the efficacy of the proposed identification method.
文摘To enhance the accuracy of short-term photovoltaic power output prediction and address issues such as insufficient spatial resolution of meteorological forecast data and weak generalization ability of models,this paper proposes a prediction method that integrates spatial downscaling meteorological data with a convolutional neural network(CNN)-iTransformer-long short-term memory(LSTM)model.First,the rime-optimized random forest regression algorithm(RIME-RF)is employed to perform spatial downscaling on numerical weather prediction(NWP)data,thereby improving its local applicability.Second,a CNN-iTransformer-LSTM hybrid prediction model is constructed.This model utilizes a CNN as a spatial feature extractor to capture local patterns in meteorological data,employs an iTransformer to model the global dependencies among multiple variables,and leverages an LSTM to enhance the learning of short-term temporal dynamic features,thereby achieving efficient collaborative mining of multi-scale features.Finally,experiments are conducted using actual data from a photovoltaic power station in Hebei,China,during various seasons and weather conditions.The results show that the proposed model outperforms the comparison models in terms of the root mean square error(RMSE),mean absolute error(MAE),and R2,maintaining high prediction accuracy and stability even under complex weather conditions such as overcast and rainy days.The downscaling process further enhances the prediction performance,verifying the effectiveness and practicality of this method.
基金supported by National Key Research and Development Program of China(No.2016YFB1001001)Beijing Natural Science Foundation(No.JQ18017)National Natural Science Foundation of China(No.61976002)。
文摘Audio-visual learning,aimed at exploiting the relationship between audio and visual modalities,has drawn considerable attention since deep learning started to be used successfully.Researchers tend to leverage these two modalities to improve the performance of previously considered single-modality tasks or address new challenging problems.In this paper,we provide a comprehensive survey of recent audio-visual learning development.We divide the current audio-visual learning tasks into four different subfields:audiovisual separation and localization,audio-visual correspondence learning,audio-visual generation,and audio-visual representation learning.State-of-the-art methods,as well as the remaining challenges of each subfield,are further discussed.Finally,we summarize the commonly used datasets and challenges.
文摘Video data are composed of multimodal information streams including visual, auditory and textual streams, so an approach of story segmentation for news video using multimodal analysis is described in this paper. The proposed approach detects the topic-caption frames, and integrates them with silence clips detection results, as well as shot segmentation results to locate the news story boundaries. The integration of audio-visual features and text information overcomes the weakness of the approach using only image analysis techniques. On test data with 135 400 frames, when the boundaries between news stories are detected, the accuracy rate 85.8% and the recall rate 97.5% are obtained. The experimental results show the approach is valid and robust.
基金National Research Foundation,Singapore,under its Medium-Sized Center for Advanced Robotics Technology Innovation(CARTIN)under project WP5 within the Delta-NTU Corporate Lab with funding support from A*STAR under its IAF-ICP program(Grant no:I2201E0013)and Delta Electronics Inc.
文摘In response to the evolving challenges posed by small unmanned aerial vehicles(UAVs),which have the potential to transport harmful payloads or cause significant damage,we present AV-FDTI,an innovative Audio-Visual Fusion system designed for Drone Threat Identification.AV-FDTI leverages the fusion of audio and omnidirectional camera feature inputs,providing a comprehensive solution to enhance the precision and resilience of drone classification and 3D localization.Specifically,AV-FDTI employs a CRNN network to capture vital temporal dynamics within the audio domain and utilizes a pretrained ResNet50 model for image feature extraction.Furthermore,we adopt a visual information entropy and cross-attention-based mechanism to enhance the fusion of visual and audio data.Notably,our system is trained based on automated Leica tracking annotations,offering accurate ground truth data with millimeter-level accuracy.Comprehensive comparative evaluations demonstrate the superiority of our solution over the existing systems.In our commitment to advancing this field,we will release this work as open-source code and wearable AV-FDTI design,contributing valuable resources to the research community.
文摘This paper is dedicated to a thorough review on the audio-visual related translations from both home and abroad.In reviewing the foreign achievements on this specific field of translation studies it can shed some lights on our national audio-visual practice and research.The review on the Chinese scholars’ audio-visual translation studies is to offer the potential developing direction and guidelines to the studies and aspects neglected as well.Based on the summary of relevant studies,possible topics for further studies are proposed.
基金Supported by the National Natural Science Foundation of China(60905006)the NSFC-Guangdong Joint Fund(U1035004)
文摘Emotion recognition has become an important task of modern human-computer interac- tion. A multilayer boosted HMM ( MBHMM ) classifier for automatic audio-visual emotion recognition is presented in this paper. A modified Baum-Welch algorithm is proposed for component HMM learn- ing and adaptive boosting (AdaBoost) is used to train ensemble classifiers for different layers (cues). Except for the first layer, the initial weights of training samples in current layer are decided by recognition results of the ensemble classifier in the upper layer. Thus the training procedure using current cue can focus more on the difficult samples according to the previous cue. Our MBHMM clas- sifier is combined by these ensemble classifiers and takes advantage of the complementary informa- tion from multiple cues and modalities. Experimental results on audio-visual emotion data collected in Wizard of Oz scenarios and labeled under two types of emotion category sets demonstrate that our approach is effective and promising.
文摘February 10 (US Central Time), 2019, China National Peking Opera Company (CNPOC) and the Hubei Chime Bells National Chinese Orchestra presented a fantastic audio-visual performance of Chinese Peking Opera and Chinese chime bells for the American audience at the world s top-level Buntrock Hall at Symphony Center.
基金This paper is the periodic research result of the research project:Basic Research Project of Beijing Institute of Graphic Communication:Research on the Artistic,Modern Communication and Publishing of Dian-shi Zhai Pictorial(1884-1898)(Serial Number Eb202008).
文摘Mongolian audio-visual works are an important carrier of exploring the true significance to this national culture.This paper believes that the Mongolian people in Inner Mongolia constantly enhance the individual sense of identity to the overall ethnic group through the influence of film and television and music,and on this basis constantly evolve a new culture in line with modern and contemporary life to further enhance their sense of belonging to the ethnic nation.
文摘Based on the current situation of college audio-visual English teaching in China, this article points out that the avoidance in class is a serious phenomenon in the process of college audio-visual English teaching. After further analysis and combination with the characteristics of college English audio-visual teaching in China, it puts forward the application of task-based teaching method to college audio-visual English teaching and its steps, attempting to alleviate the avoidance phenomenon in students through task-based teaching method.
基金supported by Guangxi University of Chinese Medicine School-Level Education and Teaching Reform and Research Project:Integration and Innovative Practice of Ideological and Political Education and Zhuang Ethnic Culture in College English Audio-Visual Speaking Course(Project No.2022B073).
文摘Zhuang culture,a representative of the native ethnic culture of Guangxi,China,is of great significance to Chinese culture.In order to promote traditional culture,enrich the teaching content of College English Audio-Visual Speaking Course,and enhance the intercultural communication ability of college students,this paper,from a multicultural perspective,explores the classroom practices of integrating indigenous Zhuang cultural elements in College English Audio-Visual Speaking Course,providing new perspectives and reference for multicultural education in foreign languages.
文摘By distinguishing the differences between audio-visual interpretation and visual interpretation, it is clear that the two belong to different categories in essence and working methods, in order to avoid misunderstanding and confusion between the two in learning. At the same time, there are some misconceptions in their teaching methods. This paper explores the teaching methods of visual interpretation and audio-visual interpretation, which will make them more reasonable and scientific in the teaching process.
文摘The object-based scalable coding in MPEG-4 is investigated, and a prioritized transmission scheme of MPEG-4 audio-visual objects (AVOs) over the DiffServ network with the QoS guarantee is proposed. MPEG-4 AVOs are extracted and classified into different groups according to their priority values and scalable layers (visual importance). These priority values are mapped to the 1P DiffServ per hop behaviors (PHB). This scheme can selectively discard packets with low importance, in order to avoid the network congestion. Simulation results show that the quality of received video can gracefully adapt to network state, as compared with the ‘best-effort' manner. Also, by allowing the content provider to define prioritization of each audio-visual object, the adaptive transmission of object-based scalable video can be customized based on the content.
基金supported by the National Key R&D Program of China(No.2021YFB2011300)the Special Funds Project for the Transformation of Scientific and Technological Achievements of Jiangsu Province,China(No.BA2023039)+1 种基金the National Natural Science Foundation of China(No.52075262)the Fundamental Research Funds for the Central Universities,China(No.30922010706).
文摘The output feedback active disturbance rejection control of a valve-controlled cylinder electro-hydraulic servo system is investigated in this paper.First,a comprehensive nonlinear mathematical model that encompasses both matched and mismatched disturbances is formulated.Due to the fact that only position information can be measured,a linear Extended State Observer(ESO)is introduced to estimate unknown states and matched disturbances,while a dedicated disturbance observer is constructed to estimate mismatched disturbances.Different from the traditional observer results,the design of the disturbance observer used in this study is carried out under the constraint of output feedback.Furthermore,an output feedback nonlinear controller is proposed leveraging the aforementioned observers to achieve accurate trajectory tracking.To mitigate the inherent differential explosion problem of the traditional backstepping framework,a finite-time stable command filter is incorporated.Simultaneously,considering transient filtering errors,a set of error compensation signals are designed to counter their negative impact effectively.Theoretical analysis affirms that the proposed control strategy ensures the boundedness of all signals within the closed-loop system.Additionally,under the specific condition of only time-invariant disturbances in the system,the conclusion of asymptotic stability is established.Finally,the algorithm’s efficacy is validated through comparative experiments.
基金financially supported by the Natural Science Foundation of Guangdong Province(No.2024A1515010639)PolyU Postdoc Matching Fund Scheme(No.1-W327),PolyU Grant(No.1-CE0H)+3 种基金Shenzhen Science and Technology Program(No.ZDSYS20220606100406016)Shenzhen Key Laboratory of Photonics and Biophotonics(No.ZDSYS20210623092006020)National Key Laboratory of Green and Long-Life Road Engineering in Extreme Environment(Shenzhen)(No.868-000003010103)National Natural Science Foundation of China(No.52208272)。
文摘Fueled by the increasing imperative for sustainable energy solutions and the burgeoning emphasis on health awareness,self-powered techniques have undergone notable strides in advancement.Triboelectric nanogenerators(TENGs)stand out as a prominent device capitalizing on the principles of triboelectrification and electrostatic induction to generate electricity or electrical signals.In efforts to augment the electrical output performance of TENGs and broaden their range of applications,researchers have endeavored to refine materials,surface morphology,and structural design.Among them,physical morphological modifications play a pivotal role in enhancing the electrical properties of TENGs by increasing the contact surface area,which can be achieved by building micro-/nano-structures on the surface or inside the friction material.In this review,we summarize the common morphologies of TENGs,categorize the morphologies into surface and internal structures,and elucidate their roles in enhancing the electric output performance of devices.Moreover,we systematically classify the methodologies employed for morphological preparation into physical and chemical approaches,thereby furnishing a comprehensive survey of the diverse techniques.Subsequently,typical applications of TENGs with special morphology divided by energy harvesting and self-powered sensors are presented.Finally,an overview of the challenges and future trajectories pertinent to TENGs is conducted.Through this endeavor,the aim of this article is to catalyze the evolution of further strategies for enhancing performance of TENGs.
基金supported by the National Science and Technology Innovation 2030-Major Program(2022ZD 0115403)the National Natural Science Foundation of China(61991414)+1 种基金Chongqing Natural Science Foundation(CSTB2023NSCQJQX0018)Beijing Natural Science Foundation(L221005)
文摘Dear Editor,This letter studies output consensus problem of heterogeneous linear multiagent systems over directed graphs. A novel adaptive dynamic event-triggered controller is presented based only on the feedback combination of the agent's own state and neighbors' output,which can achieve exponential output consensus through intermittent communication. The controller is obtained by solving two linear matrix equations, and Zeno behavior is excluded.
文摘In this paper, we proposed an output voltage stabilization of a DC-DC Zeta converter using hybrid control. We modeled the Zeta converter under continuous conduction mode operation. We derived a switching control law that brings the output voltage to the desired level. Due to infinite switching occurring at the desired level, we enhanced the switching control law by allowing a sizeable output voltage ripple. We derived mathematical models that allow one to choose the desired switching frequency. In practice, the existence of the non-ideal properties of the Zeta converter results in steady-state output voltage error. By analyzing the power loss in the zeta converter, we proposed an improved switching control law that eliminates the steady-state output voltage error. The effectiveness of the proposed method is illustrated with simulation results.
基金supported by the Zhejiang Provincial Natural Science Foundation(LY24F030011,LY23F030005)the National Natural Science Foundation of China(62373131).
文摘In this paper,a pair of dynamic high-gain observer and output feedback controller is proposed for nonlinear systems with multiple unknown time delays.By constructing Lyapunov-Krasovskii functionals,it shows that global state asymptotic regulation can be ensured by introducing a single dynamic gain;furthermore,global asymptotic stabilization can be achieved by choosing a sufficiently large static scaling gain when the upper bounds of all system parameters are known.Especially,the output coefficient is allowed to be non-differentiable with unknown upper bound.This paper proposes a generalized Lyapunov matrix inequality based dynamic-gain scaling method,which significantly simplifies the design computational complexity by comparing with the classic backstepping method.