Extensive research has explored human motion generation,but the generated sequences are influenced by different motion styles.For instance,the act of walking with joy and sorrow evokes distinct effects on a character...Extensive research has explored human motion generation,but the generated sequences are influenced by different motion styles.For instance,the act of walking with joy and sorrow evokes distinct effects on a character’s motion.Due to the difficulties in motion capture with styles,the available data for style research are also limited.To address the problems,we propose ASMNet,an action and style-conditioned motion generative network.This network ensures that the generated human motion sequences not only comply with the provided action label but also exhibit distinctive stylistic features.To extract motion features from human motion sequences,we design a spatial temporal extractor.Moreover,we use the adaptive instance normalization layer to inject style into the target motion.Our results are comparable to state-of-the-art approaches and display a substantial advantage in both quantitative and qualitative evaluations.The code is available at https://github.com/ZongYingLi/ASMNet.git.展开更多
The human motion generation model can extract structural features from existing human motion capture data,and the generated data makes animated characters move.The 3D human motion capture sequences contain complex spa...The human motion generation model can extract structural features from existing human motion capture data,and the generated data makes animated characters move.The 3D human motion capture sequences contain complex spatial-temporal structures,and the deep learning model can fully describe the potential semantic structure of human motion.To improve the authenticity of the generated human motion sequences,we propose a multi-task motion generation model that consists of a discriminator and a generator.The discriminator classifies motion sequences into different styles according to their similarity to the mean spatial-temporal templates from motion sequences of 17 crucial human joints in three-freedom degrees.And target motion sequences are created with these styles by the generator.Unlike traditional related works,our model can handle multiple tasks,such as identifying styles and generating data.In addition,by extracting 17 crucial joints from 29 human joints,our model avoids data redundancy and improves the accuracy of model recognition.The experimental results show that the discriminator of the model can effectively recognize diversified movements,and the generated data can correctly fit the actual data.The combination of discriminator and generator solves the problem of low reuse rate of motion data,and the generated motion sequences are more suitable for actual movement.展开更多
The correlation between music and human motion has attracted widespread research attention.Although recent studies have successfully generated motion for singers,dancers,and musicians,few have explored motion generati...The correlation between music and human motion has attracted widespread research attention.Although recent studies have successfully generated motion for singers,dancers,and musicians,few have explored motion generation for orchestral conductors.The generation of music-driven conducting motion should consider not only the basic music beats,but also mid-level music structures,high-level music semantic expressions,and hints for different parts of orchestras(strings,woodwind,etc.).However,most existing conducting motion generation methods rely heavily on human-designed rules,which significantly limits the quality of generated motion.Therefore,we propose a novel Music Motion Synchronized Generative Adversarial Network(M^(2)S-GAN),which generates motions according to the automatically learned music representations.More specifically,M^(2)S-GAN is a cross-modal generative network comprising four components:1)a music encoder that encodes the music signal;2)a generator that generates conducting motion from the music codes;3)a motion encoder that encodes the motion;4)a discriminator that differentiates the real and generated motions.These four components respectively imitate four key aspects of human conductors:understanding music,interpreting music,precision and elegance.The music and motion encoders are first jointly trained by a self-supervised contrastive loss,and can thus help to facilitate the music motion synchronization during the following adversarial learning process.To verify the effectiveness of our method,we construct a large-scale dataset,named ConductorMotion100,which consists of unprecedented 100 hours of conducting motion data.Extensive experiments on ConductorMotion100 demonstrate the effectiveness of M^(2)S-GAN.Our proposed approach outperforms various comparison methods both quantitatively and qualitatively.Through visualization,we show that our approach can generate plausible,diverse,and music-synchronized conducting motion.展开更多
基金supported by National Natural Science Foundation of China(No.62203476)Natural Science Foundation of Shenzhen(No.JCYJ20230807120801002).
文摘Extensive research has explored human motion generation,but the generated sequences are influenced by different motion styles.For instance,the act of walking with joy and sorrow evokes distinct effects on a character’s motion.Due to the difficulties in motion capture with styles,the available data for style research are also limited.To address the problems,we propose ASMNet,an action and style-conditioned motion generative network.This network ensures that the generated human motion sequences not only comply with the provided action label but also exhibit distinctive stylistic features.To extract motion features from human motion sequences,we design a spatial temporal extractor.Moreover,we use the adaptive instance normalization layer to inject style into the target motion.Our results are comparable to state-of-the-art approaches and display a substantial advantage in both quantitative and qualitative evaluations.The code is available at https://github.com/ZongYingLi/ASMNet.git.
文摘The human motion generation model can extract structural features from existing human motion capture data,and the generated data makes animated characters move.The 3D human motion capture sequences contain complex spatial-temporal structures,and the deep learning model can fully describe the potential semantic structure of human motion.To improve the authenticity of the generated human motion sequences,we propose a multi-task motion generation model that consists of a discriminator and a generator.The discriminator classifies motion sequences into different styles according to their similarity to the mean spatial-temporal templates from motion sequences of 17 crucial human joints in three-freedom degrees.And target motion sequences are created with these styles by the generator.Unlike traditional related works,our model can handle multiple tasks,such as identifying styles and generating data.In addition,by extracting 17 crucial joints from 29 human joints,our model avoids data redundancy and improves the accuracy of model recognition.The experimental results show that the discriminator of the model can effectively recognize diversified movements,and the generated data can correctly fit the actual data.The combination of discriminator and generator solves the problem of low reuse rate of motion data,and the generated motion sequences are more suitable for actual movement.
基金the Natural Science Foundation of Jiangsu Province of China under Grant No.BK20191298the National Natural Science Foundation of China under Grant No.61902110.
文摘The correlation between music and human motion has attracted widespread research attention.Although recent studies have successfully generated motion for singers,dancers,and musicians,few have explored motion generation for orchestral conductors.The generation of music-driven conducting motion should consider not only the basic music beats,but also mid-level music structures,high-level music semantic expressions,and hints for different parts of orchestras(strings,woodwind,etc.).However,most existing conducting motion generation methods rely heavily on human-designed rules,which significantly limits the quality of generated motion.Therefore,we propose a novel Music Motion Synchronized Generative Adversarial Network(M^(2)S-GAN),which generates motions according to the automatically learned music representations.More specifically,M^(2)S-GAN is a cross-modal generative network comprising four components:1)a music encoder that encodes the music signal;2)a generator that generates conducting motion from the music codes;3)a motion encoder that encodes the motion;4)a discriminator that differentiates the real and generated motions.These four components respectively imitate four key aspects of human conductors:understanding music,interpreting music,precision and elegance.The music and motion encoders are first jointly trained by a self-supervised contrastive loss,and can thus help to facilitate the music motion synchronization during the following adversarial learning process.To verify the effectiveness of our method,we construct a large-scale dataset,named ConductorMotion100,which consists of unprecedented 100 hours of conducting motion data.Extensive experiments on ConductorMotion100 demonstrate the effectiveness of M^(2)S-GAN.Our proposed approach outperforms various comparison methods both quantitatively and qualitatively.Through visualization,we show that our approach can generate plausible,diverse,and music-synchronized conducting motion.