The application of generative artificial intelligence(AI)is bringing about notable changes in anime creation.This paper surveys recent advancements and applications of diffusion and language models in anime generation...The application of generative artificial intelligence(AI)is bringing about notable changes in anime creation.This paper surveys recent advancements and applications of diffusion and language models in anime generation,focusing on their demonstrated potential to enhance production efficiency through automation and personalization.Despite these benefits,it is crucial to acknowledge the substantial initial computational investments required for training and deploying these models.We conduct an in-depth survey of cutting-edge generative AI technologies,encompassing models such as Stable Diffusion and GPT,and appraise pivotal large-scale datasets alongside quantifiable evaluation metrics.Review of the surveyed literature indicates the achievement of considerable maturity in the capacity of AI models to synthesize high-quality,aesthetically compelling anime visual images from textual prompts,alongside discernible progress in the generation of coherent narratives.However,achieving perfect long-form consistency,mitigating artifacts like flickering in video sequences,and enabling fine-grained artistic control remain critical ongoing challenges.Building upon these advancements,research efforts have increasingly pivoted towards the synthesis of higher-dimensional content,such as video and three-dimensional assets,with recent studies demonstrating significant progress in this burgeoning field.Nevertheless,formidable challenges endure amidst these advancements.Foremost among these are the substantial computational exigencies requisite for training and deploying these sophisticated models,particularly pronounced in the realm of high-dimensional generation such as video synthesis.Additional persistent hurdles include maintaining spatial-temporal consistency across complex scenes and mitigating ethical considerations surrounding bias and the preservation of human creative autonomy.This research underscores the transformative potential and inherent complexities of AI-driven synergy within the creative industries.We posit that future research should be dedicated to the synergistic fusion of diffusion and autoregressive models,the integration of multimodal inputs,and the balanced consideration of ethical implications,particularly regarding bias and the preservation of human creative autonomy,thereby establishing a robust foundation for the advancement of anime creation and the broader landscape of AI-driven content generation.展开更多
To generate realistic three-dimensional animation of virtual character,capturing real facial expression is the primary task.Due to diverse facial expressions and complex background,facial landmarks recognized by exist...To generate realistic three-dimensional animation of virtual character,capturing real facial expression is the primary task.Due to diverse facial expressions and complex background,facial landmarks recognized by existing strategies have the problem of deviations and low accuracy.Therefore,a method for facial expression capture based on two-stage neural network is proposed in this paper which takes advantage of improved multi-task cascaded convolutional networks(MTCNN)and high-resolution network.Firstly,the convolution operation of traditional MTCNN is improved.The face information in the input image is quickly filtered by feature fusion in the first stage and Octave Convolution instead of the original ones is introduced into in the second stage to enhance the feature extraction ability of the network,which further rejects a large number of false candidates.The model outputs more accurate facial candidate windows for better landmarks recognition and locates the faces.Then the images cropped after face detection are input into high-resolution network.Multi-scale feature fusion is realized by parallel connection of multi-resolution streams,and rich high-resolution heatmaps of facial landmarks are obtained.Finally,the changes of facial landmarks recognized are tracked in real-time.The expression parameters are extracted and transmitted to Unity3D engine to drive the virtual character’s face,which can realize facial expression synchronous animation.Extensive experimental results obtained on the WFLW database demonstrate the superiority of the proposed method in terms of accuracy and robustness,especially for diverse expressions and complex background.The method can accurately capture facial expression and generate three-dimensional animation effects,making online entertainment and social interaction more immersive in shared virtual space.展开更多
Background Computer Generated Animations(CGA),when applied to three-dimensional(3D)city models(3DCM),can be used as powerful tools to support urban decision-making.This leads to a new paradigm,based on procedural mode...Background Computer Generated Animations(CGA),when applied to three-dimensional(3D)city models(3DCM),can be used as powerful tools to support urban decision-making.This leads to a new paradigm,based on procedural modeling,that allows the integration of known urban structures.Methods This paper introduces a new workflow for the development of high-quality approximations of urban models in a short time and enables facilities to be imported from other cities into a given city model,following specific generation rules.Results Thus,this workflow provides a very simple approach to observe,study,and simulate the implementation of models already developed in other cities,in a city where they are not yet adopted.Examples of these models include all types of mobility systems and urban infrastructure.Conclusions This allows us to perceive the environmental impact of certain decisions in the real world,as well as to carry out simple simulations to determine the changes that can occur in the flows of people,traffic,and other city activities.展开更多
基金supported by the National Natural Science Foundation of China(Grant No.62202210).
文摘The application of generative artificial intelligence(AI)is bringing about notable changes in anime creation.This paper surveys recent advancements and applications of diffusion and language models in anime generation,focusing on their demonstrated potential to enhance production efficiency through automation and personalization.Despite these benefits,it is crucial to acknowledge the substantial initial computational investments required for training and deploying these models.We conduct an in-depth survey of cutting-edge generative AI technologies,encompassing models such as Stable Diffusion and GPT,and appraise pivotal large-scale datasets alongside quantifiable evaluation metrics.Review of the surveyed literature indicates the achievement of considerable maturity in the capacity of AI models to synthesize high-quality,aesthetically compelling anime visual images from textual prompts,alongside discernible progress in the generation of coherent narratives.However,achieving perfect long-form consistency,mitigating artifacts like flickering in video sequences,and enabling fine-grained artistic control remain critical ongoing challenges.Building upon these advancements,research efforts have increasingly pivoted towards the synthesis of higher-dimensional content,such as video and three-dimensional assets,with recent studies demonstrating significant progress in this burgeoning field.Nevertheless,formidable challenges endure amidst these advancements.Foremost among these are the substantial computational exigencies requisite for training and deploying these sophisticated models,particularly pronounced in the realm of high-dimensional generation such as video synthesis.Additional persistent hurdles include maintaining spatial-temporal consistency across complex scenes and mitigating ethical considerations surrounding bias and the preservation of human creative autonomy.This research underscores the transformative potential and inherent complexities of AI-driven synergy within the creative industries.We posit that future research should be dedicated to the synergistic fusion of diffusion and autoregressive models,the integration of multimodal inputs,and the balanced consideration of ethical implications,particularly regarding bias and the preservation of human creative autonomy,thereby establishing a robust foundation for the advancement of anime creation and the broader landscape of AI-driven content generation.
基金This research was funded by College Student Innovation and Entrepreneurship Training Program,grant number 2021055Z and S202110082031the Special Project for Cultivating Scientific and Technological Innovation Ability of College and Middle School Students in Hebei Province,Grant Number 2021H011404.
文摘To generate realistic three-dimensional animation of virtual character,capturing real facial expression is the primary task.Due to diverse facial expressions and complex background,facial landmarks recognized by existing strategies have the problem of deviations and low accuracy.Therefore,a method for facial expression capture based on two-stage neural network is proposed in this paper which takes advantage of improved multi-task cascaded convolutional networks(MTCNN)and high-resolution network.Firstly,the convolution operation of traditional MTCNN is improved.The face information in the input image is quickly filtered by feature fusion in the first stage and Octave Convolution instead of the original ones is introduced into in the second stage to enhance the feature extraction ability of the network,which further rejects a large number of false candidates.The model outputs more accurate facial candidate windows for better landmarks recognition and locates the faces.Then the images cropped after face detection are input into high-resolution network.Multi-scale feature fusion is realized by parallel connection of multi-resolution streams,and rich high-resolution heatmaps of facial landmarks are obtained.Finally,the changes of facial landmarks recognized are tracked in real-time.The expression parameters are extracted and transmitted to Unity3D engine to drive the virtual character’s face,which can realize facial expression synchronous animation.Extensive experimental results obtained on the WFLW database demonstrate the superiority of the proposed method in terms of accuracy and robustness,especially for diverse expressions and complex background.The method can accurately capture facial expression and generate three-dimensional animation effects,making online entertainment and social interaction more immersive in shared virtual space.
基金project"Crowdsourcing Optimized Wireless Sensor Network Deployment(CRoWD)"of Dirección General de Investigaciones of Universidad Santiago de Cali under grant No.613-621119-852.
文摘Background Computer Generated Animations(CGA),when applied to three-dimensional(3D)city models(3DCM),can be used as powerful tools to support urban decision-making.This leads to a new paradigm,based on procedural modeling,that allows the integration of known urban structures.Methods This paper introduces a new workflow for the development of high-quality approximations of urban models in a short time and enables facilities to be imported from other cities into a given city model,following specific generation rules.Results Thus,this workflow provides a very simple approach to observe,study,and simulate the implementation of models already developed in other cities,in a city where they are not yet adopted.Examples of these models include all types of mobility systems and urban infrastructure.Conclusions This allows us to perceive the environmental impact of certain decisions in the real world,as well as to carry out simple simulations to determine the changes that can occur in the flows of people,traffic,and other city activities.