The application of generative artificial intelligence(AI)is bringing about notable changes in anime creation.This paper surveys recent advancements and applications of diffusion and language models in anime generation...The application of generative artificial intelligence(AI)is bringing about notable changes in anime creation.This paper surveys recent advancements and applications of diffusion and language models in anime generation,focusing on their demonstrated potential to enhance production efficiency through automation and personalization.Despite these benefits,it is crucial to acknowledge the substantial initial computational investments required for training and deploying these models.We conduct an in-depth survey of cutting-edge generative AI technologies,encompassing models such as Stable Diffusion and GPT,and appraise pivotal large-scale datasets alongside quantifiable evaluation metrics.Review of the surveyed literature indicates the achievement of considerable maturity in the capacity of AI models to synthesize high-quality,aesthetically compelling anime visual images from textual prompts,alongside discernible progress in the generation of coherent narratives.However,achieving perfect long-form consistency,mitigating artifacts like flickering in video sequences,and enabling fine-grained artistic control remain critical ongoing challenges.Building upon these advancements,research efforts have increasingly pivoted towards the synthesis of higher-dimensional content,such as video and three-dimensional assets,with recent studies demonstrating significant progress in this burgeoning field.Nevertheless,formidable challenges endure amidst these advancements.Foremost among these are the substantial computational exigencies requisite for training and deploying these sophisticated models,particularly pronounced in the realm of high-dimensional generation such as video synthesis.Additional persistent hurdles include maintaining spatial-temporal consistency across complex scenes and mitigating ethical considerations surrounding bias and the preservation of human creative autonomy.This research underscores the transformative potential and inherent complexities of AI-driven synergy within the creative industries.We posit that future research should be dedicated to the synergistic fusion of diffusion and autoregressive models,the integration of multimodal inputs,and the balanced consideration of ethical implications,particularly regarding bias and the preservation of human creative autonomy,thereby establishing a robust foundation for the advancement of anime creation and the broader landscape of AI-driven content generation.展开更多
HWANG Jenq-Neng received his Ph.D. degree from the University of Southern California, USA. In the summer of 1989, Dr. HWANG joined the De- partment of Electrical Engineering of the Universi- ty of Washington in Seattl...HWANG Jenq-Neng received his Ph.D. degree from the University of Southern California, USA. In the summer of 1989, Dr. HWANG joined the De- partment of Electrical Engineering of the Universi- ty of Washington in Seattle, USA, where he has been promoted to Full Professor since 1999. He served as the Associate Chair for Research fi'om 2003 to 2005, and from 2011-2015. He is current- ly the Associate Chair for Global Affairs and Inter- national Development in the EE Depamnent. Hehas written more than 330 journal papers, conference papers and book chapters in the areas of machine learning, muhimedia signal processing, and muhimedia system integration and networking, including an au- thored textbook on "Multimedia Networking: from Theory to Practice," published by Cambridge University Press. Dr. HWANG has close work- ing relationship with the industry on muhimedia signal processing and nmltimedia networking.展开更多
Synthesizing a real⁃time,high⁃resolution,and lip⁃sync digital human is a challenging task.Although the Wav2Lip model represents a remarkable advancement in real⁃time lip⁃sync,its clarity is still limited.To address th...Synthesizing a real⁃time,high⁃resolution,and lip⁃sync digital human is a challenging task.Although the Wav2Lip model represents a remarkable advancement in real⁃time lip⁃sync,its clarity is still limited.To address this,we enhanced the Wav2Lip model in this study and trained it on a high⁃resolution video dataset produced in our laboratory.Experimental results indicate that the improved Wav2Lip model produces digital humans with greater clarity than the original model,while maintaining its real⁃time performance and accurate lip⁃sync.We implemented the improved Wav2Lip model in a government interface application,generating a government digital human.Testing revealed that this government digital human can interact seamlessly with users in real⁃time,delivering clear visuals and synthesized speech that closely resembles a human voice.展开更多
This survey paper investigates how personalized learning offered by Large Language Models (LLMs) could transform educational experiences. We explore Knowledge Editing Techniques (KME), which guarantee that LLMs mainta...This survey paper investigates how personalized learning offered by Large Language Models (LLMs) could transform educational experiences. We explore Knowledge Editing Techniques (KME), which guarantee that LLMs maintain current knowledge and are essential for providing accurate and up-to-date information. The datasets analyzed in this article are intended to evaluate LLM performance on educational tasks, such as error correction and question answering. We acknowledge the limitations of LLMs while highlighting their fundamental educational capabilities in writing, math, programming, and reasoning. We also explore two promising system architectures: a Mixture-of-Experts (MoE) framework and a unified LLM approach, for LLM-based education. The MoE approach makes use of specialized LLMs under the direction of a central controller for various subjects. We also discuss the use of LLMs for individualized feedback and their possibility in content creation, including the creation of videos, quizzes, and plans. In our final section, we discuss the difficulties and potential solutions for incorporating LLMs into educational systems, highlighting the importance of factual accuracy, reducing bias, and fostering critical thinking abilities. The purpose of this survey is to show the promise of LLMs as well as the issues that still need to be resolved in order to facilitate their responsible and successful integration into the educational ecosystem.展开更多
Several approaches for fast generation of digital holograms of a three-dimensional (3D) object have been discussed. Among them, the novel look-up table (N-LUT) method is analyzed to dramatically reduce the number ...Several approaches for fast generation of digital holograms of a three-dimensional (3D) object have been discussed. Among them, the novel look-up table (N-LUT) method is analyzed to dramatically reduce the number of pre-calculated fringe patterns required for computation of digital holograms of a 3D object by employing a new concept of principal fringe patterns, so that problems of computational complexity and huge memory size of the conventional ray-tracing and look-up table methods have been considerably alleviated. Meanwhile, as the 3D video images have a lot of temporally or spatially redundant data in their inter- and intra-frames, computation time of the 3D video holograms could be also reduced just by removing these redundant data. Thus, a couple of computational methods for generation of 3D video holograms by combined use of the N-LUT method and data compression algorithms are also presented and discussed. Some experimental results finally reveal that by using this approach a great reduction of computation time of 3D video holograms could be achieved.展开更多
基金supported by the National Natural Science Foundation of China(Grant No.62202210).
文摘The application of generative artificial intelligence(AI)is bringing about notable changes in anime creation.This paper surveys recent advancements and applications of diffusion and language models in anime generation,focusing on their demonstrated potential to enhance production efficiency through automation and personalization.Despite these benefits,it is crucial to acknowledge the substantial initial computational investments required for training and deploying these models.We conduct an in-depth survey of cutting-edge generative AI technologies,encompassing models such as Stable Diffusion and GPT,and appraise pivotal large-scale datasets alongside quantifiable evaluation metrics.Review of the surveyed literature indicates the achievement of considerable maturity in the capacity of AI models to synthesize high-quality,aesthetically compelling anime visual images from textual prompts,alongside discernible progress in the generation of coherent narratives.However,achieving perfect long-form consistency,mitigating artifacts like flickering in video sequences,and enabling fine-grained artistic control remain critical ongoing challenges.Building upon these advancements,research efforts have increasingly pivoted towards the synthesis of higher-dimensional content,such as video and three-dimensional assets,with recent studies demonstrating significant progress in this burgeoning field.Nevertheless,formidable challenges endure amidst these advancements.Foremost among these are the substantial computational exigencies requisite for training and deploying these sophisticated models,particularly pronounced in the realm of high-dimensional generation such as video synthesis.Additional persistent hurdles include maintaining spatial-temporal consistency across complex scenes and mitigating ethical considerations surrounding bias and the preservation of human creative autonomy.This research underscores the transformative potential and inherent complexities of AI-driven synergy within the creative industries.We posit that future research should be dedicated to the synergistic fusion of diffusion and autoregressive models,the integration of multimodal inputs,and the balanced consideration of ethical implications,particularly regarding bias and the preservation of human creative autonomy,thereby establishing a robust foundation for the advancement of anime creation and the broader landscape of AI-driven content generation.
文摘HWANG Jenq-Neng received his Ph.D. degree from the University of Southern California, USA. In the summer of 1989, Dr. HWANG joined the De- partment of Electrical Engineering of the Universi- ty of Washington in Seattle, USA, where he has been promoted to Full Professor since 1999. He served as the Associate Chair for Research fi'om 2003 to 2005, and from 2011-2015. He is current- ly the Associate Chair for Global Affairs and Inter- national Development in the EE Depamnent. Hehas written more than 330 journal papers, conference papers and book chapters in the areas of machine learning, muhimedia signal processing, and muhimedia system integration and networking, including an au- thored textbook on "Multimedia Networking: from Theory to Practice," published by Cambridge University Press. Dr. HWANG has close work- ing relationship with the industry on muhimedia signal processing and nmltimedia networking.
基金Sponsored by Collaborative Education Projects Between Industry and Academia by Ministry of Education(Grant No.230801065261444)Humanities and Social Sciences Pre Research Fund Project of Zhejiang University of Technology(Grant No.SKY-ZX-20220207).
文摘Synthesizing a real⁃time,high⁃resolution,and lip⁃sync digital human is a challenging task.Although the Wav2Lip model represents a remarkable advancement in real⁃time lip⁃sync,its clarity is still limited.To address this,we enhanced the Wav2Lip model in this study and trained it on a high⁃resolution video dataset produced in our laboratory.Experimental results indicate that the improved Wav2Lip model produces digital humans with greater clarity than the original model,while maintaining its real⁃time performance and accurate lip⁃sync.We implemented the improved Wav2Lip model in a government interface application,generating a government digital human.Testing revealed that this government digital human can interact seamlessly with users in real⁃time,delivering clear visuals and synthesized speech that closely resembles a human voice.
文摘This survey paper investigates how personalized learning offered by Large Language Models (LLMs) could transform educational experiences. We explore Knowledge Editing Techniques (KME), which guarantee that LLMs maintain current knowledge and are essential for providing accurate and up-to-date information. The datasets analyzed in this article are intended to evaluate LLM performance on educational tasks, such as error correction and question answering. We acknowledge the limitations of LLMs while highlighting their fundamental educational capabilities in writing, math, programming, and reasoning. We also explore two promising system architectures: a Mixture-of-Experts (MoE) framework and a unified LLM approach, for LLM-based education. The MoE approach makes use of specialized LLMs under the direction of a central controller for various subjects. We also discuss the use of LLMs for individualized feedback and their possibility in content creation, including the creation of videos, quizzes, and plans. In our final section, we discuss the difficulties and potential solutions for incorporating LLMs into educational systems, highlighting the importance of factual accuracy, reducing bias, and fostering critical thinking abilities. The purpose of this survey is to show the promise of LLMs as well as the issues that still need to be resolved in order to facilitate their responsible and successful integration into the educational ecosystem.
基金supported by the MKE (Ministry of Knowledge Economy), Korea, under the ITRC (Informa-tion Technology Research Center)support program su-pervised by the NIPA (National IT Industry Promotion Agency) (NIPA-2009-C1090-0902-0018)
文摘Several approaches for fast generation of digital holograms of a three-dimensional (3D) object have been discussed. Among them, the novel look-up table (N-LUT) method is analyzed to dramatically reduce the number of pre-calculated fringe patterns required for computation of digital holograms of a 3D object by employing a new concept of principal fringe patterns, so that problems of computational complexity and huge memory size of the conventional ray-tracing and look-up table methods have been considerably alleviated. Meanwhile, as the 3D video images have a lot of temporally or spatially redundant data in their inter- and intra-frames, computation time of the 3D video holograms could be also reduced just by removing these redundant data. Thus, a couple of computational methods for generation of 3D video holograms by combined use of the N-LUT method and data compression algorithms are also presented and discussed. Some experimental results finally reveal that by using this approach a great reduction of computation time of 3D video holograms could be achieved.