The rapid development of short video platforms poses new challenges for traditional recommendation systems.Recommender systems typically depend on two types of user behavior feedback to construct user interest profile...The rapid development of short video platforms poses new challenges for traditional recommendation systems.Recommender systems typically depend on two types of user behavior feedback to construct user interest profiles:explicit feedback(interactive behavior),which significantly influences users’short-term interests,and implicit feedback(viewing time),which substantially affects their long-term interests.However,the previous model fails to distinguish between these two feedback methods,leading it to predict only the overall preferences of users based on extensive historical behavior sequences.Consequently,it cannot differentiate between users’long-term and shortterm interests,resulting in low accuracy in describing users’interest states and predicting the evolution of their interests.This paper introduces a video recommendationmodel calledCAT-MFRec(CrossAttention Transformer-Mixed Feedback Recommendation)designed to differentiate between explicit and implicit user feedback within the DIEN(Deep Interest Evolution Network)framework.This study emphasizes the separate learning of the two types of behavioral feedback,effectively integrating them through the cross-attention mechanism.Additionally,it leverages the long sequence dependence capabilities of Transformer technology to accurately construct user interest profiles and predict the evolution of user interests.Experimental results indicate that CAT-MF Rec significantly outperforms existing recommendation methods across various performance indicators.This advancement offers new theoretical and practical insights for the development of video recommendations,particularly in addressing complex and dynamic user behavior patterns.展开更多
Internal learning-based video inpainting methods have shown promising results by exploiting the intrinsic properties of the video to fill in the missing region without external dataset supervision.However,existing int...Internal learning-based video inpainting methods have shown promising results by exploiting the intrinsic properties of the video to fill in the missing region without external dataset supervision.However,existing internal learning-based video inpainting methods would produce inconsistent structures or blurry textures due to the insufficient utilisation of motion priors within the video sequence.In this paper,the authors propose a new internal learning-based video inpainting model called appearance consistency and motion coherence network(ACMC-Net),which can not only learn the recurrence of appearance prior but can also capture motion coherence prior to improve the quality of the inpainting results.In ACMC-Net,a transformer-based appearance network is developed to capture global context information within the video frame for representing appearance consistency accurately.Additionally,a novel motion coherence learning scheme is proposed to learn the motion prior in a video sequence effectively.Finally,the learnt internal appearance consistency and motion coherence are implicitly propagated to the missing regions to achieve inpainting well.Extensive experiments conducted on the DAVIS dataset show that the proposed model obtains the superior performance in terms of quantitative measurements and produces more visually plausible results compared with the state-of-the-art methods.展开更多
Airway management plays a crucial role in providing adequate oxygenation and ventilation to patients during various medical procedures and emergencies.When patients have a limited mouth opening due to factors such as ...Airway management plays a crucial role in providing adequate oxygenation and ventilation to patients during various medical procedures and emergencies.When patients have a limited mouth opening due to factors such as trauma,inflammation,or anatomical abnormalities airway management becomes challenging.A commonly utilized method to overcome this challenge is the use of video laryngoscopy(VL),which employs a specialized device equipped with a camera and a light source to allow a clear view of the larynx and vocal cords.VL overcomes the limitations of direct laryngoscopy in patients with limited mouth opening,enabling better visualization and successful intubation.Various types of VL blades are available.We devised a novel flangeless video laryngoscope for use in patients with a limited mouth opening and then tested it on a manikin.展开更多
Semantic segmentation is a core task in computer vision that allows AI models to interact and understand their surrounding environment. Similarly to how humans subconsciously segment scenes, this ability is crucial fo...Semantic segmentation is a core task in computer vision that allows AI models to interact and understand their surrounding environment. Similarly to how humans subconsciously segment scenes, this ability is crucial for scene understanding. However, a challenge many semantic learning models face is the lack of data. Existing video datasets are limited to short, low-resolution videos that are not representative of real-world examples. Thus, one of our key contributions is a customized semantic segmentation version of the Walking Tours Dataset that features hour-long, high-resolution, real-world data from tours of different cities. Additionally, we evaluate the performance of open-vocabulary, semantic model OpenSeeD on our own custom dataset and discuss future implications.展开更多
Objective: The purpose of this study was to evaluate health education using videos and leaflets for preconception care (PCC) awareness among adolescent females up to six months after the health education. Methods: The...Objective: The purpose of this study was to evaluate health education using videos and leaflets for preconception care (PCC) awareness among adolescent females up to six months after the health education. Methods: The subjects were female university students living in the Kinki area. A longitudinal survey was conducted on 67 members in the intervention group, who received the health education, and 52 members in the control group, who did not receive the health education. The primary outcome measures were knowledge of PCC and the subscales of the Health Promotion Lifestyle Profile. Surveys were conducted before, after, and six months after the intervention in the intervention group, and an initial survey and survey six months later were conducted in the control group. Cochran’s Q test, Bonferroni’s multiple comparison test, and McNemar’s test were used to analyze the knowledge of PCC data. The Health Awareness, Nutrition, and Stress Management subscales of the Health Promotion Lifestyle Profile were analyzed by paired t-test, and comparisons between the intervention and control groups were performed using the two-way repeated measures analysis of variance. Results: In the intervention group of 67 people, the number of subjects who answered “correct” for five of the nine items concerning knowledge of PCC increased immediately after the health education (P = 0.006) but decreased for five items from immediately after the health education to six months later (P = 0.043). In addition, the number of respondents who answered “correct” for “low birth weight infants and future lifestyle-related diseases” (P = 0.016) increased after six months compared with before the health education. For the 52 subjects in the control group, there was no change in the number of subjects who answered “correct” for eight out of the nine items after six months. There was also no increase in scores for the Health Promotion Lifestyle Profile after six months for either the intervention or control group. Conclusion: Providing health education about PCC using videos and leaflets to adolescent females was shown to enhance the knowledge of PCC immediately after the education.展开更多
Multimedia semantic communication has been receiving increasing attention due to its significant enhancement of communication efficiency.Semantic coding,which is oriented towards extracting and encoding the key semant...Multimedia semantic communication has been receiving increasing attention due to its significant enhancement of communication efficiency.Semantic coding,which is oriented towards extracting and encoding the key semantics of video for transmission,is a key aspect in the framework of multimedia semantic communication.In this paper,we propose a facial video semantic coding method with low bitrate based on the temporal continuity of video semantics.At the sender’s end,we selectively transmit facial keypoints and deformation information,allocating distinct bitrates to different keypoints across frames.Compressive techniques involving sampling and quantization are employed to reduce the bitrate while retaining facial key semantic information.At the receiver’s end,a GAN-based generative network is utilized for reconstruction,effectively mitigating block artifacts and buffering problems present in traditional codec algorithms under low bitrates.The performance of the proposed approach is validated on multiple datasets,such as VoxCeleb and TalkingHead-1kH,employing metrics such as LPIPS,DISTS,and AKD for assessment.Experimental results demonstrate significant advantages over traditional codec methods,achieving up to approximately 10-fold bitrate reduction in prolonged,stable head pose scenarios across diverse conversational video settings.展开更多
The application of short videos in agricultural scenarios has become a new form of productive force driving agricultural development,injecting new vitality and opportunities into traditional agriculture.These videos l...The application of short videos in agricultural scenarios has become a new form of productive force driving agricultural development,injecting new vitality and opportunities into traditional agriculture.These videos leverage the unique expressive logic of the platform by adopting a small entry point and prioritizing dissemination rate.They are strategically planned in terms of content,visuals,and interaction to cater to users needs for relaxation,knowledge acquisition,social sharing,agricultural product marketing,and talent display.Through careful design,full creativity,rich emotion,and the creation of distinct character personalities,these videos deliver positive,entertaining,informative,and opinion-driven agricultural content.The production and operation of agricultural short videos can be effectively optimized by analyzing the characteristics of both popular and less popular videos,and utilizing smart tools and trending topics.展开更多
Objectives:Short video addiction has emerged as a significant public health issue in recent years,with a growing trend toward severity.However,research on the causes and impacts of short video addiction remains limite...Objectives:Short video addiction has emerged as a significant public health issue in recent years,with a growing trend toward severity.However,research on the causes and impacts of short video addiction remains limited,and understanding of the variable“TikTok brain”is still in its infancy.Therefore,based on the Stimulus-Organism-Behavior-Consequence(SOBC)framework,we proposed six research hypotheses and constructed a model to explore the relationships between short video usage intensity,TikTok brain,short video addiction,and decreased attention control.Methods:Given that students are considered a high-risk group for excessive short video use,we collected 1086 valid participants from Chinese student users,including 609 males(56.1%)and 477 females(43.9%),with an average participant age of 19.84 years,to test the hypotheses.Results:(1)Short video usage intensity was positively related to short video addiction,TikTok brain,and decreased attention control;(2)TikTok brain was positively related to short video addiction and decreased attention control;and(3)Short video addiction was positively related to decreased attention control.Conclusions:These findings suggest that although excessive use of short video applications brings negative consequences,users still spend significant amounts of time on these platforms,indicating a need for strict self-regulation of usage time.展开更多
Objective:The objective of this study is to determine the effect of nurse-led instructional video(NLIV)on anxiety,satisfaction,and recovery among mothers admitted for cesarean section(CS).Materials and Methods:A quasi...Objective:The objective of this study is to determine the effect of nurse-led instructional video(NLIV)on anxiety,satisfaction,and recovery among mothers admitted for cesarean section(CS).Materials and Methods:A quasi-experimental design was carried out on the mothers scheduled for CS.Eighty participants were selected by a purposive sampling technique,which were divided(40 participants in each group)into an experimental group and a control group.Nurse-led informational video(NLIV)was shown to the experimental group,and routine care was provided for the control group.Modified hospital anxiety scale(HADS),scale for measuring maternal satisfaction in cesarean birth,and obstetric quality of recovery following cesarean delivery were used to assess anxiety,satisfaction,and recovery.Results:Both the experimental and control groups showed significant reductions in anxiety by the first postintervention day(P<0.001),with the experimental group experiencing a greater mean reduction(mean difference[MD]=4.37)than the control group(MD=3.35)but the intergroup difference was not statistically significant(P>0.05).The experimental group reported significantly higher satisfaction scores(175.55±9.42)on the 3rd postoperative day compared to the control group(151.93±14.89;P<0.001).Similarly,the experimental group’s recovery scores(79.90±6.24)were considerably higher than those of the control group(62.45±15.18;P<0.001).On the 3rd postintervention day,satisfaction was significantly associated with age(P<0.001),and recovery with gravidity(P<0.05).Conclusions:NLIV can be used in the preoperative period to reduce anxiety related to CS and to improve satisfaction and recovery after the CS.展开更多
Video classification is an important task in video understanding and plays a pivotal role in intelligent monitoring of information content.Most existing methods do not consider the multimodal nature of the video,and t...Video classification is an important task in video understanding and plays a pivotal role in intelligent monitoring of information content.Most existing methods do not consider the multimodal nature of the video,and the modality fusion approach tends to be too simple,often neglecting modality alignment before fusion.This research introduces a novel dual stream multimodal alignment and fusion network named DMAFNet for classifying short videos.The network uses two unimodal encoder modules to extract features within modalities and exploits a multimodal encoder module to learn interaction between modalities.To solve the modality alignment problem,contrastive learning is introduced between two unimodal encoder modules.Additionally,masked language modeling(MLM)and video text matching(VTM)auxiliary tasks are introduced to improve the interaction between video frames and text modalities through backpropagation of loss functions.Diverse experiments prove the efficiency of DMAFNet in multimodal video classification tasks.Compared with other two mainstream baselines,DMAFNet achieves the best results on the 2022 WeChat Big Data Challenge dataset.展开更多
The Double Take column looks at a single topic from an African and Chinese perspective.This month,we explore how we can cope with the influence of short videos.
High-speed imaging is crucial for understanding the transient dynamics of the world,but conventional frame-by-frame video acquisition is limited by specialized hardware and substantial data storage requirements.We int...High-speed imaging is crucial for understanding the transient dynamics of the world,but conventional frame-by-frame video acquisition is limited by specialized hardware and substantial data storage requirements.We introduce“SpeedShot,”a computational imaging framework for efficient high-speed video imaging.SpeedShot features a low-speed dual-camera setup,which simultaneously captures two temporally coded snapshots.Cross-referencing these two snapshots extracts a multiplexed temporal gradient image,producing a compact and multiframe motion representation for video reconstruction.Recognizing the unique temporal-only modulation model,we propose an explicable motion-guided scale-recurrent transformer for video decoding.It exploits cross-scale error maps to bolster the cycle consistency between predicted and observed data.Evaluations on both simulated datasets and real imaging setups demonstrate SpeedShot’s effectiveness in video-rate up-conversion,with pronounced improvement over video frame interpolation and deblurring methods.The proposed framework is compatible with commercial low-speed cameras,offering a versatile low-bandwidth alternative for video-related applications,such as video surveillance and sports analysis.展开更多
Dance video is an innovative form that integrates dance art with imaging technology,enriching the expression of dance through techniques such as photography,videography,and special effects.This paper explores the defi...Dance video is an innovative form that integrates dance art with imaging technology,enriching the expression of dance through techniques such as photography,videography,and special effects.This paper explores the definition,current development,artistic expression,social impact,and future trends of dance video.Through narrative construction,visual impact,emotional resonance,technological innovation,and cultural expression,dance video enhances the narrative and visual appeal of dance.In the future,dance video will focus more on the integration of virtual reality(VR)and augmented reality(AR)technologies.However,it also faces challenges such as rapid technological updates,maintaining artistic originality,and balancing commercial interests.展开更多
It has been over a decade since the first coded aperture video compressive sensing(CS)system was reported.The underlying principle of this technology is to employ a high-frequency modulator in the optical path to modu...It has been over a decade since the first coded aperture video compressive sensing(CS)system was reported.The underlying principle of this technology is to employ a high-frequency modulator in the optical path to modulate a recorded high-speed scene within one integration time.The superimposed image captured in this manner is modulated and compressed,since multiple modulation patterns are imposed.Following this,reconstruction algorithms are utilized to recover the desired high-speed scene.One leading advantage of video CS is that a single captured measurement can be used to reconstruct a multi-frame video,thereby enabling a low-speed camera to capture high-speed scenes.Inspired by this,a number of variants of video CS systems have been built,mainly using different modulation devices.Meanwhile,in order to obtain high-quality reconstruction videos,many algorithms have been developed,from optimization-based iterative algorithms to deep-learning-based ones.Recently,emerging deep learning methods have been dominant due to their high-speed inference and high-quality reconstruction,highlighting the possibility of deploying video CS in practical applications.Toward this end,this paper reviews the progress that has been achieved in video CS during the past decade.We further analyze the efforts that need to be made—in terms of both hardware and algorithms—to enable real applications.Research gaps are put forward and future directions are summarized to help researchers and engineers working on this topic.展开更多
The blockchain-based audiovisual transmission systems were built to create a distributed and flexible smart transport system(STS).This system lets customers,video creators,and service providers directly connect with e...The blockchain-based audiovisual transmission systems were built to create a distributed and flexible smart transport system(STS).This system lets customers,video creators,and service providers directly connect with each other.Blockchain-based STS devices need a lot of computer power to change different video feed quality and forms into different versions and structures that meet the needs of different users.On the other hand,existing blockchains can’t support live streaming because they take too long to process and don’t have enough computer power.Large amounts of video data being sent and analyzed put too much stress on networks for vehicles.A video surveillance method is suggested in this paper to improve the performance of the blockchain system’s data and lower the latency across the multiple access edge computing(MEC)system.The integration of MEC and blockchain for video surveillance in autonomous vehicles(IMEC-BVS)framework has been proposed.To deal with this problem,the joint optimization problem is shown using the actor-critical asynchronous advantage(ACAA)method and deep reinforcement training as a Markov Choice Progression(MCP).Simulation results show that the suggested method quickly converges and improves the performance of MEC and blockchain when used together for video surveillance in self-driving cars compared to other methods.展开更多
In the wave of internet culture,short videos have become an indispensable medium for social communication.The metaphorical hot words contained within them serve as a unique linguistic phenomenon that leads topics and ...In the wave of internet culture,short videos have become an indispensable medium for social communication.The metaphorical hot words contained within them serve as a unique linguistic phenomenon that leads topics and focuses attention,greatly enriching the expressive layers and rhetorical charm of short videos,and significantly enhancing the video’s theme orientation and emotional identification.This research aims to explore the relationship between the use of metaphorical Internet buzzwords in short videos and the thematic and emotional orientation.The study adopts a combination of qualitative and quantitative methods,taking 10 videos with over 10,000 likes posted by a well-known blogger on Xiaohongshu in 2024 as the research object,transcribing the text,forming research corpora,and conducting multi-dimensional cognitive analysis on them.The study shows that about half of short videos contain metaphorical hot words.Different types of metaphorical hot words can trigger different emotional reactions from fans,especially humorous metaphorical hot words that can stimulate fans’emotional identification and resonance.In addition,in terms of fan participation,videos using metaphorical hot words tend to attract more fan attention than those that do not:these videos not only attract more fans to watch and like,but also trigger more comments and sharing behaviors.In summary,short videos cleverly use metaphors to create internet hot words,significantly enhancing the video’s thematic guidance and emotional resonance,manifested in creating popular topics,clarifying guiding themes,enhancing content attractiveness,and stimulating strong emotional identification,thereby promoting interactive behaviors such as likes and shares.These findings provide a reference for research in related fields such as metaphor,communication studies,and sociology.展开更多
Personal video recorders (PVRs) have altered the way users consume television (TV) content by allowing users to record programs and watch them at their convenience, overcoming the constraints of live broadcasting. How...Personal video recorders (PVRs) have altered the way users consume television (TV) content by allowing users to record programs and watch them at their convenience, overcoming the constraints of live broadcasting. However, standalone PVRs are limited by their individual storage capacities, restricting the number of programs they can store. While online catch-up TV services such as Hulu and Netflix mitigate this limitation by offering on-demand access to broadcast programs shortly after their initial broadcast, they require substantial storage and network resources, leading to significant infrastructural costs for service providers. To address these challenges, we propose a collaborative TV content recording system that leverages distributed PVRs, combining their storage into a virtual shared pool without additional costs. Our system aims to support all concurrent playback requests without service interruption while ensuring program availability comparable to that of local devices. The main contributions of our proposed system are fourfold. First, by sharing storage and upload bandwidth among PVRs, our system significantly expands the overall recording capacity and enables simultaneous recording of multiple programs without the physical constraints of standalone devices. Second, by utilizing erasure coding efficiently, our system reduces the storage space required for each program, allowing more programs to be recorded compared to traditional replication. Third, we propose an adaptive redundancy scheme to control the degree of redundancy of each program based on its evolving playback demand, ensuring high-quality playback by providing sufficient bandwidth for popular programs. Finally, we introduce a contribution-based incentive policy that encourages PVRs to actively participate by contributing resources, while discouraging excessive consumption of the combined storage pool. Through extensive experiments, we demonstrate the effectiveness of our proposed collaborative TV program recording system in terms of storage efficiency and performance.展开更多
With the rapid development of artificial intelligence and Internet of Things technologies,video action recognition technology is widely applied in various scenarios,such as personal life and industrial production.Howe...With the rapid development of artificial intelligence and Internet of Things technologies,video action recognition technology is widely applied in various scenarios,such as personal life and industrial production.However,while enjoying the convenience brought by this technology,it is crucial to effectively protect the privacy of users’video data.Therefore,this paper proposes a video action recognition method based on personalized federated learning and spatiotemporal features.Under the framework of federated learning,a video action recognition method leveraging spatiotemporal features is designed.For the local spatiotemporal features of the video,a new differential information extraction scheme is proposed to extract differential features with a single RGB frame as the center,and a spatialtemporal module based on local information is designed to improve the effectiveness of local feature extraction;for the global temporal features,a method of extracting action rhythm features using differential technology is proposed,and a timemodule based on global information is designed.Different translational strides are used in the module to obtain bidirectional differential features under different action rhythms.Additionally,to address user data privacy issues,the method divides model parameters into local private parameters and public parameters based on the structure of the video action recognition model.This approach enhancesmodel training performance and ensures the security of video data.The experimental results show that under personalized federated learning conditions,an average accuracy of 97.792%was achieved on the UCF-101 dataset,which is non-independent and identically distributed(non-IID).This research provides technical support for privacy protection in video action recognition.展开更多
Background:Vestibular assessments,such as the video ocular counter roll(vOCR)test,offer valuable insights into the interactions between age,otolith function,and vestibular performance.Objective:To analyze the relation...Background:Vestibular assessments,such as the video ocular counter roll(vOCR)test,offer valuable insights into the interactions between age,otolith function,and vestibular performance.Objective:To analyze the relation between age and vOCR gains as a potential marker of age-related otolith degeneration.Methods:A total of 107 participants underwent vOCR testing.Descriptive statistics and simple linear regression analyses were conducted to explore the association between age and vOCR gains.Results were presented using regression coefficients,95%confidence intervals,p-values,and R-squared values.Results:In the overall sample,statistically significant associations were observed between age and vOCR gains in both ears.For the right ear,vOCR gains decreased with increasing age(coefficient:-0.03;95%CI:-0.05 to-0.01;p<0.001;R^(2)=0.08),while the left ear showed a slightly stronger association(coefficient:-0.04;95%CI:-0.07 to-0.02;p<0.001;R²=0.12).These findings indicate a moderate age-related decline in otolith-mediated vestibular responses.Conclusion:vOCR gains appear to decline with age,reflecting potential age-related otolith degeneration.These results support the clinical value of vOCR as a non-invasive method to assess vestibular function and its changes across the lifespan.展开更多
基金supported by National Natural Science Foundation of China(62072416)Key Research and Development Special Project of Henan Province(221111210500)Key TechnologiesR&DProgram of Henan rovince(232102211053,242102211071).
文摘The rapid development of short video platforms poses new challenges for traditional recommendation systems.Recommender systems typically depend on two types of user behavior feedback to construct user interest profiles:explicit feedback(interactive behavior),which significantly influences users’short-term interests,and implicit feedback(viewing time),which substantially affects their long-term interests.However,the previous model fails to distinguish between these two feedback methods,leading it to predict only the overall preferences of users based on extensive historical behavior sequences.Consequently,it cannot differentiate between users’long-term and shortterm interests,resulting in low accuracy in describing users’interest states and predicting the evolution of their interests.This paper introduces a video recommendationmodel calledCAT-MFRec(CrossAttention Transformer-Mixed Feedback Recommendation)designed to differentiate between explicit and implicit user feedback within the DIEN(Deep Interest Evolution Network)framework.This study emphasizes the separate learning of the two types of behavioral feedback,effectively integrating them through the cross-attention mechanism.Additionally,it leverages the long sequence dependence capabilities of Transformer technology to accurately construct user interest profiles and predict the evolution of user interests.Experimental results indicate that CAT-MF Rec significantly outperforms existing recommendation methods across various performance indicators.This advancement offers new theoretical and practical insights for the development of video recommendations,particularly in addressing complex and dynamic user behavior patterns.
基金Shenzhen Science and Technology Programme,Grant/Award Number:JCYJ202308071208000012023 Shenzhen sustainable supporting funds for colleges and universities,Grant/Award Number:20231121165240001Guangdong Provincial Key Laboratory of Ultra High Definition Immersive Media Technology,Grant/Award Number:2024B1212010006。
文摘Internal learning-based video inpainting methods have shown promising results by exploiting the intrinsic properties of the video to fill in the missing region without external dataset supervision.However,existing internal learning-based video inpainting methods would produce inconsistent structures or blurry textures due to the insufficient utilisation of motion priors within the video sequence.In this paper,the authors propose a new internal learning-based video inpainting model called appearance consistency and motion coherence network(ACMC-Net),which can not only learn the recurrence of appearance prior but can also capture motion coherence prior to improve the quality of the inpainting results.In ACMC-Net,a transformer-based appearance network is developed to capture global context information within the video frame for representing appearance consistency accurately.Additionally,a novel motion coherence learning scheme is proposed to learn the motion prior in a video sequence effectively.Finally,the learnt internal appearance consistency and motion coherence are implicitly propagated to the missing regions to achieve inpainting well.Extensive experiments conducted on the DAVIS dataset show that the proposed model obtains the superior performance in terms of quantitative measurements and produces more visually plausible results compared with the state-of-the-art methods.
文摘Airway management plays a crucial role in providing adequate oxygenation and ventilation to patients during various medical procedures and emergencies.When patients have a limited mouth opening due to factors such as trauma,inflammation,or anatomical abnormalities airway management becomes challenging.A commonly utilized method to overcome this challenge is the use of video laryngoscopy(VL),which employs a specialized device equipped with a camera and a light source to allow a clear view of the larynx and vocal cords.VL overcomes the limitations of direct laryngoscopy in patients with limited mouth opening,enabling better visualization and successful intubation.Various types of VL blades are available.We devised a novel flangeless video laryngoscope for use in patients with a limited mouth opening and then tested it on a manikin.
文摘Semantic segmentation is a core task in computer vision that allows AI models to interact and understand their surrounding environment. Similarly to how humans subconsciously segment scenes, this ability is crucial for scene understanding. However, a challenge many semantic learning models face is the lack of data. Existing video datasets are limited to short, low-resolution videos that are not representative of real-world examples. Thus, one of our key contributions is a customized semantic segmentation version of the Walking Tours Dataset that features hour-long, high-resolution, real-world data from tours of different cities. Additionally, we evaluate the performance of open-vocabulary, semantic model OpenSeeD on our own custom dataset and discuss future implications.
文摘Objective: The purpose of this study was to evaluate health education using videos and leaflets for preconception care (PCC) awareness among adolescent females up to six months after the health education. Methods: The subjects were female university students living in the Kinki area. A longitudinal survey was conducted on 67 members in the intervention group, who received the health education, and 52 members in the control group, who did not receive the health education. The primary outcome measures were knowledge of PCC and the subscales of the Health Promotion Lifestyle Profile. Surveys were conducted before, after, and six months after the intervention in the intervention group, and an initial survey and survey six months later were conducted in the control group. Cochran’s Q test, Bonferroni’s multiple comparison test, and McNemar’s test were used to analyze the knowledge of PCC data. The Health Awareness, Nutrition, and Stress Management subscales of the Health Promotion Lifestyle Profile were analyzed by paired t-test, and comparisons between the intervention and control groups were performed using the two-way repeated measures analysis of variance. Results: In the intervention group of 67 people, the number of subjects who answered “correct” for five of the nine items concerning knowledge of PCC increased immediately after the health education (P = 0.006) but decreased for five items from immediately after the health education to six months later (P = 0.043). In addition, the number of respondents who answered “correct” for “low birth weight infants and future lifestyle-related diseases” (P = 0.016) increased after six months compared with before the health education. For the 52 subjects in the control group, there was no change in the number of subjects who answered “correct” for eight out of the nine items after six months. There was also no increase in scores for the Health Promotion Lifestyle Profile after six months for either the intervention or control group. Conclusion: Providing health education about PCC using videos and leaflets to adolescent females was shown to enhance the knowledge of PCC immediately after the education.
基金supported by the National Natural Science Foundation of China (Nos. NSFC 61925105, 62322109, 62171257 and U22B2001)the Xplorer Prize in Information and Electronics technologiesthe Tsinghua University (Department of Electronic Engineering)-Nantong Research Institute for Advanced Communication Technologies Joint Research Center for Space, Air, Ground and Sea Cooperative Communication Network Technology
文摘Multimedia semantic communication has been receiving increasing attention due to its significant enhancement of communication efficiency.Semantic coding,which is oriented towards extracting and encoding the key semantics of video for transmission,is a key aspect in the framework of multimedia semantic communication.In this paper,we propose a facial video semantic coding method with low bitrate based on the temporal continuity of video semantics.At the sender’s end,we selectively transmit facial keypoints and deformation information,allocating distinct bitrates to different keypoints across frames.Compressive techniques involving sampling and quantization are employed to reduce the bitrate while retaining facial key semantic information.At the receiver’s end,a GAN-based generative network is utilized for reconstruction,effectively mitigating block artifacts and buffering problems present in traditional codec algorithms under low bitrates.The performance of the proposed approach is validated on multiple datasets,such as VoxCeleb and TalkingHead-1kH,employing metrics such as LPIPS,DISTS,and AKD for assessment.Experimental results demonstrate significant advantages over traditional codec methods,achieving up to approximately 10-fold bitrate reduction in prolonged,stable head pose scenarios across diverse conversational video settings.
文摘The application of short videos in agricultural scenarios has become a new form of productive force driving agricultural development,injecting new vitality and opportunities into traditional agriculture.These videos leverage the unique expressive logic of the platform by adopting a small entry point and prioritizing dissemination rate.They are strategically planned in terms of content,visuals,and interaction to cater to users needs for relaxation,knowledge acquisition,social sharing,agricultural product marketing,and talent display.Through careful design,full creativity,rich emotion,and the creation of distinct character personalities,these videos deliver positive,entertaining,informative,and opinion-driven agricultural content.The production and operation of agricultural short videos can be effectively optimized by analyzing the characteristics of both popular and less popular videos,and utilizing smart tools and trending topics.
基金supported by the International Joint Research Project of Huiyan International College,Faculty of Education,Beijing Normal University(Grant Number:ICER202102).
文摘Objectives:Short video addiction has emerged as a significant public health issue in recent years,with a growing trend toward severity.However,research on the causes and impacts of short video addiction remains limited,and understanding of the variable“TikTok brain”is still in its infancy.Therefore,based on the Stimulus-Organism-Behavior-Consequence(SOBC)framework,we proposed six research hypotheses and constructed a model to explore the relationships between short video usage intensity,TikTok brain,short video addiction,and decreased attention control.Methods:Given that students are considered a high-risk group for excessive short video use,we collected 1086 valid participants from Chinese student users,including 609 males(56.1%)and 477 females(43.9%),with an average participant age of 19.84 years,to test the hypotheses.Results:(1)Short video usage intensity was positively related to short video addiction,TikTok brain,and decreased attention control;(2)TikTok brain was positively related to short video addiction and decreased attention control;and(3)Short video addiction was positively related to decreased attention control.Conclusions:These findings suggest that although excessive use of short video applications brings negative consequences,users still spend significant amounts of time on these platforms,indicating a need for strict self-regulation of usage time.
文摘Objective:The objective of this study is to determine the effect of nurse-led instructional video(NLIV)on anxiety,satisfaction,and recovery among mothers admitted for cesarean section(CS).Materials and Methods:A quasi-experimental design was carried out on the mothers scheduled for CS.Eighty participants were selected by a purposive sampling technique,which were divided(40 participants in each group)into an experimental group and a control group.Nurse-led informational video(NLIV)was shown to the experimental group,and routine care was provided for the control group.Modified hospital anxiety scale(HADS),scale for measuring maternal satisfaction in cesarean birth,and obstetric quality of recovery following cesarean delivery were used to assess anxiety,satisfaction,and recovery.Results:Both the experimental and control groups showed significant reductions in anxiety by the first postintervention day(P<0.001),with the experimental group experiencing a greater mean reduction(mean difference[MD]=4.37)than the control group(MD=3.35)but the intergroup difference was not statistically significant(P>0.05).The experimental group reported significantly higher satisfaction scores(175.55±9.42)on the 3rd postoperative day compared to the control group(151.93±14.89;P<0.001).Similarly,the experimental group’s recovery scores(79.90±6.24)were considerably higher than those of the control group(62.45±15.18;P<0.001).On the 3rd postintervention day,satisfaction was significantly associated with age(P<0.001),and recovery with gravidity(P<0.05).Conclusions:NLIV can be used in the preoperative period to reduce anxiety related to CS and to improve satisfaction and recovery after the CS.
基金Fundamental Research Funds for the Central Universities,China(No.2232021A-10)National Natural Science Foundation of China(No.61903078)+1 种基金Shanghai Sailing Program,China(No.22YF1401300)Natural Science Foundation of Shanghai,China(No.20ZR1400400)。
文摘Video classification is an important task in video understanding and plays a pivotal role in intelligent monitoring of information content.Most existing methods do not consider the multimodal nature of the video,and the modality fusion approach tends to be too simple,often neglecting modality alignment before fusion.This research introduces a novel dual stream multimodal alignment and fusion network named DMAFNet for classifying short videos.The network uses two unimodal encoder modules to extract features within modalities and exploits a multimodal encoder module to learn interaction between modalities.To solve the modality alignment problem,contrastive learning is introduced between two unimodal encoder modules.Additionally,masked language modeling(MLM)and video text matching(VTM)auxiliary tasks are introduced to improve the interaction between video frames and text modalities through backpropagation of loss functions.Diverse experiments prove the efficiency of DMAFNet in multimodal video classification tasks.Compared with other two mainstream baselines,DMAFNet achieves the best results on the 2022 WeChat Big Data Challenge dataset.
文摘The Double Take column looks at a single topic from an African and Chinese perspective.This month,we explore how we can cope with the influence of short videos.
基金supported by the National Natural Science Foundation of China(Grant No.62305184)the Basic and Applied Basic Research Foundation of Guangdong Province(Grant No.2023A1515012932)+7 种基金the Science,Technology,and Innovation Commission of Shenzhen Municipality(Grant No.JCYJ20241202123919027)the Major Key Project of Pengcheng Laboratory(Grant No.PCL2024A1)the Science Fund for Distinguished Young Scholars of Zhejiang Province(Grant No.LR23F010001)the Research Center for Industries of the Future(RCIF)at Westlake University and and the Key Project of Westlake Institute for Optoelectronics(Grant No.2023GD007)the Zhejiang“Pioneer”and“Leading Goose”R&D Program(Grant Nos.2024SDXHDX0006 and 2024C03182)the Ningbo Science and Technology Bureau“Science and Technology Yongjiang 2035”Key Technology Breakthrough Program(Grant No.2024Z126)the Research Grants Council of the Hong Kong Special Administrative Region,China(Grant Nos.C5031-22G,CityU11310522,and CityU11300123)the City University of Hong Kong(Grant No.9610628).
文摘High-speed imaging is crucial for understanding the transient dynamics of the world,but conventional frame-by-frame video acquisition is limited by specialized hardware and substantial data storage requirements.We introduce“SpeedShot,”a computational imaging framework for efficient high-speed video imaging.SpeedShot features a low-speed dual-camera setup,which simultaneously captures two temporally coded snapshots.Cross-referencing these two snapshots extracts a multiplexed temporal gradient image,producing a compact and multiframe motion representation for video reconstruction.Recognizing the unique temporal-only modulation model,we propose an explicable motion-guided scale-recurrent transformer for video decoding.It exploits cross-scale error maps to bolster the cycle consistency between predicted and observed data.Evaluations on both simulated datasets and real imaging setups demonstrate SpeedShot’s effectiveness in video-rate up-conversion,with pronounced improvement over video frame interpolation and deblurring methods.The proposed framework is compatible with commercial low-speed cameras,offering a versatile low-bandwidth alternative for video-related applications,such as video surveillance and sports analysis.
基金National Social Science Fund of the Chongqing,China(Grant No.2023PY86)。
文摘Dance video is an innovative form that integrates dance art with imaging technology,enriching the expression of dance through techniques such as photography,videography,and special effects.This paper explores the definition,current development,artistic expression,social impact,and future trends of dance video.Through narrative construction,visual impact,emotional resonance,technological innovation,and cultural expression,dance video enhances the narrative and visual appeal of dance.In the future,dance video will focus more on the integration of virtual reality(VR)and augmented reality(AR)technologies.However,it also faces challenges such as rapid technological updates,maintaining artistic originality,and balancing commercial interests.
基金supported by the National Natural Science Foundation of China(61931012,62171258,62088102,and 62271414)the Zhejiang Provincial Outstanding Youth Science Foundation(LR23F010001)the Key Project of Westlake Institute for Optoelectronics(2023GD007).
文摘It has been over a decade since the first coded aperture video compressive sensing(CS)system was reported.The underlying principle of this technology is to employ a high-frequency modulator in the optical path to modulate a recorded high-speed scene within one integration time.The superimposed image captured in this manner is modulated and compressed,since multiple modulation patterns are imposed.Following this,reconstruction algorithms are utilized to recover the desired high-speed scene.One leading advantage of video CS is that a single captured measurement can be used to reconstruct a multi-frame video,thereby enabling a low-speed camera to capture high-speed scenes.Inspired by this,a number of variants of video CS systems have been built,mainly using different modulation devices.Meanwhile,in order to obtain high-quality reconstruction videos,many algorithms have been developed,from optimization-based iterative algorithms to deep-learning-based ones.Recently,emerging deep learning methods have been dominant due to their high-speed inference and high-quality reconstruction,highlighting the possibility of deploying video CS in practical applications.Toward this end,this paper reviews the progress that has been achieved in video CS during the past decade.We further analyze the efforts that need to be made—in terms of both hardware and algorithms—to enable real applications.Research gaps are put forward and future directions are summarized to help researchers and engineers working on this topic.
文摘The blockchain-based audiovisual transmission systems were built to create a distributed and flexible smart transport system(STS).This system lets customers,video creators,and service providers directly connect with each other.Blockchain-based STS devices need a lot of computer power to change different video feed quality and forms into different versions and structures that meet the needs of different users.On the other hand,existing blockchains can’t support live streaming because they take too long to process and don’t have enough computer power.Large amounts of video data being sent and analyzed put too much stress on networks for vehicles.A video surveillance method is suggested in this paper to improve the performance of the blockchain system’s data and lower the latency across the multiple access edge computing(MEC)system.The integration of MEC and blockchain for video surveillance in autonomous vehicles(IMEC-BVS)framework has been proposed.To deal with this problem,the joint optimization problem is shown using the actor-critical asynchronous advantage(ACAA)method and deep reinforcement training as a Markov Choice Progression(MCP).Simulation results show that the suggested method quickly converges and improves the performance of MEC and blockchain when used together for video surveillance in self-driving cars compared to other methods.
文摘In the wave of internet culture,short videos have become an indispensable medium for social communication.The metaphorical hot words contained within them serve as a unique linguistic phenomenon that leads topics and focuses attention,greatly enriching the expressive layers and rhetorical charm of short videos,and significantly enhancing the video’s theme orientation and emotional identification.This research aims to explore the relationship between the use of metaphorical Internet buzzwords in short videos and the thematic and emotional orientation.The study adopts a combination of qualitative and quantitative methods,taking 10 videos with over 10,000 likes posted by a well-known blogger on Xiaohongshu in 2024 as the research object,transcribing the text,forming research corpora,and conducting multi-dimensional cognitive analysis on them.The study shows that about half of short videos contain metaphorical hot words.Different types of metaphorical hot words can trigger different emotional reactions from fans,especially humorous metaphorical hot words that can stimulate fans’emotional identification and resonance.In addition,in terms of fan participation,videos using metaphorical hot words tend to attract more fan attention than those that do not:these videos not only attract more fans to watch and like,but also trigger more comments and sharing behaviors.In summary,short videos cleverly use metaphors to create internet hot words,significantly enhancing the video’s thematic guidance and emotional resonance,manifested in creating popular topics,clarifying guiding themes,enhancing content attractiveness,and stimulating strong emotional identification,thereby promoting interactive behaviors such as likes and shares.These findings provide a reference for research in related fields such as metaphor,communication studies,and sociology.
基金supported by the National Research Foundation of Korea(NRF)grant funded by the Korea government(MSIT)(Nos.2019R1A2C1002221 and RS-2023-00252186)Institute of Information&Communications Technology Planning&Evaluation(IITP)grant funded by the Korea government(MSIT)(Nos.2021-0-00590,RS-2021-II210590Decentralized High Performance Consensus for Large-Scale Blockchains).
文摘Personal video recorders (PVRs) have altered the way users consume television (TV) content by allowing users to record programs and watch them at their convenience, overcoming the constraints of live broadcasting. However, standalone PVRs are limited by their individual storage capacities, restricting the number of programs they can store. While online catch-up TV services such as Hulu and Netflix mitigate this limitation by offering on-demand access to broadcast programs shortly after their initial broadcast, they require substantial storage and network resources, leading to significant infrastructural costs for service providers. To address these challenges, we propose a collaborative TV content recording system that leverages distributed PVRs, combining their storage into a virtual shared pool without additional costs. Our system aims to support all concurrent playback requests without service interruption while ensuring program availability comparable to that of local devices. The main contributions of our proposed system are fourfold. First, by sharing storage and upload bandwidth among PVRs, our system significantly expands the overall recording capacity and enables simultaneous recording of multiple programs without the physical constraints of standalone devices. Second, by utilizing erasure coding efficiently, our system reduces the storage space required for each program, allowing more programs to be recorded compared to traditional replication. Third, we propose an adaptive redundancy scheme to control the degree of redundancy of each program based on its evolving playback demand, ensuring high-quality playback by providing sufficient bandwidth for popular programs. Finally, we introduce a contribution-based incentive policy that encourages PVRs to actively participate by contributing resources, while discouraging excessive consumption of the combined storage pool. Through extensive experiments, we demonstrate the effectiveness of our proposed collaborative TV program recording system in terms of storage efficiency and performance.
基金supported by National Natural Science Foundation of China(Grant No.62071098)Sichuan Science and Technology Program(Grants 2022YFG0319,2023YFG0301 and 2023YFG0018).
文摘With the rapid development of artificial intelligence and Internet of Things technologies,video action recognition technology is widely applied in various scenarios,such as personal life and industrial production.However,while enjoying the convenience brought by this technology,it is crucial to effectively protect the privacy of users’video data.Therefore,this paper proposes a video action recognition method based on personalized federated learning and spatiotemporal features.Under the framework of federated learning,a video action recognition method leveraging spatiotemporal features is designed.For the local spatiotemporal features of the video,a new differential information extraction scheme is proposed to extract differential features with a single RGB frame as the center,and a spatialtemporal module based on local information is designed to improve the effectiveness of local feature extraction;for the global temporal features,a method of extracting action rhythm features using differential technology is proposed,and a timemodule based on global information is designed.Different translational strides are used in the module to obtain bidirectional differential features under different action rhythms.Additionally,to address user data privacy issues,the method divides model parameters into local private parameters and public parameters based on the structure of the video action recognition model.This approach enhancesmodel training performance and ensures the security of video data.The experimental results show that under personalized federated learning conditions,an average accuracy of 97.792%was achieved on the UCF-101 dataset,which is non-independent and identically distributed(non-IID).This research provides technical support for privacy protection in video action recognition.
文摘Background:Vestibular assessments,such as the video ocular counter roll(vOCR)test,offer valuable insights into the interactions between age,otolith function,and vestibular performance.Objective:To analyze the relation between age and vOCR gains as a potential marker of age-related otolith degeneration.Methods:A total of 107 participants underwent vOCR testing.Descriptive statistics and simple linear regression analyses were conducted to explore the association between age and vOCR gains.Results were presented using regression coefficients,95%confidence intervals,p-values,and R-squared values.Results:In the overall sample,statistically significant associations were observed between age and vOCR gains in both ears.For the right ear,vOCR gains decreased with increasing age(coefficient:-0.03;95%CI:-0.05 to-0.01;p<0.001;R^(2)=0.08),while the left ear showed a slightly stronger association(coefficient:-0.04;95%CI:-0.07 to-0.02;p<0.001;R²=0.12).These findings indicate a moderate age-related decline in otolith-mediated vestibular responses.Conclusion:vOCR gains appear to decline with age,reflecting potential age-related otolith degeneration.These results support the clinical value of vOCR as a non-invasive method to assess vestibular function and its changes across the lifespan.