We propose a Rate-Distortion (RD) optimized strategy for frame-dropping and scheduling of multi-user conversa- tional and streaming videos. We consider a scenario where conversational and streaming videos share the fo...We propose a Rate-Distortion (RD) optimized strategy for frame-dropping and scheduling of multi-user conversa- tional and streaming videos. We consider a scenario where conversational and streaming videos share the forwarding resources at a network node. Two buffers are setup on the node to temporarily store the packets for these two types of video applications. For streaming video, a big buffer is used as the associated delay constraint of the application is moderate and a very small buffer is used for conversational video to ensure that the forwarding delay of every packet is limited. A scheduler is located behind these two buffers that dynamically assigns transmission slots on the outgoing link to the two buffers. Rate-distortion side information is used to perform RD-optimized frame dropping in case of node overload. Sharing the data rate on the outgoing link between the con- versational and the streaming videos is done either based on the fullness of the two associated buffers or on the mean incoming rates of the respective videos. Simulation results showed that our proposed RD-optimized frame dropping and scheduling ap- proach provides significant improvements in performance over the popular priority-based random dropping (PRD) technique.展开更多
With the increasing popularity of solid sate lighting devices, Visible Light Communication (VLC) is globally recognized as an advanced and promising technology to realize short-range, high speed as well as large capac...With the increasing popularity of solid sate lighting devices, Visible Light Communication (VLC) is globally recognized as an advanced and promising technology to realize short-range, high speed as well as large capacity wireless data transmission. In this paper, we propose a prototype of real-time audio and video broadcast system using inexpensive commercially available light emitting diode (LED) lamps. Experimental results show that real-time high quality audio and video with the maximum distance of 3 m can be achieved through proper layout of LED sources and improvement of concentration effects. Lighting model within room environment is designed and simulated which indicates close relationship between layout of light sources and distribution of illuminance.展开更多
Video action recognition(VAR)aims to analyze dynamic behaviors in videos and achieve semantic understanding.VAR faces challenges such as temporal dynamics,action-scene coupling,and the complexity of human interactions...Video action recognition(VAR)aims to analyze dynamic behaviors in videos and achieve semantic understanding.VAR faces challenges such as temporal dynamics,action-scene coupling,and the complexity of human interactions.Existing methods can be categorized into motion-level,event-level,and story-level ones based on spatiotemporal granularity.However,single-modal approaches struggle to capture complex behavioral semantics and human factors.Therefore,in recent years,vision-language models(VLMs)have been introduced into this field,providing new research perspectives for VAR.In this paper,we systematically review spatiotemporal hierarchical methods in VAR and explore how the introduction of large models has advanced the field.Additionally,we propose the concept of“Factor”to identify and integrate key information from both visual and textual modalities,enhancing multimodal alignment.We also summarize various multimodal alignment methods and provide in-depth analysis and insights into future research directions.展开更多
The scalable extension of H.264/AVC, known as scalable video coding or SVC, is currently the main focus of the Joint Video Team’s work. In its present working draft, the higher level syntax of SVC follows the design ...The scalable extension of H.264/AVC, known as scalable video coding or SVC, is currently the main focus of the Joint Video Team’s work. In its present working draft, the higher level syntax of SVC follows the design principles of H.264/AVC. Self-contained network abstraction layer units (NAL units) form natural entities for packetization. The SVC specification is by no means finalized yet, but nevertheless the work towards an optimized RTP payload format has already started. RFC 3984, the RTP payload specification for H.264/AVC has been taken as a starting point, but it became quickly clear that the scalable features of SVC require adaptation in at least the areas of capability/operation point signaling and documentation of the extended NAL unit header. This paper first gives an overview of the history of scalable video coding, and then reviews the video coding layer (VCL) and NAL of the latest SVC draft specification. Finally, it discusses different aspects of the draft SVC RTP payload format, in- cluding the design criteria, use cases, signaling and payload structure.展开更多
The 3rd generation partnership project (3GPP) has defined the protocols and codecs for implementing media streaming services over packet-switched 3G mobile networks. The specification is based on IETF RFCs on audio/vi...The 3rd generation partnership project (3GPP) has defined the protocols and codecs for implementing media streaming services over packet-switched 3G mobile networks. The specification is based on IETF RFCs on audio/video transport.It also adds new features to achieve better adaptation to the mobile network environment. In this paper, we propose an algorithm for handover detection and fast buffer refill that is based on the existing feedback and signaling mechanisms. The proposed algorithm refills the receiver buffer at a faster pace during a limited time frame after a hard handover is detected in order to achieve higher video quality.展开更多
Providing services on demand is a major contributing factor to drive the increasingly development of the software defined network. However, it should supply all the current popular applications before it really attain...Providing services on demand is a major contributing factor to drive the increasingly development of the software defined network. However, it should supply all the current popular applications before it really attains widespread development. Multiple Description Coding(MDC) video applications, as a popular application in the current network, should be reasonably supported in this novel network virtualization environment. In this paper, we address this issue to assign MDC video application into virtual networks with an efficient centralized algorithm(CAMDV). Since this problem is an NP-hard problem, we design an algorithm that can effectively balance the user satisfaction and network resource cost. Previous work just builds a global multicast tree for each description to connect all the destination nodes by breadth-first search strategy or shortest path tree algorithm. But those methods could not achieve an optimal balance or a high-level user satisfaction. By introducing the hierarchical clustering scheme, our algorithm decomposes the whole mapping procedure into multicast tree construction and multipath description distribution. A serial of simulation experiments show that our centralized algorithm could achieve a better performance in balancing the user satisfaction and average mapping cost in comparison with its rivals.展开更多
Under the guidance of “technical value theory” taking both the natural and social attributes of technique into consideration, the fault in indirect infringements of copyrights by video sharing websites includes two ...Under the guidance of “technical value theory” taking both the natural and social attributes of technique into consideration, the fault in indirect infringements of copyrights by video sharing websites includes two forms:“intention” and “negligence”. As an objective criterion of negligence identification, the duty of care, is the natural extension of the security obligation in cyberspace;for a video sharing website, the foreseeable obligation of infringements is the main content of the duty of care;and it is highlighted that the degree of the duty of care hinges on different factors. For the form of liability, a video sharing website faces the difficulties of excessive costs in debt recovery after assuming the complementary liability, joint and several liability is thus alienated as an aggravating responsibility. However, according to the causative potency between the fault of a video sharing website and the infringement results, the several/shared liability can avoid the overburden to a video sharing website and distribute the risks of inadequate compensation based on the principle of fairness.展开更多
For rate control (RC) of hierarchical structure coding, an independent rate-quantization (R-Q) model was proposed based on mean absolute differences (MADs) in different temporal levels (TLs). In the proposed R-Q model...For rate control (RC) of hierarchical structure coding, an independent rate-quantization (R-Q) model was proposed based on mean absolute differences (MADs) in different temporal levels (TLs). In the proposed R-Q model, a novel MAD model was developed according to the hierarchical structure. The experimental results demonstrate that the proposed algorithm provides better performance, in terms of average peak signal-to-noise ratio (PSNR) and quality smoothness, than the H.264 reference model, JM14.2, under various sequences.展开更多
BACKGROUND Video capsule endoscopy(VCE)is a noninvasive technique used to examine small bowel abnormalities in both adults and children.However,manual review of VCE images is time-consuming and labor-intensive,making ...BACKGROUND Video capsule endoscopy(VCE)is a noninvasive technique used to examine small bowel abnormalities in both adults and children.However,manual review of VCE images is time-consuming and labor-intensive,making it crucial to develop deep learning methods to assist in image analysis.AIM To employ deep learning models for the automatic classification of small bowel lesions using pediatric VCE images.METHODS We retrospectively analyzed VCE images from 162 pediatric patients who underwent VCE between January 2021 and December 2023 at the Children's Hospital of Nanjing Medical University.A total of 2298 high-resolution images were extracted,including normal mucosa and lesions(erosions/erythema,ulcers,and polyps).The images were split into training and test datasets in a 4:1 ratio.Four deep learning models:DenseNet121,Visual geometry group-16,ResNet50,and vision transformer were trained using 5-fold cross-validation,with hyperparameters adjusted for optimal classification performance.The models were evaluated based on accuracy,precision,recall,F1-score,and area under the receiver operating curve(AU-ROC).Lesion visualization was performed using gradient-weighted class activation mapping.RESULTS Abdominal pain was the most common indication for VCE,accounting for 62%of cases,followed by diarrhea,vomiting,and gastrointestinal bleeding.Abnormal lesions were detected in 93 children,with 38 diagnosed with inflammatory bowel disease.Among the deep learning models,DenseNet121 and ResNet50 demonstrated excellent classification performance,achieving accuracies of 90.6%[95%confidence interval(CI):89.2-92.0]and 90.5%(95%CI:89.9-91.2),respectively.The AU-ROC values for these models were 93.7%(95%CI:92.9-94.5)for DenseNet121 and 93.4%(95%CI:93.1-93.8)for ResNet50.CONCLUSION Our deep learning-based diagnostic tool developed in this study effectively classified lesions in pediatric VCE images,contributing to more accurate diagnoses and increased diagnostic efficiency.展开更多
Short videos on social media have rapidly emerged as a powerful marketing tool for shaping consumer behavior.This comparative study investigates the impact of short videos on the purchasing behavior of young consumers...Short videos on social media have rapidly emerged as a powerful marketing tool for shaping consumer behavior.This comparative study investigates the impact of short videos on the purchasing behavior of young consumers(aged 18-35)in Hanoi and Taipei,China.Quantitative methods,including surveys,and experimental design,were employed in both cities,with a sample size of 200 respondents per location.Key influencing factors-including video content,product information,celebrity endorsement,viewer interaction,and perceived value-were systematically analyzed.The findings highlight both commonalities and contextual differences in how short videos influence purchasing behavior.This study offers practical implications for businesses and marketers targeting young consumers in Vietnam and Taiwan,China.展开更多
In 2022,cervical cancer accounted for approximately 662,301 new cases worldwide,representing 6.9%of all cancers diagnosed in women.Furthermore,it was the fourth leading cause of cancer-related deaths among women~1.In ...In 2022,cervical cancer accounted for approximately 662,301 new cases worldwide,representing 6.9%of all cancers diagnosed in women.Furthermore,it was the fourth leading cause of cancer-related deaths among women~1.In China,human papillomavirus(HPV)vaccination is not included in the National Immunization Program,thus creating marked urban±rural disparities:only 5.7%of rural children are vaccinated~2.Local publicly funded initiatives have increased vaccination uptake in some cities(e.g.,Shenzhen pilot;Jinan first-dose coverage>90%among eligible girls)~(3,4).展开更多
Herein,we describe a case of robotic duodenum-preserving pancreatic head resection(DPPHR)performed on a 21-monthold male infant(weight:13 kg;body mass index:18.87 kg/m^(2))with focal nesidioblastosis,expanding the sco...Herein,we describe a case of robotic duodenum-preserving pancreatic head resection(DPPHR)performed on a 21-monthold male infant(weight:13 kg;body mass index:18.87 kg/m^(2))with focal nesidioblastosis,expanding the scope of minimally invasive pediatric hepatobiliary surgery(Fig.1;Video S1).Preoperative positron emission tomography-computed tomography revealed a 23×13 mm neuroendocrine lesion in the pancreatic head,which caused refractory hypoglycemia and necessitated surgical intervention.展开更多
In view of the fact that the current high efficiency video coding standard does not consider the characteristics of human vision, this paper proposes a perceptual video coding algorithm based on the just noticeable di...In view of the fact that the current high efficiency video coding standard does not consider the characteristics of human vision, this paper proposes a perceptual video coding algorithm based on the just noticeable distortion model (JND). The adjusted JND model is combined into the transformation quantization process in high efficiency video coding (HEVC) to remove more visual redundancy and maintain compatibility. First of all, we design the JND model based on pixel domain and transform domain respectively, and the pixel domain model can give the JND threshold more intuitively on the pixel. The transform domain model introduces the contrast sensitive function into the model, making the threshold estimation more precise. Secondly, the proposed JND model is embedded in the HEVC video coding framework. For the transformation skip mode (TSM) in HEVC, we adopt the existing pixel domain called nonlinear additively model (NAMM). For the non-transformation skip mode (non-TSM) in HEVC, we use transform domain JND model to further reduce visual redundancy. The simulation results show that in the case of the same visual subjective quality, the algorithm can save more bitrates.展开更多
Although compressive measurements save data storage and bandwidth usage, they are difficult to be used directly for target tracking and classification without pixel reconstruction. This is because the Gaussian random ...Although compressive measurements save data storage and bandwidth usage, they are difficult to be used directly for target tracking and classification without pixel reconstruction. This is because the Gaussian random matrix destroys the target location information in the original video frames. This paper summarizes our research effort on target tracking and classification directly in the compressive measurement domain. We focus on one particular type of compressive measurement using pixel subsampling. That is, original pixels in video frames are randomly subsampled. Even in such a special compressive sensing setting, conventional trackers do not work in a satisfactory manner. We propose a deep learning approach that integrates YOLO (You Only Look Once) and ResNet (residual network) for multiple target tracking and classification. YOLO is used for multiple target tracking and ResNet is for target classification. Extensive experiments using short wave infrared (SWIR), mid-wave infrared (MWIR), and long-wave infrared (LWIR) videos demonstrated the efficacy of the proposed approach even though the training data are very scarce.展开更多
We are interested in providing Video-on-Demand (VoD) streaming service to a large population of clients using peer-to-peer (P2P) approach. Given the asynchronous demands from multiple clients, continuously changing of...We are interested in providing Video-on-Demand (VoD) streaming service to a large population of clients using peer-to-peer (P2P) approach. Given the asynchronous demands from multiple clients, continuously changing of the buffered contents, and the continuous video display requirement, how to collaborate with potential partners to get expected data for future content delivery are very important and challenging. In this paper, we develop a novel scheduling algorithm based on deadline- aware network coding (DNC) to fully exploit the network resource for efficient VoD service. DNC generalizes the existing net- work coding (NC) paradigm, an elegant solution for ubiquitous data distribution. Yet, with deadline awareness, DNC improves the network throughput and meanwhile avoid missing the play deadline in high probability, which is a major deficiency of the con- ventional NC. Extensive simulation results demonstrated that DNC achieves high streaming continuity even in tight network conditions.展开更多
Real-time video transport over wireless Internet faces many challenges due to the heterogeneous environment including wireline and wireless networks. A robust network condition classification algorithm using multiple ...Real-time video transport over wireless Internet faces many challenges due to the heterogeneous environment including wireline and wireless networks. A robust network condition classification algorithm using multiple end-to-end metrics and Support Vector Machine (SVM) is proposed to classify different network events and model the transition pattern of network conditions. End-to-end Quality-of-Service (QoS) mechanisms like congestion control, error control, and power control can benefit from the network condition information and react to different network situations appropriately. The proposed network condition classifica- tion algorithm uses SVM as a classifier to cluster different end-to-end metrics such as end-to-end delay, delay jitter, throughput and packet loss-rate for the UDP traffic with TCP-friendly Rate Control (TFRC), which is used for video transport. The algorithm is also flexible for classifying different numbers of states representing different levels of network events such as wireline congestion and wireless channel loss. Simulation results using network simulator 2 (ns2) showed the effectiveness of the proposed scheme.展开更多
A P2P approaches to extend the ability of Video on Demand systems to serve more users. In the proposed system users share with each other the media data obtained and the media server is no longer the only source to ge...A P2P approaches to extend the ability of Video on Demand systems to serve more users. In the proposed system users share with each other the media data obtained and the media server is no longer the only source to get data from, thereby, the load on the media server could be greatly alleviated and the overall system capacity increases and more users could be served. The P2P streaming system introduces efficient searching;data transfer dynamically monitoring and initial buffering to maintain a high quality of playback. Its provider selection policy helps to reduce the load of the underlying network by avoiding remote data transfer.展开更多
Semantic segmentation is a core task in computer vision that allows AI models to interact and understand their surrounding environment. Similarly to how humans subconsciously segment scenes, this ability is crucial fo...Semantic segmentation is a core task in computer vision that allows AI models to interact and understand their surrounding environment. Similarly to how humans subconsciously segment scenes, this ability is crucial for scene understanding. However, a challenge many semantic learning models face is the lack of data. Existing video datasets are limited to short, low-resolution videos that are not representative of real-world examples. Thus, one of our key contributions is a customized semantic segmentation version of the Walking Tours Dataset that features hour-long, high-resolution, real-world data from tours of different cities. Additionally, we evaluate the performance of open-vocabulary, semantic model OpenSeeD on our own custom dataset and discuss future implications.展开更多
The devastating effects of wildland fire are an unsolved problem,resulting in human losses and the destruction of natural and economic resources.Convolutional neural network(CNN)is shown to perform very well in the ar...The devastating effects of wildland fire are an unsolved problem,resulting in human losses and the destruction of natural and economic resources.Convolutional neural network(CNN)is shown to perform very well in the area of object classification.This network has the ability to perform feature extraction and classification within the same architecture.In this paper,we propose a CNN for identifying fire in videos.A deep domain based method for video fire detection is proposed to extract a powerful feature representation of fire.Testing on real video sequences,the proposed approach achieves better classification performance as some of relevant conventional video based fire detection methods and indicates that using CNN to detect fire in videos is efficient.To balance the efficiency and accuracy,the model is fine-tuned considering the nature of the target problem and fire data.Experimental results on benchmark fire datasets reveal the effectiveness of the proposed framework and validate its suitability for fire detection in closed-circuit television surveillance systems compared to state-of-the-art methods.展开更多
基金Project (No. STE1093/1-1) supported by the German ResearchFoundation, Germany
文摘We propose a Rate-Distortion (RD) optimized strategy for frame-dropping and scheduling of multi-user conversa- tional and streaming videos. We consider a scenario where conversational and streaming videos share the forwarding resources at a network node. Two buffers are setup on the node to temporarily store the packets for these two types of video applications. For streaming video, a big buffer is used as the associated delay constraint of the application is moderate and a very small buffer is used for conversational video to ensure that the forwarding delay of every packet is limited. A scheduler is located behind these two buffers that dynamically assigns transmission slots on the outgoing link to the two buffers. Rate-distortion side information is used to perform RD-optimized frame dropping in case of node overload. Sharing the data rate on the outgoing link between the con- versational and the streaming videos is done either based on the fullness of the two associated buffers or on the mean incoming rates of the respective videos. Simulation results showed that our proposed RD-optimized frame dropping and scheduling ap- proach provides significant improvements in performance over the popular priority-based random dropping (PRD) technique.
文摘With the increasing popularity of solid sate lighting devices, Visible Light Communication (VLC) is globally recognized as an advanced and promising technology to realize short-range, high speed as well as large capacity wireless data transmission. In this paper, we propose a prototype of real-time audio and video broadcast system using inexpensive commercially available light emitting diode (LED) lamps. Experimental results show that real-time high quality audio and video with the maximum distance of 3 m can be achieved through proper layout of LED sources and improvement of concentration effects. Lighting model within room environment is designed and simulated which indicates close relationship between layout of light sources and distribution of illuminance.
基金supported by the Zhejiang Provincial Natural Science Foundation of China(No.LQ23F030001)the National Natural Science Foundation of China(No.62406280)+5 种基金the Autism Research Special Fund of Zhejiang Foundation for Disabled Persons(No.2023008)the Liaoning Province Higher Education Innovative Talents Program Support Project(No.LR2019058)the Liaoning Province Joint Open Fund for Key Scientific and Technological Innovation Bases(No.2021-KF-12-05)the Central Guidance on Local Science and Technology Development Fund of Liaoning Province(No.2023JH6/100100066)the Key Laboratory for Biomedical Engineering of Ministry of Education,Zhejiang University,Chinain part by the Open Research Fund of the State Key Laboratory of Cognitive Neuroscience and Learning.
文摘Video action recognition(VAR)aims to analyze dynamic behaviors in videos and achieve semantic understanding.VAR faces challenges such as temporal dynamics,action-scene coupling,and the complexity of human interactions.Existing methods can be categorized into motion-level,event-level,and story-level ones based on spatiotemporal granularity.However,single-modal approaches struggle to capture complex behavioral semantics and human factors.Therefore,in recent years,vision-language models(VLMs)have been introduced into this field,providing new research perspectives for VAR.In this paper,we systematically review spatiotemporal hierarchical methods in VAR and explore how the introduction of large models has advanced the field.Additionally,we propose the concept of“Factor”to identify and integrate key information from both visual and textual modalities,enhancing multimodal alignment.We also summarize various multimodal alignment methods and provide in-depth analysis and insights into future research directions.
文摘The scalable extension of H.264/AVC, known as scalable video coding or SVC, is currently the main focus of the Joint Video Team’s work. In its present working draft, the higher level syntax of SVC follows the design principles of H.264/AVC. Self-contained network abstraction layer units (NAL units) form natural entities for packetization. The SVC specification is by no means finalized yet, but nevertheless the work towards an optimized RTP payload format has already started. RFC 3984, the RTP payload specification for H.264/AVC has been taken as a starting point, but it became quickly clear that the scalable features of SVC require adaptation in at least the areas of capability/operation point signaling and documentation of the extended NAL unit header. This paper first gives an overview of the history of scalable video coding, and then reviews the video coding layer (VCL) and NAL of the latest SVC draft specification. Finally, it discusses different aspects of the draft SVC RTP payload format, in- cluding the design criteria, use cases, signaling and payload structure.
文摘The 3rd generation partnership project (3GPP) has defined the protocols and codecs for implementing media streaming services over packet-switched 3G mobile networks. The specification is based on IETF RFCs on audio/video transport.It also adds new features to achieve better adaptation to the mobile network environment. In this paper, we propose an algorithm for handover detection and fast buffer refill that is based on the existing feedback and signaling mechanisms. The proposed algorithm refills the receiver buffer at a faster pace during a limited time frame after a hard handover is detected in order to achieve higher video quality.
基金supported by the National Basic Research Program of China (2012CB315903)the National Science and Technology Support Program (2014BAH24F01)+3 种基金the Program for Key Science and Technology Innovation Team of Zhejiang Province (2011R50010-21, 2013TD20)863 Program of China (2015AA016103)the National Natural Science Foundation of China (61379118)the Fundamental Research Funds for the Central Universities
文摘Providing services on demand is a major contributing factor to drive the increasingly development of the software defined network. However, it should supply all the current popular applications before it really attains widespread development. Multiple Description Coding(MDC) video applications, as a popular application in the current network, should be reasonably supported in this novel network virtualization environment. In this paper, we address this issue to assign MDC video application into virtual networks with an efficient centralized algorithm(CAMDV). Since this problem is an NP-hard problem, we design an algorithm that can effectively balance the user satisfaction and network resource cost. Previous work just builds a global multicast tree for each description to connect all the destination nodes by breadth-first search strategy or shortest path tree algorithm. But those methods could not achieve an optimal balance or a high-level user satisfaction. By introducing the hierarchical clustering scheme, our algorithm decomposes the whole mapping procedure into multicast tree construction and multipath description distribution. A serial of simulation experiments show that our centralized algorithm could achieve a better performance in balancing the user satisfaction and average mapping cost in comparison with its rivals.
基金Marxism Theoretical Research and Construction Project and National Social Science Fund Project “Research on Intellectual Property Protection and Innovative Development”(2016MZD022)
文摘Under the guidance of “technical value theory” taking both the natural and social attributes of technique into consideration, the fault in indirect infringements of copyrights by video sharing websites includes two forms:“intention” and “negligence”. As an objective criterion of negligence identification, the duty of care, is the natural extension of the security obligation in cyberspace;for a video sharing website, the foreseeable obligation of infringements is the main content of the duty of care;and it is highlighted that the degree of the duty of care hinges on different factors. For the form of liability, a video sharing website faces the difficulties of excessive costs in debt recovery after assuming the complementary liability, joint and several liability is thus alienated as an aggravating responsibility. However, according to the causative potency between the fault of a video sharing website and the infringement results, the several/shared liability can avoid the overburden to a video sharing website and distribute the risks of inadequate compensation based on the principle of fairness.
基金National Natural Science Foundations of China (No. 60972035,No. 61074009)Natural Science Foundation Program of Shanghai,China ( No. 10ZR1432800)
文摘For rate control (RC) of hierarchical structure coding, an independent rate-quantization (R-Q) model was proposed based on mean absolute differences (MADs) in different temporal levels (TLs). In the proposed R-Q model, a novel MAD model was developed according to the hierarchical structure. The experimental results demonstrate that the proposed algorithm provides better performance, in terms of average peak signal-to-noise ratio (PSNR) and quality smoothness, than the H.264 reference model, JM14.2, under various sequences.
文摘BACKGROUND Video capsule endoscopy(VCE)is a noninvasive technique used to examine small bowel abnormalities in both adults and children.However,manual review of VCE images is time-consuming and labor-intensive,making it crucial to develop deep learning methods to assist in image analysis.AIM To employ deep learning models for the automatic classification of small bowel lesions using pediatric VCE images.METHODS We retrospectively analyzed VCE images from 162 pediatric patients who underwent VCE between January 2021 and December 2023 at the Children's Hospital of Nanjing Medical University.A total of 2298 high-resolution images were extracted,including normal mucosa and lesions(erosions/erythema,ulcers,and polyps).The images were split into training and test datasets in a 4:1 ratio.Four deep learning models:DenseNet121,Visual geometry group-16,ResNet50,and vision transformer were trained using 5-fold cross-validation,with hyperparameters adjusted for optimal classification performance.The models were evaluated based on accuracy,precision,recall,F1-score,and area under the receiver operating curve(AU-ROC).Lesion visualization was performed using gradient-weighted class activation mapping.RESULTS Abdominal pain was the most common indication for VCE,accounting for 62%of cases,followed by diarrhea,vomiting,and gastrointestinal bleeding.Abnormal lesions were detected in 93 children,with 38 diagnosed with inflammatory bowel disease.Among the deep learning models,DenseNet121 and ResNet50 demonstrated excellent classification performance,achieving accuracies of 90.6%[95%confidence interval(CI):89.2-92.0]and 90.5%(95%CI:89.9-91.2),respectively.The AU-ROC values for these models were 93.7%(95%CI:92.9-94.5)for DenseNet121 and 93.4%(95%CI:93.1-93.8)for ResNet50.CONCLUSION Our deep learning-based diagnostic tool developed in this study effectively classified lesions in pediatric VCE images,contributing to more accurate diagnoses and increased diagnostic efficiency.
文摘Short videos on social media have rapidly emerged as a powerful marketing tool for shaping consumer behavior.This comparative study investigates the impact of short videos on the purchasing behavior of young consumers(aged 18-35)in Hanoi and Taipei,China.Quantitative methods,including surveys,and experimental design,were employed in both cities,with a sample size of 200 respondents per location.Key influencing factors-including video content,product information,celebrity endorsement,viewer interaction,and perceived value-were systematically analyzed.The findings highlight both commonalities and contextual differences in how short videos influence purchasing behavior.This study offers practical implications for businesses and marketers targeting young consumers in Vietnam and Taiwan,China.
基金supported by grants from the China Association for Science and Technology(KXYJS2024012)the Gates Foundation(INV-006373 and INV-023808)+1 种基金Capital’s Funds for Health Improvement and Research(2024-2-30117)Beijing Municipal Health Commission’s Funds for the Highqualified Public Health Professionals Development Project(Discipline Core-03-36)。
文摘In 2022,cervical cancer accounted for approximately 662,301 new cases worldwide,representing 6.9%of all cancers diagnosed in women.Furthermore,it was the fourth leading cause of cancer-related deaths among women~1.In China,human papillomavirus(HPV)vaccination is not included in the National Immunization Program,thus creating marked urban±rural disparities:only 5.7%of rural children are vaccinated~2.Local publicly funded initiatives have increased vaccination uptake in some cities(e.g.,Shenzhen pilot;Jinan first-dose coverage>90%among eligible girls)~(3,4).
文摘Herein,we describe a case of robotic duodenum-preserving pancreatic head resection(DPPHR)performed on a 21-monthold male infant(weight:13 kg;body mass index:18.87 kg/m^(2))with focal nesidioblastosis,expanding the scope of minimally invasive pediatric hepatobiliary surgery(Fig.1;Video S1).Preoperative positron emission tomography-computed tomography revealed a 23×13 mm neuroendocrine lesion in the pancreatic head,which caused refractory hypoglycemia and necessitated surgical intervention.
文摘In view of the fact that the current high efficiency video coding standard does not consider the characteristics of human vision, this paper proposes a perceptual video coding algorithm based on the just noticeable distortion model (JND). The adjusted JND model is combined into the transformation quantization process in high efficiency video coding (HEVC) to remove more visual redundancy and maintain compatibility. First of all, we design the JND model based on pixel domain and transform domain respectively, and the pixel domain model can give the JND threshold more intuitively on the pixel. The transform domain model introduces the contrast sensitive function into the model, making the threshold estimation more precise. Secondly, the proposed JND model is embedded in the HEVC video coding framework. For the transformation skip mode (TSM) in HEVC, we adopt the existing pixel domain called nonlinear additively model (NAMM). For the non-transformation skip mode (non-TSM) in HEVC, we use transform domain JND model to further reduce visual redundancy. The simulation results show that in the case of the same visual subjective quality, the algorithm can save more bitrates.
文摘Although compressive measurements save data storage and bandwidth usage, they are difficult to be used directly for target tracking and classification without pixel reconstruction. This is because the Gaussian random matrix destroys the target location information in the original video frames. This paper summarizes our research effort on target tracking and classification directly in the compressive measurement domain. We focus on one particular type of compressive measurement using pixel subsampling. That is, original pixels in video frames are randomly subsampled. Even in such a special compressive sensing setting, conventional trackers do not work in a satisfactory manner. We propose a deep learning approach that integrates YOLO (You Only Look Once) and ResNet (residual network) for multiple target tracking and classification. YOLO is used for multiple target tracking and ResNet is for target classification. Extensive experiments using short wave infrared (SWIR), mid-wave infrared (MWIR), and long-wave infrared (LWIR) videos demonstrated the efficacy of the proposed approach even though the training data are very scarce.
基金Project (No. DAG05/06.EG05) supported by the Research GrantCouncil (RGC) of Hong Kong, China
文摘We are interested in providing Video-on-Demand (VoD) streaming service to a large population of clients using peer-to-peer (P2P) approach. Given the asynchronous demands from multiple clients, continuously changing of the buffered contents, and the continuous video display requirement, how to collaborate with potential partners to get expected data for future content delivery are very important and challenging. In this paper, we develop a novel scheduling algorithm based on deadline- aware network coding (DNC) to fully exploit the network resource for efficient VoD service. DNC generalizes the existing net- work coding (NC) paradigm, an elegant solution for ubiquitous data distribution. Yet, with deadline awareness, DNC improves the network throughput and meanwhile avoid missing the play deadline in high probability, which is a major deficiency of the con- ventional NC. Extensive simulation results demonstrated that DNC achieves high streaming continuity even in tight network conditions.
基金Project supported by the Croucher Foundation Fellowship fromHong Kong, China
文摘Real-time video transport over wireless Internet faces many challenges due to the heterogeneous environment including wireline and wireless networks. A robust network condition classification algorithm using multiple end-to-end metrics and Support Vector Machine (SVM) is proposed to classify different network events and model the transition pattern of network conditions. End-to-end Quality-of-Service (QoS) mechanisms like congestion control, error control, and power control can benefit from the network condition information and react to different network situations appropriately. The proposed network condition classifica- tion algorithm uses SVM as a classifier to cluster different end-to-end metrics such as end-to-end delay, delay jitter, throughput and packet loss-rate for the UDP traffic with TCP-friendly Rate Control (TFRC), which is used for video transport. The algorithm is also flexible for classifying different numbers of states representing different levels of network events such as wireline congestion and wireless channel loss. Simulation results using network simulator 2 (ns2) showed the effectiveness of the proposed scheme.
文摘A P2P approaches to extend the ability of Video on Demand systems to serve more users. In the proposed system users share with each other the media data obtained and the media server is no longer the only source to get data from, thereby, the load on the media server could be greatly alleviated and the overall system capacity increases and more users could be served. The P2P streaming system introduces efficient searching;data transfer dynamically monitoring and initial buffering to maintain a high quality of playback. Its provider selection policy helps to reduce the load of the underlying network by avoiding remote data transfer.
文摘Semantic segmentation is a core task in computer vision that allows AI models to interact and understand their surrounding environment. Similarly to how humans subconsciously segment scenes, this ability is crucial for scene understanding. However, a challenge many semantic learning models face is the lack of data. Existing video datasets are limited to short, low-resolution videos that are not representative of real-world examples. Thus, one of our key contributions is a customized semantic segmentation version of the Walking Tours Dataset that features hour-long, high-resolution, real-world data from tours of different cities. Additionally, we evaluate the performance of open-vocabulary, semantic model OpenSeeD on our own custom dataset and discuss future implications.
基金National Natural Science Foundation of China(No.61573095)Natural Science Foundation of Shanghai,China(No.6ZR1446700)
文摘The devastating effects of wildland fire are an unsolved problem,resulting in human losses and the destruction of natural and economic resources.Convolutional neural network(CNN)is shown to perform very well in the area of object classification.This network has the ability to perform feature extraction and classification within the same architecture.In this paper,we propose a CNN for identifying fire in videos.A deep domain based method for video fire detection is proposed to extract a powerful feature representation of fire.Testing on real video sequences,the proposed approach achieves better classification performance as some of relevant conventional video based fire detection methods and indicates that using CNN to detect fire in videos is efficient.To balance the efficiency and accuracy,the model is fine-tuned considering the nature of the target problem and fire data.Experimental results on benchmark fire datasets reveal the effectiveness of the proposed framework and validate its suitability for fire detection in closed-circuit television surveillance systems compared to state-of-the-art methods.