期刊文献+
共找到2篇文章
< 1 >
每页显示 20 50 100
DenseCL:A simple framework for self-supervised dense visual pre-training 被引量:1
1
作者 Xinlong Wang Rufeng Zhang +1 位作者 Chunhua Shen Tao Kong 《Visual Informatics》 EI 2023年第1期30-40,共11页
Self-supervised learning aims to learn a universal feature representation without labels.To date,most existing self-supervised learning methods are designed and optimized for image classification.These pre-trained mod... Self-supervised learning aims to learn a universal feature representation without labels.To date,most existing self-supervised learning methods are designed and optimized for image classification.These pre-trained models can be sub-optimal for dense prediction tasks due to the discrepancy between image-level prediction and pixel-level prediction.To fill this gap,we aim to design an effective,dense self-supervised learning framework that directly works at the level of pixels(or local features)by taking into account the correspondence between local features.Specifically,we present dense contrastive learning(DenseCL),which implements self-supervised learning by optimizing a pairwise contrastive(dis)similarity loss at the pixel level between two views of input images.Compared to the supervised ImageNet pre-training and other self-supervised learning methods,our self-supervised DenseCL pretraining demonstrates consistently superior performance when transferring to downstream dense prediction tasks including object detection,semantic segmentation and instance segmentation.Specifically,our approach significantly outperforms the strong MoCo-v2 by 2.0%AP on PASCAL VOC object detection,1.1%AP on COCO object detection,0.9%AP on COCO instance segmentation,3.0%mIoU on PASCAL VOC semantic segmentation and 1.8%mIoU on Cityscapes semantic segmentation.The improvements are up to 3.5%AP and 8.8%mIoU over MoCo-v2,and 6.1%AP and 6.1%mIoU over supervised counterpart with frozen-backbone evaluation protocol. 展开更多
关键词 Self-supervised learning Visual pre-training Dense prediction tasks
原文传递
Dance2MIDI:Dance-driven multi-instrument music generation
2
作者 Bo Han Yuheng Li +2 位作者 Yixuan Shen Yi Ren Feilin Han 《Computational Visual Media》 SCIE EI CSCD 2024年第4期791-802,共12页
Dance-driven music generation aims to generate musical pieces conditioned on dance videos.Previous works focus on monophonic or raw audio generation,while the multi-instrument scenario is under-explored.The challenges... Dance-driven music generation aims to generate musical pieces conditioned on dance videos.Previous works focus on monophonic or raw audio generation,while the multi-instrument scenario is under-explored.The challenges associated with dancedriven multi-instrument music(MIDI)generation are twofold:(i)lack of a publicly available multi-instrument MIDI and video paired dataset and(ii)the weak correlation between music and video.To tackle these challenges,we have built the first multi-instrument MIDI and dance paired dataset(D2MIDI).Based on this dataset,we introduce a multi-instrument MIDI generation framework(Dance2MIDI)conditioned on dance video.Firstly,to capture the relationship between dance and music,we employ a graph convolutional network to encode the dance motion.This allows us to extract features related to dance movement and dance style.Secondly,to generate a harmonious rhythm,we utilize a transformer model to decode the drum track sequence,leveraging a cross-attention mechanism.Thirdly,we model the task of generating the remaining tracks based on the drum track as a sequence understanding and completion task.A BERTlike model is employed to comprehend the context of the entire music piece through self-supervised learning.We evaluate the music generated by our framework trained on the D2MIDI dataset and demonstrate that our method achieves state-of-the-art performance. 展开更多
关键词 video understanding music generation symbolic music cross-modal learning self-supervision
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部