With the rising ubiquity of digital touch devices and sketch-based interfaces,freehand sketching has become an essential mode of visual communication.Nevertheless,interpreting these often ambiguous and sparse sketches...With the rising ubiquity of digital touch devices and sketch-based interfaces,freehand sketching has become an essential mode of visual communication.Nevertheless,interpreting these often ambiguous and sparse sketches poses challenges for computers.This paper presents Sketchformer++,a hierarchical transformer architecture for the neural representation of vector sketches.It treats a vector sketch as a three-level structure,at sketch level,stroke level,and segment level.Three self-attention modules are adopted in the network architecture,corresponding to the sketch hierarchy.The semantics of sketches are aggregated from local to global levels,resulting in neural representations of sketches.Extensive experiments show that Sketchformer++helps to achieve superior performance in various downstream tasks,including sketch reconstruction,sketch recog-nition,sketch semantic segmentation,and sketch retrieval,demonstrating its robustness and effectiveness as a means of sketch representation.Code is available at https://github.com/BHR7/SketchformerPlus.展开更多
基金supported in part by the National Natural Science Foundation of China(62472287,62072316,62172363,U21B2023)Natural Science Foundation of Shenzhen City(JCYJ20250604181519025)+2 种基金Department of Education of Guangdong Province Innovation Team(2022KCXTD025)Shenzhen Science and Technology Program(KQTD20210811090044003)Guangdong Laboratory of Artificial Intelligence and Digital Economy(SZ).
文摘With the rising ubiquity of digital touch devices and sketch-based interfaces,freehand sketching has become an essential mode of visual communication.Nevertheless,interpreting these often ambiguous and sparse sketches poses challenges for computers.This paper presents Sketchformer++,a hierarchical transformer architecture for the neural representation of vector sketches.It treats a vector sketch as a three-level structure,at sketch level,stroke level,and segment level.Three self-attention modules are adopted in the network architecture,corresponding to the sketch hierarchy.The semantics of sketches are aggregated from local to global levels,resulting in neural representations of sketches.Extensive experiments show that Sketchformer++helps to achieve superior performance in various downstream tasks,including sketch reconstruction,sketch recog-nition,sketch semantic segmentation,and sketch retrieval,demonstrating its robustness and effectiveness as a means of sketch representation.Code is available at https://github.com/BHR7/SketchformerPlus.