Siamese tracking algorithms usually take convolutional neural networks(CNNs)as feature extractors owing to their capability of extracting deep discriminative features.However,the convolution kernels in CNNs have limit...Siamese tracking algorithms usually take convolutional neural networks(CNNs)as feature extractors owing to their capability of extracting deep discriminative features.However,the convolution kernels in CNNs have limited receptive fields,making it difficult to capture global feature dependencies which is important for object detection,especially when the target undergoes large-scale variations or movement.In view of this,we develop a novel network called effective convolution mixed Transformer Siamese network(SiamCMT)for visual tracking,which integrates CNN-based and Transformer-based architectures to capture both local information and long-range dependencies.Specifically,we design a Transformer-based module named lightweight multi-head attention(LWMHA)which can be flexibly embedded into stage-wise CNNs and improve the network’s representation ability.Additionally,we introduce a stage-wise feature aggregation mechanism which integrates features learned from multiple stages.By leveraging both location and semantic information,this mechanism helps the SiamCMT to better locate and find the target.Moreover,to distinguish the contribution of different channels,a channel-wise attention mechanism is introduced to enhance the important channels and suppress the others.Extensive experiments on seven challenging benchmarks,i.e.,OTB2015,UAV123,GOT10K,LaSOT,DTB70,UAVTrack112_L,and VOT2018,demonstrate the effectiveness of the proposed algorithm.Specially,the proposed method outperforms the baseline by 3.5%and 3.1%in terms of precision and success rates with a real-time speed of 59.77 FPS on UAV123.展开更多
Knowledge graphs convey precise semantic information that can be effectively interpreted by neural networks,and generating descriptive text based on these graphs places significant emphasis on content consistency.Howe...Knowledge graphs convey precise semantic information that can be effectively interpreted by neural networks,and generating descriptive text based on these graphs places significant emphasis on content consistency.However,knowledge graphs are inadequate for providing additional linguistic features such as paragraph structure and expressive modes,making it challenging to ensure content coherence in generating text that spans multiple sentences.This lack of coherence can further compromise the overall consistency of the content within a paragraph.In this work,we present the generation of scientific abstracts by leveraging knowledge graphs,with a focus on enhancing both content consistency and coherence.In particular,we construct the ACL Abstract Graph Dataset(ACL-AGD)which pairs knowledge graphs with text,incorporating sentence labels to guide text structure and diverse expressions.We then implement a Siamese network to complement and concretize the entities and relations based on paragraph structure by accomplishing two tasks:graph-to-text generation and entity alignment.Extensive experiments demonstrate that the logical paragraphs generated by our method exhibit entities with a uniform position distribution and appropriate frequency.In terms of content,our method accurately represents the information encoded in the knowledge graph,prevents the generation of irrelevant content,and achieves coherent and non-redundant adjacent sentences,even with a shared knowledge graph.展开更多
This paper proposes a new approach to counter cyberattacks using the increasingly diverse malware in cyber security.Traditional signature detection methods that utilize static and dynamic features face limitations due...This paper proposes a new approach to counter cyberattacks using the increasingly diverse malware in cyber security.Traditional signature detection methods that utilize static and dynamic features face limitations due to the continuous evolution and diversity of new malware.Recently,machine learning-based malware detection techniques,such as Convolutional Neural Networks(CNN)and Recurrent Neural Networks(RNN),have gained attention.While these methods demonstrate high performance by leveraging static and dynamic features,they are limited in detecting new malware or variants because they learn based on the characteristics of existing malware.To overcome these limitations,malware detection techniques employing One-Shot Learning and Few-Shot Learning have been introduced.Based on this,the Siamese Network,which can effectively learn from a small number of samples and perform predictions based on similarity rather than learning the characteristics of the input data,enables the detection of new malware or variants.We propose a dual Siamese network-based detection framework that utilizes byte images converted frommalware binary data to grayscale,and opcode frequency-based images generated after extracting opcodes and converting them into 2-gramfrequencies.The proposed framework integrates two independent Siamese network models,one learning from byte images and the other from opcode frequency-based images.The detection models trained on the different kinds of images generated separately apply the L1 distancemeasure to the output vectors themodels generate,calculate the similarity,and then apply different weights to each model.Our proposed framework achieved a malware detection accuracy of 95.9%and 99.83%in the experimentsusingdifferentmalware datasets.The experimental resultsdemonstrate that ourmalware detection model can effectively detect malware by utilizing two different types of features and employing the dual Siamese network-based model.展开更多
Target tracking has a wide range of applications in intelligent transportation,real‐time monitoring,human‐computer interaction and other aspects.However,in the tracking process,the target is prone to deformation,occ...Target tracking has a wide range of applications in intelligent transportation,real‐time monitoring,human‐computer interaction and other aspects.However,in the tracking process,the target is prone to deformation,occlusion,loss,scale variation,background clutter,illumination variation,etc.,which bring great challenges to realize accurate and real‐time tracking.Tracking based on Siamese networks promotes the application of deep learning in the field of target tracking,ensuring both accuracy and real‐time performance.However,due to its offline training,it is difficult to deal with the fast motion,serious occlusion,loss and deformation of the target during tracking.Therefore,it is very helpful to improve the performance of the Siamese networks by learning new features of the target quickly and updating the target position in time online.The broad learning system(BLS)has a simple network structure,high learning efficiency,and strong feature learning ability.Aiming at the problems of Siamese networks and the characteristics of BLS,a target tracking method based on BLS is proposed.The method combines offline training with fast online learning of new features,which not only adopts the powerful feature representation ability of deep learning,but also skillfully uses the BLS for re‐learning and re‐detection.The broad re‐learning information is used for re‐detection when the target tracking appears serious occlusion and so on,so as to change the selection of the Siamese networks search area,solve the problem that the search range cannot meet the fast motion of the target,and improve the adaptability.Experimental results show that the proposed method achieves good results on three challenging datasets and improves the performance of the basic algorithm in difficult scenarios.展开更多
In recent years,with the development of the natural language processing(NLP)technologies,security analyst began to use NLP directly on assembly codes which were disassembled from binary executables in order to examine...In recent years,with the development of the natural language processing(NLP)technologies,security analyst began to use NLP directly on assembly codes which were disassembled from binary executables in order to examine binary similarity,achieved great progress.However,we found that the existing frameworks often ignored the complex internal structure of instructions and didn’t fully consider the long-term dependencies of instructions.In this paper,we propose firmVulSeeker—a vulnerability search tool for embedded firmware images,based on BERT and Siamese network.It first builds a BERT MLM task to observe and learn the semantics of different instructions in their context in a very large unlabeled binary corpus.Then,a finetune mode based on Siamese network is constructed to guide training and matching semantically similar functions using the knowledge learned from the first stage.Finally,it will use a function embedding generated from the fine-tuned model to search in the targeted corpus and find the most similar function which will be confirmed whether it’s a real vulnerability manually.We evaluate the accuracy,robustness,scalability and vulnerability search capability of firmVulSeeker.Results show that it can greatly improve the accuracy of matching semantically similar functions,and can successfully find more real vulnerabilities in real-world firmware than other tools.展开更多
Seal authentication is an important task for verifying the authenticity of stamped seals used in various domains to protect legal documents from tampering and counterfeiting.Stamped seal inspection is commonly audited...Seal authentication is an important task for verifying the authenticity of stamped seals used in various domains to protect legal documents from tampering and counterfeiting.Stamped seal inspection is commonly audited manually to ensure document authenticity.However,manual assessment of seal images is tedious and laborintensive due to human errors,inconsistent placement,and completeness of the seal.Traditional image recognition systems are inadequate enough to identify seal types accurately,necessitating a neural network-based method for seal image recognition.However,neural network-based classification algorithms,such as Residual Networks(ResNet)andVisualGeometryGroup with 16 layers(VGG16)yield suboptimal recognition rates on stamp datasets.Additionally,the fixed training data categories make handling new categories to be a challenging task.This paper proposes amulti-stage seal recognition algorithmbased on Siamese network to overcome these limitations.Firstly,the seal image is pre-processed by applying an image rotation correction module based on Histogram of Oriented Gradients(HOG).Secondly,the similarity between input seal image pairs is measured by utilizing a similarity comparison module based on the Siamese network.Finally,we compare the results with the pre-stored standard seal template images in the database to obtain the seal type.To evaluate the performance of the proposed method,we further create a new seal image dataset that contains two subsets with 210,000 valid labeled pairs in total.The proposed work has a practical significance in industries where automatic seal authentication is essential as in legal,financial,and governmental sectors,where automatic seal recognition can enhance document security and streamline validation processes.Furthermore,the experimental results show that the proposed multi-stage method for seal image recognition outperforms state-of-the-art methods on the two established datasets.展开更多
Recent change detection(CD)methods focus on the extraction of deep change semantic features.However,existing methods overlook the fine-grained features and have the poor ability to capture long-range space–time infor...Recent change detection(CD)methods focus on the extraction of deep change semantic features.However,existing methods overlook the fine-grained features and have the poor ability to capture long-range space–time information,which leads to the micro changes missing and the edges of change types smoothing.In this paper,a potential transformer-based semantic change detection(SCD)model,Pyramid-SCDFormer is proposed,which precisely recognizes the small changes and fine edges details of the changes.The SCD model selectively merges different semantic tokens in multi-head self-attention block to obtain multiscale features,which is crucial for extraction information of remote sensing images(RSIs)with multiple changes from different scales.Moreover,we create a well-annotated SCD dataset,Landsat-SCD with unprecedented time series and change types in complex scenarios.Comparing with three Convolutional Neural Network-based,one attention-based,and two transformer-based networks,experimental results demonstrate that the Pyramid-SCDFormer stably outperforms the existing state-of-the-art CD models and obtains an improvement in MIoU/F1 of 1.11/0.76%,0.57/0.50%,and 8.75/8.59%on the LEVIR-CD,WHU_CD,and Landsat-SCD dataset respectively.For change classes proportion less than 1%,the proposed model improves the MIoU by 7.17–19.53%on Landsat-SCD dataset.The recognition performance for small-scale and fine edges of change types has greatly improved.展开更多
Single object tracking based on deep learning has achieved the advanced performance in many applications of computer vision.However,the existing trackers have certain limitations owing to deformation,occlusion,movemen...Single object tracking based on deep learning has achieved the advanced performance in many applications of computer vision.However,the existing trackers have certain limitations owing to deformation,occlusion,movement and some other conditions.We propose a siamese attentional dense network called SiamADN in an end-to-end offline manner,especially aiming at unmanned aerial vehicle(UAV)tracking.First,it applies a dense network to reduce vanishing-gradient,which strengthens the features transfer.Second,the channel attention mechanism is involved into the Densenet structure,in order to focus on the possible key regions.The advance corner detection network is introduced to improve the following tracking process.Extensive experiments are carried out on four mainly tracking benchmarks as OTB-2015,UAV123,LaSOT and VOT.The accuracy rate on UAV123 is 78.9%,and the running speed is 32 frame per second(FPS),which demonstrates its efficiency in the practical real application.展开更多
Onboard visual object tracking in unmanned aerial vehicles(UAVs)has attractedmuch interest due to its versatility.Meanwhile,due to high precision,Siamese networks are becoming hot spots in visual object tracking.Howev...Onboard visual object tracking in unmanned aerial vehicles(UAVs)has attractedmuch interest due to its versatility.Meanwhile,due to high precision,Siamese networks are becoming hot spots in visual object tracking.However,most Siamese trackers fail to balance the tracking accuracy and time within onboard limited computational resources of UAVs.To meet the tracking precision and real-time requirements,this paper proposes a Siamese dense pixel-level network for UAV object tracking named SiamDPL.Specifically,the Siamese network extracts features of the search region and the template region through a parameter-shared backbone network,then performs correlationmatching to obtain the candidate regionwith high similarity.To improve the matching effect of template and search features,this paper designs a dense pixel-level feature fusion module to enhance the matching ability by pixel-wise correlation and enrich the feature diversity by dense connection.An attention module composed of self-attention and channel attention is introduced to learn global context information and selectively emphasize the target feature region in the spatial and channel dimensions.In addition,a target localization module is designed to improve target location accuracy.Compared with other advanced trackers,experiments on two public benchmarks,which are UAV123@10fps and UAV20L fromthe unmanned air vehicle123(UAV123)dataset,show that SiamDPL can achieve superior performance and low complexity with a running speed of 100.1 fps on NVIDIA TITAN RTX.展开更多
Feed intake is an important indicator to reflect the production performance and disease risk of dairy cows,which can also evaluate the utilization rate of pasture feed.To achieve an automatic and non-contact measureme...Feed intake is an important indicator to reflect the production performance and disease risk of dairy cows,which can also evaluate the utilization rate of pasture feed.To achieve an automatic and non-contact measurement of feed intake,this paper proposes a method for measuring the feed intake of cows based on computer vision technology with a Siamese network and depth images.An automated data acquisition system was first designed to collect depth images of feed piles and constructed a dataset with 24150 samples.A deep learning model based on the Siamese network was then constructed to implement non-contact measurement of feed intake for dairy cows by training with collected data.The experimental results show that the mean absolute error(MAE)and the root mean square error(RMSE)of this method are 0.100 kg and 0.128 kg in the range of 0-8.2 kg respectively,which outperformed existing works.This work provides a new idea and technology for the intelligent measuring of dairy cow feed intake.展开更多
Label assignment refers to determining positive/negative labels foreach sample to supervise the training process. Existing Siamese-based trackersprimarily use fixed label assignment strategies according to human prior...Label assignment refers to determining positive/negative labels foreach sample to supervise the training process. Existing Siamese-based trackersprimarily use fixed label assignment strategies according to human priorknowledge;thus, they can be sensitive to predefined hyperparameters and failto fit the spatial and scale variations of samples. In this study, we first developa novel dynamic label assignment (DLA) module to handle the diverse datadistributions and adaptively distinguish the foreground from the backgroundbased on the statistical characteristics of the target in visual object tracking.The core of DLA module is a two-step selection mechanism. The first stepselects candidate samples according to the Euclidean distance between trainingsamples and ground truth, and the second step selects positive/negativesamples based on the mean and standard deviation of candidate samples.The proposed approach is general-purpose and can be easily integrated intoanchor-based and anchor-free trackers for optimal sample-label matching.According to extensive experimental findings, Siamese-based trackers withDLA modules can refine target locations and outperformbaseline trackers onOTB100, VOT2019, UAV123 and LaSOT. Particularly, DLA-SiamRPN++improves SiamRPN++ by 1% AUC and DLA-SiamCAR improves Siam-CAR by 2.5% AUC on OTB100. Furthermore, hyper-parameters analysisexperiments show that DLA module hardly increases spatio-temporal complexity,the proposed approach maintains the same speed as the originaltracker without additional overhead.展开更多
This paper presents an off-line handwritten signature verification system based on the Siamese network,where a hybrid architecture is used.The Residual neural Network(ResNet)is used to realize a powerful feature extra...This paper presents an off-line handwritten signature verification system based on the Siamese network,where a hybrid architecture is used.The Residual neural Network(ResNet)is used to realize a powerful feature extraction model such that Writer Independent(WI)features can be effectively learned.A single-layer Siamese Neural Network(NN)is used to realize a Writer Dependent(WD)classifier such that the storage space can be minimized.For the purpose of reducing the impact of the high intraclass variability of the signature and ensuring that the Siamese network can learn more effectively,we propose a method of selecting a reference signature as one of the inputs for the Siamese network.To take full advantage of the reference signature,we modify the conventional contrastive loss function to enhance the accuracy.By using the proposed techniques,the accuracy of the system can be increased by 5.9%.Based on the GPDS signature dataset,the proposed system is able to achieve an accuracy of 94.61%which is better than the accuracy achieved by the current state-of-the-art work.展开更多
Palmprint identification has been conducted over the last two decades in many biometric systems.High-dimensional data with many uncorrelated and duplicated features remains difficult due to several computational compl...Palmprint identification has been conducted over the last two decades in many biometric systems.High-dimensional data with many uncorrelated and duplicated features remains difficult due to several computational complexity issues.This paper presents an interactive authentication approach based on deep learning and feature selection that supports Palmprint authentication.The proposed model has two stages of learning;the first stage is to transfer pre-trained VGG-16 of ImageNet to specific features based on the extraction model.The second stage involves the VGG-16 Palmprint feature extraction in the Siamese network to learn Palmprint similarity.The proposed model achieves robust and reliable end-to-end Palmprint authentication by extracting the convolutional features using VGG-16 Palmprint and the similarity of two input Palmprint using the Siamese network.The second stage uses the CASIA dataset to train and test the Siamese network.The suggested model outperforms comparable studies based on the deep learning approach achieving accuracy and EER of 91.8%and 0.082%,respectively,on the CASIA left-hand images and accuracy and EER of 91.7%and 0.084,respectively,on the CASIA right-hand images.展开更多
In response to the problem of traditional methods ignoring audio modality tampering, this study aims to explore an effective deep forgery video detection technique that improves detection precision and reliability by ...In response to the problem of traditional methods ignoring audio modality tampering, this study aims to explore an effective deep forgery video detection technique that improves detection precision and reliability by fusing lip images and audio signals. The main method used is lip-audio matching detection technology based on the Siamese neural network, combined with MFCC (Mel Frequency Cepstrum Coefficient) feature extraction of band-pass filters, an improved dual-branch Siamese network structure, and a two-stream network structure design. Firstly, the video stream is preprocessed to extract lip images, and the audio stream is preprocessed to extract MFCC features. Then, these features are processed separately through the two branches of the Siamese network. Finally, the model is trained and optimized through fully connected layers and loss functions. The experimental results show that the testing accuracy of the model in this study on the LRW (Lip Reading in the Wild) dataset reaches 92.3%;the recall rate is 94.3%;the F1 score is 93.3%, significantly better than the results of CNN (Convolutional Neural Networks) and LSTM (Long Short-Term Memory) models. In the validation of multi-resolution image streams, the highest accuracy of dual-resolution image streams reaches 94%. Band-pass filters can effectively improve the signal-to-noise ratio of deep forgery video detection when processing different types of audio signals. The real-time processing performance of the model is also excellent, and it achieves an average score of up to 5 in user research. These data demonstrate that the method proposed in this study can effectively fuse visual and audio information in deep forgery video detection, accurately identify inconsistencies between video and audio, and thus verify the effectiveness of lip-audio modality fusion technology in improving detection performance.展开更多
A person’s eye gaze can effectively express that person’s intentions.Thus,gaze estimation is an important approach in intelligent manufacturing to analyze a person’s intentions.Many gaze estimation methods regress ...A person’s eye gaze can effectively express that person’s intentions.Thus,gaze estimation is an important approach in intelligent manufacturing to analyze a person’s intentions.Many gaze estimation methods regress the direction of the gaze by analyzing images of the eyes,also known as eye patches.However,it is very difficult to construct a person-independent model that can estimate an accurate gaze direction for every person due to individual differences.In this paper,we hypothesize that the difference in the appearance of each of a person’s eyes is related to the difference in the corresponding gaze directions.Based on this hypothesis,a differential eyes’appearances network(DEANet)is trained on public datasets to predict the gaze differences of pairwise eye patches belonging to the same individual.Our proposed DEANet is based on a Siamese neural network(SNNet)framework which has two identical branches.A multi-stream architecture is fed into each branch of the SNNet.Both branches of the DEANet that share the same weights extract the features of the patches;then the features are concatenated to obtain the difference of the gaze directions.Once the differential gaze model is trained,a new person’s gaze direction can be estimated when a few calibrated eye patches for that person are provided.Because personspecific calibrated eye patches are involved in the testing stage,the estimation accuracy is improved.Furthermore,the problem of requiring a large amount of data when training a person-specific model is effectively avoided.A reference grid strategy is also proposed in order to select a few references as some of the DEANet’s inputs directly based on the estimation values,further thereby improving the estimation accuracy.Experiments on public datasets show that our proposed approach outperforms the state-of-theart methods.展开更多
Evaluation of calligraphic copy is the core of Chinese calligraphy appreciation and in-heritance.However,previous aesthetic evaluation studies often focussed on photos and paintings,with few attempts on Chinese callig...Evaluation of calligraphic copy is the core of Chinese calligraphy appreciation and in-heritance.However,previous aesthetic evaluation studies often focussed on photos and paintings,with few attempts on Chinese calligraphy.To solve this problem,a Siamese regression aesthetic fusion method is proposed,named SRAFE,for Chinese calligraphy based on the combination of calligraphy aesthetics and deep learning.First,a dataset termed Evaluated Chinese Calligraphy Copies(E3C)is constructed for aesthetic evalu-ation.Second,12 hand‐crafted aesthetic features based on the shape,structure,and stroke of calligraphy are designed.Then,the Siamese regression network(SRN)is designed to extract the deep aesthetic representation of calligraphy.Finally,the SRAFE method is built by fusing the deep aesthetic features with the hand‐crafted aesthetic features.Experimental results show that scores given by SRAFE are similar to the aesthetic evaluation label of E3C,proving the effectiveness of the authors’method.展开更多
Neuropsychological tests,such as the Rey-Osterrieth complex figure(ROCF)test,help detect mild cognitive impairment(MCI)in adults by assessing cognitive abilities such as planning,organization,and memory.Furthermore,th...Neuropsychological tests,such as the Rey-Osterrieth complex figure(ROCF)test,help detect mild cognitive impairment(MCI)in adults by assessing cognitive abilities such as planning,organization,and memory.Furthermore,they are inexpensive and minimally invasive,making them excellent tools for early screening.In this paper,we propose the use of image analysis models to characterize the relationship between an individual’s ROCF drawing and their cognitive state.This task is usually framed as a classification problem and is solved using deep learning models,due to their success in the last decade.In order to achieve good performance,these models need to be trained with a large number of examples.Given that our data availability is limited,we alternatively treat our task as a similarity learning problem,performing pairwise ROCF drawing comparisons to define groups that represent different cognitive states.This way of working could lead to better data utilization and improved model performance.To solve the similarity learning problem,we propose a siamese neural network(SNN)that exploits the distances of arbitrary ROCF drawings to the ideal representation of the ROCF.Our proposal is compared against various deep learning models designed for classification using a public dataset of 528 ROCF copy drawings,which are associated with either healthy individuals or those with MCI.Quantitative results are derived from a scheme involving multiple rounds of evaluation,employing both a dedicated test set and 14-fold cross-validation.Our SNN proposal demonstrates superiority in validation performance,and test results comparable to those of the classification-based deep learning models.展开更多
Colorectal cancer,a malignant lesion of the intestines,significantly affects human health and life,emphasizing the necessity of early detection and treatment.Accurate segmentation of colorectal cancer regions directly...Colorectal cancer,a malignant lesion of the intestines,significantly affects human health and life,emphasizing the necessity of early detection and treatment.Accurate segmentation of colorectal cancer regions directly impacts subsequent staging,treatment methods,and prognostic outcomes.While colonoscopy is an effective method for detecting colorectal cancer,its data collection approach can cause patient discomfort.To address this,current research utilizes Computed Tomography(CT)imaging;however,conventional CT images only capture transient states,lacking sufficient representational capability to precisely locate colorectal cancer.This study utilizes enhanced CT images,constructing a deep feature network from the arterial,portal venous,and delay phases to simulate the physician’s diagnostic process and achieve accurate cancer segmentation.The innovations include:1)Utilizing portal venous phase CT images to introduce a context-aware multi-scale aggregation module for preliminary shape extraction of colorectal cancer.2)Building an image sequence based on arterial and delay phases,transforming the cancer segmentation issue into an anomaly detection problem,establishing a pixel-pairing strategy,and proposing a colorectal cancer segmentation algorithm using a Siamese network.Experiments with 84 clinical cases of colorectal cancer enhanced CT data demonstrated an Area Overlap Measure of 0.90,significantly better than Fully Convolutional Networks(FCNs)at 0.20.Future research will explore the relationship between conventional and enhanced CT to further reduce segmentation time and improve accuracy.展开更多
基金supported by the National Natural Science Foundation of China(Grant No.62033007)the Major Fundamental Research Program of Shandong Province(Grant No.ZR2023ZD37).
文摘Siamese tracking algorithms usually take convolutional neural networks(CNNs)as feature extractors owing to their capability of extracting deep discriminative features.However,the convolution kernels in CNNs have limited receptive fields,making it difficult to capture global feature dependencies which is important for object detection,especially when the target undergoes large-scale variations or movement.In view of this,we develop a novel network called effective convolution mixed Transformer Siamese network(SiamCMT)for visual tracking,which integrates CNN-based and Transformer-based architectures to capture both local information and long-range dependencies.Specifically,we design a Transformer-based module named lightweight multi-head attention(LWMHA)which can be flexibly embedded into stage-wise CNNs and improve the network’s representation ability.Additionally,we introduce a stage-wise feature aggregation mechanism which integrates features learned from multiple stages.By leveraging both location and semantic information,this mechanism helps the SiamCMT to better locate and find the target.Moreover,to distinguish the contribution of different channels,a channel-wise attention mechanism is introduced to enhance the important channels and suppress the others.Extensive experiments on seven challenging benchmarks,i.e.,OTB2015,UAV123,GOT10K,LaSOT,DTB70,UAVTrack112_L,and VOT2018,demonstrate the effectiveness of the proposed algorithm.Specially,the proposed method outperforms the baseline by 3.5%and 3.1%in terms of precision and success rates with a real-time speed of 59.77 FPS on UAV123.
文摘Knowledge graphs convey precise semantic information that can be effectively interpreted by neural networks,and generating descriptive text based on these graphs places significant emphasis on content consistency.However,knowledge graphs are inadequate for providing additional linguistic features such as paragraph structure and expressive modes,making it challenging to ensure content coherence in generating text that spans multiple sentences.This lack of coherence can further compromise the overall consistency of the content within a paragraph.In this work,we present the generation of scientific abstracts by leveraging knowledge graphs,with a focus on enhancing both content consistency and coherence.In particular,we construct the ACL Abstract Graph Dataset(ACL-AGD)which pairs knowledge graphs with text,incorporating sentence labels to guide text structure and diverse expressions.We then implement a Siamese network to complement and concretize the entities and relations based on paragraph structure by accomplishing two tasks:graph-to-text generation and entity alignment.Extensive experiments demonstrate that the logical paragraphs generated by our method exhibit entities with a uniform position distribution and appropriate frequency.In terms of content,our method accurately represents the information encoded in the knowledge graph,prevents the generation of irrelevant content,and achieves coherent and non-redundant adjacent sentences,even with a shared knowledge graph.
文摘This paper proposes a new approach to counter cyberattacks using the increasingly diverse malware in cyber security.Traditional signature detection methods that utilize static and dynamic features face limitations due to the continuous evolution and diversity of new malware.Recently,machine learning-based malware detection techniques,such as Convolutional Neural Networks(CNN)and Recurrent Neural Networks(RNN),have gained attention.While these methods demonstrate high performance by leveraging static and dynamic features,they are limited in detecting new malware or variants because they learn based on the characteristics of existing malware.To overcome these limitations,malware detection techniques employing One-Shot Learning and Few-Shot Learning have been introduced.Based on this,the Siamese Network,which can effectively learn from a small number of samples and perform predictions based on similarity rather than learning the characteristics of the input data,enables the detection of new malware or variants.We propose a dual Siamese network-based detection framework that utilizes byte images converted frommalware binary data to grayscale,and opcode frequency-based images generated after extracting opcodes and converting them into 2-gramfrequencies.The proposed framework integrates two independent Siamese network models,one learning from byte images and the other from opcode frequency-based images.The detection models trained on the different kinds of images generated separately apply the L1 distancemeasure to the output vectors themodels generate,calculate the similarity,and then apply different weights to each model.Our proposed framework achieved a malware detection accuracy of 95.9%and 99.83%in the experimentsusingdifferentmalware datasets.The experimental resultsdemonstrate that ourmalware detection model can effectively detect malware by utilizing two different types of features and employing the dual Siamese network-based model.
基金supported in part by the National Natural Science Foundation of China(under Grant Nos.51939001,61976033,U1813203,61803064,and 61751202)Natural Foundation Guidance Plan Project of Liaoning(2019‐ZD‐0151)+2 种基金Science&Technology Innovation Funds of Dalian(under Grant No.2018J11CY022)Fundamental Research Funds for the Central Universities(under Grant No.3132019345)Dalian High‐level Talents Innovation Support Program(Young Sci-ence and Technology Star Project)(under Grant No.2021RQ067).
文摘Target tracking has a wide range of applications in intelligent transportation,real‐time monitoring,human‐computer interaction and other aspects.However,in the tracking process,the target is prone to deformation,occlusion,loss,scale variation,background clutter,illumination variation,etc.,which bring great challenges to realize accurate and real‐time tracking.Tracking based on Siamese networks promotes the application of deep learning in the field of target tracking,ensuring both accuracy and real‐time performance.However,due to its offline training,it is difficult to deal with the fast motion,serious occlusion,loss and deformation of the target during tracking.Therefore,it is very helpful to improve the performance of the Siamese networks by learning new features of the target quickly and updating the target position in time online.The broad learning system(BLS)has a simple network structure,high learning efficiency,and strong feature learning ability.Aiming at the problems of Siamese networks and the characteristics of BLS,a target tracking method based on BLS is proposed.The method combines offline training with fast online learning of new features,which not only adopts the powerful feature representation ability of deep learning,but also skillfully uses the BLS for re‐learning and re‐detection.The broad re‐learning information is used for re‐detection when the target tracking appears serious occlusion and so on,so as to change the selection of the Siamese networks search area,solve the problem that the search range cannot meet the fast motion of the target,and improve the adaptability.Experimental results show that the proposed method achieves good results on three challenging datasets and improves the performance of the basic algorithm in difficult scenarios.
文摘In recent years,with the development of the natural language processing(NLP)technologies,security analyst began to use NLP directly on assembly codes which were disassembled from binary executables in order to examine binary similarity,achieved great progress.However,we found that the existing frameworks often ignored the complex internal structure of instructions and didn’t fully consider the long-term dependencies of instructions.In this paper,we propose firmVulSeeker—a vulnerability search tool for embedded firmware images,based on BERT and Siamese network.It first builds a BERT MLM task to observe and learn the semantics of different instructions in their context in a very large unlabeled binary corpus.Then,a finetune mode based on Siamese network is constructed to guide training and matching semantically similar functions using the knowledge learned from the first stage.Finally,it will use a function embedding generated from the fine-tuned model to search in the targeted corpus and find the most similar function which will be confirmed whether it’s a real vulnerability manually.We evaluate the accuracy,robustness,scalability and vulnerability search capability of firmVulSeeker.Results show that it can greatly improve the accuracy of matching semantically similar functions,and can successfully find more real vulnerabilities in real-world firmware than other tools.
基金the National Natural Science Foundation of China(Grant No.62172132)Public Welfare Technology Research Project of Zhejiang Province(Grant No.LGF21F020014)the Opening Project of Key Laboratory of Public Security Information Application Based on Big-Data Architecture,Ministry of Public Security of Zhejiang Police College(Grant No.2021DSJSYS002).
文摘Seal authentication is an important task for verifying the authenticity of stamped seals used in various domains to protect legal documents from tampering and counterfeiting.Stamped seal inspection is commonly audited manually to ensure document authenticity.However,manual assessment of seal images is tedious and laborintensive due to human errors,inconsistent placement,and completeness of the seal.Traditional image recognition systems are inadequate enough to identify seal types accurately,necessitating a neural network-based method for seal image recognition.However,neural network-based classification algorithms,such as Residual Networks(ResNet)andVisualGeometryGroup with 16 layers(VGG16)yield suboptimal recognition rates on stamp datasets.Additionally,the fixed training data categories make handling new categories to be a challenging task.This paper proposes amulti-stage seal recognition algorithmbased on Siamese network to overcome these limitations.Firstly,the seal image is pre-processed by applying an image rotation correction module based on Histogram of Oriented Gradients(HOG).Secondly,the similarity between input seal image pairs is measured by utilizing a similarity comparison module based on the Siamese network.Finally,we compare the results with the pre-stored standard seal template images in the database to obtain the seal type.To evaluate the performance of the proposed method,we further create a new seal image dataset that contains two subsets with 210,000 valid labeled pairs in total.The proposed work has a practical significance in industries where automatic seal authentication is essential as in legal,financial,and governmental sectors,where automatic seal recognition can enhance document security and streamline validation processes.Furthermore,the experimental results show that the proposed multi-stage method for seal image recognition outperforms state-of-the-art methods on the two established datasets.
基金supported by National Key Research and Development Program of China[Grant number 2017YFB0504203]Xinjiang Production and Construction Corps Science and Technology Project:[Grant number 2017DB005].
文摘Recent change detection(CD)methods focus on the extraction of deep change semantic features.However,existing methods overlook the fine-grained features and have the poor ability to capture long-range space–time information,which leads to the micro changes missing and the edges of change types smoothing.In this paper,a potential transformer-based semantic change detection(SCD)model,Pyramid-SCDFormer is proposed,which precisely recognizes the small changes and fine edges details of the changes.The SCD model selectively merges different semantic tokens in multi-head self-attention block to obtain multiscale features,which is crucial for extraction information of remote sensing images(RSIs)with multiple changes from different scales.Moreover,we create a well-annotated SCD dataset,Landsat-SCD with unprecedented time series and change types in complex scenarios.Comparing with three Convolutional Neural Network-based,one attention-based,and two transformer-based networks,experimental results demonstrate that the Pyramid-SCDFormer stably outperforms the existing state-of-the-art CD models and obtains an improvement in MIoU/F1 of 1.11/0.76%,0.57/0.50%,and 8.75/8.59%on the LEVIR-CD,WHU_CD,and Landsat-SCD dataset respectively.For change classes proportion less than 1%,the proposed model improves the MIoU by 7.17–19.53%on Landsat-SCD dataset.The recognition performance for small-scale and fine edges of change types has greatly improved.
基金supported by the Zhejiang Key Laboratory of General Aviation Operation Technology(No.JDGA2020-7)the National Natural Science Foundation of China(No.62173237)+3 种基金the Natural Science Foundation of Liaoning Province(No.2019-MS-251)the Talent Project of Revitalization Liaoning Province(No.XLYC1907022)the Key R&D Projects of Liaoning Province(No.2020JH2/10100045)the High-Level Innovation Talent Project of Shenyang(No.RC190030).
文摘Single object tracking based on deep learning has achieved the advanced performance in many applications of computer vision.However,the existing trackers have certain limitations owing to deformation,occlusion,movement and some other conditions.We propose a siamese attentional dense network called SiamADN in an end-to-end offline manner,especially aiming at unmanned aerial vehicle(UAV)tracking.First,it applies a dense network to reduce vanishing-gradient,which strengthens the features transfer.Second,the channel attention mechanism is involved into the Densenet structure,in order to focus on the possible key regions.The advance corner detection network is introduced to improve the following tracking process.Extensive experiments are carried out on four mainly tracking benchmarks as OTB-2015,UAV123,LaSOT and VOT.The accuracy rate on UAV123 is 78.9%,and the running speed is 32 frame per second(FPS),which demonstrates its efficiency in the practical real application.
基金funded by the National Natural Science Foundation of China(Grant No.52072408),author Y.C.
文摘Onboard visual object tracking in unmanned aerial vehicles(UAVs)has attractedmuch interest due to its versatility.Meanwhile,due to high precision,Siamese networks are becoming hot spots in visual object tracking.However,most Siamese trackers fail to balance the tracking accuracy and time within onboard limited computational resources of UAVs.To meet the tracking precision and real-time requirements,this paper proposes a Siamese dense pixel-level network for UAV object tracking named SiamDPL.Specifically,the Siamese network extracts features of the search region and the template region through a parameter-shared backbone network,then performs correlationmatching to obtain the candidate regionwith high similarity.To improve the matching effect of template and search features,this paper designs a dense pixel-level feature fusion module to enhance the matching ability by pixel-wise correlation and enrich the feature diversity by dense connection.An attention module composed of self-attention and channel attention is introduced to learn global context information and selectively emphasize the target feature region in the spatial and channel dimensions.In addition,a target localization module is designed to improve target location accuracy.Compared with other advanced trackers,experiments on two public benchmarks,which are UAV123@10fps and UAV20L fromthe unmanned air vehicle123(UAV123)dataset,show that SiamDPL can achieve superior performance and low complexity with a running speed of 100.1 fps on NVIDIA TITAN RTX.
基金This work was supported in part by the National Natural Science Foundation of China(Grant No.32072788,31902210)the National Key Research and Development Program of China(Grant No.2019YFE0125600)the Postdoctoral Research Start-up Fund of Heilongjiang Province(Grant No.LBH-Q21062)and the Earmarked Fund for CARS36.
文摘Feed intake is an important indicator to reflect the production performance and disease risk of dairy cows,which can also evaluate the utilization rate of pasture feed.To achieve an automatic and non-contact measurement of feed intake,this paper proposes a method for measuring the feed intake of cows based on computer vision technology with a Siamese network and depth images.An automated data acquisition system was first designed to collect depth images of feed piles and constructed a dataset with 24150 samples.A deep learning model based on the Siamese network was then constructed to implement non-contact measurement of feed intake for dairy cows by training with collected data.The experimental results show that the mean absolute error(MAE)and the root mean square error(RMSE)of this method are 0.100 kg and 0.128 kg in the range of 0-8.2 kg respectively,which outperformed existing works.This work provides a new idea and technology for the intelligent measuring of dairy cow feed intake.
基金support of the National Natural Science Foundation of China (Grant No.52127809,author Z.W,http://www.nsfc.gov.cn/No.51625501,author Z.W,http://www.nsfc.gov.cn/)is greatly appreciated.
文摘Label assignment refers to determining positive/negative labels foreach sample to supervise the training process. Existing Siamese-based trackersprimarily use fixed label assignment strategies according to human priorknowledge;thus, they can be sensitive to predefined hyperparameters and failto fit the spatial and scale variations of samples. In this study, we first developa novel dynamic label assignment (DLA) module to handle the diverse datadistributions and adaptively distinguish the foreground from the backgroundbased on the statistical characteristics of the target in visual object tracking.The core of DLA module is a two-step selection mechanism. The first stepselects candidate samples according to the Euclidean distance between trainingsamples and ground truth, and the second step selects positive/negativesamples based on the mean and standard deviation of candidate samples.The proposed approach is general-purpose and can be easily integrated intoanchor-based and anchor-free trackers for optimal sample-label matching.According to extensive experimental findings, Siamese-based trackers withDLA modules can refine target locations and outperformbaseline trackers onOTB100, VOT2019, UAV123 and LaSOT. Particularly, DLA-SiamRPN++improves SiamRPN++ by 1% AUC and DLA-SiamCAR improves Siam-CAR by 2.5% AUC on OTB100. Furthermore, hyper-parameters analysisexperiments show that DLA module hardly increases spatio-temporal complexity,the proposed approach maintains the same speed as the originaltracker without additional overhead.
文摘This paper presents an off-line handwritten signature verification system based on the Siamese network,where a hybrid architecture is used.The Residual neural Network(ResNet)is used to realize a powerful feature extraction model such that Writer Independent(WI)features can be effectively learned.A single-layer Siamese Neural Network(NN)is used to realize a Writer Dependent(WD)classifier such that the storage space can be minimized.For the purpose of reducing the impact of the high intraclass variability of the signature and ensuring that the Siamese network can learn more effectively,we propose a method of selecting a reference signature as one of the inputs for the Siamese network.To take full advantage of the reference signature,we modify the conventional contrastive loss function to enhance the accuracy.By using the proposed techniques,the accuracy of the system can be increased by 5.9%.Based on the GPDS signature dataset,the proposed system is able to achieve an accuracy of 94.61%which is better than the accuracy achieved by the current state-of-the-art work.
基金This work was funded by the Deanship of Scientific Research at Jouf University under Grant No.(DSR-2022-RG-0104).
文摘Palmprint identification has been conducted over the last two decades in many biometric systems.High-dimensional data with many uncorrelated and duplicated features remains difficult due to several computational complexity issues.This paper presents an interactive authentication approach based on deep learning and feature selection that supports Palmprint authentication.The proposed model has two stages of learning;the first stage is to transfer pre-trained VGG-16 of ImageNet to specific features based on the extraction model.The second stage involves the VGG-16 Palmprint feature extraction in the Siamese network to learn Palmprint similarity.The proposed model achieves robust and reliable end-to-end Palmprint authentication by extracting the convolutional features using VGG-16 Palmprint and the similarity of two input Palmprint using the Siamese network.The second stage uses the CASIA dataset to train and test the Siamese network.The suggested model outperforms comparable studies based on the deep learning approach achieving accuracy and EER of 91.8%and 0.082%,respectively,on the CASIA left-hand images and accuracy and EER of 91.7%and 0.084,respectively,on the CASIA right-hand images.
文摘In response to the problem of traditional methods ignoring audio modality tampering, this study aims to explore an effective deep forgery video detection technique that improves detection precision and reliability by fusing lip images and audio signals. The main method used is lip-audio matching detection technology based on the Siamese neural network, combined with MFCC (Mel Frequency Cepstrum Coefficient) feature extraction of band-pass filters, an improved dual-branch Siamese network structure, and a two-stream network structure design. Firstly, the video stream is preprocessed to extract lip images, and the audio stream is preprocessed to extract MFCC features. Then, these features are processed separately through the two branches of the Siamese network. Finally, the model is trained and optimized through fully connected layers and loss functions. The experimental results show that the testing accuracy of the model in this study on the LRW (Lip Reading in the Wild) dataset reaches 92.3%;the recall rate is 94.3%;the F1 score is 93.3%, significantly better than the results of CNN (Convolutional Neural Networks) and LSTM (Long Short-Term Memory) models. In the validation of multi-resolution image streams, the highest accuracy of dual-resolution image streams reaches 94%. Band-pass filters can effectively improve the signal-to-noise ratio of deep forgery video detection when processing different types of audio signals. The real-time processing performance of the model is also excellent, and it achieves an average score of up to 5 in user research. These data demonstrate that the method proposed in this study can effectively fuse visual and audio information in deep forgery video detection, accurately identify inconsistencies between video and audio, and thus verify the effectiveness of lip-audio modality fusion technology in improving detection performance.
基金supported by the Science and Technology Support Project of Sichuan Science and Technology Department(2018SZ0357)and China Scholarship。
文摘A person’s eye gaze can effectively express that person’s intentions.Thus,gaze estimation is an important approach in intelligent manufacturing to analyze a person’s intentions.Many gaze estimation methods regress the direction of the gaze by analyzing images of the eyes,also known as eye patches.However,it is very difficult to construct a person-independent model that can estimate an accurate gaze direction for every person due to individual differences.In this paper,we hypothesize that the difference in the appearance of each of a person’s eyes is related to the difference in the corresponding gaze directions.Based on this hypothesis,a differential eyes’appearances network(DEANet)is trained on public datasets to predict the gaze differences of pairwise eye patches belonging to the same individual.Our proposed DEANet is based on a Siamese neural network(SNNet)framework which has two identical branches.A multi-stream architecture is fed into each branch of the SNNet.Both branches of the DEANet that share the same weights extract the features of the patches;then the features are concatenated to obtain the difference of the gaze directions.Once the differential gaze model is trained,a new person’s gaze direction can be estimated when a few calibrated eye patches for that person are provided.Because personspecific calibrated eye patches are involved in the testing stage,the estimation accuracy is improved.Furthermore,the problem of requiring a large amount of data when training a person-specific model is effectively avoided.A reference grid strategy is also proposed in order to select a few references as some of the DEANet’s inputs directly based on the estimation values,further thereby improving the estimation accuracy.Experiments on public datasets show that our proposed approach outperforms the state-of-theart methods.
文摘Evaluation of calligraphic copy is the core of Chinese calligraphy appreciation and in-heritance.However,previous aesthetic evaluation studies often focussed on photos and paintings,with few attempts on Chinese calligraphy.To solve this problem,a Siamese regression aesthetic fusion method is proposed,named SRAFE,for Chinese calligraphy based on the combination of calligraphy aesthetics and deep learning.First,a dataset termed Evaluated Chinese Calligraphy Copies(E3C)is constructed for aesthetic evalu-ation.Second,12 hand‐crafted aesthetic features based on the shape,structure,and stroke of calligraphy are designed.Then,the Siamese regression network(SRN)is designed to extract the deep aesthetic representation of calligraphy.Finally,the SRAFE method is built by fusing the deep aesthetic features with the hand‐crafted aesthetic features.Experimental results show that scores given by SRAFE are similar to the aesthetic evaluation label of E3C,proving the effectiveness of the authors’method.
文摘Neuropsychological tests,such as the Rey-Osterrieth complex figure(ROCF)test,help detect mild cognitive impairment(MCI)in adults by assessing cognitive abilities such as planning,organization,and memory.Furthermore,they are inexpensive and minimally invasive,making them excellent tools for early screening.In this paper,we propose the use of image analysis models to characterize the relationship between an individual’s ROCF drawing and their cognitive state.This task is usually framed as a classification problem and is solved using deep learning models,due to their success in the last decade.In order to achieve good performance,these models need to be trained with a large number of examples.Given that our data availability is limited,we alternatively treat our task as a similarity learning problem,performing pairwise ROCF drawing comparisons to define groups that represent different cognitive states.This way of working could lead to better data utilization and improved model performance.To solve the similarity learning problem,we propose a siamese neural network(SNN)that exploits the distances of arbitrary ROCF drawings to the ideal representation of the ROCF.Our proposal is compared against various deep learning models designed for classification using a public dataset of 528 ROCF copy drawings,which are associated with either healthy individuals or those with MCI.Quantitative results are derived from a scheme involving multiple rounds of evaluation,employing both a dedicated test set and 14-fold cross-validation.Our SNN proposal demonstrates superiority in validation performance,and test results comparable to those of the classification-based deep learning models.
基金This work is supported by the Natural Science Foundation of China(No.82372035)National Transportation Preparedness Projects(No.ZYZZYJ).Light of West China(No.XAB2022YN10)The China Postdoctoral Science Foundation(No.2023M740760).
文摘Colorectal cancer,a malignant lesion of the intestines,significantly affects human health and life,emphasizing the necessity of early detection and treatment.Accurate segmentation of colorectal cancer regions directly impacts subsequent staging,treatment methods,and prognostic outcomes.While colonoscopy is an effective method for detecting colorectal cancer,its data collection approach can cause patient discomfort.To address this,current research utilizes Computed Tomography(CT)imaging;however,conventional CT images only capture transient states,lacking sufficient representational capability to precisely locate colorectal cancer.This study utilizes enhanced CT images,constructing a deep feature network from the arterial,portal venous,and delay phases to simulate the physician’s diagnostic process and achieve accurate cancer segmentation.The innovations include:1)Utilizing portal venous phase CT images to introduce a context-aware multi-scale aggregation module for preliminary shape extraction of colorectal cancer.2)Building an image sequence based on arterial and delay phases,transforming the cancer segmentation issue into an anomaly detection problem,establishing a pixel-pairing strategy,and proposing a colorectal cancer segmentation algorithm using a Siamese network.Experiments with 84 clinical cases of colorectal cancer enhanced CT data demonstrated an Area Overlap Measure of 0.90,significantly better than Fully Convolutional Networks(FCNs)at 0.20.Future research will explore the relationship between conventional and enhanced CT to further reduce segmentation time and improve accuracy.