Image inpainting refers to synthesizing missing content in an image based on known information to restore occluded or damaged regions,which is a typical manifestation of this trend.With the increasing complexity of im...Image inpainting refers to synthesizing missing content in an image based on known information to restore occluded or damaged regions,which is a typical manifestation of this trend.With the increasing complexity of image in tasks and the growth of data scale,existing deep learning methods still have some limitations.For example,they lack the ability to capture long-range dependencies and their performance in handling multi-scale image structures is suboptimal.To solve this problem,the paper proposes an image inpainting method based on the parallel dual-branch learnable Transformer network.The encoder of the proposed model generator consists of a dual-branch parallel structure with stacked CNN blocks and Transformer blocks,aiming to extract global and local feature information from images.Furthermore,a dual-branch fusion module is adopted to combine the features obtained from both branches.Additionally,a gated full-scale skip connection module is proposed to further enhance the coherence of the inpainting results and alleviate information loss.Finally,experimental results from the three public datasets demonstrate the superior performance of the proposed method.展开更多
Sign language fills the communication gap for people with hearing and speaking ailments.It includes both visual modalities,manual gestures consisting of movements of hands,and non-manual gestures incorporating body mo...Sign language fills the communication gap for people with hearing and speaking ailments.It includes both visual modalities,manual gestures consisting of movements of hands,and non-manual gestures incorporating body movements including head,facial expressions,eyes,shoulder shrugging,etc.Previously both gestures have been detected;identifying separately may have better accuracy,butmuch communicational information is lost.Aproper sign language mechanism is needed to detect manual and non-manual gestures to convey the appropriate detailed message to others.Our novel proposed system contributes as Sign LanguageAction Transformer Network(SLATN),localizing hand,body,and facial gestures in video sequences.Here we are expending a Transformer-style structural design as a“base network”to extract features from a spatiotemporal domain.Themodel impulsively learns to track individual persons and their action context inmultiple frames.Furthermore,a“head network”emphasizes hand movement and facial expression simultaneously,which is often crucial to understanding sign language,using its attention mechanism for creating tight bounding boxes around classified gestures.The model’s work is later compared with the traditional identification methods of activity recognition.It not only works faster but achieves better accuracy as well.Themodel achieves overall 82.66%testing accuracy with a very considerable performance of computation with 94.13 Giga-Floating Point Operations per Second(G-FLOPS).Another contribution is a newly created dataset of Pakistan Sign Language forManual and Non-Manual(PkSLMNM)gestures.展开更多
Seal authentication is an important task for verifying the authenticity of stamped seals used in various domains to protect legal documents from tampering and counterfeiting.Stamped seal inspection is commonly audited...Seal authentication is an important task for verifying the authenticity of stamped seals used in various domains to protect legal documents from tampering and counterfeiting.Stamped seal inspection is commonly audited manually to ensure document authenticity.However,manual assessment of seal images is tedious and laborintensive due to human errors,inconsistent placement,and completeness of the seal.Traditional image recognition systems are inadequate enough to identify seal types accurately,necessitating a neural network-based method for seal image recognition.However,neural network-based classification algorithms,such as Residual Networks(ResNet)andVisualGeometryGroup with 16 layers(VGG16)yield suboptimal recognition rates on stamp datasets.Additionally,the fixed training data categories make handling new categories to be a challenging task.This paper proposes amulti-stage seal recognition algorithmbased on Siamese network to overcome these limitations.Firstly,the seal image is pre-processed by applying an image rotation correction module based on Histogram of Oriented Gradients(HOG).Secondly,the similarity between input seal image pairs is measured by utilizing a similarity comparison module based on the Siamese network.Finally,we compare the results with the pre-stored standard seal template images in the database to obtain the seal type.To evaluate the performance of the proposed method,we further create a new seal image dataset that contains two subsets with 210,000 valid labeled pairs in total.The proposed work has a practical significance in industries where automatic seal authentication is essential as in legal,financial,and governmental sectors,where automatic seal recognition can enhance document security and streamline validation processes.Furthermore,the experimental results show that the proposed multi-stage method for seal image recognition outperforms state-of-the-art methods on the two established datasets.展开更多
Reliable electricity infrastructure is critical for modern society,highlighting the importance of securing the stability of fundamental power electronic systems.However,as such systems frequently involve high-current ...Reliable electricity infrastructure is critical for modern society,highlighting the importance of securing the stability of fundamental power electronic systems.However,as such systems frequently involve high-current and high-voltage conditions,there is a greater likelihood of failures.Consequently,anomaly detection of power electronic systems holds great significance,which is a task that properly-designed neural networks can well undertake,as proven in various scenarios.Transformer-like networks are promising for such application,yet with its structure initially designed for different tasks,features extracted by beginning layers are often lost,decreasing detection performance.Also,such data-driven methods typically require sufficient anomalous data for training,which could be difficult to obtain in practice.Therefore,to improve feature utilization while achieving efficient unsupervised learning,a novel model,Densely-connected Decoder Transformer(DDformer),is proposed for unsupervised anomaly detection of power electronic systems in this paper.First,efficient labelfree training is achieved based on the concept of autoencoder with recursive-free output.An encoder-decoder structure with densely-connected decoder is then adopted,merging features from all encoder layers to avoid possible loss of mined features while reducing training difficulty.Both simulation and real-world experiments are conducted to validate the capabilities of DDformer,and the average FDR has surpassed baseline models,reaching 89.39%,93.91%,95.98%in different experiment setups respectively.展开更多
With the continuous expansion and increasing complexity of power system scales,the binary classifica-tion for transient stability assessment in power systems can no longer meet the safety requirements of power system ...With the continuous expansion and increasing complexity of power system scales,the binary classifica-tion for transient stability assessment in power systems can no longer meet the safety requirements of power system control and regulation.Therefore,this paper proposes a multi-class transient stability assessment model based on an improved Transformer.The model is designed with a dual-tower encoder structure:one encoder focuses on the time dependency of data,while the other focuses on the dynamic correlations between variables.Feature extraction is conducted from both time and variable perspectives to ensure the completeness of the feature extraction process,thereby enhancing the accuracy of multi-class evaluation in power systems.Additionally,this paper introduces a hybrid sampling strategy based on sample boundaries,which addresses the issue of sample imbalance by increasing the number of boundary samples in the minority class and reducing the number of non-boundary samples in the majority class.Considering the frequent changes in power grid topology or operation modes,this paper proposes a two-stage updating scheme based on self-supervised learning:In the first stage,self-supervised learning is employed to mine the structural information from unlabeled data in the target domain,enhancing the model’s generalization capability in new scenarios.In the second stage,a sample screening mechanism is used to select key samples,which are labeled through long-term simulation techniques for fine-tuning the model parameters.This allows for rapid model updates without relying on many labeled samples.This paper’s proposed model and update scheme have been simulated and verified on two node systems,the IEEE New England 10-machine 39-bus system and the IEEE 47-machine 140-bus system,demonstrating their effectiveness and reliability.展开更多
To tackle the problem of inaccurate short-term bus load prediction,especially during holidays,a Transformer-based scheme with tailored architectural enhancements is proposed.First,the input data are clustered to reduc...To tackle the problem of inaccurate short-term bus load prediction,especially during holidays,a Transformer-based scheme with tailored architectural enhancements is proposed.First,the input data are clustered to reduce complexity and capture inherent characteristics more effectively.Gated residual connections are then employed to selectively propagate salient features across layers,while an attention mechanism focuses on identifying prominent patterns in multivariate time-series data.Ultimately,a pre-trained structure is incorporated to reduce computational complexity.Experimental results based on extensive data show that the proposed scheme achieves improved prediction accuracy over comparative algorithms by at least 32.00%consistently across all buses evaluated,and the fitting effect of holiday load curves is outstanding.Meanwhile,the pre-trained structure drastically reduces the training time of the proposed algorithm by more than 65.75%.The proposed scheme can efficiently predict bus load results while enhancing robustness for holiday predictions,making it better adapted to real-world prediction scenarios.展开更多
Internal learning-based video inpainting methods have shown promising results by exploiting the intrinsic properties of the video to fill in the missing region without external dataset supervision.However,existing int...Internal learning-based video inpainting methods have shown promising results by exploiting the intrinsic properties of the video to fill in the missing region without external dataset supervision.However,existing internal learning-based video inpainting methods would produce inconsistent structures or blurry textures due to the insufficient utilisation of motion priors within the video sequence.In this paper,the authors propose a new internal learning-based video inpainting model called appearance consistency and motion coherence network(ACMC-Net),which can not only learn the recurrence of appearance prior but can also capture motion coherence prior to improve the quality of the inpainting results.In ACMC-Net,a transformer-based appearance network is developed to capture global context information within the video frame for representing appearance consistency accurately.Additionally,a novel motion coherence learning scheme is proposed to learn the motion prior in a video sequence effectively.Finally,the learnt internal appearance consistency and motion coherence are implicitly propagated to the missing regions to achieve inpainting well.Extensive experiments conducted on the DAVIS dataset show that the proposed model obtains the superior performance in terms of quantitative measurements and produces more visually plausible results compared with the state-of-the-art methods.展开更多
In view of the weak ability of the convolutional neural networks to explicitly learn spatial invariance and the probabilistic loss of discriminative features caused by occlusion and background interference in pedestri...In view of the weak ability of the convolutional neural networks to explicitly learn spatial invariance and the probabilistic loss of discriminative features caused by occlusion and background interference in pedestrian re-identification tasks,a person re-identification method combining spatial feature learning and multi-granularity feature fusion was proposed.First,an attention spatial transformation network(A-STN)is proposed to learn spatial features and solve the problem of misalignment of pedestrian spatial features.Then the network was divided into a global branch,a local coarse-grained fusion branch,and a local fine-grained fusion branch to extract pedestrian global features,coarse-grained fusion features,and fine-grained fusion features,respectively.Among them,the global branch enriches the global features by fusing different pooling features.The local coarse-grained fusion branch uses an overlay pooling to enhance each local feature while learning the correlation relationship between multi-granularity features.The local fine-grained fusion branch uses a differential pooling to obtain the differential features that were fused with global features to learn the relationship between pedestrian local features and pedestrian global features.Finally,the proposed method was compared on three public datasets:Market1501,DukeMTMC-ReID and CUHK03.The experimental results were better than those of the comparative methods,which verifies the effectiveness of the proposed method.展开更多
Current you only look once(YOLO)-based algorithm model is facing the challenge of overwhelming parameters and calculation complexity under the printed circuit board(PCB)defect detection application scenario.In order t...Current you only look once(YOLO)-based algorithm model is facing the challenge of overwhelming parameters and calculation complexity under the printed circuit board(PCB)defect detection application scenario.In order to solve this problem,we propose a new method,which combined the lightweight network mobile vision transformer(Mobile Vi T)with the convolutional block attention module(CBAM)mechanism and the new regression loss function.This method needed less computation resources,making it more suitable for embedded edge detection devices.Meanwhile,the new loss function improved the positioning accuracy of the bounding box and enhanced the robustness of the model.In addition,experiments on public datasets demonstrate that the improved model achieves an average accuracy of 87.9%across six typical defect detection tasks,while reducing computational costs by nearly 90%.It significantly reduces the model's computational requirements while maintaining accuracy,ensuring reliable performance for edge deployment.展开更多
There is a Poisson inverse problem in biomedical imaging,fluorescence microscopy and so on.Since the observed measurements are damaged by a linear operator and further destroyed by Poisson noise,recovering the approxi...There is a Poisson inverse problem in biomedical imaging,fluorescence microscopy and so on.Since the observed measurements are damaged by a linear operator and further destroyed by Poisson noise,recovering the approximate original image is difficult.Motivated by the decouple scheme and the variance-stabilizing transformation(VST)strategy,we propose a method of transformed convolutional neural network(CNN)to restore the observed image.In the network,the Conv-layers play the role of a linear inverse filter and the distribution transformation simultaneously.Furthermore,there is no batch normalization(BN)layer in the residual block of the network,which is devoted to tackling with the non-Gaussian recovery procedure.The proposed method is compared with state-of-the-art Poisson deblurring algorithms,and the experimental results show the effectiveness of the method.展开更多
Background In this study, we propose view interpolation networks to reproduce changes in the brightness of an object′s surface depending on the viewing direction, which is important for reproducing the material appea...Background In this study, we propose view interpolation networks to reproduce changes in the brightness of an object′s surface depending on the viewing direction, which is important for reproducing the material appearance of a real object. Method We used an original and modified version of U-Net for image transformation. The networks were trained to generate images from the intermediate viewpoints of four cameras placed at the corners of a square. We conducted an experiment using with three different combinations of methods and training data formats. Result We determined that inputting the coordinates of the viewpoints together with the four camera images and using images from random viewpoints as the training data produces the best results.展开更多
According to the researches on theoretic basis in part Ⅰ of the paper, the spanning tree algorithms solving the maximum independent set both in even network and in odd network have been developed in this part, part ...According to the researches on theoretic basis in part Ⅰ of the paper, the spanning tree algorithms solving the maximum independent set both in even network and in odd network have been developed in this part, part Ⅱ of the paper. The algorithms transform first the general network into the pair sets network, and then decompose the pair sets network into a series of pair subsets by use of the characteristic of maximum flow passing through the pair sets network. As for the even network, the algorithm requires only one time of transformation and decomposition, the maximum independent set can be gained without any iteration processes, and the time complexity of the algorithm is within the bound of O(V3). However, as for the odd network, the algorithm consists of two stages. In the first stage, the general odd network is transformed and decomposed into the pseudo-negative envelope graphs and generalized reverse pseudo-negative envelope graphs alternately distributed at first; then the algorithm turns to the second stage, searching for the negative envelope graphs within the pseudo-negative envelope graphs only. Each time as a negative envelope graph has been found, renew the pair sets network by iteration at once, and then turn back to the first stage. So both stages form a circulation process up to the optimum. Two available methods, the adjusting search and the picking-off search are specially developed to deal with the problems resulted from the odd network. Both of them link up with each other harmoniously and are embedded together in the algorithm. Analysis and study indicate that the time complexity of this algorithm is within the bound of O(V5).展开更多
The structure and characteristics of a connected network are analyzed, and a special kind of sub-network, which can optimize the iteration processes, is discovered. Then, the sufficient and necessary conditions for o...The structure and characteristics of a connected network are analyzed, and a special kind of sub-network, which can optimize the iteration processes, is discovered. Then, the sufficient and necessary conditions for obtaining the maximum independent set are deduced. It is found that the neighborhood of this sub-network possesses the similar characters, but both can never be allowed incorporated together. Particularly, it is identified that the network can be divided into two parts by a certain style, and then both of them can be transformed into a pair sets network, where the special sub-networks and their neighborhoods appear alternately distributed throughout the entire pair sets network. By use of this characteristic, the network decomposed enough without losing any solutions is obtained. All of these above will be able to make well ready for developing a much better algorithm with polynomial time bound for an odd network in the the application research part of this subject.展开更多
Training neural network to recognize targets needs a lot of samples.People usually get these samples in a non-systematic way,which can miss or overemphasize some target information.To improve this situation,a new meth...Training neural network to recognize targets needs a lot of samples.People usually get these samples in a non-systematic way,which can miss or overemphasize some target information.To improve this situation,a new method based on virtual model and invariant moments was proposed to generate training samples.The method was composed of the following steps:use computer and simulation software to build target object's virtual model and then simulate the environment,light condition,camera parameter,etc.;rotate the model by spin and nutation of inclination to get the image sequence by virtual camera;preprocess each image and transfer them into binary image;calculate the invariant moments for each image and get a vectors' sequence.The vectors' sequence which was proved to be complete became the training samples together with the target outputs.The simulated results showed that the proposed method could be used to recognize the real targets and improve the accuracy of target recognition effectively when the sampling interval was short enough and the circumstance simulation was close enough.展开更多
The process of turning forest area into land is known as deforestation or forest degradation. Reforestation as a fraction of deforestation is extremely low. For improved qualitative and quantitative classification, we...The process of turning forest area into land is known as deforestation or forest degradation. Reforestation as a fraction of deforestation is extremely low. For improved qualitative and quantitative classification, we used Sentinel-1 dataset of State of Para, Brazil to precisely and closely monitor deforestation between June 2019 and June 2023. This research aimed to find out suitable model for classification called Satellite Imaging analysis by Transpose deep neural transformation network (SIT-net) using mathematical model based on Band math approach to classify deforestation applying transpose deep neural network. The main advantage of proposed model is easy to handle SAR images. The study concludes that SAR satellite gives high-resolution images to improve deforestation monitoring and proposed model takes less computational time compared to other techniques.展开更多
As the demand for more efficient and adaptable power distribution systems intensifies, especially in rural areas, innovative solutions like the Capacitor-Coupled Substation with a Controllable Network Transformer (CCS...As the demand for more efficient and adaptable power distribution systems intensifies, especially in rural areas, innovative solutions like the Capacitor-Coupled Substation with a Controllable Network Transformer (CCS-CNT) are becoming increasingly critical. Traditional power distribution networks, often limited by unidirectional flow capabilities and inflexibility, struggle to meet the complex demands of modern energy systems. The CCS-CNT system offers a transformative approach by enabling bidirectional power flow between high-voltage transmission lines and local distribution networks, a feature that is essential for integrating renewable energy sources and ensuring reliable electrification in underserved regions. This paper presents a detailed mathematical representation of power flow within the CCS-CNT system, emphasizing the control of both active and reactive power through the adjustment of voltage levels and phase angles. A control algorithm is developed to dynamically manage power flow, ensuring optimal performance by minimizing losses and maintaining voltage stability across the network. The proposed CCS-CNT system demonstrates significant potential in enhancing the efficiency and reliability of power distribution, making it particularly suited for rural electrification and other applications where traditional methods fall short. The findings underscore the system's capability to adapt to varying operational conditions, offering a robust solution for modern power distribution challenges.展开更多
Recent change detection(CD)methods focus on the extraction of deep change semantic features.However,existing methods overlook the fine-grained features and have the poor ability to capture long-range space–time infor...Recent change detection(CD)methods focus on the extraction of deep change semantic features.However,existing methods overlook the fine-grained features and have the poor ability to capture long-range space–time information,which leads to the micro changes missing and the edges of change types smoothing.In this paper,a potential transformer-based semantic change detection(SCD)model,Pyramid-SCDFormer is proposed,which precisely recognizes the small changes and fine edges details of the changes.The SCD model selectively merges different semantic tokens in multi-head self-attention block to obtain multiscale features,which is crucial for extraction information of remote sensing images(RSIs)with multiple changes from different scales.Moreover,we create a well-annotated SCD dataset,Landsat-SCD with unprecedented time series and change types in complex scenarios.Comparing with three Convolutional Neural Network-based,one attention-based,and two transformer-based networks,experimental results demonstrate that the Pyramid-SCDFormer stably outperforms the existing state-of-the-art CD models and obtains an improvement in MIoU/F1 of 1.11/0.76%,0.57/0.50%,and 8.75/8.59%on the LEVIR-CD,WHU_CD,and Landsat-SCD dataset respectively.For change classes proportion less than 1%,the proposed model improves the MIoU by 7.17–19.53%on Landsat-SCD dataset.The recognition performance for small-scale and fine edges of change types has greatly improved.展开更多
With the advancement of computer vision techniques in surveillance systems,the need for more proficient,intelligent,and sustainable facial expressions and age recognition is necessary.The main purpose of this study is...With the advancement of computer vision techniques in surveillance systems,the need for more proficient,intelligent,and sustainable facial expressions and age recognition is necessary.The main purpose of this study is to develop accurate facial expressions and an age recognition system that is capable of error-free recognition of human expression and age in both indoor and outdoor environments.The proposed system first takes an input image pre-process it and then detects faces in the entire image.After that landmarks localization helps in the formation of synthetic face mask prediction.A novel set of features are extracted and passed to a classifier for the accurate classification of expressions and age group.The proposed system is tested over two benchmark datasets,namely,the Gallagher collection person dataset and the Images of Groups dataset.The system achieved remarkable results over these benchmark datasets about recognition accuracy and computational time.The proposed system would also be applicable in different consumer application domains such as online business negotiations,consumer behavior analysis,E-learning environments,and emotion robotics.展开更多
This paper develops a trustworthy deep learning model that considers electricity demand(G)and local climate conditions.The model utilises Multi-Head Self-Attention Transformer(TNET)to capture critical information from...This paper develops a trustworthy deep learning model that considers electricity demand(G)and local climate conditions.The model utilises Multi-Head Self-Attention Transformer(TNET)to capture critical information from𝐻,to attain reliable predictions with local climate(rainfall,radiation,humidity,evaporation,and maximum and minimum temperatures)data from Energex substations in Queensland,Australia.The TNET model is then evaluated with deep learning models(Long-Short Term Memory LSTM,Bidirectional LSTM BILSTM,Gated Recurrent Unit GRU,Convolutional Neural Networks CNN,and Deep Neural Network DNN)based on robust model assessment metrics.The Kernel Density Estimation method is used to generate the prediction interval(PI)of electricity demand forecasts and derive probability metrics and results to show the developed TNET model is accurate for all the substations.The study concludes that the proposed TNET model is a reliable electricity demand predictive tool that has high accuracy and low predictive errors and could be employed as a stratagem by demand modellers and energy policy-makers who wish to incorporate climatic factors into electricity demand patterns and develop national energy market insights and analysis systems.展开更多
Aspect category detection is one challenging subtask of aspect based sentiment analysis, which categorizes a review sentence into a set of predefined aspect categories. Most existing methods regard the aspect category...Aspect category detection is one challenging subtask of aspect based sentiment analysis, which categorizes a review sentence into a set of predefined aspect categories. Most existing methods regard the aspect category detection as a flat classification problem. However, aspect categories are inter-related, and they are usually organized with a hierarchical tree structure. To leverage the structure information, this paper proposes a hierarchical multi-label classification model to detect aspect categories and uses a graph enhanced transformer network to integrate label dependency information into prediction features. Experiments have been conducted on four widely-used benchmark datasets, showing that the proposed model outperforms all strong baselines.展开更多
基金supported by Scientific Research Fund of Hunan Provincial Natural Science Foundation under Grant 20231J60257Hunan Provincial Engineering Research Center for Intelligent Rehabilitation Robotics and Assistive Equipment under Grant 2025SH501Inha University and Design of a Conflict Detection and Validation Tool under Grant HX2024123.
文摘Image inpainting refers to synthesizing missing content in an image based on known information to restore occluded or damaged regions,which is a typical manifestation of this trend.With the increasing complexity of image in tasks and the growth of data scale,existing deep learning methods still have some limitations.For example,they lack the ability to capture long-range dependencies and their performance in handling multi-scale image structures is suboptimal.To solve this problem,the paper proposes an image inpainting method based on the parallel dual-branch learnable Transformer network.The encoder of the proposed model generator consists of a dual-branch parallel structure with stacked CNN blocks and Transformer blocks,aiming to extract global and local feature information from images.Furthermore,a dual-branch fusion module is adopted to combine the features obtained from both branches.Additionally,a gated full-scale skip connection module is proposed to further enhance the coherence of the inpainting results and alleviate information loss.Finally,experimental results from the three public datasets demonstrate the superior performance of the proposed method.
文摘Sign language fills the communication gap for people with hearing and speaking ailments.It includes both visual modalities,manual gestures consisting of movements of hands,and non-manual gestures incorporating body movements including head,facial expressions,eyes,shoulder shrugging,etc.Previously both gestures have been detected;identifying separately may have better accuracy,butmuch communicational information is lost.Aproper sign language mechanism is needed to detect manual and non-manual gestures to convey the appropriate detailed message to others.Our novel proposed system contributes as Sign LanguageAction Transformer Network(SLATN),localizing hand,body,and facial gestures in video sequences.Here we are expending a Transformer-style structural design as a“base network”to extract features from a spatiotemporal domain.Themodel impulsively learns to track individual persons and their action context inmultiple frames.Furthermore,a“head network”emphasizes hand movement and facial expression simultaneously,which is often crucial to understanding sign language,using its attention mechanism for creating tight bounding boxes around classified gestures.The model’s work is later compared with the traditional identification methods of activity recognition.It not only works faster but achieves better accuracy as well.Themodel achieves overall 82.66%testing accuracy with a very considerable performance of computation with 94.13 Giga-Floating Point Operations per Second(G-FLOPS).Another contribution is a newly created dataset of Pakistan Sign Language forManual and Non-Manual(PkSLMNM)gestures.
基金the National Natural Science Foundation of China(Grant No.62172132)Public Welfare Technology Research Project of Zhejiang Province(Grant No.LGF21F020014)the Opening Project of Key Laboratory of Public Security Information Application Based on Big-Data Architecture,Ministry of Public Security of Zhejiang Police College(Grant No.2021DSJSYS002).
文摘Seal authentication is an important task for verifying the authenticity of stamped seals used in various domains to protect legal documents from tampering and counterfeiting.Stamped seal inspection is commonly audited manually to ensure document authenticity.However,manual assessment of seal images is tedious and laborintensive due to human errors,inconsistent placement,and completeness of the seal.Traditional image recognition systems are inadequate enough to identify seal types accurately,necessitating a neural network-based method for seal image recognition.However,neural network-based classification algorithms,such as Residual Networks(ResNet)andVisualGeometryGroup with 16 layers(VGG16)yield suboptimal recognition rates on stamp datasets.Additionally,the fixed training data categories make handling new categories to be a challenging task.This paper proposes amulti-stage seal recognition algorithmbased on Siamese network to overcome these limitations.Firstly,the seal image is pre-processed by applying an image rotation correction module based on Histogram of Oriented Gradients(HOG).Secondly,the similarity between input seal image pairs is measured by utilizing a similarity comparison module based on the Siamese network.Finally,we compare the results with the pre-stored standard seal template images in the database to obtain the seal type.To evaluate the performance of the proposed method,we further create a new seal image dataset that contains two subsets with 210,000 valid labeled pairs in total.The proposed work has a practical significance in industries where automatic seal authentication is essential as in legal,financial,and governmental sectors,where automatic seal recognition can enhance document security and streamline validation processes.Furthermore,the experimental results show that the proposed multi-stage method for seal image recognition outperforms state-of-the-art methods on the two established datasets.
基金supported in part by the National Natural Science Foundation of China under Grant 62303090,U2330206in part by the Postdoctoral Science Foundation of China under Grant 2023M740516+1 种基金in part by the Natural Science Foundation of Sichuan Province under Grant 2024NSFSC1480in part by the New Cornerstone Science Foundation through the XPLORER PRIZE.
文摘Reliable electricity infrastructure is critical for modern society,highlighting the importance of securing the stability of fundamental power electronic systems.However,as such systems frequently involve high-current and high-voltage conditions,there is a greater likelihood of failures.Consequently,anomaly detection of power electronic systems holds great significance,which is a task that properly-designed neural networks can well undertake,as proven in various scenarios.Transformer-like networks are promising for such application,yet with its structure initially designed for different tasks,features extracted by beginning layers are often lost,decreasing detection performance.Also,such data-driven methods typically require sufficient anomalous data for training,which could be difficult to obtain in practice.Therefore,to improve feature utilization while achieving efficient unsupervised learning,a novel model,Densely-connected Decoder Transformer(DDformer),is proposed for unsupervised anomaly detection of power electronic systems in this paper.First,efficient labelfree training is achieved based on the concept of autoencoder with recursive-free output.An encoder-decoder structure with densely-connected decoder is then adopted,merging features from all encoder layers to avoid possible loss of mined features while reducing training difficulty.Both simulation and real-world experiments are conducted to validate the capabilities of DDformer,and the average FDR has surpassed baseline models,reaching 89.39%,93.91%,95.98%in different experiment setups respectively.
基金the National Natural Science Foundation of China(5227-7084).
文摘With the continuous expansion and increasing complexity of power system scales,the binary classifica-tion for transient stability assessment in power systems can no longer meet the safety requirements of power system control and regulation.Therefore,this paper proposes a multi-class transient stability assessment model based on an improved Transformer.The model is designed with a dual-tower encoder structure:one encoder focuses on the time dependency of data,while the other focuses on the dynamic correlations between variables.Feature extraction is conducted from both time and variable perspectives to ensure the completeness of the feature extraction process,thereby enhancing the accuracy of multi-class evaluation in power systems.Additionally,this paper introduces a hybrid sampling strategy based on sample boundaries,which addresses the issue of sample imbalance by increasing the number of boundary samples in the minority class and reducing the number of non-boundary samples in the majority class.Considering the frequent changes in power grid topology or operation modes,this paper proposes a two-stage updating scheme based on self-supervised learning:In the first stage,self-supervised learning is employed to mine the structural information from unlabeled data in the target domain,enhancing the model’s generalization capability in new scenarios.In the second stage,a sample screening mechanism is used to select key samples,which are labeled through long-term simulation techniques for fine-tuning the model parameters.This allows for rapid model updates without relying on many labeled samples.This paper’s proposed model and update scheme have been simulated and verified on two node systems,the IEEE New England 10-machine 39-bus system and the IEEE 47-machine 140-bus system,demonstrating their effectiveness and reliability.
文摘To tackle the problem of inaccurate short-term bus load prediction,especially during holidays,a Transformer-based scheme with tailored architectural enhancements is proposed.First,the input data are clustered to reduce complexity and capture inherent characteristics more effectively.Gated residual connections are then employed to selectively propagate salient features across layers,while an attention mechanism focuses on identifying prominent patterns in multivariate time-series data.Ultimately,a pre-trained structure is incorporated to reduce computational complexity.Experimental results based on extensive data show that the proposed scheme achieves improved prediction accuracy over comparative algorithms by at least 32.00%consistently across all buses evaluated,and the fitting effect of holiday load curves is outstanding.Meanwhile,the pre-trained structure drastically reduces the training time of the proposed algorithm by more than 65.75%.The proposed scheme can efficiently predict bus load results while enhancing robustness for holiday predictions,making it better adapted to real-world prediction scenarios.
基金Shenzhen Science and Technology Programme,Grant/Award Number:JCYJ202308071208000012023 Shenzhen sustainable supporting funds for colleges and universities,Grant/Award Number:20231121165240001Guangdong Provincial Key Laboratory of Ultra High Definition Immersive Media Technology,Grant/Award Number:2024B1212010006。
文摘Internal learning-based video inpainting methods have shown promising results by exploiting the intrinsic properties of the video to fill in the missing region without external dataset supervision.However,existing internal learning-based video inpainting methods would produce inconsistent structures or blurry textures due to the insufficient utilisation of motion priors within the video sequence.In this paper,the authors propose a new internal learning-based video inpainting model called appearance consistency and motion coherence network(ACMC-Net),which can not only learn the recurrence of appearance prior but can also capture motion coherence prior to improve the quality of the inpainting results.In ACMC-Net,a transformer-based appearance network is developed to capture global context information within the video frame for representing appearance consistency accurately.Additionally,a novel motion coherence learning scheme is proposed to learn the motion prior in a video sequence effectively.Finally,the learnt internal appearance consistency and motion coherence are implicitly propagated to the missing regions to achieve inpainting well.Extensive experiments conducted on the DAVIS dataset show that the proposed model obtains the superior performance in terms of quantitative measurements and produces more visually plausible results compared with the state-of-the-art methods.
基金the Foshan Science and technology Innovation Team Project(No.FS0AA-KJ919-4402-0060)the National Natural Science Foundation of China(No.62263018)。
文摘In view of the weak ability of the convolutional neural networks to explicitly learn spatial invariance and the probabilistic loss of discriminative features caused by occlusion and background interference in pedestrian re-identification tasks,a person re-identification method combining spatial feature learning and multi-granularity feature fusion was proposed.First,an attention spatial transformation network(A-STN)is proposed to learn spatial features and solve the problem of misalignment of pedestrian spatial features.Then the network was divided into a global branch,a local coarse-grained fusion branch,and a local fine-grained fusion branch to extract pedestrian global features,coarse-grained fusion features,and fine-grained fusion features,respectively.Among them,the global branch enriches the global features by fusing different pooling features.The local coarse-grained fusion branch uses an overlay pooling to enhance each local feature while learning the correlation relationship between multi-granularity features.The local fine-grained fusion branch uses a differential pooling to obtain the differential features that were fused with global features to learn the relationship between pedestrian local features and pedestrian global features.Finally,the proposed method was compared on three public datasets:Market1501,DukeMTMC-ReID and CUHK03.The experimental results were better than those of the comparative methods,which verifies the effectiveness of the proposed method.
基金supported by the National Natural Science Foundation of China(Nos.62373215,62373219 and 62073193)the Natural Science Foundation of Shandong Province(No.ZR2023MF100)+1 种基金the Key Projects of the Ministry of Industry and Information Technology(No.TC220H057-2022)the Independently Developed Instrument Funds of Shandong University(No.zy20240201)。
文摘Current you only look once(YOLO)-based algorithm model is facing the challenge of overwhelming parameters and calculation complexity under the printed circuit board(PCB)defect detection application scenario.In order to solve this problem,we propose a new method,which combined the lightweight network mobile vision transformer(Mobile Vi T)with the convolutional block attention module(CBAM)mechanism and the new regression loss function.This method needed less computation resources,making it more suitable for embedded edge detection devices.Meanwhile,the new loss function improved the positioning accuracy of the bounding box and enhanced the robustness of the model.In addition,experiments on public datasets demonstrate that the improved model achieves an average accuracy of 87.9%across six typical defect detection tasks,while reducing computational costs by nearly 90%.It significantly reduces the model's computational requirements while maintaining accuracy,ensuring reliable performance for edge deployment.
基金the National Natural Science Foundation of China(No.61661031)。
文摘There is a Poisson inverse problem in biomedical imaging,fluorescence microscopy and so on.Since the observed measurements are damaged by a linear operator and further destroyed by Poisson noise,recovering the approximate original image is difficult.Motivated by the decouple scheme and the variance-stabilizing transformation(VST)strategy,we propose a method of transformed convolutional neural network(CNN)to restore the observed image.In the network,the Conv-layers play the role of a linear inverse filter and the distribution transformation simultaneously.Furthermore,there is no batch normalization(BN)layer in the residual block of the network,which is devoted to tackling with the non-Gaussian recovery procedure.The proposed method is compared with state-of-the-art Poisson deblurring algorithms,and the experimental results show the effectiveness of the method.
文摘Background In this study, we propose view interpolation networks to reproduce changes in the brightness of an object′s surface depending on the viewing direction, which is important for reproducing the material appearance of a real object. Method We used an original and modified version of U-Net for image transformation. The networks were trained to generate images from the intermediate viewpoints of four cameras placed at the corners of a square. We conducted an experiment using with three different combinations of methods and training data formats. Result We determined that inputting the coordinates of the viewpoints together with the four camera images and using images from random viewpoints as the training data produces the best results.
文摘According to the researches on theoretic basis in part Ⅰ of the paper, the spanning tree algorithms solving the maximum independent set both in even network and in odd network have been developed in this part, part Ⅱ of the paper. The algorithms transform first the general network into the pair sets network, and then decompose the pair sets network into a series of pair subsets by use of the characteristic of maximum flow passing through the pair sets network. As for the even network, the algorithm requires only one time of transformation and decomposition, the maximum independent set can be gained without any iteration processes, and the time complexity of the algorithm is within the bound of O(V3). However, as for the odd network, the algorithm consists of two stages. In the first stage, the general odd network is transformed and decomposed into the pseudo-negative envelope graphs and generalized reverse pseudo-negative envelope graphs alternately distributed at first; then the algorithm turns to the second stage, searching for the negative envelope graphs within the pseudo-negative envelope graphs only. Each time as a negative envelope graph has been found, renew the pair sets network by iteration at once, and then turn back to the first stage. So both stages form a circulation process up to the optimum. Two available methods, the adjusting search and the picking-off search are specially developed to deal with the problems resulted from the odd network. Both of them link up with each other harmoniously and are embedded together in the algorithm. Analysis and study indicate that the time complexity of this algorithm is within the bound of O(V5).
文摘The structure and characteristics of a connected network are analyzed, and a special kind of sub-network, which can optimize the iteration processes, is discovered. Then, the sufficient and necessary conditions for obtaining the maximum independent set are deduced. It is found that the neighborhood of this sub-network possesses the similar characters, but both can never be allowed incorporated together. Particularly, it is identified that the network can be divided into two parts by a certain style, and then both of them can be transformed into a pair sets network, where the special sub-networks and their neighborhoods appear alternately distributed throughout the entire pair sets network. By use of this characteristic, the network decomposed enough without losing any solutions is obtained. All of these above will be able to make well ready for developing a much better algorithm with polynomial time bound for an odd network in the the application research part of this subject.
基金Supported by the Ministerial Level Research Foundation(404040401)
文摘Training neural network to recognize targets needs a lot of samples.People usually get these samples in a non-systematic way,which can miss or overemphasize some target information.To improve this situation,a new method based on virtual model and invariant moments was proposed to generate training samples.The method was composed of the following steps:use computer and simulation software to build target object's virtual model and then simulate the environment,light condition,camera parameter,etc.;rotate the model by spin and nutation of inclination to get the image sequence by virtual camera;preprocess each image and transfer them into binary image;calculate the invariant moments for each image and get a vectors' sequence.The vectors' sequence which was proved to be complete became the training samples together with the target outputs.The simulated results showed that the proposed method could be used to recognize the real targets and improve the accuracy of target recognition effectively when the sampling interval was short enough and the circumstance simulation was close enough.
文摘The process of turning forest area into land is known as deforestation or forest degradation. Reforestation as a fraction of deforestation is extremely low. For improved qualitative and quantitative classification, we used Sentinel-1 dataset of State of Para, Brazil to precisely and closely monitor deforestation between June 2019 and June 2023. This research aimed to find out suitable model for classification called Satellite Imaging analysis by Transpose deep neural transformation network (SIT-net) using mathematical model based on Band math approach to classify deforestation applying transpose deep neural network. The main advantage of proposed model is easy to handle SAR images. The study concludes that SAR satellite gives high-resolution images to improve deforestation monitoring and proposed model takes less computational time compared to other techniques.
文摘As the demand for more efficient and adaptable power distribution systems intensifies, especially in rural areas, innovative solutions like the Capacitor-Coupled Substation with a Controllable Network Transformer (CCS-CNT) are becoming increasingly critical. Traditional power distribution networks, often limited by unidirectional flow capabilities and inflexibility, struggle to meet the complex demands of modern energy systems. The CCS-CNT system offers a transformative approach by enabling bidirectional power flow between high-voltage transmission lines and local distribution networks, a feature that is essential for integrating renewable energy sources and ensuring reliable electrification in underserved regions. This paper presents a detailed mathematical representation of power flow within the CCS-CNT system, emphasizing the control of both active and reactive power through the adjustment of voltage levels and phase angles. A control algorithm is developed to dynamically manage power flow, ensuring optimal performance by minimizing losses and maintaining voltage stability across the network. The proposed CCS-CNT system demonstrates significant potential in enhancing the efficiency and reliability of power distribution, making it particularly suited for rural electrification and other applications where traditional methods fall short. The findings underscore the system's capability to adapt to varying operational conditions, offering a robust solution for modern power distribution challenges.
基金supported by National Key Research and Development Program of China[Grant number 2017YFB0504203]Xinjiang Production and Construction Corps Science and Technology Project:[Grant number 2017DB005].
文摘Recent change detection(CD)methods focus on the extraction of deep change semantic features.However,existing methods overlook the fine-grained features and have the poor ability to capture long-range space–time information,which leads to the micro changes missing and the edges of change types smoothing.In this paper,a potential transformer-based semantic change detection(SCD)model,Pyramid-SCDFormer is proposed,which precisely recognizes the small changes and fine edges details of the changes.The SCD model selectively merges different semantic tokens in multi-head self-attention block to obtain multiscale features,which is crucial for extraction information of remote sensing images(RSIs)with multiple changes from different scales.Moreover,we create a well-annotated SCD dataset,Landsat-SCD with unprecedented time series and change types in complex scenarios.Comparing with three Convolutional Neural Network-based,one attention-based,and two transformer-based networks,experimental results demonstrate that the Pyramid-SCDFormer stably outperforms the existing state-of-the-art CD models and obtains an improvement in MIoU/F1 of 1.11/0.76%,0.57/0.50%,and 8.75/8.59%on the LEVIR-CD,WHU_CD,and Landsat-SCD dataset respectively.For change classes proportion less than 1%,the proposed model improves the MIoU by 7.17–19.53%on Landsat-SCD dataset.The recognition performance for small-scale and fine edges of change types has greatly improved.
基金This research was supported by the Basic Science Research Program through the National Research Foundation of Korea(NRF)funded by the Ministry of Education(No.2018R1D1A1A02085645)Also,this work was supported by the KoreaMedical Device Development Fund grant funded by the Korean government(the Ministry of Science and ICT,the Ministry of Trade,Industry and Energy,the Ministry of Health&Welfare,theMinistry of Food and Drug Safety)(Project Number:202012D05-02).
文摘With the advancement of computer vision techniques in surveillance systems,the need for more proficient,intelligent,and sustainable facial expressions and age recognition is necessary.The main purpose of this study is to develop accurate facial expressions and an age recognition system that is capable of error-free recognition of human expression and age in both indoor and outdoor environments.The proposed system first takes an input image pre-process it and then detects faces in the entire image.After that landmarks localization helps in the formation of synthetic face mask prediction.A novel set of features are extracted and passed to a classifier for the accurate classification of expressions and age group.The proposed system is tested over two benchmark datasets,namely,the Gallagher collection person dataset and the Images of Groups dataset.The system achieved remarkable results over these benchmark datasets about recognition accuracy and computational time.The proposed system would also be applicable in different consumer application domains such as online business negotiations,consumer behavior analysis,E-learning environments,and emotion robotics.
基金Partial support of this work was through a project PID2020-115454GB-C21 of the Spanish Ministry of Science and Innovation(MICINN).
文摘This paper develops a trustworthy deep learning model that considers electricity demand(G)and local climate conditions.The model utilises Multi-Head Self-Attention Transformer(TNET)to capture critical information from𝐻,to attain reliable predictions with local climate(rainfall,radiation,humidity,evaporation,and maximum and minimum temperatures)data from Energex substations in Queensland,Australia.The TNET model is then evaluated with deep learning models(Long-Short Term Memory LSTM,Bidirectional LSTM BILSTM,Gated Recurrent Unit GRU,Convolutional Neural Networks CNN,and Deep Neural Network DNN)based on robust model assessment metrics.The Kernel Density Estimation method is used to generate the prediction interval(PI)of electricity demand forecasts and derive probability metrics and results to show the developed TNET model is accurate for all the substations.The study concludes that the proposed TNET model is a reliable electricity demand predictive tool that has high accuracy and low predictive errors and could be employed as a stratagem by demand modellers and energy policy-makers who wish to incorporate climatic factors into electricity demand patterns and develop national energy market insights and analysis systems.
基金supported by the National Natural Science Foundation of China under Grant No.62036001.
文摘Aspect category detection is one challenging subtask of aspect based sentiment analysis, which categorizes a review sentence into a set of predefined aspect categories. Most existing methods regard the aspect category detection as a flat classification problem. However, aspect categories are inter-related, and they are usually organized with a hierarchical tree structure. To leverage the structure information, this paper proposes a hierarchical multi-label classification model to detect aspect categories and uses a graph enhanced transformer network to integrate label dependency information into prediction features. Experiments have been conducted on four widely-used benchmark datasets, showing that the proposed model outperforms all strong baselines.