Sign language dataset is essential in sign language recognition and translation(SLRT). Current public sign language datasets are small and lack diversity, which does not meet the practical application requirements for...Sign language dataset is essential in sign language recognition and translation(SLRT). Current public sign language datasets are small and lack diversity, which does not meet the practical application requirements for SLRT. However, making a large-scale and diverse sign language dataset is difficult as sign language data on the Internet is scarce. In making a large-scale and diverse sign language dataset, some sign language data qualities are not up to standard. This paper proposes a two information streams transformer(TIST) model to judge whether the quality of sign language data is qualified. To verify that TIST effectively improves sign language recognition(SLR), we make two datasets, the screened dataset and the unscreened dataset. In this experiment, this paper uses visual alignment constraint(VAC) as the baseline model. The experimental results show that the screened dataset can achieve better word error rate(WER) than the unscreened dataset.展开更多
The existing data mining methods are mostly focused on relational databases and structured data, but not on complex structured data (like in extensible markup language(XML)). By converting XML document type descriptio...The existing data mining methods are mostly focused on relational databases and structured data, but not on complex structured data (like in extensible markup language(XML)). By converting XML document type description to the relational semantic recording XML data relations, and using an XML data mining language, the XML data mining system presents a strategy to mine information on XML.展开更多
With object oriented design/analysis,a general purpose corrosion data model(GPCDM)and a corrosion data markup language(CDML)are created to meet the increasing demand of multi-source corrosion data integration and shar...With object oriented design/analysis,a general purpose corrosion data model(GPCDM)and a corrosion data markup language(CDML)are created to meet the increasing demand of multi-source corrosion data integration and sharing."Cor-rosion data island"is proposed to model the corrosion data of comprehensiveness and self-contained.The island of tree-liked structure contains six first-level child nodes to characterize every important aspect of the corrosion data.Each first-level node holds more child nodes recursively as data containers.The design of data structure inside the island is intended to decrease the learning curve and break the acceptance barrier of GPCDM and CDML.A detailed explanation about the role and meaning of the first-level nodes are presented with examples chosen carefully in order to review the design goals and requirements proposed in the previous paper.Then,CDML tag structure and CDML application programming interface(API)are introduced in logic order.At the end,the roles of GPCDM,CDML and its API in the multi-source corrosion data integration and information sharing are highlighted and projected.展开更多
Prompt learning has become crucial for adapting Visual Language Models(VLM)to downstream tasks.Although existing prompt learning models have made significant strides,they still face two major challenges:1.Too much att...Prompt learning has become crucial for adapting Visual Language Models(VLM)to downstream tasks.Although existing prompt learning models have made significant strides,they still face two major challenges:1.Too much attention is paid to learning about basic classes,making it harder to understand novel classes;2.Most methods only rely on the context information provided by the prompt template,resulting in limited text features.In this study,we propose a new fine-tuning method for Visual-Language Models called Input-Enhanced Prompt Tuning(IEPT).The IEPT improves the generalization of VLMs for downstream tasks by introducing two components,i.e.,the Data Augmentation Framework(DAF)and the Category Generalization Optimizer(CGO).Specifically,the DAF employs Large Language Models to resolve issues of word ambiguity by obtaining more class label context,and uses simple image augmentation to address the issue of limited features by providing more image samples.The CGO prevents overfitting by adding new class names during training.Experiments show that the performance of IEPT in various evaluation suites is better or comparable to that of the existing method,covering basic to novel generalization,domain generalization,and cross-dataset evaluation.Compared to the state-of-the-art method PromptSRC,IEPT achieves an absolute improvement of 0.40%for base classes,1.56%for novel classes and 1.04%on the harmonic mean,averaged over 11 datasets.In addition,we present detailed ablation studies that validate the individual contributions of DAF and CGO to the overall performance of IEPT.Our code is available at https://github.com/ayuan 0626/IEPT.展开更多
基金supported by the National Language Commission to research on sign language data specifications for artificial intelligence applications and test standards for language service translation systems (No.ZDI145-70)。
文摘Sign language dataset is essential in sign language recognition and translation(SLRT). Current public sign language datasets are small and lack diversity, which does not meet the practical application requirements for SLRT. However, making a large-scale and diverse sign language dataset is difficult as sign language data on the Internet is scarce. In making a large-scale and diverse sign language dataset, some sign language data qualities are not up to standard. This paper proposes a two information streams transformer(TIST) model to judge whether the quality of sign language data is qualified. To verify that TIST effectively improves sign language recognition(SLR), we make two datasets, the screened dataset and the unscreened dataset. In this experiment, this paper uses visual alignment constraint(VAC) as the baseline model. The experimental results show that the screened dataset can achieve better word error rate(WER) than the unscreened dataset.
文摘The existing data mining methods are mostly focused on relational databases and structured data, but not on complex structured data (like in extensible markup language(XML)). By converting XML document type description to the relational semantic recording XML data relations, and using an XML data mining language, the XML data mining system presents a strategy to mine information on XML.
文摘With object oriented design/analysis,a general purpose corrosion data model(GPCDM)and a corrosion data markup language(CDML)are created to meet the increasing demand of multi-source corrosion data integration and sharing."Cor-rosion data island"is proposed to model the corrosion data of comprehensiveness and self-contained.The island of tree-liked structure contains six first-level child nodes to characterize every important aspect of the corrosion data.Each first-level node holds more child nodes recursively as data containers.The design of data structure inside the island is intended to decrease the learning curve and break the acceptance barrier of GPCDM and CDML.A detailed explanation about the role and meaning of the first-level nodes are presented with examples chosen carefully in order to review the design goals and requirements proposed in the previous paper.Then,CDML tag structure and CDML application programming interface(API)are introduced in logic order.At the end,the roles of GPCDM,CDML and its API in the multi-source corrosion data integration and information sharing are highlighted and projected.
基金supported by National Key R&D Program of China(No.2022YFE0196100)the Innovation Capacity Enhancement Program Science and Technology Platform Project of Hebei Province(22567623H)Hebei University High Level Innovative Talent Research Start-up Funding Project(No.521000981092).
文摘Prompt learning has become crucial for adapting Visual Language Models(VLM)to downstream tasks.Although existing prompt learning models have made significant strides,they still face two major challenges:1.Too much attention is paid to learning about basic classes,making it harder to understand novel classes;2.Most methods only rely on the context information provided by the prompt template,resulting in limited text features.In this study,we propose a new fine-tuning method for Visual-Language Models called Input-Enhanced Prompt Tuning(IEPT).The IEPT improves the generalization of VLMs for downstream tasks by introducing two components,i.e.,the Data Augmentation Framework(DAF)and the Category Generalization Optimizer(CGO).Specifically,the DAF employs Large Language Models to resolve issues of word ambiguity by obtaining more class label context,and uses simple image augmentation to address the issue of limited features by providing more image samples.The CGO prevents overfitting by adding new class names during training.Experiments show that the performance of IEPT in various evaluation suites is better or comparable to that of the existing method,covering basic to novel generalization,domain generalization,and cross-dataset evaluation.Compared to the state-of-the-art method PromptSRC,IEPT achieves an absolute improvement of 0.40%for base classes,1.56%for novel classes and 1.04%on the harmonic mean,averaged over 11 datasets.In addition,we present detailed ablation studies that validate the individual contributions of DAF and CGO to the overall performance of IEPT.Our code is available at https://github.com/ayuan 0626/IEPT.