Sign language dataset is essential in sign language recognition and translation(SLRT). Current public sign language datasets are small and lack diversity, which does not meet the practical application requirements for...Sign language dataset is essential in sign language recognition and translation(SLRT). Current public sign language datasets are small and lack diversity, which does not meet the practical application requirements for SLRT. However, making a large-scale and diverse sign language dataset is difficult as sign language data on the Internet is scarce. In making a large-scale and diverse sign language dataset, some sign language data qualities are not up to standard. This paper proposes a two information streams transformer(TIST) model to judge whether the quality of sign language data is qualified. To verify that TIST effectively improves sign language recognition(SLR), we make two datasets, the screened dataset and the unscreened dataset. In this experiment, this paper uses visual alignment constraint(VAC) as the baseline model. The experimental results show that the screened dataset can achieve better word error rate(WER) than the unscreened dataset.展开更多
The existing data mining methods are mostly focused on relational databases and structured data, but not on complex structured data (like in extensible markup language(XML)). By converting XML document type descriptio...The existing data mining methods are mostly focused on relational databases and structured data, but not on complex structured data (like in extensible markup language(XML)). By converting XML document type description to the relational semantic recording XML data relations, and using an XML data mining language, the XML data mining system presents a strategy to mine information on XML.展开更多
With object oriented design/analysis,a general purpose corrosion data model(GPCDM)and a corrosion data markup language(CDML)are created to meet the increasing demand of multi-source corrosion data integration and shar...With object oriented design/analysis,a general purpose corrosion data model(GPCDM)and a corrosion data markup language(CDML)are created to meet the increasing demand of multi-source corrosion data integration and sharing."Cor-rosion data island"is proposed to model the corrosion data of comprehensiveness and self-contained.The island of tree-liked structure contains six first-level child nodes to characterize every important aspect of the corrosion data.Each first-level node holds more child nodes recursively as data containers.The design of data structure inside the island is intended to decrease the learning curve and break the acceptance barrier of GPCDM and CDML.A detailed explanation about the role and meaning of the first-level nodes are presented with examples chosen carefully in order to review the design goals and requirements proposed in the previous paper.Then,CDML tag structure and CDML application programming interface(API)are introduced in logic order.At the end,the roles of GPCDM,CDML and its API in the multi-source corrosion data integration and information sharing are highlighted and projected.展开更多
基金supported by the National Language Commission to research on sign language data specifications for artificial intelligence applications and test standards for language service translation systems (No.ZDI145-70)。
文摘Sign language dataset is essential in sign language recognition and translation(SLRT). Current public sign language datasets are small and lack diversity, which does not meet the practical application requirements for SLRT. However, making a large-scale and diverse sign language dataset is difficult as sign language data on the Internet is scarce. In making a large-scale and diverse sign language dataset, some sign language data qualities are not up to standard. This paper proposes a two information streams transformer(TIST) model to judge whether the quality of sign language data is qualified. To verify that TIST effectively improves sign language recognition(SLR), we make two datasets, the screened dataset and the unscreened dataset. In this experiment, this paper uses visual alignment constraint(VAC) as the baseline model. The experimental results show that the screened dataset can achieve better word error rate(WER) than the unscreened dataset.
文摘The existing data mining methods are mostly focused on relational databases and structured data, but not on complex structured data (like in extensible markup language(XML)). By converting XML document type description to the relational semantic recording XML data relations, and using an XML data mining language, the XML data mining system presents a strategy to mine information on XML.
文摘With object oriented design/analysis,a general purpose corrosion data model(GPCDM)and a corrosion data markup language(CDML)are created to meet the increasing demand of multi-source corrosion data integration and sharing."Cor-rosion data island"is proposed to model the corrosion data of comprehensiveness and self-contained.The island of tree-liked structure contains six first-level child nodes to characterize every important aspect of the corrosion data.Each first-level node holds more child nodes recursively as data containers.The design of data structure inside the island is intended to decrease the learning curve and break the acceptance barrier of GPCDM and CDML.A detailed explanation about the role and meaning of the first-level nodes are presented with examples chosen carefully in order to review the design goals and requirements proposed in the previous paper.Then,CDML tag structure and CDML application programming interface(API)are introduced in logic order.At the end,the roles of GPCDM,CDML and its API in the multi-source corrosion data integration and information sharing are highlighted and projected.