Sign language dataset is essential in sign language recognition and translation(SLRT). Current public sign language datasets are small and lack diversity, which does not meet the practical application requirements for...Sign language dataset is essential in sign language recognition and translation(SLRT). Current public sign language datasets are small and lack diversity, which does not meet the practical application requirements for SLRT. However, making a large-scale and diverse sign language dataset is difficult as sign language data on the Internet is scarce. In making a large-scale and diverse sign language dataset, some sign language data qualities are not up to standard. This paper proposes a two information streams transformer(TIST) model to judge whether the quality of sign language data is qualified. To verify that TIST effectively improves sign language recognition(SLR), we make two datasets, the screened dataset and the unscreened dataset. In this experiment, this paper uses visual alignment constraint(VAC) as the baseline model. The experimental results show that the screened dataset can achieve better word error rate(WER) than the unscreened dataset.展开更多
The real-time energy flow data obtained in industrial production processes are usually of low quality.It is difficult to accurately predict the short-term energy flow profile by using these field data,which diminishes...The real-time energy flow data obtained in industrial production processes are usually of low quality.It is difficult to accurately predict the short-term energy flow profile by using these field data,which diminishes the effect of industrial big data and artificial intelligence in industrial energy system.The real-time data of blast furnace gas(BFG)generation collected in iron and steel sites are also of low quality.In order to tackle this problem,a three-stage data quality improvement strategy was proposed to predict the BFG generation.In the first stage,correlation principle was used to test the sample set.In the second stage,the original sample set was rectified and updated.In the third stage,Kalman filter was employed to eliminate the noise of the updated sample set.The method was verified by autoregressive integrated moving average model,back propagation neural network model and long short-term memory model.The results show that the prediction model based on the proposed three-stage data quality improvement method performs well.Long short-term memory model has the best prediction performance,with a mean absolute error of 17.85 m3/min,a mean absolute percentage error of 0.21%,and an R squared of 95.17%.展开更多
Sustainable development of power and energy systems(PES)can effectively handle challenges of fuel shortage,environmental pollution,climate change,energy security,etc.Data of PES presents distinctive characteristics in...Sustainable development of power and energy systems(PES)can effectively handle challenges of fuel shortage,environmental pollution,climate change,energy security,etc.Data of PES presents distinctive characteristics including large collection,wide coverage,diverse temporal and spatial scales,inconsistent sparsity,multiple structures and low value density,putting forward higher requirements for real-time and accuracy of data analysis,and bringing great challenges to operation analysis and coordinated control of PES.In order to realize data quality improvement and further support flexible choice of operating mode,safe and efficient coordinated control,dynamic and orderly fault recovery of sustainable PES,this paper proposes an unscented particle filter algorithm,adopting unscented Kalman filter to construct importance density functions and KLD resampling to dynamically adjust the particle number.Simulation results obtained by taking an 85-node system as a benchmark for simulation verification show that compared with traditional PF algorithm and UKF algorithm,UPF algorithm has higher estimation accuracy.展开更多
The dairy herd improvement data from Henan Province were analyzed statistically to establish screening criteria for relevant data, thereby laying a foundation for genetic evaluation of dairy cows. With the 2 152 451 t...The dairy herd improvement data from Henan Province were analyzed statistically to establish screening criteria for relevant data, thereby laying a foundation for genetic evaluation of dairy cows. With the 2 152 451 test-day records about 155 893 Chinese Holstein dairy cows collected by the Henan Dairy Herd Improvement Center from January 2008 to April 2016, the dynamics of test times during a complete lactation, test interval during a complete lactation, days in milk (DIM) of first test-day record, daughter descendant number and herd number of bull, age at first calving and pedigree integrity rate among different years and different herd sizes were analyzed by MEANS order of SAS 9.4. In addition, the data that were applicable to genetic evaluation were screened by SQL program. The results showed that during 2008-2015, the number of cow individuals participating in DHI in Henan Province increased from 7 379 to 93 706; the test-day milk yield increased from 19.91 to 24.05 kg; the somatic cell count reduced from 411.09×10^3 to 277.08×10^3 cells/ml; the percentage of cows with DIM ranging from 5-305 d reached 70.92%; the average test times increased from 3.20 to 6.31 times; the test interval decreased from 70.22 to 33.83 d; the dairy cows with age at first calving of 25 months were dominant, accounting for 12.57%; the bulls whose daughter descendant number was 20 or more and the daughters were distributed in 10 or more farms accounted for 6.05%; the one-generation pedigree integrity rate was 82.54%; the percentage of data that could be used for genetic evaluation was screened as 20.67%, which was lower than the results of other similar studies.展开更多
基金supported by the National Language Commission to research on sign language data specifications for artificial intelligence applications and test standards for language service translation systems (No.ZDI145-70)。
文摘Sign language dataset is essential in sign language recognition and translation(SLRT). Current public sign language datasets are small and lack diversity, which does not meet the practical application requirements for SLRT. However, making a large-scale and diverse sign language dataset is difficult as sign language data on the Internet is scarce. In making a large-scale and diverse sign language dataset, some sign language data qualities are not up to standard. This paper proposes a two information streams transformer(TIST) model to judge whether the quality of sign language data is qualified. To verify that TIST effectively improves sign language recognition(SLR), we make two datasets, the screened dataset and the unscreened dataset. In this experiment, this paper uses visual alignment constraint(VAC) as the baseline model. The experimental results show that the screened dataset can achieve better word error rate(WER) than the unscreened dataset.
基金supported by the National Natural Science Foundation of China(51734004 and 51704069).
文摘The real-time energy flow data obtained in industrial production processes are usually of low quality.It is difficult to accurately predict the short-term energy flow profile by using these field data,which diminishes the effect of industrial big data and artificial intelligence in industrial energy system.The real-time data of blast furnace gas(BFG)generation collected in iron and steel sites are also of low quality.In order to tackle this problem,a three-stage data quality improvement strategy was proposed to predict the BFG generation.In the first stage,correlation principle was used to test the sample set.In the second stage,the original sample set was rectified and updated.In the third stage,Kalman filter was employed to eliminate the noise of the updated sample set.The method was verified by autoregressive integrated moving average model,back propagation neural network model and long short-term memory model.The results show that the prediction model based on the proposed three-stage data quality improvement method performs well.Long short-term memory model has the best prediction performance,with a mean absolute error of 17.85 m3/min,a mean absolute percentage error of 0.21%,and an R squared of 95.17%.
基金supported by China Electric Power Research Institute Innovation Fund Program:Research on inexact data correction and association method for D-IoT(5242001900DS)。
文摘Sustainable development of power and energy systems(PES)can effectively handle challenges of fuel shortage,environmental pollution,climate change,energy security,etc.Data of PES presents distinctive characteristics including large collection,wide coverage,diverse temporal and spatial scales,inconsistent sparsity,multiple structures and low value density,putting forward higher requirements for real-time and accuracy of data analysis,and bringing great challenges to operation analysis and coordinated control of PES.In order to realize data quality improvement and further support flexible choice of operating mode,safe and efficient coordinated control,dynamic and orderly fault recovery of sustainable PES,this paper proposes an unscented particle filter algorithm,adopting unscented Kalman filter to construct importance density functions and KLD resampling to dynamically adjust the particle number.Simulation results obtained by taking an 85-node system as a benchmark for simulation verification show that compared with traditional PF algorithm and UKF algorithm,UPF algorithm has higher estimation accuracy.
基金Supported by Science and Technology Open Cooperation Project of Henan Province(162106000017)Science and Technology People-benefiting Plan Project of Henan Province(152207110004)Puyang Science and Technology Plan Project(150109)~~
文摘The dairy herd improvement data from Henan Province were analyzed statistically to establish screening criteria for relevant data, thereby laying a foundation for genetic evaluation of dairy cows. With the 2 152 451 test-day records about 155 893 Chinese Holstein dairy cows collected by the Henan Dairy Herd Improvement Center from January 2008 to April 2016, the dynamics of test times during a complete lactation, test interval during a complete lactation, days in milk (DIM) of first test-day record, daughter descendant number and herd number of bull, age at first calving and pedigree integrity rate among different years and different herd sizes were analyzed by MEANS order of SAS 9.4. In addition, the data that were applicable to genetic evaluation were screened by SQL program. The results showed that during 2008-2015, the number of cow individuals participating in DHI in Henan Province increased from 7 379 to 93 706; the test-day milk yield increased from 19.91 to 24.05 kg; the somatic cell count reduced from 411.09×10^3 to 277.08×10^3 cells/ml; the percentage of cows with DIM ranging from 5-305 d reached 70.92%; the average test times increased from 3.20 to 6.31 times; the test interval decreased from 70.22 to 33.83 d; the dairy cows with age at first calving of 25 months were dominant, accounting for 12.57%; the bulls whose daughter descendant number was 20 or more and the daughters were distributed in 10 or more farms accounted for 6.05%; the one-generation pedigree integrity rate was 82.54%; the percentage of data that could be used for genetic evaluation was screened as 20.67%, which was lower than the results of other similar studies.