In recent years,artificial intelligence technology has exhibited great potential in seismic signal recognition,setting off a new wave of research.Vast amounts of high-quality labeled data are required to develop and a...In recent years,artificial intelligence technology has exhibited great potential in seismic signal recognition,setting off a new wave of research.Vast amounts of high-quality labeled data are required to develop and apply artificial intelligence in seismology research.In this study,based on the 2013–2020 seismic cataloging reports of the China Earthquake Networks Center,we constructed an artificial intelligence seismological training dataset(“DiTing”)with the largest known total time length.Data were recorded using broadband and short-period seismometers.The obtained dataset included 2,734,748 threecomponent waveform traces from 787,010 regional seismic events,the corresponding P-and S-phase arrival time labels,and 641,025 P-wave first-motion polarity labels.All waveforms were sampled at 50 Hz and cut to a time length of 180 s starting from a random number of seconds before the occurrence of an earthquake.Each three-component waveform contained a considerable amount of descriptive information,such as the epicentral distance,back azimuth,and signal-to-noise ratios.The magnitudes of seismic events,epicentral distance,signal-to-noise ratio of P-wave data,and signal-to-noise ratio of S-wave data ranged from 0 to 7.7,0 to 330 km,–0.05 to 5.31 dB,and–0.05 to 4.73 dB,respectively.The dataset compiled in this study can serve as a high-quality benchmark for machine learning model development and data-driven seismological research on earthquake detection,seismic phase picking,first-motion polarity determination,earthquake magnitude prediction,early warning systems,and strong ground-motion prediction.Such research will further promote the development and application of artificial intelligence in seismology.展开更多
Nowcasts of strong convective precipitation and radar-based quantitative precipitation estimations have always been hot yet challenging issues in meteorological sciences.Data-driven machine learning,especially deep le...Nowcasts of strong convective precipitation and radar-based quantitative precipitation estimations have always been hot yet challenging issues in meteorological sciences.Data-driven machine learning,especially deep learning,provides a new technical approach for the quantitative estimation and forecasting of precipitation.A high-quality,large-sample,and labeled training dataset is critical for the successful application of machine-learning technology to a specific field.The present study develops a benchmark dataset that can be applied to machine learning for minutescale quantitative precipitation estimation and forecasting(QpefBD),containing 231,978 samples of 3185 heavy precipitation events that occurred in 6 provinces of central and eastern China from April to October 2016-2018.Each individual sample consists of 8 products of weather radars at 6-min intervals within the time window of the corresponding event and products of 27 physical quantities at hourly intervals that describe the atmospheric dynamic and thermodynamic conditions.Two data labels,i.e.,ground precipitation intensity and areal coverage of heavy precipitation at 6-min intervals,are also included.The present study describes the basic components of the dataset and data processing and provides metrics for the evaluation of model performance on precipitation estimation and forecasting.Based on these evaluation metrics,some simple and commonly used methods are applied to evaluate precipitation estimates and forecasts.The results can serve as the benchmark reference for the performance evaluation of machine learning models using this dataset.This paper also gives some suggestions and scenarios of the QpefBD application.We believe that the application of this benchmark dataset will promote interdisciplinary collaboration between meteorological sciences and artificial intelligence sciences,providing a new way for the identification and forecast of heavy precipitation.展开更多
Machine learning is becoming increasingly important in scientific and technological progress,due to its ability to create models that describe complex data and generalize well.The wealth of publicly-available seismic ...Machine learning is becoming increasingly important in scientific and technological progress,due to its ability to create models that describe complex data and generalize well.The wealth of publicly-available seismic data nowadays requires automated,fast,and reliable tools to carry out a multitude of tasks,such as the detection of small,local earthquakes in areas characterized by sparsity of receivers.A similar application of machine learning,however,should be built on a large amount of labeled seismograms,which is neither immediate to obtain nor to compile.In this study we present a large dataset of seismograms recorded along the vertical,north,and east components of 1487 broad-band or very broad-band receivers distributed worldwide;this includes 629,0953-component seismograms generated by 304,878 local earthquakes and labeled as EQ,and 615,847 ones labeled as noise(AN).Application of machine learning to this dataset shows that a simple Convolutional Neural Network of 67,939 parameters allows discriminating between earthquakes and noise single-station recordings,even if applied in regions not represented in the training set.Achieving an accuracy of 96.7,95.3,and 93.2% on training,validation,and test set,respectively,we prove that the large variety of geological and tectonic settings covered by our data supports the generalization capabilities of the algorithm,and makes it applicable to real-time detection of local events.We make the database publicly available,intending to provide the seismological and broader scientific community with a benchmark for time-series to be used as a testing ground in signal processing.展开更多
Advancements in deep learning have considerably enhanced techniques for Rapid Entire Body Assess-ment(REBA)pose estimation by leveraging progress in three-dimensional human modeling.This survey provides an extensive o...Advancements in deep learning have considerably enhanced techniques for Rapid Entire Body Assess-ment(REBA)pose estimation by leveraging progress in three-dimensional human modeling.This survey provides an extensive overview of recent advancements,particularly emphasizing monocular image-based methodologies and their incorporation into ergonomic risk assessment frameworks.By reviewing literature from 2016 to 2024,this study offers a current and comprehensive analysis of techniques,existing challenges,and emerging trends in three-dimensional human pose estimation.In contrast to traditional reviews organized by learning paradigms,this survey examines how three-dimensional pose estimation is effectively utilized within musculoskeletal disorder(MSD)assessments,focusing on essential advancements,comparative analyses,and ergonomic implications.We extend existing image-based clas-sification schemes by examining state-of-the-art two-dimensional models that enhance monocular three-dimensional prediction accuracy and analyze skeleton representations by evaluating joint connectivity and spatial configuration,offering insights into how structural variability influences model robustness.A core contribution of this work is the identification of a critical research gap:the limited exploration of estimating REBA scores directly from single RGB images using monocular three-dimensional pose estimation.Most existing studies depend on depth sensors or sequential inputs,limiting applicability in real-time and resource-constrained environments.Our review emphasizes this gap and proposes future research directions to develop accurate,lightweight,and generalizable models suitable for practical deployment.This survey is a valuable resource for researchers and practitioners in computer vision,ergonomics,and related disciplines,offering a structured understanding of current methodologies and guidance for future innovation in three-dimensional human pose estimation for REBA-based ergonomic risk assessment.展开更多
目的谎言检测通过分析个体的生理行为特征来识别其是否说谎,在刑侦和安全审查等领域具有重要应用。然而,目前缺乏公开的中文测谎数据集,考虑到语言和文化方面的差异,基于英文数据集研发的算法可能难以适用于中文语境。此外,现有数据集...目的谎言检测通过分析个体的生理行为特征来识别其是否说谎,在刑侦和安全审查等领域具有重要应用。然而,目前缺乏公开的中文测谎数据集,考虑到语言和文化方面的差异,基于英文数据集研发的算法可能难以适用于中文语境。此外,现有数据集样本规模有限,在激发被试说谎动机方面存在不足。针对这些问题,构建了首个公开的中文多模态测谎数据集(Southeast University multimodal lie detection dataset,SEUMLD)。方法实验基于犯罪知识测试范式,设计了模拟犯罪和模拟审讯等流程以激发被试的说谎动机。通过记录被试在模拟审讯过程中的多模态信号,SEUMLD包含了长期生活在中文语境下的76位被试的视频、音频以及心电3种模态数据,共计3224段对话。该数据集不仅提供了用于判断被试是否说谎的长会话标注(粗粒度标注),还提供了每段长会话细化分割的精准标注(细粒度标注)。基于SEUMLD,设计了跨语种实验以验证语言文化差异对说谎行为的影响;通过迁移学习实验评估其在提升模型泛化能力上的性能;最后基于经典谎言检测方法对SEUMLD进行了基准实验。结果跨语种测谎实验在中英文语境下表现出了显著差异。迁移学习实验验证了SEUMLD在提升模型泛化能力上的优异表现。基准实验结果显示,基于单模态的粗粒度和细粒度测谎的最佳未加权平均召回率(unweighted average recall,UAR)识别结果分别为0.7576和0.7096;融合了多模态信息后的测谎性能达到最佳,粗粒度检测和细粒度测谎的识别结果分别为0.8083和0.7379。结论SEUMLD为研究中文语境下的多模态测谎提供了重要的数据来源,对未来研究中文母语者的说谎模式具有重要意义。数据集开源地址:https://aip.seu.edu.cn/2024/1219/c54084a515309/page.htm或https://doi.org/10.57760/sciencedb.22548。展开更多
Bridging the gap between the computation of mechanical properties and the chemical structure of elastomers is a long-standing challenge.To fill the gap,we create a raw dataset and build predictive models for Young’s ...Bridging the gap between the computation of mechanical properties and the chemical structure of elastomers is a long-standing challenge.To fill the gap,we create a raw dataset and build predictive models for Young’s modulus,tensile strength,and elongation at break of polyurethane elastomers(PUEs).We then construct a benchmark dataset with 50.4%samples remained from the raw dataset which suffers from the intrinsic diversity problem,through a newly proposed recursive data elimination protocol.The coefficients of determination(R^(2)s)from predictions are improved from 0.73-0.78 to 0.85-0.91 based on the raw and the benchmark datasets.The fitting of stress-strain curves using the machine learning model shows a slightly better performance than that for one of the well-performed constitutive models(e.g.,the Khiêm-Itskov model).It confirmed that the black-box machine learning models are feasible to bridge the gap between the mechanical properties of PUEs and multiple factors for their chemical structures,composition,processing,and measurement settings.While accurate prediction for these curves is still a challenge.We release the raw dataset and the most representative benchmark dataset so far to call for more attention to tackle the longstanding gap problem.展开更多
With the development of artificial intelligence-related technologies such as deep learning,various organizations,including the government,are making various efforts to generate and manage big data for use in artificia...With the development of artificial intelligence-related technologies such as deep learning,various organizations,including the government,are making various efforts to generate and manage big data for use in artificial intelligence.However,it is difficult to acquire big data due to various social problems and restrictions such as personal information leakage.There are many problems in introducing technology in fields that do not have enough training data necessary to apply deep learning technology.Therefore,this study proposes a mixed contour data augmentation technique,which is a data augmentation technique using contour images,to solve a problem caused by a lack of data.ResNet,a famous convolutional neural network(CNN)architecture,and CIFAR-10,a benchmark data set,are used for experimental performance evaluation to prove the superiority of the proposed method.And to prove that high performance improvement can be achieved even with a small training dataset,the ratio of the training dataset was divided into 70%,50%,and 30%for comparative analysis.As a result of applying the mixed contour data augmentation technique,it was possible to achieve a classification accuracy improvement of up to 4.64%and high accuracy even with a small amount of data set.In addition,it is expected that the mixed contour data augmentation technique can be applied in various fields by proving the excellence of the proposed data augmentation technique using benchmark datasets.展开更多
基金the National Natural Science Foundation of China(Nos.41804047 and 42111540260)Fundamental Research Funds of the Institute of Geophysics,China Earthquake Administration(NO.DQJB19A0114)the Key Research Program of the Institute of Geology and Geophysics,Chinese Academy of Sciences(No.IGGCAS-201904).
文摘In recent years,artificial intelligence technology has exhibited great potential in seismic signal recognition,setting off a new wave of research.Vast amounts of high-quality labeled data are required to develop and apply artificial intelligence in seismology research.In this study,based on the 2013–2020 seismic cataloging reports of the China Earthquake Networks Center,we constructed an artificial intelligence seismological training dataset(“DiTing”)with the largest known total time length.Data were recorded using broadband and short-period seismometers.The obtained dataset included 2,734,748 threecomponent waveform traces from 787,010 regional seismic events,the corresponding P-and S-phase arrival time labels,and 641,025 P-wave first-motion polarity labels.All waveforms were sampled at 50 Hz and cut to a time length of 180 s starting from a random number of seconds before the occurrence of an earthquake.Each three-component waveform contained a considerable amount of descriptive information,such as the epicentral distance,back azimuth,and signal-to-noise ratios.The magnitudes of seismic events,epicentral distance,signal-to-noise ratio of P-wave data,and signal-to-noise ratio of S-wave data ranged from 0 to 7.7,0 to 330 km,–0.05 to 5.31 dB,and–0.05 to 4.73 dB,respectively.The dataset compiled in this study can serve as a high-quality benchmark for machine learning model development and data-driven seismological research on earthquake detection,seismic phase picking,first-motion polarity determination,earthquake magnitude prediction,early warning systems,and strong ground-motion prediction.Such research will further promote the development and application of artificial intelligence in seismology.
基金Supported by the National Key Research and Development Program of China(2018YFC1507305)。
文摘Nowcasts of strong convective precipitation and radar-based quantitative precipitation estimations have always been hot yet challenging issues in meteorological sciences.Data-driven machine learning,especially deep learning,provides a new technical approach for the quantitative estimation and forecasting of precipitation.A high-quality,large-sample,and labeled training dataset is critical for the successful application of machine-learning technology to a specific field.The present study develops a benchmark dataset that can be applied to machine learning for minutescale quantitative precipitation estimation and forecasting(QpefBD),containing 231,978 samples of 3185 heavy precipitation events that occurred in 6 provinces of central and eastern China from April to October 2016-2018.Each individual sample consists of 8 products of weather radars at 6-min intervals within the time window of the corresponding event and products of 27 physical quantities at hourly intervals that describe the atmospheric dynamic and thermodynamic conditions.Two data labels,i.e.,ground precipitation intensity and areal coverage of heavy precipitation at 6-min intervals,are also included.The present study describes the basic components of the dataset and data processing and provides metrics for the evaluation of model performance on precipitation estimation and forecasting.Based on these evaluation metrics,some simple and commonly used methods are applied to evaluate precipitation estimates and forecasts.The results can serve as the benchmark reference for the performance evaluation of machine learning models using this dataset.This paper also gives some suggestions and scenarios of the QpefBD application.We believe that the application of this benchmark dataset will promote interdisciplinary collaboration between meteorological sciences and artificial intelligence sciences,providing a new way for the identification and forecast of heavy precipitation.
文摘Machine learning is becoming increasingly important in scientific and technological progress,due to its ability to create models that describe complex data and generalize well.The wealth of publicly-available seismic data nowadays requires automated,fast,and reliable tools to carry out a multitude of tasks,such as the detection of small,local earthquakes in areas characterized by sparsity of receivers.A similar application of machine learning,however,should be built on a large amount of labeled seismograms,which is neither immediate to obtain nor to compile.In this study we present a large dataset of seismograms recorded along the vertical,north,and east components of 1487 broad-band or very broad-band receivers distributed worldwide;this includes 629,0953-component seismograms generated by 304,878 local earthquakes and labeled as EQ,and 615,847 ones labeled as noise(AN).Application of machine learning to this dataset shows that a simple Convolutional Neural Network of 67,939 parameters allows discriminating between earthquakes and noise single-station recordings,even if applied in regions not represented in the training set.Achieving an accuracy of 96.7,95.3,and 93.2% on training,validation,and test set,respectively,we prove that the large variety of geological and tectonic settings covered by our data supports the generalization capabilities of the algorithm,and makes it applicable to real-time detection of local events.We make the database publicly available,intending to provide the seismological and broader scientific community with a benchmark for time-series to be used as a testing ground in signal processing.
文摘Advancements in deep learning have considerably enhanced techniques for Rapid Entire Body Assess-ment(REBA)pose estimation by leveraging progress in three-dimensional human modeling.This survey provides an extensive overview of recent advancements,particularly emphasizing monocular image-based methodologies and their incorporation into ergonomic risk assessment frameworks.By reviewing literature from 2016 to 2024,this study offers a current and comprehensive analysis of techniques,existing challenges,and emerging trends in three-dimensional human pose estimation.In contrast to traditional reviews organized by learning paradigms,this survey examines how three-dimensional pose estimation is effectively utilized within musculoskeletal disorder(MSD)assessments,focusing on essential advancements,comparative analyses,and ergonomic implications.We extend existing image-based clas-sification schemes by examining state-of-the-art two-dimensional models that enhance monocular three-dimensional prediction accuracy and analyze skeleton representations by evaluating joint connectivity and spatial configuration,offering insights into how structural variability influences model robustness.A core contribution of this work is the identification of a critical research gap:the limited exploration of estimating REBA scores directly from single RGB images using monocular three-dimensional pose estimation.Most existing studies depend on depth sensors or sequential inputs,limiting applicability in real-time and resource-constrained environments.Our review emphasizes this gap and proposes future research directions to develop accurate,lightweight,and generalizable models suitable for practical deployment.This survey is a valuable resource for researchers and practitioners in computer vision,ergonomics,and related disciplines,offering a structured understanding of current methodologies and guidance for future innovation in three-dimensional human pose estimation for REBA-based ergonomic risk assessment.
文摘目的谎言检测通过分析个体的生理行为特征来识别其是否说谎,在刑侦和安全审查等领域具有重要应用。然而,目前缺乏公开的中文测谎数据集,考虑到语言和文化方面的差异,基于英文数据集研发的算法可能难以适用于中文语境。此外,现有数据集样本规模有限,在激发被试说谎动机方面存在不足。针对这些问题,构建了首个公开的中文多模态测谎数据集(Southeast University multimodal lie detection dataset,SEUMLD)。方法实验基于犯罪知识测试范式,设计了模拟犯罪和模拟审讯等流程以激发被试的说谎动机。通过记录被试在模拟审讯过程中的多模态信号,SEUMLD包含了长期生活在中文语境下的76位被试的视频、音频以及心电3种模态数据,共计3224段对话。该数据集不仅提供了用于判断被试是否说谎的长会话标注(粗粒度标注),还提供了每段长会话细化分割的精准标注(细粒度标注)。基于SEUMLD,设计了跨语种实验以验证语言文化差异对说谎行为的影响;通过迁移学习实验评估其在提升模型泛化能力上的性能;最后基于经典谎言检测方法对SEUMLD进行了基准实验。结果跨语种测谎实验在中英文语境下表现出了显著差异。迁移学习实验验证了SEUMLD在提升模型泛化能力上的优异表现。基准实验结果显示,基于单模态的粗粒度和细粒度测谎的最佳未加权平均召回率(unweighted average recall,UAR)识别结果分别为0.7576和0.7096;融合了多模态信息后的测谎性能达到最佳,粗粒度检测和细粒度测谎的识别结果分别为0.8083和0.7379。结论SEUMLD为研究中文语境下的多模态测谎提供了重要的数据来源,对未来研究中文母语者的说谎模式具有重要意义。数据集开源地址:https://aip.seu.edu.cn/2024/1219/c54084a515309/page.htm或https://doi.org/10.57760/sciencedb.22548。
基金financially supported by the National Natural Science Foundation of China(Nos.51988102 and 22173094)CAS Key Research Program of Frontier Sciences(No.QYZDYSSW-SLH027)+1 种基金Network and Computing Center,Changchun Institute of Applied Chemistry for essential supportthe financial support of Major Science and Technology Project in Yunnan Province(No.202002AB080001-1)。
文摘Bridging the gap between the computation of mechanical properties and the chemical structure of elastomers is a long-standing challenge.To fill the gap,we create a raw dataset and build predictive models for Young’s modulus,tensile strength,and elongation at break of polyurethane elastomers(PUEs).We then construct a benchmark dataset with 50.4%samples remained from the raw dataset which suffers from the intrinsic diversity problem,through a newly proposed recursive data elimination protocol.The coefficients of determination(R^(2)s)from predictions are improved from 0.73-0.78 to 0.85-0.91 based on the raw and the benchmark datasets.The fitting of stress-strain curves using the machine learning model shows a slightly better performance than that for one of the well-performed constitutive models(e.g.,the Khiêm-Itskov model).It confirmed that the black-box machine learning models are feasible to bridge the gap between the mechanical properties of PUEs and multiple factors for their chemical structures,composition,processing,and measurement settings.While accurate prediction for these curves is still a challenge.We release the raw dataset and the most representative benchmark dataset so far to call for more attention to tackle the longstanding gap problem.
文摘With the development of artificial intelligence-related technologies such as deep learning,various organizations,including the government,are making various efforts to generate and manage big data for use in artificial intelligence.However,it is difficult to acquire big data due to various social problems and restrictions such as personal information leakage.There are many problems in introducing technology in fields that do not have enough training data necessary to apply deep learning technology.Therefore,this study proposes a mixed contour data augmentation technique,which is a data augmentation technique using contour images,to solve a problem caused by a lack of data.ResNet,a famous convolutional neural network(CNN)architecture,and CIFAR-10,a benchmark data set,are used for experimental performance evaluation to prove the superiority of the proposed method.And to prove that high performance improvement can be achieved even with a small training dataset,the ratio of the training dataset was divided into 70%,50%,and 30%for comparative analysis.As a result of applying the mixed contour data augmentation technique,it was possible to achieve a classification accuracy improvement of up to 4.64%and high accuracy even with a small amount of data set.In addition,it is expected that the mixed contour data augmentation technique can be applied in various fields by proving the excellence of the proposed data augmentation technique using benchmark datasets.