This paper presents an procedure for purifying training data sets (i.e., past occurrences of slope failures) for inverse estimation on unobserved trigger factors of "different types of simultaneous slope failures"...This paper presents an procedure for purifying training data sets (i.e., past occurrences of slope failures) for inverse estimation on unobserved trigger factors of "different types of simultaneous slope failures". Due to difficulties in pixel-by-pixel observations of trigger factors, as one of the measures, the authors had proposed an inverse analysis algorithm on trigger factors based on SEM (structural equation modeling). Through a measurement equation, the trigger factor is inversely estimated, and a TFI (trigger factor influence) map can be also produced. As a subsequence subject, a purification procedure of training data set should be constructed to improve the accuracy of TFI map which depends on the representativeness of given training data sets of different types of slope failures. The proposed procedure resamples the matched pixels between original groups of past slope failures (i.e., surface slope failures, deep-seated slope failures, landslides) and classified three groups by K-means clustering for all pixels corresponding to those slope failures. For all cases of three types of slope failures, the improvement of success rates with respect to resampled training data sets was confirmed. As a final outcome, the differences between TFI maps produced by using original and resampled training data sets, respectively, are delineated on a DIF map (difference map) which is useful for analyzing trigger factor influence in terms of "risky- and safe-side assessment" sub-areas with respect to "different types of simultaneous slope failures".展开更多
Urban railways are vital means of public transportation in Korea.More than 30%of metropolitan residents use the railways,and this proportion is expected to increase.To enhance safety,the government has mandated the in...Urban railways are vital means of public transportation in Korea.More than 30%of metropolitan residents use the railways,and this proportion is expected to increase.To enhance safety,the government has mandated the installation of closed-circuit televisions in all carriages by 2024.However,cameras still monitored humans.To address this limitation,we developed a dataset of risk factors and a smart detection system that enables an immediate response to any abnormal behavior and intensive monitoring thereof.We created an innovative learning dataset that takes into account seven unique risk factors specific to Korean railway passengers.Detailed data collection was conducted across the Shinbundang Line of the Incheon Transportation Corporation,and the Ui-Shinseol Line.We observed several behavioral characteristics and assigned unique annotations to them.We also considered carriage congestion.Recognition performance was evaluated by camera placement and number.Then the camera installation plan was optimized.The dataset will find immediate applications in domestic railway operations.The artificial intelligence algorithms will be verified shortly.展开更多
In the realm of subsurface flow simulations,deep-learning-based surrogate models have emerged as a promising alternative to traditional simulation methods,especially in addressing complex optimization problems.However...In the realm of subsurface flow simulations,deep-learning-based surrogate models have emerged as a promising alternative to traditional simulation methods,especially in addressing complex optimization problems.However,a significant challenge lies in the necessity of numerous high-fidelity training simulations to construct these deep-learning models,which limits their application to field-scale problems.To overcome this limitation,we introduce a training procedure that leverages transfer learning with multi-fidelity training data to construct surrogate models efficiently.The procedure begins with the pre-training of the surrogate model using a relatively larger amount of data that can be efficiently generated from upscaled coarse-scale models.Subsequently,the model parameters are finetuned with a much smaller set of high-fidelity simulation data.For the cases considered in this study,this method leads to about a 75%reduction in total computational cost,in comparison with the traditional training approach,without any sacrifice of prediction accuracy.In addition,a dedicated well-control embedding model is introduced to the traditional U-Net architecture to improve the surrogate model's prediction accuracy,which is shown to be particularly effective when dealing with large-scale reservoir models under time-varying well control parameters.Comprehensive results and analyses are presented for the prediction of well rates,pressure and saturation states of a 3D synthetic reservoir system.Finally,the proposed procedure is applied to a field-scale production optimization problem.The trained surrogate model is shown to provide excellent generalization capabilities during the optimization process,in which the final optimized net-present-value is much higher than those from the training data ranges.展开更多
There has been a significant increase in the availability of global high-resolution land cover(HRLC)datasets due to growing demand and favorable technological advancements.However,this has brought forth the challenge ...There has been a significant increase in the availability of global high-resolution land cover(HRLC)datasets due to growing demand and favorable technological advancements.However,this has brought forth the challenge of collecting reference data with a high level of detail for global extents.While photo-interpretation is considered optimal for collecting quality training data for global HRLC mapping,some producers of existing HRLCs use less trustworthy sources,such as existing land cover at a lower resolution,to reduce costs.This work proposes a methodology to extract the most accurate parts of existing HRLCs in response to the challenge of providing reliable reference data at a low cost.The methodology combines existing HRLCs by intersection,and the output represents a Map Of Land Cover Agreement(MOLCA)that can be utilized for selecting training samples.MOLCA’s effectiveness was demonstrated through HRLC map production in Africa,in which it generated 48,000 samples.The best classification test had an overall accuracy of 78%.This level of accuracy is comparable to or better than the accuracy of existing HRLCs obtained from more expensive sources of training data,such as photo-interpretation,highlighting the cost-effectiveness and reliability potential of the developed methodology in supporting global HRLC production.展开更多
In recent years,blockchain technology has been applied in the educational domain because of its salient advantages,i.e.,transparency,decentralization,and immutability.Available systems typically use public blockchain ...In recent years,blockchain technology has been applied in the educational domain because of its salient advantages,i.e.,transparency,decentralization,and immutability.Available systems typically use public blockchain networks such as Ethereum and Bitcoin to store learning results.However,the cost of writing data on these networks is significant,making educational institutions limit data sent to the target network,typically containing only hash codes of the issued certificates.In this paper,we present a system based on a private blockchain network for lifelong learning data authentication and management named B4E(Blockchain For Education).B4E stores not only certificates but also learners’training data such as transcripts and educational programs in order to create a complete record of the lifelong education of each user and verify certificates that they have obtained.As a result,B4E can address two types of fake certificates,i.e.,certificates printed by unlawful organizations and certificates issued by educational institutions for learners who have not met the training requirements.In addition,B4E is designed to allow all participants to easily deploy software packages to manage,share,and check stored information without depending on a single point of access.As such,the system enhances the transparency and reliability of the stored data.Our experiments show that B4E meets expectations for deployment in reality.展开更多
Large Language Models (LLMs) have revolutionized Generative Artificial Intelligence (GenAI) tasks, becoming an integral part of various applications in society, including text generation, translation, summarization, a...Large Language Models (LLMs) have revolutionized Generative Artificial Intelligence (GenAI) tasks, becoming an integral part of various applications in society, including text generation, translation, summarization, and more. However, their widespread usage emphasizes the critical need to enhance their security posture to ensure the integrity and reliability of their outputs and minimize harmful effects. Prompt injections and training data poisoning attacks are two of the most prominent vulnerabilities in LLMs, which could potentially lead to unpredictable and undesirable behaviors, such as biased outputs, misinformation propagation, and even malicious content generation. The Common Vulnerability Scoring System (CVSS) framework provides a standardized approach to capturing the principal characteristics of vulnerabilities, facilitating a deeper understanding of their severity within the security and AI communities. By extending the current CVSS framework, we generate scores for these vulnerabilities such that organizations can prioritize mitigation efforts, allocate resources effectively, and implement targeted security measures to defend against potential risks.展开更多
Gas-insulated switchgear(GIS)plays a critical role in ensuring the reliability of power systems,but partial discharge(PD)is a primary cause of failures within GIS equipment.Traditional PD diagnostic methods rely heavi...Gas-insulated switchgear(GIS)plays a critical role in ensuring the reliability of power systems,but partial discharge(PD)is a primary cause of failures within GIS equipment.Traditional PD diagnostic methods rely heavily on laboratory data,which differ signifi-cantly from that under the complex conditions of field data,leading to a marked drop in recognition accuracy when they are applied to field PD diagnosis.This study addresses the challenge by integrating field data into the training process,utilising a deep transfer learning approach that combines laboratory and field data to improve diagnostic accuracy for GIS PD.The research collected PD data from laboratory models representing five defect types and field data gathered from operational GIS equipment.A deep residual network(ResNet50)was pretrained using laboratory data and fine-tuned with field data through deep transfer learning to optimise the recognition of PD in field conditions.The results show that the proposed model achieves a significantly higher recognition accuracy(93.7%)for field data compared to traditional methods(60%-70%).The integration of deep transfer learning ensures that both low-dimensional general features from labora-tory data and high-dimensional specific features from field data are effectively utilised.This research significantly contributes to improving the diagnostic accuracy of PD in GIS under field conditions,providing a robust method for defect detection in operational equipment.展开更多
In recent years, deep networks has achieved outstanding performance in computer vision, especially in the field of face recognition. In terms of the performance for a face recognition model based on deep network, ther...In recent years, deep networks has achieved outstanding performance in computer vision, especially in the field of face recognition. In terms of the performance for a face recognition model based on deep network, there are two main closely related factors: 1) the structure of the deep neural network, and 2) the number and quality of training data. In real applications, illumination change is one of the most important factors that significantly affect the performance of face recognition algorithms. As for deep network models, only if there is sufficient training data that has various illumination intensity could they achieve expected performance. However, such kind of training data is hard to collect in the real world. In this paper, focusing on the illumination change challenge, we propose a deep network model which takes both visible light image and near-infrared image into account to perform face recognition. Near- infrared image, as we know, is much less sensitive to illuminations. Visible light face image contains abundant texture information which is very useful for face recognition. Thus, we design an adaptive score fusion strategy which hardly has information loss and the nearest neighbor algorithm to conduct the final classification. The experimental results demonstrate that the model is very effective in realworld scenarios and perform much better in terms of illumination change than other state-of-the-art models.展开更多
In this paper,we propose an efficient fall detection system in enclosed environments based on single Gaussian model using the maximum likelihood method.Online video clips are used to extract the features from two came...In this paper,we propose an efficient fall detection system in enclosed environments based on single Gaussian model using the maximum likelihood method.Online video clips are used to extract the features from two cameras.After the model is constructed,a threshold is set,and the probability for an incoming sample under the single Gaussian model is compared with that threshold to make a decision.Experimental results show that if a proper threshold is set,a good recognition rate for fall activities can be achieved.展开更多
If clinical research is to be relevant to real-world decision making it requires interventions that reflect usual practice.Observational studies may provide the best approach to defining this.Siting studies in the stu...If clinical research is to be relevant to real-world decision making it requires interventions that reflect usual practice.Observational studies may provide the best approach to defining this.Siting studies in the student clinics of acupuncture teaching institutions(ATIs).has potential benefits for the institutions as well as for researchers.This is the first such multi-centre study accredited ATIs展开更多
The purpose of this research paper is to explore how early Machine Learning models have shown a bias in the results where a bias should not be seen. A prime example is an ML model that favors male applicants over fema...The purpose of this research paper is to explore how early Machine Learning models have shown a bias in the results where a bias should not be seen. A prime example is an ML model that favors male applicants over female applicants. While the model is supposed to take into consideration other aspects of the data, it tends to have a bias and skew the results one way or another. Therefore, in this paper, we will be exploring how this bias comes about and how it can be fixed. In this research, I have taken different case studies of real-world examples of these biases being shown. For example, an Amazon hiring application that favored male applicants or a loan application that favored western applicants is both studies that I will reference in this paper and explore the situation itself. In order to find out where the bias is coming from, I have constructed a machine learning model that will use a dataset found on Kaggle, and I will analyze the results of said ML model. The results that the research has yielded clarify the reason for said bias in the artificial intelligence models. The way the model was trained influences the way the results will play out. If the model is trained with a large amount of male applicant data over female applicant data, the model will favor male applicants. Therefore, when they are trained with new data, they are likely to accept applications that are male over female despite having equivalent parts. Later in the paper, I will dive deeper into the way that AI applications work and how they find biases and trends in order to classify things correctly. However, there is a fine line between classification and bias and making sure that it is rightfully corrected and tested is important in machine learning today.展开更多
Global land cover is one of the fundamental contents of Digital Earth.The Global Mapping project coordinated by the International Steering Committee for Global Mapping has produced a 1-km global land cover datasetGlo...Global land cover is one of the fundamental contents of Digital Earth.The Global Mapping project coordinated by the International Steering Committee for Global Mapping has produced a 1-km global land cover datasetGlobal Land Cover by National Mapping Organizations.It has 20 land cover classes defined using the Land Cover Classification System.Of them,14 classes were derived using supervised classification.The remaining six were classified independently:urban,tree open,mangrove,wetland,snow/ice,andwater.Primary source data of this land cover mapping were eight periods of 16-day composite 7-band 1-km MODIS data of 2003.Training data for supervised classification were collected using Landsat images,MODIS NDVI seasonal change patterns,Google Earth,Virtual Earth,existing regional maps,and expert’s comments.The overall accuracy is 76.5%and the overall accuracy with the weight of the mapped area coverage is 81.2%.The data are available from the Global Mapping project website(http://www.iscgm.org/).TheMODISdata used,land cover training data,and a list of existing regional maps are also available from the CEReS website.This mapping attempt demonstrates that training/validation data accumulation from different mapping projects must be promoted to support future global land cover mapping.展开更多
Annotating named entity recognition (NER) training corpora is a costly but necessary process for supervised NER approaches. This paper presents a general framework to generate large-scale NER training data from para...Annotating named entity recognition (NER) training corpora is a costly but necessary process for supervised NER approaches. This paper presents a general framework to generate large-scale NER training data from parallel corpora. In our method, we first employ a high performance NER system on one side of a bilingual corpus. Then, we project the named entity (NE) labels to the other side according to the word level alignments. Finally, we propose several strategies to select high-quality auto-labeled NER training data. We apply our approach to Chinese NER using an English-Chinese parallel corpus. Experimental results show that our approach can collect high-quality labeled data and can help improve Chinese NER.展开更多
The purpose of this paper is to investigate the k-nearest neighbor classification rule for spatially dependent data.Some spatial mixing conditions are considered,and under such spatial structures,the well known k-neare...The purpose of this paper is to investigate the k-nearest neighbor classification rule for spatially dependent data.Some spatial mixing conditions are considered,and under such spatial structures,the well known k-nearest neighbor rule is suggested to classify spatial data.We established consistency and strong consistency of the classifier under mild assumptions.Our main results extend the consistency result in the i.i.d.case to the spatial case.展开更多
The FAIR Guidelines attempts to make digital data Findable, Accessible, Interoperable, and Reusable(FAIR). To prepare FAIR data, a new data science discipline known as data stewardship is emerging and, as the FAIR Gui...The FAIR Guidelines attempts to make digital data Findable, Accessible, Interoperable, and Reusable(FAIR). To prepare FAIR data, a new data science discipline known as data stewardship is emerging and, as the FAIR Guidelines gain more acceptance, an increase in the demand for data stewards is expected. Consequently, there is a need to develop curricula to foster professional skills in data stewardship through effective knowledge communication. There have been a number of initiatives aimed at bridging the gap in FAIR data management training through both formal and informal programmes. This article describes the experience of developing a digital initiative for FAIR data management training under the Digital Innovations and Skills Hub(DISH) project. The FAIR Data Management course offers 6 short on-demand certificate modules over 12 weeks. The modules are divided into two sets: FAIR data and data science. The core subjects cover elementary topics in data science, regulatory frameworks, FAIR data management, intermediate to advanced topics in FAIR Data Point installation, and FAIR data in the management of healthcare and semantic data. Each week, participants are required to devote 7–8 hours of self-study to the modules, based on the resources provided. Once they have satisfied all requirements, students are certified as FAIR data scientists and qualified to serve as both FAIR data stewards and analysts. It is expected that in-depth and focused curricula development with diverse participants will build a core of FAIR data scientists for Data Competence Centres and encourage the rapid adoption of the FAIR Guidelines for research and development.展开更多
This paper aims to increase the diagnosis accuracy of the fault classification of power transformers by introducing a new off-line hybrid model based on a combination subset of the et method(C-set)&modified fuzzy ...This paper aims to increase the diagnosis accuracy of the fault classification of power transformers by introducing a new off-line hybrid model based on a combination subset of the et method(C-set)&modified fuzzy C-mean algorithm(MFCM)and the optimizable multiclass-SVM(MCSVM).The innovation in this paper is shown in terms of solving the predicaments of outliers,boundary proportion,and unequal data existing in both traditional and intelligence models.Taking into consideration the closeness of dissolved gas analysis(DGA)data,the C-set method is implemented to subset the DGA data samples based on their type of faults within unrepeated subsets.Then,the MFCM is used for removing outliers from DGA samples by combining highly similar data for every subset within the same cluster to obtain the optimized training data(OTD)set.It is also used to minimize dimensionality of DGA samples and the uncertainty of transformer condition monitoring.After that,the optimized MCSVM is trained by using the(OTD).The proposed model diagnosis accuracy is 93.3%.The obtained results indicate that our model significantly improves the fault identification accuracy in power transformers when compared with other conventional and intelligence models.展开更多
The abundance of spectral information provided by hyperspectral imagery offers great benefits for many applications.However,processing such high-dimensional data volumes is a challenge because there may be redundant b...The abundance of spectral information provided by hyperspectral imagery offers great benefits for many applications.However,processing such high-dimensional data volumes is a challenge because there may be redundant bands owing to the high interband correlation.This study aimed to reduce the possibility of“dimension disaster”in the classification of coastal wetlands using hyperspectral images with limited training samples.The study developed a hyperspectral classification algorithm for coastal wetlands using a combination of subspace partitioning and infinite probabilistic latent graph ranking in a random patch network(the SSP-IPLGR-RPnet model).The SSP-IPLGR-RPnet approach applied SSP techniques and an IPLGR algorithm to reduce the dimensions of hyperspectral data.The RPnet model overcame the problem of dimension disaster caused by the mismatch between the dimensionality of hyperspectral bands and the small number of training samples.The results showed that the proposed algorithm had a better classification performance and was more robust with limited training data compared with that of several other state-of-the-art methods.The overall accuracy was nearly 4%higher on average compared with that of multi-kernel SVM and RF algorithms.Compared with the EMAP algorithm,MSTV algorithm,ERF algorithm,ERW algorithm,RMKL algorithm and 3D-CNN algorithm,the SSP-IPLGR-RPnet algorithm provided a better classification performance in a shorter time.展开更多
Diffusion-based models have recently achieved remarkable success in style transfer. However, when training data is scarce, existing methods struggle to effectively balance style and content. In this paper, we propose ...Diffusion-based models have recently achieved remarkable success in style transfer. However, when training data is scarce, existing methods struggle to effectively balance style and content. In this paper, we propose Style-Aware Diffusion (SAD), a novel method that harnesses efficient low-rank adaptation training techniques. Specifically, We extract latent representations of both style and content using DDIM inversion, formulated as an ordinary differential equation. Then, we use adaptive instance normalization and query–key–value injection to effectively align low-level style features with high-level content semantics. In addition, we propose parameter-efficient adaptation, which mitigates catastrophic forgetting and overfitting by rationally optimizing the weights of the attention layers, ensuring robust and effective performance, and achieving a 61.5% relative score increase over the plain model. The proposed method outperforms the high-performance DreamBooth-LoRA model and won the Fourth Jittor Artificial Intelligence Challenge. Our model is implemented using the Jittor framework and is available at https://github.com/liylo/jittor-qwqw-Few_Shot_Style_Transfer.展开更多
文摘This paper presents an procedure for purifying training data sets (i.e., past occurrences of slope failures) for inverse estimation on unobserved trigger factors of "different types of simultaneous slope failures". Due to difficulties in pixel-by-pixel observations of trigger factors, as one of the measures, the authors had proposed an inverse analysis algorithm on trigger factors based on SEM (structural equation modeling). Through a measurement equation, the trigger factor is inversely estimated, and a TFI (trigger factor influence) map can be also produced. As a subsequence subject, a purification procedure of training data set should be constructed to improve the accuracy of TFI map which depends on the representativeness of given training data sets of different types of slope failures. The proposed procedure resamples the matched pixels between original groups of past slope failures (i.e., surface slope failures, deep-seated slope failures, landslides) and classified three groups by K-means clustering for all pixels corresponding to those slope failures. For all cases of three types of slope failures, the improvement of success rates with respect to resampled training data sets was confirmed. As a final outcome, the differences between TFI maps produced by using original and resampled training data sets, respectively, are delineated on a DIF map (difference map) which is useful for analyzing trigger factor influence in terms of "risky- and safe-side assessment" sub-areas with respect to "different types of simultaneous slope failures".
基金supported by a Korean Agency for Infrastructure Technology Advancement(KAIA)grant funded by the Ministry of Land,Infrastructure and Transport(grant no.RS-2023-00239464).
文摘Urban railways are vital means of public transportation in Korea.More than 30%of metropolitan residents use the railways,and this proportion is expected to increase.To enhance safety,the government has mandated the installation of closed-circuit televisions in all carriages by 2024.However,cameras still monitored humans.To address this limitation,we developed a dataset of risk factors and a smart detection system that enables an immediate response to any abnormal behavior and intensive monitoring thereof.We created an innovative learning dataset that takes into account seven unique risk factors specific to Korean railway passengers.Detailed data collection was conducted across the Shinbundang Line of the Incheon Transportation Corporation,and the Ui-Shinseol Line.We observed several behavioral characteristics and assigned unique annotations to them.We also considered carriage congestion.Recognition performance was evaluated by camera placement and number.Then the camera installation plan was optimized.The dataset will find immediate applications in domestic railway operations.The artificial intelligence algorithms will be verified shortly.
基金funding support from the National Natural Science Foundation of China(No.52204065,No.ZX20230398)supported by a grant from the Human Resources Development Program(No.20216110100070)of the Korea Institute of Energy Technology Evaluation and Planning(KETEP)。
文摘In the realm of subsurface flow simulations,deep-learning-based surrogate models have emerged as a promising alternative to traditional simulation methods,especially in addressing complex optimization problems.However,a significant challenge lies in the necessity of numerous high-fidelity training simulations to construct these deep-learning models,which limits their application to field-scale problems.To overcome this limitation,we introduce a training procedure that leverages transfer learning with multi-fidelity training data to construct surrogate models efficiently.The procedure begins with the pre-training of the surrogate model using a relatively larger amount of data that can be efficiently generated from upscaled coarse-scale models.Subsequently,the model parameters are finetuned with a much smaller set of high-fidelity simulation data.For the cases considered in this study,this method leads to about a 75%reduction in total computational cost,in comparison with the traditional training approach,without any sacrifice of prediction accuracy.In addition,a dedicated well-control embedding model is introduced to the traditional U-Net architecture to improve the surrogate model's prediction accuracy,which is shown to be particularly effective when dealing with large-scale reservoir models under time-varying well control parameters.Comprehensive results and analyses are presented for the prediction of well rates,pressure and saturation states of a 3D synthetic reservoir system.Finally,the proposed procedure is applied to a field-scale production optimization problem.The trained surrogate model is shown to provide excellent generalization capabilities during the optimization process,in which the final optimized net-present-value is much higher than those from the training data ranges.
文摘There has been a significant increase in the availability of global high-resolution land cover(HRLC)datasets due to growing demand and favorable technological advancements.However,this has brought forth the challenge of collecting reference data with a high level of detail for global extents.While photo-interpretation is considered optimal for collecting quality training data for global HRLC mapping,some producers of existing HRLCs use less trustworthy sources,such as existing land cover at a lower resolution,to reduce costs.This work proposes a methodology to extract the most accurate parts of existing HRLCs in response to the challenge of providing reliable reference data at a low cost.The methodology combines existing HRLCs by intersection,and the output represents a Map Of Land Cover Agreement(MOLCA)that can be utilized for selecting training samples.MOLCA’s effectiveness was demonstrated through HRLC map production in Africa,in which it generated 48,000 samples.The best classification test had an overall accuracy of 78%.This level of accuracy is comparable to or better than the accuracy of existing HRLCs obtained from more expensive sources of training data,such as photo-interpretation,highlighting the cost-effectiveness and reliability potential of the developed methodology in supporting global HRLC production.
基金supported by the Vietnamese MOET’s project“Researching and applying blockchain technology to the problem of authenticating the certificate issuing process in Vietnam”,No.B2020-BKA-14.
文摘In recent years,blockchain technology has been applied in the educational domain because of its salient advantages,i.e.,transparency,decentralization,and immutability.Available systems typically use public blockchain networks such as Ethereum and Bitcoin to store learning results.However,the cost of writing data on these networks is significant,making educational institutions limit data sent to the target network,typically containing only hash codes of the issued certificates.In this paper,we present a system based on a private blockchain network for lifelong learning data authentication and management named B4E(Blockchain For Education).B4E stores not only certificates but also learners’training data such as transcripts and educational programs in order to create a complete record of the lifelong education of each user and verify certificates that they have obtained.As a result,B4E can address two types of fake certificates,i.e.,certificates printed by unlawful organizations and certificates issued by educational institutions for learners who have not met the training requirements.In addition,B4E is designed to allow all participants to easily deploy software packages to manage,share,and check stored information without depending on a single point of access.As such,the system enhances the transparency and reliability of the stored data.Our experiments show that B4E meets expectations for deployment in reality.
文摘Large Language Models (LLMs) have revolutionized Generative Artificial Intelligence (GenAI) tasks, becoming an integral part of various applications in society, including text generation, translation, summarization, and more. However, their widespread usage emphasizes the critical need to enhance their security posture to ensure the integrity and reliability of their outputs and minimize harmful effects. Prompt injections and training data poisoning attacks are two of the most prominent vulnerabilities in LLMs, which could potentially lead to unpredictable and undesirable behaviors, such as biased outputs, misinformation propagation, and even malicious content generation. The Common Vulnerability Scoring System (CVSS) framework provides a standardized approach to capturing the principal characteristics of vulnerabilities, facilitating a deeper understanding of their severity within the security and AI communities. By extending the current CVSS framework, we generate scores for these vulnerabilities such that organizations can prioritize mitigation efforts, allocate resources effectively, and implement targeted security measures to defend against potential risks.
基金Key Program of Joint Funds of the National Natural Science Foundation of China,Grant/Award Number:U22B20118。
文摘Gas-insulated switchgear(GIS)plays a critical role in ensuring the reliability of power systems,but partial discharge(PD)is a primary cause of failures within GIS equipment.Traditional PD diagnostic methods rely heavily on laboratory data,which differ signifi-cantly from that under the complex conditions of field data,leading to a marked drop in recognition accuracy when they are applied to field PD diagnosis.This study addresses the challenge by integrating field data into the training process,utilising a deep transfer learning approach that combines laboratory and field data to improve diagnostic accuracy for GIS PD.The research collected PD data from laboratory models representing five defect types and field data gathered from operational GIS equipment.A deep residual network(ResNet50)was pretrained using laboratory data and fine-tuned with field data through deep transfer learning to optimise the recognition of PD in field conditions.The results show that the proposed model achieves a significantly higher recognition accuracy(93.7%)for field data compared to traditional methods(60%-70%).The integration of deep transfer learning ensures that both low-dimensional general features from labora-tory data and high-dimensional specific features from field data are effectively utilised.This research significantly contributes to improving the diagnostic accuracy of PD in GIS under field conditions,providing a robust method for defect detection in operational equipment.
文摘In recent years, deep networks has achieved outstanding performance in computer vision, especially in the field of face recognition. In terms of the performance for a face recognition model based on deep network, there are two main closely related factors: 1) the structure of the deep neural network, and 2) the number and quality of training data. In real applications, illumination change is one of the most important factors that significantly affect the performance of face recognition algorithms. As for deep network models, only if there is sufficient training data that has various illumination intensity could they achieve expected performance. However, such kind of training data is hard to collect in the real world. In this paper, focusing on the illumination change challenge, we propose a deep network model which takes both visible light image and near-infrared image into account to perform face recognition. Near- infrared image, as we know, is much less sensitive to illuminations. Visible light face image contains abundant texture information which is very useful for face recognition. Thus, we design an adaptive score fusion strategy which hardly has information loss and the nearest neighbor algorithm to conduct the final classification. The experimental results demonstrate that the model is very effective in realworld scenarios and perform much better in terms of illumination change than other state-of-the-art models.
文摘In this paper,we propose an efficient fall detection system in enclosed environments based on single Gaussian model using the maximum likelihood method.Online video clips are used to extract the features from two cameras.After the model is constructed,a threshold is set,and the probability for an incoming sample under the single Gaussian model is compared with that threshold to make a decision.Experimental results show that if a proper threshold is set,a good recognition rate for fall activities can be achieved.
文摘If clinical research is to be relevant to real-world decision making it requires interventions that reflect usual practice.Observational studies may provide the best approach to defining this.Siting studies in the student clinics of acupuncture teaching institutions(ATIs).has potential benefits for the institutions as well as for researchers.This is the first such multi-centre study accredited ATIs
文摘The purpose of this research paper is to explore how early Machine Learning models have shown a bias in the results where a bias should not be seen. A prime example is an ML model that favors male applicants over female applicants. While the model is supposed to take into consideration other aspects of the data, it tends to have a bias and skew the results one way or another. Therefore, in this paper, we will be exploring how this bias comes about and how it can be fixed. In this research, I have taken different case studies of real-world examples of these biases being shown. For example, an Amazon hiring application that favored male applicants or a loan application that favored western applicants is both studies that I will reference in this paper and explore the situation itself. In order to find out where the bias is coming from, I have constructed a machine learning model that will use a dataset found on Kaggle, and I will analyze the results of said ML model. The results that the research has yielded clarify the reason for said bias in the artificial intelligence models. The way the model was trained influences the way the results will play out. If the model is trained with a large amount of male applicant data over female applicant data, the model will favor male applicants. Therefore, when they are trained with new data, they are likely to accept applications that are male over female despite having equivalent parts. Later in the paper, I will dive deeper into the way that AI applications work and how they find biases and trends in order to classify things correctly. However, there is a fine line between classification and bias and making sure that it is rightfully corrected and tested is important in machine learning today.
文摘Global land cover is one of the fundamental contents of Digital Earth.The Global Mapping project coordinated by the International Steering Committee for Global Mapping has produced a 1-km global land cover datasetGlobal Land Cover by National Mapping Organizations.It has 20 land cover classes defined using the Land Cover Classification System.Of them,14 classes were derived using supervised classification.The remaining six were classified independently:urban,tree open,mangrove,wetland,snow/ice,andwater.Primary source data of this land cover mapping were eight periods of 16-day composite 7-band 1-km MODIS data of 2003.Training data for supervised classification were collected using Landsat images,MODIS NDVI seasonal change patterns,Google Earth,Virtual Earth,existing regional maps,and expert’s comments.The overall accuracy is 76.5%and the overall accuracy with the weight of the mapped area coverage is 81.2%.The data are available from the Global Mapping project website(http://www.iscgm.org/).TheMODISdata used,land cover training data,and a list of existing regional maps are also available from the CEReS website.This mapping attempt demonstrates that training/validation data accumulation from different mapping projects must be promoted to support future global land cover mapping.
基金This work was supported by the National Natural Science Foundation of China (Grant Nos. 61133012, 61273321) and the National 863 Leading Technology Research Project (2012AA011102). Special thanks to Wanxiang Che, Yanyan Zhao, Wei He, Fikadu Gemechu, Yuhang Guo, Zhenghua Li, Meishan Zhang and the anonymous reviewers for insightful comments and suggestions.
文摘Annotating named entity recognition (NER) training corpora is a costly but necessary process for supervised NER approaches. This paper presents a general framework to generate large-scale NER training data from parallel corpora. In our method, we first employ a high performance NER system on one side of a bilingual corpus. Then, we project the named entity (NE) labels to the other side according to the word level alignments. Finally, we propose several strategies to select high-quality auto-labeled NER training data. We apply our approach to Chinese NER using an English-Chinese parallel corpus. Experimental results show that our approach can collect high-quality labeled data and can help improve Chinese NER.
文摘The purpose of this paper is to investigate the k-nearest neighbor classification rule for spatially dependent data.Some spatial mixing conditions are considered,and under such spatial structures,the well known k-nearest neighbor rule is suggested to classify spatial data.We established consistency and strong consistency of the classifier under mild assumptions.Our main results extend the consistency result in the i.i.d.case to the spatial case.
基金VODAN-Africathe Philips Foundation+2 种基金the Dutch Development Bank FMOCORDAIDthe GO FAIR Foundation for supporting this research
文摘The FAIR Guidelines attempts to make digital data Findable, Accessible, Interoperable, and Reusable(FAIR). To prepare FAIR data, a new data science discipline known as data stewardship is emerging and, as the FAIR Guidelines gain more acceptance, an increase in the demand for data stewards is expected. Consequently, there is a need to develop curricula to foster professional skills in data stewardship through effective knowledge communication. There have been a number of initiatives aimed at bridging the gap in FAIR data management training through both formal and informal programmes. This article describes the experience of developing a digital initiative for FAIR data management training under the Digital Innovations and Skills Hub(DISH) project. The FAIR Data Management course offers 6 short on-demand certificate modules over 12 weeks. The modules are divided into two sets: FAIR data and data science. The core subjects cover elementary topics in data science, regulatory frameworks, FAIR data management, intermediate to advanced topics in FAIR Data Point installation, and FAIR data in the management of healthcare and semantic data. Each week, participants are required to devote 7–8 hours of self-study to the modules, based on the resources provided. Once they have satisfied all requirements, students are certified as FAIR data scientists and qualified to serve as both FAIR data stewards and analysts. It is expected that in-depth and focused curricula development with diverse participants will build a core of FAIR data scientists for Data Competence Centres and encourage the rapid adoption of the FAIR Guidelines for research and development.
基金supported by the National Natural Science Foundation of China under grant Ui966209Natural Science Foundation of Shandong Province under grant ZR2020ME196.
文摘This paper aims to increase the diagnosis accuracy of the fault classification of power transformers by introducing a new off-line hybrid model based on a combination subset of the et method(C-set)&modified fuzzy C-mean algorithm(MFCM)and the optimizable multiclass-SVM(MCSVM).The innovation in this paper is shown in terms of solving the predicaments of outliers,boundary proportion,and unequal data existing in both traditional and intelligence models.Taking into consideration the closeness of dissolved gas analysis(DGA)data,the C-set method is implemented to subset the DGA data samples based on their type of faults within unrepeated subsets.Then,the MFCM is used for removing outliers from DGA samples by combining highly similar data for every subset within the same cluster to obtain the optimized training data(OTD)set.It is also used to minimize dimensionality of DGA samples and the uncertainty of transformer condition monitoring.After that,the optimized MCSVM is trained by using the(OTD).The proposed model diagnosis accuracy is 93.3%.The obtained results indicate that our model significantly improves the fault identification accuracy in power transformers when compared with other conventional and intelligence models.
基金supported by the National Natural Science Foundation of China (Grant Nos. 42106179 and 42076189)the Pilot Project of Monitoring Evaluation of Spartina Alterniflora in Shandong Province in 2021 “Remote Sensing Monitoring of Spartina Alterniflora”
文摘The abundance of spectral information provided by hyperspectral imagery offers great benefits for many applications.However,processing such high-dimensional data volumes is a challenge because there may be redundant bands owing to the high interband correlation.This study aimed to reduce the possibility of“dimension disaster”in the classification of coastal wetlands using hyperspectral images with limited training samples.The study developed a hyperspectral classification algorithm for coastal wetlands using a combination of subspace partitioning and infinite probabilistic latent graph ranking in a random patch network(the SSP-IPLGR-RPnet model).The SSP-IPLGR-RPnet approach applied SSP techniques and an IPLGR algorithm to reduce the dimensions of hyperspectral data.The RPnet model overcame the problem of dimension disaster caused by the mismatch between the dimensionality of hyperspectral bands and the small number of training samples.The results showed that the proposed algorithm had a better classification performance and was more robust with limited training data compared with that of several other state-of-the-art methods.The overall accuracy was nearly 4%higher on average compared with that of multi-kernel SVM and RF algorithms.Compared with the EMAP algorithm,MSTV algorithm,ERF algorithm,ERW algorithm,RMKL algorithm and 3D-CNN algorithm,the SSP-IPLGR-RPnet algorithm provided a better classification performance in a shorter time.
基金supported by the Postdoctoral Fellowship Program of CPSF(GZC20240829).
文摘Diffusion-based models have recently achieved remarkable success in style transfer. However, when training data is scarce, existing methods struggle to effectively balance style and content. In this paper, we propose Style-Aware Diffusion (SAD), a novel method that harnesses efficient low-rank adaptation training techniques. Specifically, We extract latent representations of both style and content using DDIM inversion, formulated as an ordinary differential equation. Then, we use adaptive instance normalization and query–key–value injection to effectively align low-level style features with high-level content semantics. In addition, we propose parameter-efficient adaptation, which mitigates catastrophic forgetting and overfitting by rationally optimizing the weights of the attention layers, ensuring robust and effective performance, and achieving a 61.5% relative score increase over the plain model. The proposed method outperforms the high-performance DreamBooth-LoRA model and won the Fourth Jittor Artificial Intelligence Challenge. Our model is implemented using the Jittor framework and is available at https://github.com/liylo/jittor-qwqw-Few_Shot_Style_Transfer.