Machine learning(ML)to predict lithofacies from sparse suites of well-log data is difficult in laterally and vertically heterogeneous reservoir formations in oil and gas fields.Meandering,braided fluviatile deposition...Machine learning(ML)to predict lithofacies from sparse suites of well-log data is difficult in laterally and vertically heterogeneous reservoir formations in oil and gas fields.Meandering,braided fluviatile depositional environments tend to form clastic sequences with laterally discontinuous layers due to the continuous shifting of relatively narrow sandstone channels.Three cored wellbores drilled through such a reservoir in a large oil field,with just four recorded well logs available,are used to classify four lithofacies using ML models.To augment the well-log data,six derivative and volatility attributes were calculated from the recorded gamma ray and density logs,providing sixteen log features for the ML models to select from.A novel,multiple-optimizer feature selection technique was developed to identify high-performing feature combinations with which seven ML models were used to predict lithofacies assisted by multi-k-fold cross validation.Feature combinations with just seven to nine selected log features achieved overall ML lithofacies accuracy of 0.87 for two wells used for training and validation.When the trained ML models were applied to a third well for testing,lithofacies ML prediction accuracy declined to 0.65 for the best performing extreme gradient boosting model with seven features.However,an accuracy of~0.76 was achieved by that model in predicting the presence of the pay bearing sandstone and siltstone lithofacies in the test well.A model using only the four recorded well logs was only able to predict the pay-bearing lithofacies with~0.6 accuracy.Annotated confusion matrices and feature importance analysis provide additional insight to ML model performance and identify the log attributes that are most influential in enhancing lithofacies prediction.展开更多
Background: Being able to predict with confidence the early onset of type 2 diabetes from a suite of signs and symptoms (features) displayed by potential sufferers is desirable to commence treatment promptly. Late or ...Background: Being able to predict with confidence the early onset of type 2 diabetes from a suite of signs and symptoms (features) displayed by potential sufferers is desirable to commence treatment promptly. Late or inconclusive diagnosis can result in more serious health consequences for sufferers and higher costs for health care services in the long run.Methods: A novel integrated methodology is proposed involving correlation, statistical analysis, machine learning, multi-K-fold cross-validation, and confusion matrices to provide a reliable classification of diabetes-positive and -negative individuals from a substantial suite of features. The method also identifies the relative influence of each feature on the diabetes diagnosis and highlights the most important ones. Ten statistical and machine learning methods are utilized to conduct the analysis.Results: A published data set involving 520 individuals (Sylthet Diabetes Hospital, Bangladesh) is modeled revealing that a support vector classifier generates the most accurate early-onset type 2 diabetes status predictions with just 11 misclassifications (2.1% error). Polydipsia and polyuria are among the most influential features, whereas obesity and age are assigned low weights by the prediction models.Conclusion: The proposed methodology can rapidly predict early-onset type 2 diabetes with high confidence while providing valuable insight into the key influential features involved in such predictions.展开更多
文摘Machine learning(ML)to predict lithofacies from sparse suites of well-log data is difficult in laterally and vertically heterogeneous reservoir formations in oil and gas fields.Meandering,braided fluviatile depositional environments tend to form clastic sequences with laterally discontinuous layers due to the continuous shifting of relatively narrow sandstone channels.Three cored wellbores drilled through such a reservoir in a large oil field,with just four recorded well logs available,are used to classify four lithofacies using ML models.To augment the well-log data,six derivative and volatility attributes were calculated from the recorded gamma ray and density logs,providing sixteen log features for the ML models to select from.A novel,multiple-optimizer feature selection technique was developed to identify high-performing feature combinations with which seven ML models were used to predict lithofacies assisted by multi-k-fold cross validation.Feature combinations with just seven to nine selected log features achieved overall ML lithofacies accuracy of 0.87 for two wells used for training and validation.When the trained ML models were applied to a third well for testing,lithofacies ML prediction accuracy declined to 0.65 for the best performing extreme gradient boosting model with seven features.However,an accuracy of~0.76 was achieved by that model in predicting the presence of the pay bearing sandstone and siltstone lithofacies in the test well.A model using only the four recorded well logs was only able to predict the pay-bearing lithofacies with~0.6 accuracy.Annotated confusion matrices and feature importance analysis provide additional insight to ML model performance and identify the log attributes that are most influential in enhancing lithofacies prediction.
文摘Background: Being able to predict with confidence the early onset of type 2 diabetes from a suite of signs and symptoms (features) displayed by potential sufferers is desirable to commence treatment promptly. Late or inconclusive diagnosis can result in more serious health consequences for sufferers and higher costs for health care services in the long run.Methods: A novel integrated methodology is proposed involving correlation, statistical analysis, machine learning, multi-K-fold cross-validation, and confusion matrices to provide a reliable classification of diabetes-positive and -negative individuals from a substantial suite of features. The method also identifies the relative influence of each feature on the diabetes diagnosis and highlights the most important ones. Ten statistical and machine learning methods are utilized to conduct the analysis.Results: A published data set involving 520 individuals (Sylthet Diabetes Hospital, Bangladesh) is modeled revealing that a support vector classifier generates the most accurate early-onset type 2 diabetes status predictions with just 11 misclassifications (2.1% error). Polydipsia and polyuria are among the most influential features, whereas obesity and age are assigned low weights by the prediction models.Conclusion: The proposed methodology can rapidly predict early-onset type 2 diabetes with high confidence while providing valuable insight into the key influential features involved in such predictions.