Electronic medical record (EMR) containing rich biomedical information has a great potential in disease diagnosis and biomedical research. However, the EMR information is usually in the form of unstructured text, whic...Electronic medical record (EMR) containing rich biomedical information has a great potential in disease diagnosis and biomedical research. However, the EMR information is usually in the form of unstructured text, which increases the use cost and hinders its applications. In this work, an effective named entity recognition (NER) method is presented for information extraction on Chinese EMR, which is achieved by word embedding bootstrapped deep active learning to promote the acquisition of medical information from Chinese EMR and to release its value. In this work, deep active learning of bi-directional long short-term memory followed by conditional random field (Bi-LSTM+CRF) is used to capture the characteristics of different information from labeled corpus, and the word embedding models of contiguous bag of words and skip-gram are combined in the above model to respectively capture the text feature of Chinese EMR from unlabeled corpus. To evaluate the performance of above method, the tasks of NER on Chinese EMR with “medical history” content were used. Experimental results show that the word embedding bootstrapped deep active learning method using unlabeled medical corpus can achieve a better performance compared with other models.展开更多
In this study, the author will investigate and utilize advanced machine learning models related to two different methodologies to determine the best and most effective way to predict individuals with heart failure and...In this study, the author will investigate and utilize advanced machine learning models related to two different methodologies to determine the best and most effective way to predict individuals with heart failure and cardiovascular diseases. The first methodology involves a list of classification machine learning algorithms, and the second methodology involves the use of a deep learning algorithm known as MLP or Multilayer Perceptrons. Globally, hospitals are dealing with cases related to cardiovascular diseases and heart failure as they are major causes of death, not only for overweight individuals but also for those who do not adopt a healthy diet and lifestyle. Often, heart failures and cardiovascular diseases can be caused by many factors, including cardiomyopathy, high blood pressure, coronary heart disease, and heart inflammation [1]. Other factors, such as irregular shocks or stress, can also contribute to heart failure or a heart attack. While these events cannot be predicted, continuous data from patients’ health can help doctors predict heart failure. Therefore, this data-driven research utilizes advanced machine learning and deep learning techniques to better analyze and manipulate the data, providing doctors with informative decision-making tools regarding a person’s likelihood of experiencing heart failure. In this paper, the author employed advanced data preprocessing and cleaning techniques. Additionally, the dataset underwent testing using two different methodologies to determine the most effective machine-learning technique for producing optimal predictions. The first methodology involved employing a list of supervised classification machine learning algorithms, including Naïve Bayes (NB), KNN, logistic regression, and the SVM algorithm. The second methodology utilized a deep learning (DL) algorithm known as Multilayer Perceptrons (MLPs). This algorithm provided the author with the flexibility to experiment with different layer sizes and activation functions, such as ReLU, logistic (sigmoid), and Tanh. Both methodologies produced optimal models with high-level accuracy rates. The first methodology involves a list of supervised machine learning algorithms, including KNN, SVM, Adaboost, Logistic Regression, Naive Bayes, and Decision Tree algorithms. They achieved accuracy rates of 86%, 89%, 89%, 81%, 79%, and 99%, respectively. The author clearly explained that Decision Tree algorithm is not suitable for the dataset at hand due to overfitting issues. Therefore, it was discarded as an optimal model to be used. However, the latter methodology (Neural Network) demonstrated the most stable and optimal accuracy, achieving over 87% accuracy while adapting well to real-life situations and requiring low computing power overall. A performance assessment and evaluation were carried out based on a confusion matrix report to demonstrate feasibility and performance. The author concluded that the performance of the model in real-life situations can advance not only the medical field of science but also mathematical concepts. Additionally, the advanced preprocessing approach behind the model can provide value to the Data Science community. The model can be further developed by employing various optimization techniques to handle even larger datasets related to heart failures. Furthermore, different neural network algorithms can be tested to explore alternative approaches and yield different results.展开更多
基金the Artificial Intelligence Innovation and Development Project of Shanghai Municipal Commission of Economy and Information (No. 2019-RGZN-01081)。
文摘Electronic medical record (EMR) containing rich biomedical information has a great potential in disease diagnosis and biomedical research. However, the EMR information is usually in the form of unstructured text, which increases the use cost and hinders its applications. In this work, an effective named entity recognition (NER) method is presented for information extraction on Chinese EMR, which is achieved by word embedding bootstrapped deep active learning to promote the acquisition of medical information from Chinese EMR and to release its value. In this work, deep active learning of bi-directional long short-term memory followed by conditional random field (Bi-LSTM+CRF) is used to capture the characteristics of different information from labeled corpus, and the word embedding models of contiguous bag of words and skip-gram are combined in the above model to respectively capture the text feature of Chinese EMR from unlabeled corpus. To evaluate the performance of above method, the tasks of NER on Chinese EMR with “medical history” content were used. Experimental results show that the word embedding bootstrapped deep active learning method using unlabeled medical corpus can achieve a better performance compared with other models.
文摘In this study, the author will investigate and utilize advanced machine learning models related to two different methodologies to determine the best and most effective way to predict individuals with heart failure and cardiovascular diseases. The first methodology involves a list of classification machine learning algorithms, and the second methodology involves the use of a deep learning algorithm known as MLP or Multilayer Perceptrons. Globally, hospitals are dealing with cases related to cardiovascular diseases and heart failure as they are major causes of death, not only for overweight individuals but also for those who do not adopt a healthy diet and lifestyle. Often, heart failures and cardiovascular diseases can be caused by many factors, including cardiomyopathy, high blood pressure, coronary heart disease, and heart inflammation [1]. Other factors, such as irregular shocks or stress, can also contribute to heart failure or a heart attack. While these events cannot be predicted, continuous data from patients’ health can help doctors predict heart failure. Therefore, this data-driven research utilizes advanced machine learning and deep learning techniques to better analyze and manipulate the data, providing doctors with informative decision-making tools regarding a person’s likelihood of experiencing heart failure. In this paper, the author employed advanced data preprocessing and cleaning techniques. Additionally, the dataset underwent testing using two different methodologies to determine the most effective machine-learning technique for producing optimal predictions. The first methodology involved employing a list of supervised classification machine learning algorithms, including Naïve Bayes (NB), KNN, logistic regression, and the SVM algorithm. The second methodology utilized a deep learning (DL) algorithm known as Multilayer Perceptrons (MLPs). This algorithm provided the author with the flexibility to experiment with different layer sizes and activation functions, such as ReLU, logistic (sigmoid), and Tanh. Both methodologies produced optimal models with high-level accuracy rates. The first methodology involves a list of supervised machine learning algorithms, including KNN, SVM, Adaboost, Logistic Regression, Naive Bayes, and Decision Tree algorithms. They achieved accuracy rates of 86%, 89%, 89%, 81%, 79%, and 99%, respectively. The author clearly explained that Decision Tree algorithm is not suitable for the dataset at hand due to overfitting issues. Therefore, it was discarded as an optimal model to be used. However, the latter methodology (Neural Network) demonstrated the most stable and optimal accuracy, achieving over 87% accuracy while adapting well to real-life situations and requiring low computing power overall. A performance assessment and evaluation were carried out based on a confusion matrix report to demonstrate feasibility and performance. The author concluded that the performance of the model in real-life situations can advance not only the medical field of science but also mathematical concepts. Additionally, the advanced preprocessing approach behind the model can provide value to the Data Science community. The model can be further developed by employing various optimization techniques to handle even larger datasets related to heart failures. Furthermore, different neural network algorithms can be tested to explore alternative approaches and yield different results.