Disease Prediction using ML
Comparing different supervised machine learning algorithms for disease prediction
Prof.Pravin Pol-
Suraj Darekar-
Nachiket Khare-Janhavi Pawashe
Email-janhavipawashe16 @vit.edu
Nikita Saner-
Dept. Of Instrumentation and Control Engineering
Vishwakarma Institute of Technology, Pune
Abstract: Now-a-days, people face various diseases due to the environmental condition and their living habits. So the prediction of disease at earlier stage becomes important task.But the accurate prediction on the basis of symptoms becomes too difficult for doctor. Due to increase amount of data growth in medical and healthcare field the accurate analysis on medical data which has been benefits from early patient care. With the help of disease data, algorithm finds hidden pattern information in the huge amount of medical data.Supervised machine learning algorithms have been a dominant method in the data mining field. Disease prediction using health data has recently shown a potential application area for these methods. This study aims to identify the key trends among different types of supervised machine learning algorithms, and their performance and usage for disease risk prediction.
I. INTRODUCTION
Due to big data progress in biomedical and healthcare communities, accurate study of medical data benefits early disease recognition, patient care and community services. When the quality of medical data is incomplete the exactness of study is reduced. Moreover, different regions exhibit unique appearances of certain regional diseases, which may results in weakening the prediction of disease outbreaks.
In this project, it bid a Machine learning Decision tree map, Navie Bayes, Random forest algorithm by using structured and unstructured data from hospital. It also uses Machine learning algorithm for partitioning the data.Machine learning algorithms employ a variety of statistical, probabilistic and optimisation methods to learn from past experience and detect useful patterns from large, unstructured and complex datasets. These algorithms have a wide range of applications, including automated text categorisation , network intrusion detection , junk e-mail filtering , detection of credit card fraud , customer purchase behaviour detection , optimising manufacturing process and disease modelling . Most of these applications have been implemented using supervised variants of the machine learning algorithms rather than unsupervised ones. In the supervised variant, a prediction model is developed by learning a dataset enable doctors to provide better healthcare leading to better outcomes and reduced costs.
II. SCOPE
The scope of this research is primarily on the performance analysis of disease prediction approaches using different variants of supervised machine learning algorithms i.e comparison among different supervised machine learning algorithms.
1)Decision Tree- It is a tree-structured classifier, where internal nodes represent the features of a dataset, branches represent the decision rules and each leaf node represents the outcome.
2)Random Forest- The random forest is a classification algorithm consisting of many decisions trees. It uses bagging and feature randomness when building each individual tree to try to create an uncorrelated forest of trees whose prediction by committee is more accurate than that of any individual tree.
3)Naïve Bayes- It is a classification technique based on Bayes' Theorem with an assumption of independence among predictors. In simple terms, a Naive Bayes classifier assumes that the presence of a particular feature in a class is unrelated to the presence of any other feature
III. Algorithms
A. RANDOM FOREST
It is a supervised learning algorithm based on the ensemble learning technique.
This algorithm works well even when large proportion of data is missing. It can be applied for both classification and regression tasks.
Figure 1 Flow Chart for RandomForest
B. DECISION TREE
It is a supervised learning based tree structured classifier where the nodes represent the feature of the dataset, the branches represent the decision rules and the leaf nodes represent the outcomes. The decision tree is built using the CART algorithm.
Figure 2 Flow Chart for Decision Tree
A decision tree has multiple layers, which makes it complex. With more class labels, the computational complexity may increase. It may also come across an over fitting issue which can be resolved using the random forest method.
C. NAIVE BAYES
It is supervised learning algorithm based on the Bayes algorithm for classification problems.
is the posterior probability or the probability of hypothesis A on observed event B
is the likelyhood probability or the probability of evidence given that the probability of hypothesis is true.
is the prior probability, i.e; the probability of hypothesis before observing the evidence.
is the marginal probability or the probability of evidence.
Figure 3 Flow Chart for Naive Bayes
.
IV. OBJECTIVE
1. Accurate Disease Prediction
2. Fast response based on symptoms
V. METHOD
Figure 4 Flowchart of the process
.
VI. CONCLUSION
In an earlier research on the same data, highest accuracy of 95.01 was obtained using decision tree algorithm using AdaBoost[15]. On applying machine learning algorithms on extracted features, it is observed that maximum accuracy (0.964 or 96.4%) is achieved with Random Forest algorithm. As suggested by the evaluation metrics we observe excellent performance of majority algorithms on classification of normal patients. Good results are observed in classification of suspicious and pathological patients in random forest algorithm and ensemble learning as compared to the other machine learning algorithms. Using random forest algorithm we have significantly improved the classification of suspicious and pathological patients as compared to the work done till date.
VII. REFERENCES
1) Combined Benefit of Prediction and Treatment: A Criterion for Evaluating Clinical PredictionModels-https://www.ncbi.nlm.nih.gov/pmc/articles/PMC-/
2) Disease Prediction Using Machine Learning - https://www.irjet.net/archives/V6/i5/IRJET-V6I5977.pdf
3) Designing Disease Prediction Model Using Machine Learning Approach: https://ieeexplore.ieee.org/document/-