Application of Several Machine Learning Algorithms for Multiple Stage Inference Data
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Historically, machine learning techniques have been dependent on utilizing data from two distinct phases to predict and identify particular occurrences. The outcomes of these studies may exhibit either validity or inaccuracy, represented by binary values of one or zero. An alternative term for this is a prognostication of one of two potential results. Several issues are present in this approach, which have the potential to yield inaccurate outcomes. The issues encompassed in this context consist of data imbalance, overfitting, and error propagation. This study aims to employ and use a multiple stage outcome approach to enhance accuracy and optimize the performance of outcomes. In this step of our research, we will be implementing the Multiclass Classification One-vs.-All methodology to analyze the data collected from various stages of the experiment's conclusion. In the subsequent phase, it is necessary to engage in the utilization or investigation of a diverse range of potential supervised models, which are trained through the application of machine learning algorithms. Subsequently, the determination of the model that exhibits a superior level of accuracy will be made by designating it as the victor. In our study, we employ and evaluate five distinct machine learning algorithms, namely Support Vector Machines (SVM), Logistic Regression (LR), Random Forest (RF), Gradient Tree Boosting (GTB), and Extremely Randomized Trees (ERF). These algorithms are used within our machine learning framework to analyze multi-stage data and ascertain the technique that exhibits the highest accuracy in predicting outcome stages. This multi-stage conclusion would effectively narrow down the problem or difficulties at hand, reduce the potential for errors, and enhance the ability to accurately predict and diagnose medical diseases or cyber security threats. A Python-based model was developed to execute the proposed methodology. The utilized notion employs a binary format, which has been substantiated by empirical evidence and offers two potential outcomes. Upon the completion of our research, it was determined that the Logistic Regression and Support Vector Machine algorithms exhibited better performance compared to the other algorithms when a multiple stage outcome was employed. The results were assessed in terms of accuracy, precision, recall, and the F measure