A Study on Machine Learning Algorithms with Different Encoding Techniques for Identifying the Right One for Patients' Big Data
Keywords:
Big Data, Encoding Techniques, Healthcare Data, Machine Learning Algorithms, Statistical MetricsAbstract
In predictive modeling, categorical features often arise problems because most supervised machine learning algorithms can read numerical data as input instead of categorical attributes. So, many encoding techniques are used to convert categorical values into a machine-understandable format. Besides, different classifier algorithms could show their performance differently on the Big dataset. Therefore, the study goal is to find a learning model that will be a better-suited approach to a large volume of patients' data. This study also checks which encoding technique help to provide the high accuracy of the trained models. We applied here some encoding techniques on patients' data individually and their composite strategies to training machines. However, encoding techniques applied to categorical features and models learned as a classifier do not perform well and provide better performance. Some models trained here using various encoding techniques do not even work when facing the patients' Big data. Moreover, the training time of all machine learning models was not the same for the dataset. Therefore, this paper would help developers to choose reliable machine learning models to design their systems considering patients' Big data.
Published
How to Cite
Issue
Section
License
©2024 Jahangirnagar University Journal of Science. All rights reserved. However, permission is granted to quote from any article of the journal, to photocopy any part or full of an article for education and/or research purpose to individuals, institutions, and libraries with an appropriate citation in the reference and/or customary acknowledgement of the journal.