- Research
- Open access
- Published:
Optimizing machine learning models for predicting anemia among under-five children in Ethiopia: insights from Ethiopian demographic and health survey data
BMC Pediatrics volume 25, Article number: 311 (2025)
Abstract
Background
Healthcare practitioners require a robust predictive system to accurately diagnose diseases, especially in young children with conditions such as anemia. Delays in diagnosis and treatment can have severe consequences, potentially leading to serious complications and childhood mortality. By leveraging machine learning methods with extensive datasets, valuable and scientifically sound insights can be generated to address pressing health and healthcare-related challenges.
Objectives
The primary objective of this study was to identify the most effective machine-learning algorithm for predicting anemia among under five children in Ethiopia.
Methods
The data utilized in this study were sourced from the 2016 Ethiopian Demographic and Health Survey. Six machine-learning models, comprising a classic logistic regression model along with random forest, decision tree, support vector machine, Naïve Bayes, and K-nearest neighbors, were employed to predict factors influencing anemia in children under five. The predictive capacities of each machine-learning model were evaluated using receiver operating characteristic curves and various measures of model accuracy.
Results
The random forest model demonstrated the highest accuracy among the algorithms tested, achieving an overall accuracy of 81.16%. The accuracy rates for the decision tree, support vector machines, Naïve Bayes, K-nearest neighbors, and classical logistic regression models were 68.40%, 59.94%, 53.06%, 69.96%, and 54.79%, respectively.
Conclusion
In general, the random forest algorithm emerged as the preferred model for predicting anemia in children under five. The model exhibited a specificity of 79.26%, sensitivity of 83.07%, positive predictive value of 80.02%, negative predictive value of 82.40%, and an area under the curve of 81.80%.
Background
Anemia is a condition characterized by reduced hemoglobin levels or a decrease in red blood cell count or hematocrit levels, where the hemoglobin content in the blood falls below the standard range. This deficiency can stem from inadequate essential nutrients [1], parasitic infections, significant blood loss, or congenital hemolytic diseases. According to the World Health Organization (WHO), a child under the age of five with a blood hemoglobin level below 110 g per liter (Hgb < 11.0 g/dl) is classified as anemic [2]. Various forms of anemia exist, including sickle cell anemia, iron deficiency anemia, hemolytic anemia, aplastic anemia, inflammation-related anemia, and anemia caused by vitamin deficiencies [3].
Machine learning (ML) is a technology that empowers machines to learn from data and past experiences in order to make decisions and predictions [4]. Machine learning algorithms, a component of artificial intelligence, are instrumental in uncovering novel medical insights and introducing fresh perspectives to healthcare professionals. Disease predictions play a crucial part in the realm of data mining [5]. Data mining tasks can be categorized into two main groups. Descriptive data mining aims to characterize the general attributes of a given dataset, while predictive data mining focuses on making predictions based on the insights gleaned from the data, employing supervised and unsupervised machine learning techniques for inference [6, 7].
The utilization of machine learning (ML) is progressively growing within the medical sector. Disease diagnosis and outcome prediction are two specific areas that gain advantages from the implementation of ML [8]. Healthcare facilities store vast and sensitive data that necessitate meticulous handling procedures. Healthcare practitioners rely on robust prediction systems to ensure accurate disease diagnoses, especially in cases involving health issues like anemia in children under five. Timely diagnosis and treatment are essential to avoid severe complications and reduce childhood mortality rates. Machine learning methods are crucial in addressing these pressing health challenges effectively [6].
Multiple research investigations have indicated that anemia poses a significant public health challenge in Ethiopia [9,10,11,12,13,14,15]. Children under five from low-income families face a heightened risk of developing anemia due to iron deficiency, driven by the increased demand for iron during their rapid growth phase. However, anemia can impact individuals across all age groups [16]. Anemia impacted around 1.62Â billion individuals worldwide [17]. Globally, about 9.6Â million children were severely anemic [18] In 2017, approximately 293.1Â million (47.4%) children under the age of five were anemic globally, with 67.6% of these children residing in Africa [19]. Anemia poses a significant challenge in Sub-Saharan African nations like Mali55.8% [20]; Kenya 48.9% [21] and Tanzania 79.6% [22]. It is estimated that anemia affected between 36.4 and 61.9% of children under the age of five in sub-Saharan Africa [23]. Similarly, As of the latest available data up to 2021, anemia continues to be a significant public health issue affecting approximately 83.5Â million children in Sub-Saharan Africa, with a prevalence rate of 67% [24]. It is a pressing issue in both developed and developing nations [25].
The prevalence of anemia in under-five children significantly varies from nation to nation and region to region in particular. The studies report a prevalence of anemia ranging from 12 to 59% among children under the age of five. As per the findings of the Ethiopian Demographic and Health Survey (EDHS), the prevalence of anemia among children in Ethiopia stands at 57% [26]. The Ethiopian government has established a goal to decrease the prevalence of anemia in children under five years old from 39 to 24% by the year 2020 [27]. Nevertheless, based on the results of the aforementioned studies, Ethiopia remains significantly distant from reaching its goal. One of the tactics to reach the target is the identification and treatment of anemia. On a worldwide scale, anemia has severe repercussions, particularly concerning economic and social progress. Increased illness and mortality are also among its dire outcomes [28]. Anemia among children under five can result in low resistance to infections, reduced cognitive function, and the serious consequences of heart failure, leading to increased morbidity and mortality [29, 30]. Anemia contributes significantly to child mortality rates in many developing nations, such as Ethiopia. Hence, it is crucial to implement comprehensive intervention strategies supported by scientific research to address these critical public health issues.
The risk factors for anemia vary across nations and regions and encompass factors such as nutritional deficiencies, intestinal parasites, HIV infection, malaria, chronic illnesses such as sickle cell disease, and blood-related cancers [24, 31]. Factors contributing to the decline in hemoglobin levels among children include mothers’ lack of awareness about the issue [32], poor iron absorption from diets [33], inadequate nutritional habits, unhealthy eating patterns [34], reduced physical activity [30], and parasitic infections [35]. Anemia among children under five is also linked to factors like low socioeconomic status, family size, ignorance, and illiteracy. Gastrointestinal blood loss caused by intestinal worms and hookworm infections leads to the depletion of iron stores, impacting erythropoietin production [36]. This results in poor absorption and reduced appetite, worsening micronutrient deficiencies and childhood anemia [29].
In the past, the socioeconomic and demographic characteristics linked to anemia in children under the age of five in Ethiopia have been extensively explored using traditional regression models. Numerous studies have investigated the prevalence of anemia and its influencing factors in this age group across different regions of Ethiopia, employing common cross-sectional statistical methods. Both bivariate and multivariate logistic regression analyses have been utilized. However, due to the limitations of cross-sectional statistical approaches, they are unable to establish causal relationships, thereby restricting the exploration of uncommon or innovative patterns. Previous research has often relied on small-scale datasets with a limited number of risk factors, typically confined to a single district or city. Earlier research demonstrates enhanced machine learning (ML) efficacy in predicting childhood anemia and its determinants [37,38,39,40]. Sound decision-making ultimately stems from dependable information acquired through meticulous examination and consolidation of data utilizing various machine-learning methods. Employing machine learning techniques with extensive datasets such as EDHS offers policymakers invaluable and well-founded insights [5, 6]. Detecting anemia in children demands significant resources and poses challenges, especially in rural areas where resources are limited [41]. The application of predictive analytics in healthcare has the potential to substantially alleviate the workload of healthcare providers in terms of patient diagnosis and treatment, leading to a profound transformation in the healthcare systems of both developed and developing countries [6, 42].
In this advanced model, a variety of algorithms have been employed to choose the most suitable machine learning algorithms instead of relying on just one. Consequently, this study compared the performance of different machine learning algorithms based on their accuracy to determine the optimal algorithm for early-stage prediction of anemia in children under five. In this study, advanced machine learning models were employed with a national dataset to determine the most suitable machine learning model for predicting anemia among under five age children in Ethiopia.
Methods
Study design, setting and period
A cross-sectional study was conducted with children under the age of five in Ethiopia from June 5 to July 25, 2023, utilizing data from the 2016 Ethiopian Demographic and Health Survey.
Study participants’ determination and sampling procedures
The data for this study were derived from the 2016 Ethiopian Demographic and Health Survey. Ethiopia, as a participant in the global Demographic and Health Surveys program, undertook the EDHS 2016 for the fourth time. The Central Statistical Agency (CSA) conducted the survey at the request of the Federal Ministry of Health (FMoH). Information was collected using a standardized and validated questionnaire. Interviewers utilized tablet computers equipped with Bluetooth technology for seamless electronic data transmission during the interviews. The survey sampled over 18,008 households across 624 clusters spanning nine regions and two administrative cities in Ethiopia. Using a two-stage cluster design, the 2016 EDHS sample selection process involved selecting census enumeration areas (EAs) as sampling units in the first stage. Employing a conventional two-tier stratification approach, the population was initially segmented into regions and subsequently into urban and rural areas within each region. The sample included 645 EAs, comprising 202 in urban regions and 443 in rural regions. The second stage of sampling involved households, with the 645 selected enumeration areas undergoing a thorough household listing using equal probability systematic sampling proportional to the EA size. All women aged 15 to 49 who had given birth within the five years preceding the survey were eligible to participate. The study aimed to include a sample of 10,006 children under the age of five. Following the removal of missing data, 9,501 under-five children (Fig. 1) were included in the study from the initial pool, considering the significant presence of missing values across various variables in the EDHS dataset.
Eligibility criteria
The study utilized data pertaining to children under the age of five from the 2016 EDHS dataset. Participants with inadequate data and missing values were not considered in this analysis. Different age groups or populations outside of under-five children in Ethiopia were excluded.
Study variables and operational definitions
The study focused on determining the presence or absence of anemia in children under five years old. Anemia status was indicated by a binary coding system where a value of zero signified the absence of anemia, while a value of one indicated its presence. The identification of anemia in a child required hemoglobin or hematocrit measurements, which were assessed based on symptoms reported by the mothers [1].
Data quality management
The study sourced its data from the secondary data of the 2016 EDHS. The data extraction process adhered closely to the outlined procedures illustrated in Fig. 1. Thorough data cleaning was conducted, with a primary focus on the quality of the fieldwork during the EDHS 2016 data collection phase. Emphasizing appropriate measures to enhance data quality during processing was crucial. Key stages like data entry and editing were pivotal in identifying and rectifying inconsistencies to address missing data. Every essential data pre-processing step was meticulously carried out to ensure data quality.
Data processing
The extracted datasets contained 25 variables and 9,501 instances. Various data preprocessing techniques, including data cleaning, transformation, handling of class imbalances, and feature selection methods, were applied. Not all variables were deemed relevant for developing a predictive model to predict anemia among children under five in Ethiopia. Mode imputation techniques were utilized to address missing values in categorical data. Redundant data was manually removed. Variables with numerous categorical values, such as drinking water source, body mass index, wealth index, parents’ occupation, and fuel type, underwent discretization through binning methods to convert them into discrete values. These variables had multiple distinct values that required transformation for analytical purposes. After completing essential data preprocessing tasks, a dataset comprising 9,501 instances with 16 attributes was selected for subsequent analysis and predictive model development. This dataset was divided into training and testing sets in an 80/20 ratio. To mitigate data loss and tackle the imbalanced class levels within the training data, the Synthetic Minority Oversampling Technique (SMOTE) was employed.
The method of SMOTE, which employs a K-nearest neighbors (KNN) algorithm to generate synthetic samples by connecting selected neighboring points in space, was utilized [43, 44]. Instead of replicating existing data points, the algorithm assesses the proximity between feature vectors and their nearest neighbors [43, 44], generating synthetic data points that differ slightly from the original ones. By determining the distances between feature vectors and their closest neighbors, SMOTE was utilized to address the dataset’s imbalance.
Predictive model development
Predictive modeling solutions develop a model by analyzing historical and current data to predict future outcomes [45]. In this research, to build a predictive model for predicting anemia in children under five in Ethiopia, various machine-learning algorithms such as Decision Tree, Random Forest, K-nearest neighbor, Support Vector Machine (SVM), Naive Bayes were employed, alongside Logistic Regression for comparison purposes. Grid search was utilized to optimize the hyperparameters of each algorithm. Selecting appropriate hyperparameters is crucial in developing machine learning models as it significantly influences the algorithm’s performance [46,47,48]. Various metrics such as accuracy, sensitivity, specificity, positive predictive value, negative predictive value, and ROCAUC were employed to evaluate the performance of each predictive model. Data processing and analysis were conducted utilizing the R programming language and the caret package [49].
We randomly selected and trained 80% of the sample, reserving the remaining 20% for 10-fold cross-validation to optimize the model’s parameters. The performance of the model was assessed using test data from this reserved 20% sample. To evaluate the models’ ability to predict instances of anemia and no-anemia in children accurately, metrics such as sensitivity, specificity, positive predictive value, and negative predictive value were calculated.
The model’s performance in distinguishing between instances with anemia and those without was evaluated using the Area Under the Curve (AUC) and Receiver Operating Characteristic (ROC) curve metrics. ROC curves compare sensitivity to specificity across various thresholds to assess the model’s ability to predict binary outcomes. The AUC, which summarizes the ROC curve, indicates how effectively a classifier can differentiate between different classes [50]. As the Area Under the Curve (AUC) increases, the model’s effectiveness in distinguishing between positive and negative classes also improves [50]. The Mean Decrease in Gini, which indicates the importance of variables for the model, was calculated for each variable in the machine learning models. A graphical representation was utilized to display the best predictive machine learning model, automatically highlighting the top variable categories based on their Mean Decrease in Gini.
Results
Descriptive results of the socio-demographic characteristics
Out of 9501 study participants, over half were male (51.1%), while nearly 81% resided in rural areas. Conversely, approximately 64% had no formal education. About 37% of participants were classified as the poorest based on the family wealth index. Roughly, 60% of mothers were unemployed. The largest portion of respondents (21.1%) fell within the 48-59-month age range category (Table 1).
Environmental characteristics of respondents
Within the Ethiopian Demographic Health Survey, 55.9% of study participants reported traveling a significant distance to access health services. The survey revealed that approximately 53% lacked access to an improved drinking water source, while nearly 87% did not have an improved toilet facility. Regrettably, around 95% of respondents were found to be without health insurance based on the survey findings (Table 2).
Nutritional and co-morbid characteristics among under-five children
Of all the participants, close to 96% of children were breastfed, and approximately 89.8% (8532 children) had no reported history of diarrhea. However, 59% (5613 children) did not receive Vitamin-A supplements, and the majority—88.3% (8393 children)—did not take intestinal parasite medication in the preceding six months. In terms of nutritional health, around 9.7% (922 individuals) were classified as wasted, and 42.1% (3997 children) were identified as stunted (Table 3).
Based on data from the Ethiopian Demographic Health Survey, the prevalence of anemia among children under the age of five in Ethiopia was approximately 40% (Fig. 2).
Predicting under-five anemia status
Among the six models evaluated, the random forest algorithm demonstrated the highest predictive accuracy for anemia at 81.16%, surpassing the K-nearest neighbor model at 69.96% and the decision tree model at 68.4%. The positive and negative predictive values for the random forest algorithm were 80.02% and 82.40%, respectively. Additionally, the sensitivity and specificity of the random forest model were calculated at 83.07% and 79.26%, respectively. Below are the results for the six machine learning models assessed: Support Vector Machine, Random Forest, K-Nearest Neighbor, Decision Tree, Naive Bayes, and Logistic Regression models (Table 4).
ROC curve for the tested models
In Fig. 3, a graphical depiction of the receiver operating characteristic (ROC) curve is presented. The RF model exhibits the highest Area under the Curve (AUC) value compared to the other five machine learning models analyzed in this study, indicating its superior ability to distinguish between children with anemia and those without.
Discussion
This study provided a concise overview of algorithm comparisons for predicting anemia in children under five in Ethiopia using machine-learning techniques. Six machine learning algorithms, including Random Forest, Decision Tree, Naive Bayes, KNN, SVM, and Logistic Regression, were evaluated for their predictive capabilities in determining anemia in young children. The Random Forest algorithm exhibited the highest performance metrics, achieving 81.16% accuracy, 83.07% sensitivity, 79.26% specificity, 80.02% positive predictive value, and 82.40% negative predictive value. The RF model demonstrated superior prediction accuracy and AUC values compared to the other models assessed. Random Forest is known for its robust performance and reduced overfitting risks in comparison to individual decision trees. Given the extensive and diverse dataset like the 2016 DHS data, Random Forest could effectively generalize and provide accurate predictions in this context. A research endeavor undertaken in Bangladesh, employing a machine-learning algorithm, yielded results akin to those observed in our own investigation [39]. It showed that random forest was the most favorable machine-learning algorithm to predict anemia. This resemblance could stem from the shared application of a sizable, nationally representative demographic health survey dataset in both studies. Similarly, the utilization of six machine-learning algorithms in both studies could offer further rationale for the congruent findings. Conversely, the parity might also be attributed to the relatively high quality of demographic and health survey data compared to other institutional and smaller-scale datasets. Moreover, the efficacy of Random Forest in managing intricate data relationships might contribute to the consistency of outcomes. Given the expansive and varied datasets typically present in national health surveys like the DHS, Random Forest is adept at capturing complex interplays between diverse factors that potentially affect anemia in children under five.
Likewise, another study carried out in Afghanistan, employing machine-learning algorithms, presented a comparable outcome to the current study. It revealed that random forest was the most suitable machine learning algorithm to predict anemia [51]. This resemblance could potentially be attributed to the likeness in sociodemographic traits among the participants in both studies.
Similarly, the results of this research mirrored those of a study conducted in Nepal, where random forest emerged as the top performer with an accuracy of 98.4% [52]. This similarity could be linked to the similar feature selection processes employed in both studies.
Conversely, our study aligns with a machine learning investigation carried out among young girls in Ethiopia [38]. This alignment could be attributed to the utilization of machine learning algorithms with the 2016 extensive representative demographic and health survey dataset in both studies.
In contrast, our study presented conflicting conclusions compared to a study conducted in India among children [53].
Strengths and limitations
The extensive and nationally representative EDHS dataset holds significant implications for policymakers beyond individual institutional studies. What sets this research apart is the distinctive comparison of diverse machine-learning algorithms. Hence, we utilized a variety of machine learning algorithms to determine the most effective algorithm for predicting anemia in children under the age of five. This selection process could have been overlooked if only a single statistical analysis technique had been used. We relied on secondary data, which meant that certain clinical indicators of anemia, like pallor of the palm, conjunctiva, and tongue, were not available. Inclusion of these important clinical features could enhance the moderate performance of the machine learning algorithms noted in this study. Consequently, exploring the application of machine learning algorithms for predicting childhood anemia by incorporating these vital clinical variables could be a promising avenue for future research. It is important to note that this study focuses exclusively on the population of children under the age of five. A primary constraint of this study stems from the fact that the diagnosis of anemia in children was based on symptoms reported by mothers rather than on objective hemoglobin or hematocrit measurements. This method introduces the possibility of a considerable number of false negatives within the dataset. While the Random Forest model demonstrates promise in predicting anemia among children under five, it is imperative to acknowledge certain limitations. One significant concern pertains to the absence of external validation. The lack of external validation is a crucial consideration as it affects the generalizability of the model’s findings to other populations or datasets. Without external validation on independent datasets, the model’s performance and predictive power may not be accurately assessed across different settings. Future studies should prioritize external validation to ensure the robustness and reliability of the model’s predictive capabilities in diverse populations.
Conclusions
In this research, six machine learning algorithms—Random Forest, Decision Tree, Naive Bayes, KNN, SVM, and Logistic Regression—were constructed using homogeneous ensemble machine learning techniques. Twelve experiments were conducted for this study. The Random Forest algorithm demonstrated the highest performance, achieving 81.16% accuracy, 83.07% sensitivity, 79.26% specificity, 80.02% positive predictive value, and 82.40% negative predictive value. Random Forest offers feature importance scores, aiding in identifying the key factors contributing to anemia in children under five in Ethiopia. This information can be crucial for policymakers and healthcare professionals in prioritizing intervention strategies based on the most significant factors.
Data availability
The datasets used or analyzed during the current study are available from the corresponding author upon reasonable request.
Abbreviations
- AUC:
-
Area under the Curve
- KNN:
-
K-Nearest Neighbor
- ML:
-
Machine Learning
- NP:
-
Negative Predictive
- PP:
-
Positive Predictive
- RF:
-
Random Forest
- ROCAUC:
-
ROC curve
- SMOTE:
-
Synthetic Minority Over-Sampling Method
- SVM:
-
Support Vector Machine
References
DeMaeyer E, Adiels-Tegman M. The prevalence of anaemia in the world. World health statistics quarterly. 1985;38(3):302–316;. 1985.
Zuffo CRK, Osório MM, Taconeli CA, Schmidt ST, Silva BHCd, Almeida CCB. Prevalence and risk factors of anemia in children. Jornal De Pediatria. 2016;92:353–60.
Vieth JT, Lane DR, Anemia. Emerg Med Clin. 2014;32(3):613–28.
Géron A. Hands-on machine learning with Scikit-Learn, Keras, and tensorflow. O’Reilly Media, Inc.; 2022.
Feaster DJ, Pan Y, Nelson M, Sorensen J, Metsch LR. Predicting sexually transmitted infections in sexually transmitted disease clinics in U.S.: A machine learning approach. Drug Alcohol Depend. 2015;156:e67–e.
Uddin S, Khan A, Hossain ME, Moni MA. Comparing different supervised machine learning algorithms for disease prediction. BMC Med Inf Decis Mak. 2019;19(1):281.
Prediction of Diseases in Smart Health Care System using Machine Learning. Int J Recent Technol Eng. 2020;8(5):2534–7.
Sidey-Gibbons JA, Sidey-Gibbons CJ. Machine learning in medicine: a practical introduction. BMC Med Res Methodol. 2019;19:1–18.
Alamneh YM, Akalu TY, Shiferaw AA, Atnaf A. Magnitude of anemia and associated factors among children aged 6–59 months at Debre Markos referral hospital, Northwest Ethiopia: a hospital-based cross-sectional study. Ital J Pediatr. 2021;47:1–10.
Aliyo A, Jibril A. Anemia and associated factors among under five year old children who attended bule hora general hospital in West Guji zone, Southern Ethiopia. J Blood Med. 2022:395–406.
Kebede D, Getaneh F, Endalamaw K, Belay T, Fenta A. Prevalence of anemia and its associated factors among under-five age children in Shanan Gibe hospital, Southwest Ethiopia. BMC Pediatr. 2021;21:1–9.
Malako BG, Teshome MS, Belachew T. Anemia and associated factors among children aged 6–23 months in Damot sore district, Wolaita zone, South Ethiopia. BMC Hematol. 2018;18:1–9.
Andersen CT, Tadesse AW, Bromage S, Fekadu H, Hemler EC, Passarelli S, et al. Anemia etiology in Ethiopia: assessment of nutritional, infectious disease, and other risk factors in a population-based cross-sectional survey of women, men, and children. J Nutr. 2022;152(2):501–12.
Enawgaw B, Workineh Y, Tadesse S, Mekuria E, Addisu A, Genetu M. Prevalence of anemia and associated factors among hospitalized children attending the university of Gondar hospital, Northwest Ethiopia. Ejifcc. 2019;30(1):35.
Gebereselassie Y, BirhanSelassie M, Menjetta T, Alemu J, Tsegaye A, Magnitude. Severity, and associated factors of anemia among Under-Five children attending Hawassa university teaching and referral hospital, Hawassa, Southern Ethiopia, 2016. Anemia. 2020;2020(1):7580104.
Yusakanshori M, Purbasari S. Determinant factors affecting the nutritional status of children in regional health center of Gresik. Indian J Public Health Res Dev. 2019;10(11).
Gaston RT, Ramroop S, Habyarimana F. Joint modelling of malaria and anaemia in children less than five years of age in Malawi. Heliyon. 2021;7(5):e06899.
Organization WH. The global prevalence of anemia in 2011. Geneva: PLoS One. 2015;10(12):e0143497.
McLean E, Cogswell M, Egli I, Wojdyla D, de Benoist B. Worldwide prevalence of anaemia, WHO vitamin and mineral nutrition information system, 1993–2005. Public Health Nutr. 2009;12(4):444–54.
Hall A, Roschnik N, Ouattara F, Touré I, Maiga F, Sacko M, et al. A randomised trial in Mali of the effectiveness of weekly iron supplements given by teachers on the haemoglobin concentrations of schoolchildren. Public Health Nutr. 2002;5(3):413–8.
Neumann CG, Bwibo NO, Murphy SP, Sigman M, Whaley S, Allen LH, et al. Animal source foods improve dietary quality, micronutrient status, growth and cognitive function in Kenyan school children: background, study design and baseline findings. J Nutr. 2003;133(11 Suppl 2):s3941–9.
Tatala SR, Kihamia CM, Kyungu LH, Svanberg U. Risk factors for anaemia in schoolchildren in Tanga region, Tanzania. Tanzan J Health Res. 2008;10(4):189–202.
Roberts DJ, Matthews G, Snow RW, Zewotir T, Sartorius B. Investigating the Spatial variation and risk factors of childhood anaemia in four sub-Saharan African countries. BMC Public Health. 2020;20:1–10.
Kassebaum NJ, Jasrasaria R, Naghavi M, Wulf SK, Johns N, Lozano R, et al. A systematic analysis of global anemia burden from 1990 to 2010. Blood J Am Soc Hematol. 2014;123(5):615–24.
Mainasara A, Ibrahim K, Uko E, Jiya N, Erhabor O, Umar A, et al. Prevalence of anaemia among children attending paediatrics department of UDUTH, Sokoto, North-Western Nigeria. IBRR. 2017;7(1):1–10.
Central Statistical Agency [Ethiopia] II. Ethiopia demographic and health survey 2016. Addis Ababa, Ethiopia,Central Statistical Agency [Ethiopia], ICF International; 2017.
ETHIOPIA FDRO. National nutrition program 2016–2020.
De Benoist B, Cogswell M, Egli I, McLean E. Worldwide prevalence of anaemia 1993–2005; WHO Global Database of anaemia. 2008.
Ginzburg YZ, Glassberg J. Inflammation, hemolysis, and erythropoiesis lead to competitive regulation of Hepcidin and possibly systemic iron status in sickle cell disease. EBioMedicine. 2018;34:8–9.
Djokic D, Drakulovic MB, Radojicic Z, Crncevic Radovic L, Rakic L, Kocic S, et al. Risk factors associated with anemia among Serbian school-age children 7–14 years old: results of the first National health survey. Hippokratia. 2010;14(4):252–60.
Bremner KC. Pathogenetic factors in experimental bovine oesophagostomosis. II. Plasma iron, iron-binding capacity, and reticulocyte responses in bled and infected calves. Exp Parasitol. 1969;24(2):184–93.
Alaofè H, Zee J, Dossa R, O’Brien HT. Education and improved iron intakes for treatment of mild iron-deficiency anemia in adolescent girls in Southern Benin. Food Nutr Bull. 2009;30(1):24–36.
Hashizume M, Shimoda T, Sasaki S, Kunii O, Caypil W, Dauletbaev D, et al. Anaemia in relation to low bioavailability of dietary iron among school-aged children in the Aral sea region, Kazakhstan. Int J Food Sci Nutr. 2004;55(1):37–43.
Kikafunda JK, Lukwago FB, Turyashemererwa F. Anaemia and associated factors among under-fives and their mothers in Bushenyi district, Western Uganda. Public Health Nutr. 2009;12(12):2302–8.
Lufungulo Bahati Y, Delanghe J, Bisimwa Balaluka G, Sadiki Kishabongo A, Philippé J. Asymptomatic submicroscopic plasmodium infection is highly prevalent and is associated with anemia in children younger than 5 years in South Kivu/Democratic Republic of congo. Am J Trop Med Hyg. 2020;102(5):1048–55.
Bremner K. Pathogenetic factors in experimental bovine oesophagostomosis: IV. Exudative enteropathy as a cause of hypoproteinemia. Exp Parasitol. 1969;25:382–94.
Kebede Kassaw A, Yimer A, Abey W, Molla TL, Zemariam AB. The application of machine learning approaches to determine the predictors of anemia among under five children in Ethiopia. Sci Rep. 2023;13(1):22919.
Zemariam AB, Yimer A, Abebe GK, Wondie WT, Abate BB, Alamaw AW, et al. Employing supervised machine learning algorithms for classification and prediction of anemia among youth girls in Ethiopia. Sci Rep. 2024;14(1):9080.
Khan JR, Chowdhury S, Islam H, Raheem E. Machine learning algorithms to predict the childhood anemia in Bangladesh. J Data Sci. 2019;17(1):195–218.
Meitei AJ, Saini A, Mohapatra BB, Singh KJ. Predicting child anaemia in the North-Eastern States of India: a machine learning approach. Int J Syst Assur Eng Manage. 2022;13(6):2949–62.
Rajula HSR, Verlato G, Manchia M, Antonucci N, Fanos V. Comparison of conventional statistical methods with machine learning in medicine: diagnosis, drug development, and treatment. Medicina. 2020;56(9):455.
Bao Y, Medland NA, Fairley CK, Wu J, Shang X, Chow EPF, et al. Predicting the diagnosis of HIV and sexually transmitted infections among men who have sex with men using machine learning approaches. J Infect. 2021;82(1):48–59.
Longadge R, Dongre SJ. Class imbalance problem in data mining review. 2013.
Ribeiro RJnO. SMOTE for Regression. 2015;2013.
Kuhn M, Johnson K. Applied predictive modeling: Springer; 2013.
Healy MJAodic. Statistics from the inside. 15. Multiple regression (1). 1995;73(2):177.
Mantovani RG, Rossi ALD, Alcobaça E, Gertrudes JC, de Junior SB. Carvalho ACPdLFJapa. Rethinking default values: a low cost and efficient strategy to define hyperparameters. 2020.
Ramadhan MM, Sitanggang IS, Nasution FR, Ghifari AJDToCS. Engineering. Parameter tuning in random forest based on grid search method for gender classification based on voice frequency. 2017;10.
Kuhn M, Wing J, Weston S, Williams A, Keefer C, Engelhardt A, et al. Package ‘caret’. 2020;223:7.
Florkowski CMJTCBR. Sensitivity, specificity, receiver-operating characteristic (ROC) curves and likelihood ratios: communicating the performance of diagnostic tests. 2008;29(Suppl 1):S83.
Zahirzada A, Zaheer N, Shahpoor MA. Machine learning algorithms to predict anemia in children under the age of five years in Afghanistan: A case of Kunduz Province. J Surv Fisheries Sci. 2023;10(4S):752–62.
Dhakal P, Khanal S, Bista R. Prediction of anemia using machine learning algorithms. Int J Comput Sci Inform Technol. 2023;15(1):15–30.
Anand P, Gupta R, Sharma A. June. Prediction of Anaemia among children using Machine Learning Algorithms. no 2020:469– 80.
Acknowledgements
We would like to thank the Ethiopian Central Statistics Agency for supplying us the data and an explanation of the data set.
Funding
There is no funding for this research work.
Author information
Authors and Affiliations
Contributions
Conceptualization: Ali Yimer, Abdulaziz Kebede Kassaw, Alemu Birara Zemariam, Endris Mussa, Hassen Ahmed Yesuf, Sada Ahmed, Nurye Sirage, Adem Yesuf. Data curation: Ali Yimer, Abdulaziz Kebede Kassaw, Alemu Birara Zemariam, Endris Mussa, Hassen Ahmed Yesuf, Sada Ahmed, Nurye Sirage, Adem Yesuf. Formal analysis: Abdulaziz Kebede. Methodology: Ali Yimer, Abdulaziz Kebede Kassaw, Alemu Birara Zemariam, Endris Mussa, Hassen Ahmed Yesuf, Sada Ahmed, Nurye Sirage, Adem Yesuf. Writing – original draft: Ali Yimer, Abdulaziz Kebede Kassaw, Alemu Birara Zemariam, Endris Mussa, Hassen Ahmed Yesuf, Sada Ahmed, Nurye Sirage, Adem Yesuf. Writing – review & editing: Ali Yimer, Abdulaziz Kebede Kassaw, Alemu Birara Zemariam, Endris Mussa, Hassen Ahmed Yesuf, Sada Ahmed, Nurye Sirage, Adem Yesuf. All authors reviewed the manuscript.
Corresponding author
Ethics declarations
Ethical approval and consent to participate
The researchers received the survey data approval letter from the USAID DHS program after registering with the link https://www.dhsprogram.com/data/dataset_admin/login_main.cfm and then the researchers of this study maintained the confidentiality and privacy of the data. The study does not require ethical approval because it was a secondary data analysis using the 2016 EDHS database. After receiving the data from the USAID–DHS program, the researchers in this study maintained the data’s anonymity. During the survey, informed consent was received from the study participants before the start of the study.
Consent for publication
Not applicable.
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Yimer, A., Yesuf, H.A., Ahmed, S. et al. Optimizing machine learning models for predicting anemia among under-five children in Ethiopia: insights from Ethiopian demographic and health survey data. BMC Pediatr 25, 311 (2025). https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s12887-025-05659-9
Received:
Accepted:
Published:
DOI: https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s12887-025-05659-9