Learn more: PMC Disclaimer | PMC Copyright Notice
SmartHeLP: Smartphone-based Hemoglobin Level Prediction Using an Artificial Neural Network
Abstract
Blood hemoglobin level (Hgb) measurement has a vital role in the diagnosis, evaluation, and management of numerous diseases. We describe the use of smartphone video imaging and an artificial neural network (ANN) system to estimate Hgb levels non-invasively. We recorded 10 second-300 frame fingertip videos using a smartphone in 75 adults. Red, green, and blue pixel intensities were estimated for each of 100 area blocks in each frame and the patterns across the 300 frames were described. ANN was then used to develop a model using the extracted video features to predict hemoglobin levels. In our study sample, with patients 20-56 years of age, and gold standard hemoglobin levels of 7.6 to 13.5 g/dL., we observed a 0.93 rank order of correlation between model and gold standard hemoglobin levels. Moreover, we identified specific regions of interest in the video images which reduced the required feature space.
Introduction
Hemoglobin provides the oxygen-carrying capacity of blood, a critical need for optimal body organ function. Changes in hemoglobin, particularly decreases below optimal levels, compromise health and characterize multiple diseases. The fraction of the global population affected is large. For an example, just one condition-anemia (low blood hemoglobin level) because of iron deficiency- is estimated to be present in 1.6 billion people globally and more than 85% of the populations in Africa and Asia1,2.
The long-established gold standard method for hemoglobin measurement involves laboratory evaluation of venipuncture-obtained blood samples. This is uncomfortable, impractical when repeated measurements are needed in certain patients and for patients without easy access to health facilities, costly, inconvenient because the samples have to be transferred to a laboratory, and time-consuming. A point-of-care tool for inexpensive, accurate, non-invasive, immediate hemoglobin determination would be of major benefit to health systems, care-givers and patients worldwide.
The primary goal of our research is to develop a method for practical, non-invasive and accurate determination of blood hemoglobin levels. Existing commercial devices such as Pronto®SpHb®(Masimo Corp.) or NBM 200 (OrSense), show promising accuracy in the usual range of hemoglobin levels, but appear less accurate in complex clinical situations where such non-invasive immediate test result capacity would be very useful3. Simplicity, system cost, portability and power needs are all important practical issues in defining a widely-usable hemoglobin assessment tool.
Recent studies on noninvasive solutions for estimating hemoglobin levels use different approaches in feature collection and image analysis techniques4-14. In addition, there are several smartphone-based noninvasive hemoglobin (Hb) measurement technologies, such as ToucHb15, Masimo-Total Hemoglobin16, and Eyenaemia17. Most of these works suffer from one or more of the following limitations: 1) data analysis and accuracy18; 2) data capturing and feature collection19; 3) affordability and portability; and 4) lack of user-friendliness and addition of external modules19. Recent studies on noninvasive solutions for estimating hemoglobin levels report different approaches in feature collection and image analysis techniques4-14. There are several smartphone-based noninvasive hemoglobin (Hb) measurement technologies, such as ToucHb15, Masimo-Total Hemoglobin16, and Eyenaemia. These tools suffer from limitations in the different areas: 1) data capturing and feature collection; 2) data analysis and accuracy18; 3) affordability and portability; and 4) lack of user-friendliness and addition of external modules19.
The current report describes the development of a smartphone-generated video image system in which large data volume, multiple data features, and identification of critical data features and locations, have defined an efficient and rapid hemoglobin prediction tool with promising accuracy.
Relevant Literature
Smartphone-based point-of-care tools and applications are cost-effective and smartphone sensing capabilities allow promising health-care services. Mobile health-care tools have been commercialized for remote patient monitoring and/or management in multiple are as4-14,24-28. Multiple tools for noninvasive hemoglobin estimation have been described.
Edward et al. developed a smart-phone-based application (HemaApp) using the camera and multiple lighting sources, including infrared (IR) LED4. A Nexus 5 smart-phone with white, 880nm, and 970nm LED arrays was used to record a series of videos. Average intensity for each channel was calculated using a high-band pass filter, and FFT and SVM regression were applied for each combination of datasets. The best hemoglobin prediction yielded a rank order of 0.82 correlation with gold standard hemoglobin levels.
Anggraeni et al. reported on a non-invasive anemia detection system based on palpebral color observation using a smart-phone camera10. Their system captured digital images of the palpebral conjunctiva with an Asus Zenphone 2 Laser smartphone, in ambient lighting without flash, and then color-corrected with white paper. Red, green and blue (RGB) intensity values were extracted from the captured data, which were then evaluated using regression analysis. Among the three-color intensities, the red color intensity resulted in the highest relationship with R2 of 0.8139, with a correlation coefficient of 0.92.
Collins et al. used two consumer cameras, taking conjunctival photographs in ambient light; one a Panasonic DMC-LX5 digital camera and the other the internal rear-facing camera of an Apple iPhone 5S smartphone, with an in-frame calibration card11. A conjunctival erythema index (EI) was calculated using the equation: EI = log(Sred) – log(Sgreen). Using the LX5 camera, the palpebral conjunctival EI had a sensitivity of 93% and a specificity of 78% and the device iPhone 93% sensitivity and 66% specificity for the detection of anemia. EI analysis for recognition of anemia resulted in stronger statistical associations and higher positive likelihood ratios than assessments by three clinicians.
Using a G-Fresnel spectrometer with common smart-phones, Edwards et. al built a spectrograph system for quantitative measurement of hemoglobin concentration12. An android application was used to setup and receive raw data captured by the camera from a microelectronics-based camera controller board, and complete the whole data analysis. Their proposed solution showed promising response: 9.2% and 8.1% for retrieved absorption and reduced scattering coefficients, respectively, compared to a bench-top DRS system using a traditional spectrometer: showing 9.2% and 8.1% for retrieved absorption and reduced scattering coefficients.
Suner et al. conducted a study of the feasibility of digital photography in hemoglobin measurement13. Images were taken with a Sony digital camera and a mathematical model was used that showed the correlation of derivation and evaluation groups was r(117)=0.6. Notable is that this model was not tested under differing light conditions and in outdoor environments.
Li et al. proposed a novel Dynamic Spectrum method where a spectrograph was used with a computer to scan the transmission spectrum of fingertip tissue; they presented an average prediction correlation coefficient (R) of 0.8399 in their experiment14.
Methods
After obtained informed consent, under a Bangladeshi government approved the protocol, we recorded fingertip videos and collected blood samples by venipuncture for gold standard laboratory hemoglobin measurement by usual methods, from patients and visitors at the Amader Gram Breast Care Clinic in Khulna, Bangladesh. We also collected data on subject age and gender.
Methods of Video Capture
A research coordinator (RC) captured the index fingertip videos using a Google Nexus 4 smartphone where the RC maintained a normal finger-tip pressure while recording the video following the data collection protocol. The subjects were all believed to have normal blood oxygenation (SpO2) level. The subjects were seated. The skin tones of all subjects were similar. The smartphone flash has neither health hazard nor excessive heat. The following guidelines were followed while recording a fingertip video.
Guidelines: a) The subject’s right hand and fingers were clean. b) The fingertip video was recorded before taking the blood sample. c) The index finger was preferred, but other fingers were used as dictated by the condition of the tissues. d) The fingertip covered the smartphone camera lens, excluding ambient light. e) The smartphone camera flash was turned on before the video recording. f) The fingertip applied a light touch to the smartphone camera.
Image Processing Techniques
First, we extracted the frames from each fingertip video. The red, green, and blue (RGB) pixels were separated for each frame. We presented each frame with a color map based on the intensity of a pixel color (red, green or blue) (Figure 1). Afterward, red, green and blue channel data were normalized dividing each pixel intensity by 255. The duration of each video was from 11 to 13 seconds, with 30 frames per second resulting in a range of 330-390 frames per fingertip video.
The investigated fingertip video capturing, video frame extraction, and subdivision of image frame to generate the input feature matrix systems. A: a sample image frame extracted from a fingertip video. B: An index finger is put on the smartphone camera with the camera recording a 10-second video with flash on. The index finger covers both the camera lens and flash light. C: Three hundred frames are generated from each video. D-F: Each frame has red, green and blue (RGB) color pixels. The color maps of red, green, and blue color pixels of an image are presented in D, E, and F respectively. G: A frame, with red pixels only, is divided into a 10 × 10 block matrix. H: Coding of blocks that are not nearest neighbors (termed as Chessboard dataset).
Image Segmentation (Blocking the Frame)
In image analysis, dividing an image into separate segments is a common approach. For example, image segmentation, separating each item, is popular in biological system modeling20. Although image segmentation identifies individual visible images, the image frame in our circumstances contains invisible changes of pixel intensities. To visualize the pixel value changes, we subdivided an image into multiple regions and color-mapped each region averaging pixels intensities. The term region is mentioned as a ”block” in the following texts.
An image frame, for each color pixel, was divided into one hundred blocks. An image frame was a ten by ten block matrix as shown in Figure 1(G). If an image frame had 2000 × 1000 pixels, each block dimension would be 200 × 100 pixels. The average red, green and blue pixel intensity of each block was calculated. We generated 100 blocks and for each block the mean value of a single-color pixel (for example, red color pixel) from every frame. This calculation derived 300 mean values for three hundred frames for each block. The color map of a frame, a 10 × 10 block matrix, is presented in Figure 2. The Figure shows four different frames, frame number 1, 4, 7, and 9, to present the significant changes in color pixels which we observed in different blocks.
Averaging Block Pixel Information
Since the mean pixel values in each block were not similar, the color variation (color map) showed differences in color intensity. Here, light orange-red signified low mean value, and the block color became strong orange-red with higher mean values. We studied the pixel mean intensity variation through all frames. Figure 3 represents the average pixel intensity variation for block number 55, as an example. In our analysis, in this block area pixel intensity changed in each frame of the video. We compared this variation in multiple blocks to understand its characteristics. We observed that adjacent blocks had nearly similar pixel variation across the frame series. For example, block 1 and 2 had identical representation in pixel intensity, and changes began in block 5 and 7 (Figure 4).
Feature Selection and Dataset Preparation
We analyzed the mean pixel intensity for each block considering all frames. We observed random and wide intensity fluctuations in early frames. Subsequently, such fluctuations were rarely seen. Figure 5 shows the aberrant pixel variations we observed outside the selected box area. Movement while putting the finger on the smartphone camera and releasing the finger might cause these irregular pixel intensities. Since the fingertip videos were often a little longer than 10 seconds, we captured more than 300 frames in most videos. Because of these observations, we used frame numbers 101 to 300 (total 200) for our input features. The mean pixel value (red, green or blue pixel) of each block is considered as the feature of that block. 200 such features for each block location (recorded for 200 blocks of the same location for 200 frames) along with age, gender, and clinically measured hemoglobin level is considered as an observation. The resultant dataset has the size 75 × 100 × 203 (Number of subjects × Total no. of blocks in each frame × No. of features of each block location over 200 frames).Overall, we created three datasets of size 75 × 100 × 203 each considering red, green and blue mean pixel intensity as the feature. These datasets are termed as ARB (all red block), AGB (all green block), and ABB (all blue block) datasets respectively. When each dataset is fed in the ANN (discussed under ’Data Analysis’), the clinically measured hemoglobin value is considered as the target value and the rest as input feature matrix.
Data Analysis
To analyze the afore-described dataset, we built a computational model based on an Artificial Neural Network (ANN). ANN is considered a data-based nonlinear statistical modeling tool, in which the system determines the complex relationship between input and output.
Artificial Neural Network
The abundance of large datasets in the health-care sector and the interconnected complex relationships between each biological component has encouraged the scientific research community to incorporate Artificial Neural Network (ANN) models. We used an ANN that works in non-linear boundaries and is efficient to provide a better classification. ANN has the capability to learn from continual data and to update the model, which are not present in other machine learning algorithms such as decision tree. Again, the ANN generated models are parametric where the Support Vector Machine (SVM) produces non-parametric model. In ANN, the feedforward neural network is a simplest classification algorithm where the information forwards in one direction: from input to output21. Here, we used feedforward networks to predict the hemoglobin level, applying the input feature matrix. The command feedforward networks in general, have many layers, where the first layer is used for the network input, and the final layer gives the network output. In the middle of the network, the final layer and each other layer is connected to its previous layer. We used the built-in classifier implemented with the feedforwardnet command. In this network, we used a single hidden layer with 10 neurons (hiddenSizes = 10). We made three different types of input feature matrices. We considered randomly 70% of the full (ARB, ABB, or AGB) dataset for training the network. The rest of the dataset was again subdivided into two sections: testing (15%) and validation (15%). We used the similar approaches on the reduced datasets Chessboard and CBCD16 (described in section ’Feature Reduction’). We applied the train command in Matlab to train the network giving the input feature matrix. Finally, we estimated the hemoglobin level of the testing set using the Matlab net function. We simulated the prediction model with randomly selected training set five times to observe the association between the predicted and the gold standard hemoglobin levels.
Result and Discussion
We studied 75 patients, with an age range from 20 to 56 years, median age 35 years; there were 55 women and 20 men. All subjects had similar skin tone, and none had respiratory complaints to suggest less than normal SpO2 levels. The range of clinically measured gold standard hemoglobin levels for this sample of individuals was from 7.6 to 13.5 g/dL, with mean values of hemoglobin of 10.8 g/dL. We present the result of an analysis using R2 (goodness-of-fit measure), ANN performance concerning Mean Squared Error (MSE), Mean Absolute Error (MAE), ANN (maximum) training time (T) in second, sensitivity, and specificity.
Table 2a shows the R2 (goodness-of-fit measure), ANN performance with respect to Mean Squared Error (MSE), Mean Absolute Error (MAE), and ANN (maximum) training time (T) in second when we fed the ANN with datasets ARB, AGB, and ABB. Here, the red pixel-based (ARB) dataset show significantly better performance in ANN network. Because of this observation, we analyzed only the red pixel frames for each video in the next section. For each dataset, we randomly mixed the observations for 75 videos. We ran the ANN on each dataset five times to observe for changes in the output. Since the input feature matrix was separated randomly in training and testing sets, we observed different network performance with each combination. Here, we average the captured results from five simulations and show these in tabular form.
Table 2:
Results of application of ANN to ARB, AGB, ABB and the reduced (Chessboard and CBCD16) dataset, calculating the association (R2) and ANN performance.
| (a) ARB, AGB, and ABB dataset | ||||
|---|---|---|---|---|
| Dataset | R2 | MSE | MAE | T (Sec) |
| ARB | 0.972 | 0.043 | 0.150 | 90.2 |
| AGB | 0.901 | 0.152 | 0.201 | 57.4 |
| ABB | 0.845 | 0.237 | 0.359 | 38.0 |
| (b) Reduced dataset | ||||
|---|---|---|---|---|
| Dataset | R2 | MSE | MAE | T (Sec) |
| Chessboard | 0.94 | 0.090 | 0.225 | 34.0 |
| CBCD16 | 0.93 | 0.105 | 0.235 | 6.2 |
Important Feature Location
To identify possibly important locations of information in the video image frames, we divided the whole (10 × 10) block area of the frames into multiple sections termed as a combined block (CB) of 10 blocks each as shown in Figure 6. Then, we applied each CB as an input data source (feature matrix) for the ANN and calculated the associations (R2) and ANN network performance. Table 3a presents the five best performance results in which we observed that CB-6, 9, 1, 8, and 2 contained most prominent features. Then, we investigated which block location of the image frame was providing the best information. Here, we considered all possible combinations of two combined blocks (total 45 such combinations) together as a data source (input feature matrix) to identify the important part of the frame. Table 3b summarizes the top 5 combinations of CBs in terms of R2 value. As is presented, we observed that CB–1 and CB–6 together gave the best result. Studying the frame where the red pixels are presented as a color-map as shown in Figure 7, we observed that the combined block CB–1 and CB–6 had the most color variation in a frame. This variation helped the ANN to train the network well to provide reliable performance.

Ten blocks’ information was combined together as feature matrix that was defined as a CB (Combined Block).

The color map of each block average intensity is presented here. The combined block CB–1 and CB–6 demonstrates large color variation.
Table 3:
Results of application of ANN on each single and double combined block (CB) datasets of red pixels showing associations (R2) and ANN performance.
| (a) Single CB | ||||
|---|---|---|---|---|
| Dataset | R2 | MSE | MAE | T (Sec) |
| CB-6 | 0.941 | 0.079 | 0.2082 | 4.5 |
| CB-9 | 0.938 | 0.103 | 0.235 | 3.2 |
| CB-1 | 0.934 | 0.105 | 0.249 | 9.1 |
| CB-8 | 0.909 | 0.154 | 0.306 | 3.8 |
| CB-2 | 0.885 | 0.184 | 0.314 | 7.7 |
| (b) Double CB | ||||
|---|---|---|---|---|
| Dataset | R2 | MSE | MAE | T (Sec) |
| CB-1&6 | 0.973 | 0.039 | 0.147 | 10.9 |
| CB-6&7 | 0.959 | 0.063 | 0.192 | 9.2 |
| CB-6&9 | 0.958 | 0.062 | 0.191 | 6.7 |
| CB-1&5 | 0.956 | 0.067 | 0.189 | 8.5 |
| CB-2&6 | 0.951 | 0.076 | 0.206 | 7.6 |
Feature Reduction
As noted earlier, associated blocks had very similar information in each frame. We also presented the scenario in Figure 4, where the pixel intensities are pretty similar for close blocks. We use the Chessboard dataset reducing 50% blocks (75 × 50 × 203) as shown in Figure 8. We apply ANN on this feature matrix and the result is shown in Table 2b. Since we have seen that combined blocks CB–1 and CB–6 together show a strong association and ANN performance, we selected these combined blocks on chessboard dataset. The new dataset is termed as CBCD16 (75 × 10 × 203) is presented in Figure 8. ANN presents very close R2 to the association calculated for all blocks assigned in CB–1 and CB–6 as shown in Table 2b (2nd row).
Performance Curve
After training the ANN using different datasets, we checked the ANN (network) performance. We evaluated the training record (tr). Figure 10a shows the graphical representation of mean squared error (MSE) for training, testing, and validation sets where the Chessboard dataset is used. Figure 9a does not suggest that there are any major problems with the training set selected from the Chessboard dataset. We observed that the validation graph follows the testing graph: they are very similar, and the test curve is also similar. These similarities suggest that overfitting did not occur in this analysis. The performance curves for the ANN network applied to the CBCD16 dataset are presented in Figure 9b. Again, we observed that no overfitting is suggested.
Regression Analysis
The regression plots present the relationship between the estimated hemoglobin levels of the ANN network and the gold standard hemoglobin levels. Figure 10a displays the regression plots where we used the Chessboard dataset for 75 subjects. Figure 10b shows the regression plots for the CBCD16 dataset. We have shown the R2 values and the network performance for these two datasets in Table 2b. Since R2 value is an indication of the association between the outputs and targets, as hoped for, there is a linear relationship between estimated and gold standard hemoglobin values.
In this analysis, we specified ten hidden neurons. Since the R2 and the ANN network performance are reliable, we did not increase the number of hidden neurons because more neurons in the hidden layer might cause the problem to be under-characterized.
Sensitivity and Specificity
As previously stated, gold standard hemoglobin levels were 7.6 g/dL to 13.5 g/dL. We made two groups by hemoglobin level for sensitivity and specificity analysis. According to the WHO in defining anemia, the cut off value for men is 13 g/dL, and for women 12 g/dl22. Since we had a small sample, we consider 36 subjects with hemoglobin level above 10.8 g/dL as a non-anemic group. Rest of the subjects with values under this threshold level were treated as an anemic group. CBCD16 dataset (size=75 × 10 × 203) were used to train and test using ANN for this analysis. We calculated the sensitivity and specificity analysis on the test (30%) dataset of this CBCD16. We apply ANN five times on randomly selected 70% data of CBCD16 and measured the estimated hemoglobin values for the test set. Based on the estimated hemoglobin level, we calculated the sensitivity and specificity for each simulation. Then, we average the generated sensitivity and specificity values. We found average 94% sensitivity and 96% specificity for the test set.
Later, we determine the percentage of correctly classified hemoglobin values. The classification was done using the following criteria. If an estimated hemoglobin level was within the range of 0.5 of the target hemoglobin value, then we treated this estimated value as an adequately classified level. We test this simulation five times on randomly selected (30%) testing dataset. Here, we observed that the model could properly classify 92% testing data.
Conclusion
In the presented work, we report on a smartphone-based solution for hemoglobin level assessment where no additional equipment was used during the data collection process. We employed a fingertip video-an easy-to-image- body site. We chose random features for training, testing, and validation processes and we observed strong associations and network performances. We were able to reduce the dataset by fifty percent and obtain similar results. We identified the critical feature locations in a video frame that allowed us to discard more unnecessary features. Our system and observations suggest that such clinically useful non-invasive hemoglobin assessment systems are close to becoming a reality. In the future, the important feature locations, as well as reduced features, can be used to make the input for a dual-resolution (DR) Convolutional Neural Network (CNN). We are developing a cloud-based hemoglobin estimation system using the fingertip video captured on a smartphone-based application. Here, we will develop a smartphone-based mobile application (app) where users can capture fingertip videos by themselves. The app can detect the fingertip pressure and movement to record the video properly. The users will be able to upload their data from the smartphone to a cloud server where the image and data analysis part will be done. Later, the cloud can send back the estimated hemoglobin level to the mobile application.
Table 1:
Summary of features of selected reports on non-invasive hemoglobin measurement.
![]() |









