Development of an artificial intelligence‐based diagnostic model for Alzheimer's disease

Abstract Introduction The diagnosis of Alzheimer's disease (AD) is sometimes difficult for nonspecialists, resulting in misdiagnosis. A missed diagnosis can lead to improper management and poor outcomes. Moreover, nonspecialists lack a simple diagnostic model with high accuracy for AD diagnosis. Methods Randomly assigned data, including training data, of 6000 patients and test data of 1932 from 7932 patients who visited our memory clinic between 2009 and 2021 were introduced into the artificial intelligence (AI)‐based AD diagnostic model, which we had developed. Results The AI‐based AD diagnostic model used age, sex, Hasegawa's Dementia Scale‐Revised, the Mini‐Mental State Examination, the educational level, and the voxel‐based specific regional analysis system for Alzheimer's disease (VSRAD) score. It had a sensitivity, specificity, and c‐static value of 0.954, 0.453, and 0.819, respectively. The other AI‐based model that did not use the VSRAD had a sensitivity, specificity, and c‐static value of 0.940, 0.504, and 0.817, respectively. Discussion We created an AD diagnostic model with high sensitivity for AD diagnosis using only data acquired in daily clinical practice. By using these AI‐based models, nonspecialists could reduce missed diagnoses and contribute to the appropriate use of medical resources.

and (9) speech fluency. The total score is 30 points, with a higher score indicating better cognitive ability. The cutoff for dementia is 20 points. Many studies have reported that the sensitivity of the HDS-R (93%) is higher than that of the MMSE (82.8%), making it more suitable for AD screening. 8 The educational level was divided into eight groups: elementary school, junior high school, high school, vocational school, junior college, university, graduate school, and others, such as schools for persons with hearing or communication impairments. After developing the model using the seven variables described above, we developed an AI-based model without the VSRAD score, considering the low availability of the VSRAD score in rural areas.
We used the eXtreme Gradient Boosting (XGB) framework, Prediction One (Sony Network Communications, Inc.), to develop an AI-based AD diagnostic model using the variables described above from the training dataset. The rationale behind the use of "Prediction One" is that it automatically performs preprocessing such as missing value completion and variable normalization, does not require the tuning of hyperparameters, and can be easily used by non-AI engineers. After z-normalization for each variable, we used XGB to classify the five diagnoses. According to previous reports, generalizability was ensured by fivefold cross-validation. 9,10 Tuning was performed to increase the sensitivity. Precision, recall (same formula as sensitivity), F values, sensitivity, specificity, and c-statics were used for model evaluation in the test dataset. The weight of each variable was investigated to determine how it contributed to the model reaching a diagnosis. We then performed backward variable selection to seek the best prediction model with small numbers of variables and high performance. The features were removed in order of decreasing weight. We also developed a prediction model using variables chosen by stepwise selection. To compare the usefulness of the models, we investigated the c-statics (area under the curve; AUC) of the AI-based models, HDS-R, MMSE, VSRAD scores, and age for AD diagnosis.
The results are shown with mean ± standard deviation. We performed a t test for numerical variables and a chi-square test for categorical variables and compared AUCs. The analysis was performed with SPSS Statistics 28.0.0 (IBM) and R software (Version 4.1.2).
Significance was set at two-tailed P < 0.05.
The ethics committee approved the study design. Informed consent was obtained in the form of opt-out. All methods were performed according to the relevant guidelines and regulations of the Declaration of Helsinki.

| AI-based diagnostic model
The confusion matrix for the training and test data is shown in Tables 2 and 3. Table 2 shows the performance of Model 1 (consisting of seven variables), and Table 3 shows that of Model 2 (consisting of six variables exclusive of VSRAD).
Regarding Model 1, using seven variables in the training data, the precision, recall, F-value, sensitivity, and specificity for AD were 0.7466, 0.9403, 0.8320, 0.9403, and 0.5568, respectively. These values in the test data were 0.7256, 0.9544, 0.8240, 0.9544, 0.5039, respectively. The weights that contributed to diagnostic accuracy are listed in Table 4. The MMSE, HDS-R, and VSRAD scores contributed to AD diagnosis with high accuracy.
According to Model 2, using six variables exclusive of the VSRAD score in the training data, the precision, recall, F-value, sensitivity, and specificity for AD were 0.7382, 0.9357, 0.8253, 0.9357, and 0.5479, respectively. For the test data, these were 0.7419, 0.9407, 0.8230,0.9407, and 0.4531, respectively. The MMSE, the HDS-R, and age contributed to the diagnosis of AD with high accuracy ( Table 4).
The area under the curve of the AI-based model with the VSRAD and the AI-based model without the VSRAD, MMSE, HDS-R, VSRAD score, and age were 0.819, 0.817, 0.785, 0.801, 0.717, and 0.363, respectively ( Figure 1). We compared the AUCs of AI with or without VSRAD score to HDS-R alone. The P values were P = 0.0222 (AI with VSRAD score versus HDS-R), P = 0.0253 (AI without VSRAD score versus HDS-R), and P = 0.567 (AI with VSRAD score versus AI without VSRAD score), respectively. Therefore, the values of the AIbased models were higher than those of the other scales and tests.
We performed backward and stepwise variable selection to seek the best prediction model with small numbers of variables and high performance. Table S1 shows the AUCs of the models created with variable selection. The model with all variables had the largest AUC. in Japan, equivalent to the MMSE. 7 The results of the current study matched those of previous HDS-R studies on brain atrophy, which suggests a relationship between HDS-R results and hippocampal atrophy. 13 A recently shortened HDS-R was reported to be similar to the full HDS-R. 8 In this study, the AD diagnostic tool used seven items. These items matched the specialists' clinical abilities based on their years of experience. Moreover, the HDS-R, MMSE, VSRAD score, age, and educational level contributed to AD diagnosis in the same order. First, the HDS-R lacks cutoffs modified by the participants' age and educational level. Older age and lower educational levels usually increase the probability of developing AD. A previous study provided an appropriate cutoff, which was adjusted by age and educational level. 14 Our study might compensate for this weak point of the daily use of the HDS-R. Our results match those of a recent report describing the risk of low education. 15 Second, the VSRAD score showed an association with a clinical diagnosis of dementia. Therefore, most doctors use the VSRAD score to confirm a clinical diagnosis. The VSRAD score reflects TA B L E 2 Confusion matrix for training and test data using seven variables age-related atrophy of the brain during its developmental process. 6 Our study may indirectly ensure this process.

TA B L E 1 Patient characteristics
Moreover, this diagnostic model can be used in other clinical settings. Our tool contains common questions that can be obtained in routine clinical practice. Our AI-based model can enable patients living in rural areas with no dementia specialists and without head radiological imaging to be appropriately screened with accuracy so that the next steps can be decided. A significant problem in dementia management is the imbalance between the increasing number of patients and the limited number of doctors who are specialists in dementia. 5 It is not realistic to expect that all patients suspected of having dementia by their primary care doctors will be referred to dementia specialists. Access to specialists in dementia is difficult for people living in remote areas and islands. Additionally, patients with dementia who visited primary care doctors were reported to have an increased need for postdiagnostic support than those who visited specialists. 16 Although we tried to make an AI-based prediction model detect cases that convert from MCI to AD like eye-tracking technology, 17 our database was insufficient to develop such a model. Later, we developed the AD diagnostic tool to solve the current unmet needs. In the future, AI diagnostic support can result in enhancing the quality of dementia medicine and can be widely used as a communication tool between non-specialists and specialists. Uniting specialists and family physicians is an efficacious option to address these problems. 18 In Japan, having "dementia support doctors" who support both primary care doctors and dementia specialists has been encouraged as an alternative to increasing the number of dementia specialists alone. 19 To stabilize this movement, a diagnostic tool that can be used in the primary care setting is warranted. However, highly accurate AD diagnostic tools have not been reported to date.
In order to address this need for a highly accurate AD diagnostic tool, we developed our AI-based AD diagnostic model. This model has the potential to solve the problem of low levels of access to dementia experts in a population where the number of dementia patients is increasing. Furthermore, AI-based support can be used as a screening tool in telemedicine, which may suggest an appropriate TA B L E 3 Confusion matrix for training and test data using six variables without VSRAD  The limitation of this study was that we used no biomarkers to determine the pathological changes of AD. Although a pathologically accurate analysis is ideal, we thought that practical usefulness was more critical than pathologic confirmation of AD. Second, all participants were from a single hospital. This diagnostic model can be a reliable diagnostic tool, provided that it is validated in other cohorts.
Third, cranial imaging is required in some cases to exclude secondary dementia. However, clinicians, even nonspecialists in dementia, can usually identify these cases through appropriate history taking and physical examination. Fourth, clinicians should diagnose early-onset AD using a combination of methods rather than relying solely upon our diagnostic tools. Patients with early-onset AD often maintain their scores on neuropsychological tests.

| CON CLUS ION
This study created an AI-based tool to diagnose AD in situations with limited resources, bearing in mind that a simple diagnostic tool for MCI is warranted in the future to detect cognitive decline at an earlier stage.
We created an AD diagnostic model with high precision and recall (sensitivity) using only the items acquired during a consultation, which nonspecialists can implement quickly into their daily clinical practice as a practical diagnostic. Through the use of this diagnostic model, an AD diagnosis by nonspecialists may reduce misdiagnosis and contribute to the appropriate and timely use of medical resources.

AUTH O R CO NTR I B UTI O N S
Kazuki Fujita wrote the main manuscript text and edited the manu-

ACK N OWLED G M ENTS
None.

FU N D I N G I N FO R M ATI O N
The authors have no sources of funding to declare.

CO N FLI C T O F I NTE R E S T
The authors report no conflicts of interest for this study.

DATA AVA I L A B I L I T Y S TAT E M E N T
The data supporting this study's findings are available from the corresponding author upon reasonable request.

E TH I C S A PPROVA L
Ethics Committee issued approval 2021-11-1.

PATI E NT CO N S E NT
Informed consent was obtained in the form of opt-out.