ACSAuto-semi-automatic assessment of human vastus lateralis and rectus femoris cross-sectional area in ultrasound images

Open-access scripts to perform muscle anatomical cross-sectional area (ACSA) evaluation in ultrasound images are currently unavailable. This study presents a novel semi-automatic ImageJ script (named “ACSAuto”) for quantifying the ACSA of lower limb muscles. We compared manual ACSA measurements from 180 ultrasound scans of vastus lateralis (VL) and rectus femoris (RF) muscles to measurements assessed by the ACSAuto script. We investigated inter- and intra-investigator reliability of the script. Consecutive-pairwise intra-class correlations (ICC) and standard error of measurement (SEM) with 95% compatibility interval were calculated. Bland–Altman analyses were employed to test the agreement between measurements. Comparing manual and ACSAuto measurements, ICCs and SEMs ranged from 0.96 to 0.999 and 0.12 to 0.96 cm2 (1.2–5.9%) and mean bias was smaller than 0.5 cm2 (4.3%). Inter-investigator comparison revealed ICCs, SEMs and mean bias ranging from 0.85 to 0.999, 0.07 to 1.16 cm2 (0.9–7.6%) and − 0.16 to 0.66 cm2 (− 0.6 to 3.2%). Intra-investigator comparison revealed ICCs, SEMs and mean bias between 0.883–0.998, 0.07–0.93 cm2 (1.1–7.6%) and − 0.80 to 0.15 cm2 (− 3.4 to 1.8%). Image quality needs to be high for efficient and accurate ACSAuto analyses. Taken together, the ACSAuto script represents a reliable tool to measure RF and VL ACSA, is comparable to manual analysis and can reduce time needed to evaluate ultrasound images.


Methods
Program description. The ACSAuto script is based on the openly accessible Simple Muscle Architecture Analysis (SMA) macro developed by Seynnes and Cronin 14 . Our script consists of a single macro written in ImageJ 1.x macro language and runs in FIJI 23 . Two additional plugins (Canny Edge Detector 25 and Ridge detection 26 ) are needed to successfully run the script. The plugins can be installed the same way as the ACSAuto script (see supplementary material 2 'Installation guide'). Once the script is launched, a user interface opens where all relevant analysis parameters, such as image depth or muscle type, can be adjusted. When hovering the curser over a parameter, a short description is displayed. Either single images or whole folders containing image files can be processed. If a whole folder is processed (batch mode), input and output directories must be specified. So far, images can be processed in five different modalities. The "rectus femoris" and "vastus lateralis" modalities are designed for EFOV pictures containing only the respective muscle. The modalities "quad RF" and "quad VL" are designed for EFOV pictures containing both, RF and VL muscles, but only one muscle is measured. The modality "Quadriceps" measures the RF and VL ACSA in EFOV pictured containing both muscles. The script contains three different options to specify the outline-finder starting points (see section "Technical details"). Choosing the "manual" option, the user manually specifies one (RF) or three (VL) outline-finder starting points within the image. The "Fixed Pixels" option specifies the outline-finder starting points using hardcoded coordinates. Outline-finder starting points are estimated based on image width and height if the option "Automatic" is used. Image scaling is possible in two ways, but mandatory. If "automatic" is selected, the picture will be scaled automatically. If "manual" is selected, a line equally to the scanning depth needs to be drawn into the picture. The scanning depth must be specified. The analysis script is particularly designed for ultrasound pictures displaying the medial muscle border on the left and the lateral border on the bottom middle or right (Fig. 1a). If this is not the case, flipping options should be used. Otherwise "Fixed Pixels" and "Automatic" outline-finder strategies as well as the outline-finding process might fail. After all analysis parameters are specified, a dialogue window will appear asking for pre-processing settings (see section "Technical details"). Default values are modality-and muscle-dependent and are based on our sample pictures. During the analysis, three other dialogue windows appear. First, the user is asked to select and delete artefacts in the image. Secondly, if "Manual" outline-finder starting points was selected, the points must now be placed within the muscle. Last, the user is asked to adjust the region of interest. Manual adjustment of the selected muscle outline can be performed. The analysis results are displayed and accessible once an image is processed.
Instructional material. An instructional video for the program can be found here: https:// www. youtu be.
For a detailed written instruction on how to use the program see supplementary material 3.
Technical details. The active image needs to be automatically or manually scaled. For automatic scaling, the active image is duplicated, thresholded, a mask created by automatic particle counting is subtracted and convolution filtering is applied. Then the ridge-detection 26 plug-in searches for elongated objects of specific length within the image, for example a scaling line (Fig. 1b). Automatic scaling is only possible if some sort of scaling line is present in the image. The detected length range of automatic scaling is hardcoded based on our sample images.
In the first phase of the analysis process, the image is pre-processed using the earlier specified parameters (Fig. 1c). "Min Length Fac" describes a cut-off value relative to image width. If the length of an object is lower than this value, the object is removed from the image during pre-processing. The value of "Tubeness sigma" is used in the Tubeness 27 plugin and either less or more "tube-like" objects in the image are detected and enhanced. This is used to detect the muscle aponeuroses 14 (Fig. 1d). The "Gaussian sigma" value is used for smoothing of the image and applied in a convolution filter during pre-processing. The pre-processing steps are similar to the ones suggested by Seynnes and Cronin 14 . For further information, the reader is referred to this article. In the second analysis phase, a custom written function searches for the aponeuroses and measures the ACSA of the muscle. The function uses the defined outline-finder starting points and performs a scan along the line between the points. The scanning beams are orientated vertical, horizontal or circular, depending on the location of the outline-finder starting point and the selected muscle. If a pixel value is above the contrast threshold, the beam breaks and the coordinates of the pixel are saved. Then, the saved pixel coordinates are sorted clockwise to avoid www.nature.com/scientificreports/ overlap and then connected by generating a polygon (Fig. 1e). This step is optional and is only executed when ticked. The area of the polygon is measured representing the ACSA of the muscle (Fig. 1f).

Data collection.
The data used in this study was collected in 60 adolescent and adult high-level soccer players of both sexes [n = 46 males and 14 females, 17.8 years (14-25)]. One-hundred eighty ultrasound images (three per individual) were analysed. We used B-mode EFOV ultrasonography (ACUSON Juniper, SIEMENS www.nature.com/scientificreports/ Healthineers, Erlangen, Germany) with a 5.6 cm, linear-array probe (6.2-13.3 MHz, 12L3, Acuson 12L3) to assess the ACSA of RF and VL. We acquired pictures at rest while the participants laid in a supine position on their back. A guide was mounted to the leg in order to keep the same transversal path 19 . We took pictures of both muscles at 33 and 50% of the distance between the trochanter major and the lateral femur condyle. We acquired scans of both muscles either in the same picture or in two separate pictures, to test different modalities of the ACSAuto-script (see section "Program description"). Because of regional differences in muscle size and shape, 90 pictures from the proximal (33% of femur length) and 90 pictures from the mid (50% of femur length) region were included in the analysis. Therefore, 60 images were analysed per outline-finder starting point option. We measured ACSA of the RF and VL in all images using the ACSAuto script. We used the automatic scaling option during all measurements. Manual measurement of the ACSA of RF and VL was performed by an experienced investigator (investigator1) in FIJI 23 and served as comparison. Manual measurement consisted of digitising the ACSA of each muscle using the polygon tool. To test inter and intra-investigator reliability, investigator1 and investigator2 evaluated a subsample of 30 pictures from the mid region for every modality and outline-finder starting point option. Furthermore, we conducted "Freerun" trials on all pictures where the suggested muscle outline was not manually corrected. We did this to test whether manual correction of the outline decreases the error of the program. To investigate whether manual and automatic scaling options of the ACSAuto program yield similar results, we compared a subsample of 30 manually scaled pictures from the mid region of the RF to the respective automatically scaled images. The study was approved by the regional ethics committee (Ethikkommission Zentral-und Nordwestschweiz; Project ID: 2017-02148) and complied with the Declaration of Helsinki. Participants signed an informed written consent prior to the start of the study after receiving all relevant study information. If participants were under 18, a parent or legal guardian signed the informed written consent prior to the study after receiving all relevant information.
Statistics. All statistical analyses were performed in R software 28,29 (Base, BlandAltmanLeh and irr packages) and on an excel spreadsheet 28 . We compared ACSAuto measurements to manual measurements for all modalities and outline-finder starting point options. For this purpose, we calculated consecutive-pairwise intra-class correlations (ICC) and standard error of measurement (SEM) with 95% compatibility intervals (CI). Bland-Altman analysis 30 was used to test the agreement between two analysis methods. Limits of agreement were set to ± 1.96 standard deviations (SD). The standardized mean bias was calculated according to Hopkins 28 , with 0.1, 0.3, 0.6, 1.0 and 2.0 being small, moderate, large, very large and extremely large errors. Additionally, we examined inter-and intra-investigator reliability for a subsample of 90 pictures from the mid region by calculating ICCs and SEMs with 95% CI. We calculated minimal detectable change (MDC) as SEM × 1.96 × √ 2 . We applied Bland-Altman analysis to test the agreement between analysis methods. We categorized standardized mean bias as 0.1, 0.3, 0.6, 1.0 and 2.0 for extremely high, very high, high, moderate low reliability 28 .

Results
Comparison of area measurement between ACSAuto and manual analysis. ICCs, SEMs, mean bias and standardized mean bias with 95% CI for all muscles and modes comparing manual to ACSAuto measurements are shown in Table 1 and Fig. 2. The analysis of "rectus femoris" and "vastus lateralis" across all outlinefinder options showed the highest ICCs and lowest SEMs ranging from 0.982 to 0.998 and 0.10 to 0.74 cm 2 respectively. Mean bias and standardized mean bias ranged from − 0.34 to 0.37 cm 2 and − 0.12 to 0.17 SDs, resulting in small measurement errors. For the "quad RF" and "quad VL" modalities, ICCs were between 0.948 and 0.996 and SEMs between 0.29 and 0.96 cm 2 . Mean bias and standardized mean bias were between − 0.21 to − 0.08 cm 2 and − 0.11 to 0.03 SDs, resulting in small measurement errors. ICCs for both muscles ranged from 0.947 to 0.996, SEMs from 0.27 to 0.94 cm 2 , mean bias from − 0.46 to 0.06 cm 2 and standardized mean bias from -0.17 to 0.07 SDs, resulting in small errors for the "Quadriceps" modality. The "Freerun" modality resulted in ICCs from − 0.045 to 0.683 and SEMs from 1.84 to 5.23 cm 2 . Mean bias ranged from − 1.66 to 3.20 cm 2 and standardized mean bias from − 0.66 to 1.57, demonstrating very large errors. Overall, RF measurements displayed higher ICCs and lower SEMs, mean bias and standardized mean bias compared to VL measurements (Table 1; Fig. 2). We found no obvious differences for ICC, SEM, mean bias and standardized mean bias in "Manual", "Fixed Pixel" and "Automatic" options to define outline-finder starting points ( Table 1). As shown in Table 1, we found negligible differences when comparing manual (ManRF) to automatic scaling options of the ACSAuto program. However, it seems that most analyses using the ACSAuto plugin yielded slightly smaller values compared to manual analysis (Table 1). Except for the "Quadriceps RF" modality using "Manual" outlinefinder starting points, measured bias was not proportional to averaged values. This indicates that most manual and ACSAuto analyses agree equally throughout the measurement range (Fig. 2).
Reliability of ACSAuto program. ICCs, SEMs, MDCs, mean bias and standardized mean bias with 95% CI for inter-rater comparisons are shown in Table 2 and Fig. 3. Inter-investigator comparison revealed ICCs, SEMs and MDCs ranging from 0.85 to 0.999, 0.07 to 1.16 cm 2 and 0.22 to 2.4 cm 2 respectively. Mean bias ranged from − 0.16 to 0.66 cm 2 with standardized mean bias ranging from − 0.16 to 0.33, showing extremely high to high reliability. ICCs, SEMs, MDCs and standardized mean bias with 95% CI for intra-investigator comparisons are shown in Table 3. Results of Bland-Altman analysis can be seen in Fig. 4 and Table 3 www.nature.com/scientificreports/ all outline-finder starting point options (see Tables 2, 3). Solely the "Quadriceps RF" modality using "Manual" outline-finder starting points raised concerns about homoscedasticity, because differences seem to be proportional to average value for inter and intra-investigator comparisons.

Discussion
We investigated the comparability and reliability of a novel semi-automatic tool to measure ACSA in EFOV ultrasound images of the RF and VL. Our results demonstrate very good agreement and small errors between manual and ACSAuto analysis, with mean bias and standardized mean bias smaller than 0.5 cm 2 (4.3%) and 0.2 SDs respectively. Inter-and intra-investigator agreement was very good showing high reliability, with mean bias and standardized mean bias smaller than 1.0 cm 2 (3.4%) and 0.4 SDs respectively. RF analyses yielded better results compared to VL analysis across all modalities and outline-finder options.
The "rectus femoris" and "vastus lateralis" modes were found to have the highest agreement and ICCs and lowest SEMs compared to manual analysis. Yet, standardized mean bias was slightly higher compared to the other modalities. Usually, image quality increases when images are acquired separately. As the length of the EFOV image in total is shorter, the muscle is displayed proportionally larger. Images with muscles displayed proportionally larger enable the user to better recognize the borders between muscle tissue and aponeurosis, increasing the accuracy of ACSA measurement. We implemented a zoom function in the script counteracting this issue. "Quadriceps", "quad RF" and "quad VL" modalities showed slightly lower agreement and ICCs and Table 1. Intra-class correlation (ICC), standard error of measurement (SEM) and standardized mean bias with 95% compatibility interval. Mean bias with limits of agreement set to ± 1.96 standard deviations (SD). Values for SEM and mean bias are displayed in cm 2 with standardized mean bias being displayed in SDs. All values calculated for m. rectus femoris (RF) and m. vastus lateralis (VL) comparing ACSAuto to manual measurements. "Quadriceps" (Qa), "Quad" (Q) and separate image modes were used. "Manual" (M), "Fixed pixels" (F) and "Automatic" (A) outline-finder starting points. Freerun (fre) and manual scaling (ManRF) trials were compared as well. www.nature.com/scientificreports/ The differences between measurements are plotted against measurement means. Dotted and solid lines illustrate 95% limits of agreement and bias. During the "Quadriceps" mode both muscles are analysed in one image. During the "Quad" mode, only one muscle is evaluated per image even though both muscles are displayed. During RF and VL modalities, both muscles were analysed in separate pictures. "Freerun" describes a trial where the suggested outlines were not manually corrected. Table 2. Intra-class correlation (ICC) standard error of measurement (SEM) and standardized mean difference with 95% compatibility interval. Mean bias with limits of agreement set to ± 1.96 standard deviations and minimal detectable change (MDC). Values are displayed in cm 2 with standardized mean bias being displayed in SDs. All values calculated for m. rectus femoris (RF) and m. vastus lateralis (VL) comparing investigator1 and investigator2. "Manual" (M), "Fixed pixels" (F) and "Automatic" (A) outline-finder starting points. "Quadriceps" (Qa), "Quad" (Q) and separate image modes were used. www.nature.com/scientificreports/ higher SEMs when compared to manual measurement, whereas standardized mean bias were slightly lower than for "rectus femoris" and "vastus lateralis" modalities. Because sufficient contrast of muscle tissue and aponeurosis during single sweep images is more difficult to maintain, the outline-finding process might fail due to insufficient contrast and manual correction is needed. In a practical setting however, reduced amounts of images to acquire would be beneficial. Therefore, the contrast between muscle and aponeurosis tissue must be ample and aponeuroses clearly visible. This leads to improved detection of aponeuroses and outlines, thereby reducing amount, complexity and time of manual outline correction. Comparing the "Freerun" to manual evaluation resulted in low agreement and large errors between measurements. We observed mean bias and standardized mean bias up to 3.2 cm 2 and 1.17 SDs respectively. Low ICCs and high SEMs demonstrate the necessity of manual correction. This is important because outline-finding is dependent on image quality and therefore the expertise of the operator. Images of low quality will require more manual correction of the automatic outline-finding and thus increasing the subjective interpretation. As stated by Sennes and Cronin 14 , the detection of aponeuroses relies heavily on homogeneity of grey values and sufficient contrast. This might explain why RF measurements showed better agreement and reliability, as high image quality is easier to achieve due to the shape of the muscle.
The ACSAuto script seems to be reliable between and within investigators. Inter-and intra-investigator comparison revealed very good agreement and high to extremely high reliability for all modalities and muscles. Conversely, "rectus femoris" and "vastus lateralis" modalities showed highest mean bias and standardized mean bias, but lowest SEM. The "rectus femoris" modality resulted in lowest MDCs for RF between and within investigators. The "Quad VL" modality yielded lowest MDCs within investigators, whereas "vastus lateralis" modality resulted in lowest MDCs between investigators.
For comparison of ACSAuto measurement to manual as well as inter-and intra-investigator reliability measured bias was not proportional to averaged values, except for "Quadriceps RF" modality using "Manual" outlinefinder starting points. This could be due to inferior image quality at the proximal scanning site for RF images. ACSA of the RF is larger at this site, leading to mean bias increases proportional to measurement means.
Recent randomized controlled trials reported RF ACSA adaptations between − 0.2 and 1.7 cm 2 (− 2.9 and 18.5%) for six to fourteen weeks of training 18,20,21 . Reported VL ACSA increases ranged from 1.2 to 5.0 cm 2 (7.4-17.1%) following six to ten weeks of training 18,31,21,32 . Adaptations of RF are rather small and thus these adaptations might be hardly detectable because the MDC values for ACSAuto analyses were between 0.22 and 1.04 cm 2 . In contrast to that, adaptations of VL are large and likely good to evaluate with the ACSAuto plugin Bland-Altman plots of all modes using "Manual" outline-finder option comparing measurements of investigator1 to measurement of investigator2. M. rectus femoris (RF) and m. vastus lateralis (VL). The differences between measurements are plotted against measurement means. Dotted and solid lines illustrate 95% limits of agreement and bias. During the "Quadriceps" modalities both muscles are analysed in one image. During the "Quad" modalities, only one muscle is evaluated per image even though both muscles are displayed. During RF and VL modalities, both muscles were analysed in separate pictures. www.nature.com/scientificreports/ because MDC values were between 0.98 and 2.4 cm 2 . The inability to certainly detect small changes in the ACSA of a muscle following resistance training, is however unlikely due to errors in the ACSAuto script but potentially due to the variability of ultrasound measurements and manual evaluations in general 8,18,31 .
Although we found no differences between outline-finder starting point options, we advise users to apply the "Manual" option. As muscles are highly variable in their anatomical shape, options using pre-defined starting points might yield inferior outline detection. Comparing the "manual" and "automatic" scaling, we found high agreement and small errors between measurements, with mean bias equal to 0.17 cm 2 and standardized mean bias smaller than 0.1 SDs. In this regard, automatic scaling can be used without compromising the accuracy of the measurement. Thereby, time effort and subjective influence by investigators can be reduced. When conducting ACSAuto analyses, time saving was higher for an experienced investigator than for an inexperienced investigator. The time saved was on average 3 min per 10 images using "Manual" outline-finder starting points for all modes except the "Quadriceps" mode. Image analysis using the "Quadriceps" mode took the same time as manual evaluation. In general, time saving was less for other outline-finder starting point options and is highly dependent on image quality.
In contrast to the fully automated TRAMA-algorithm developed by Salvi et al. 17 , our script allows for the evaluation of EFOV pictures. The TRAMA-algorithm 17 is designed to measure the visible ACSA in static ultrasound images of several lower limb muscles. While this technique seems to be able to detect changes in muscle size and responses to musculoskeletal training 33,34 , ACSA measurements in muscles exceeding the field of view of the ultrasound probe might be limited in meaningfulness. Chen et al. 15 recently demonstrated an automatic ACSA segmenting algorithm using a deep learning model. Deep learning is a type of machine learning that uses a deep neural network 15 . The algorithm segments the ACSA of the RF in ultrasound images and test images were recorded during contraction of the muscle 15 . Deep learning rapidly turns out to be the state-of-the-art in medical image analysis 35 , and might be more powerful than the ACSAuto script proposed here. However, the algorithm of Chen et al. 15 is only able to segment the ACSA of the RF and is limited in transferability to ultrasound images taken at rest. Other than dynamic ultrasound imaging during movements, most investigations record images in resting participants. In addition to that, none of the abovementioned articles supply information on how to implement the program for common use. Yet, this might be important to increase comparability among investigations assessing ACSA of lower limb muscles. Table 3. Intra-class correlation (ICC) standard error of measurement (SEM) and standardized mean difference with 95% compatibility interval. Mean bias with limits of agreement set to ± 1.96 standard deviations and minimal detectable change (MDC). Values are displayed in cm 2 with standardized mean bias being displayed in SDs. All values calculated for m. rectus femoris (RF) and m. vastus lateralis (VL) comparing two measurements of investigator 1. "Manual" (M), "Fixed pixels" (F) and "Automatic" (A) outline-finder starting points. "Quadriceps" (Qa), "Quad" (Q) and separate image modes were used. www.nature.com/scientificreports/

Limitations
The following limitations of the ACSAuto script need to be mentioned. We compared the ACSAuto measurements to manual evaluation of ultrasound and not MRI images. The analysis is semiautomatic and therefore relies on subjective processing of the images. This limits objectivity and comparability among investigators [15][16][17] .
In addition, we did not assess the between-day reliability and precision of our ultrasound measurements. So far, we only investigated EFOV ultrasound images of the RF and VL from highly trained individuals. Highly trained individuals have more muscle mass and less intramuscular fat than untrained persons, which might limit the reliability and comparability in other cohorts. Generally, ACSA measurements using the ACSAuto algorithm might be applicable for every muscle. Therefore the image quality (homogeneity of grey values and contrast 14 ) must be high and outline-finding must be set to "Manual". Some of the analysis parameters are hardcoded and most are based on our sample images. Not all these parameters can be adjusted without changing the script, limiting the robustness of the algorithm.

Conclusion
In conclusion, we developed a reliable novel tool to assess the ACSA of RF and VL muscles that is comparable to manual analysis. Our results show, that ACSA measurement using the "rectus femoris" and "vastus lateralis" modalities yielded the best results. Additionally, the time effort needed for ACSA measurement can be reduced when using the ACSAuto script. Although semiautomatic, the ACSAuto script is free and openly accessible and can therefore partially reduce variability induced by manual analysis. In future investigations, more muscles need to be evaluated and the applicability of a deep learning model should be tested.

Data availability
The ACSAuto script, the dataset used for analysis, an installation guide and additional information for improved usage for the ACSAuto script are included in the supplementary information files. These materials are also available on github in the ACSAuto repository and can be accessed using the following link: https:// github. com/ PaulR itsche/ ACSAu to.

Figure 4.
Bland-Altman plots of all modes using "Manual" outline-finder option comparing two measurements of investigator1. M. rectus femoris (RF) and m. vastus lateralis (VL). The differences between measurements are plotted against measurement means. Dotted and solid lines illustrate 95% limits of agreement and bias. During the "Quadriceps" modalities both muscles are analysed in one image. During the "Quad" modalities, only one muscle is evaluated per image even though both muscles are displayed. During RF and VL modalities, both muscles were analysed in separate pictures.