Falls are one of the leading causes of injury and accidental deaths in adults over the age of 65  . The total associated medical costs in 2015 reached $31.3 billion  , and are expected to rise exponentially by 2020 . Improved health and decreased healthcare costs rely on early detection and intervention to prevent falls in older adults . Identification of risk factors, record of history of falls, and determination of gait or balance impairments are strong predictors of future falls  . While the World Health Organization suggests that fall prevention strategies should include fall screening within living environments, measurement of fall risk typically occurs at a hospital after a health event, or at the primary care physician office during an annual wellness visit   .
There are a number of readily available basic screens and assessment tools that identify adults’ fall risk, however, they rely on administration and observation by professionals in clinical settings and are unavailable as home-based self-screens. The Fall Risk Questionnaire (FRQ), available through downloading from the Centers for Disease Control and Prevention (CDC) Stopping Elderly Accidents, Deaths, and Injuries (STEADI) materials for older adults, is a validated 12-item self-assessment screening instrument aimed to increase older adults’ awareness of their risk for falls and to facilitate conversations about falls with healthcare providers, family, and friends . In addition to questionnaires, safe objective assessments such as the One Leg Stance Test (OLST), also known as Single Limb Stance Test, and 30-Second Chair Stand Test (CST) can predict the risk of future falls by measuring balance and lower extremity strength  .
Individuals at risk of falling need to engage with their healthcare system prior to the first fall  or hospital stay. As healthcare evolves into a partnership model, individuals are also encouraged to be more accountable for their health and more involved in their care . Technology has the potential to help older adults monitor risk factors and manage health conditions outside clinical settings . For instance, self-monitoring of blood pressure by older adults at their home environment has been positively associated with better outcomes related to the control of hypertension and commonly included in hypertension management . Similarly, self-monitoring of blood glucose by adults with diabetes has been linked to better glucose regulation .
Blue Marble Health Company created the Health in Motion© falls screening tool accessible via an Android tablet or a computer with the Microsoft Kinect Sensor which uses a motion capturing camera, microphones, and Windows software to recognize the user’s movements . The Health in Motion software consists of automated versions of the Fall Risk Questionnaire (FRQ), One Leg Stance Test (OLST), and 30-Second Chair Stand Test (CST). Preliminary research is needed to determine how well these automated versions compare to the associated clinic-based gold standard assessments. Therefore, the purpose of this study is to measure the concurrent validity, absolute reliability, and relative reliability of the FRQ, OLST and CST when performed using a guided, digital software program, Health in Motion© (Blue Marble Health Company, Altadena, CA) that enables both self-report (SELF) via Microsoft Surface digital tablet and sensor-based assessments (SENSOR) via the Microsoft Kinect Sensor compared with a clinical standard measurement using stopwatch and observation (CLINICAL) for individuals aged 60 and older.
Snowball sampling methods were used to recruit 15 community-dwelling adults to participate in this study. Eligibility criteria included an age of 60 years or older, medically stable with no medical condition(s) that would affect their ability to perform the balance assessments, ability to speak/read English fluently, had less than 2 falls in the past 12 months. Exclusion criteria included any mental or physical condition that would interfere with participation, history of seizure activity when watching TV or when playing video games, inability to learn to use the technology safely without direct supervision, and insufficient clear, safe exercise space in the home with internet access. Informed consent was reviewed and signed by participants. The study was approved by Alpha Independent Review Board (http://www.alphairb.com/).
Falls risk questionnaire. The FRQ  is comprised of 12 questions that ask about a person’s history of falls and about conditions that could affect fall risk. For the CLINICAL version, the participant marks “Agree or Disagree” on paper; for the SELF version, the participant taps “Agree or Disagree” on a mobile tablet; and for the SENSOR version, the participant uses a lateral reach gesture to select an on-screen “Agree or Disagree” button. The participant responds to this questionnaire once per version (CLINICAL, SELF, SENSOR). The FRQ has established validity and has a reliability coefficient alpha of 0.746 . Scores greater than 4 indicate a higher risk of falls .
One Leg Stance Test. For CLINICAL version, the participant stands with their eyes open and arms to the side. At the “Go” command, the participant lifts one leg off the floor and the participant stands on the weight bearing leg without assistance for as long as they can without putting their foot down on the floor or until they reach 30 seconds. The best of three times was recorded for further analysis . For the SELF version, participants press the on-screen “Go” button simultaneously with lifting their foot (Figure 1). They press the on-screen “Stop” button when they put their foot back on the floor. For the SENSOR version, the Kinect sensor automatically tracks the foot as it is lifted off the floor and when it is placed back on the floor (Figure 2). Participants are given three attempts for each leg. For this study we used the maximum number of seconds the foot was lifted off the floor among the three trials for analysis. Normative data has been established  and scores less than 5 seconds indicate higher risk of falls . Test-re-test reliability, and internal consistency have not been established. The OLST has good inter/intra rater reliability with ICC ranges of 0.95 - 0.99, within raters ranged from ICC 0.73 - 0.93  for eyes-open best of 3 trials, ICC was 0.99 .
Figure 1. OLST tablet version.
Figure 2. OLST Kinect version.
30-Second Chair Stand Test. The participant is asked to perform their maximum number of sit-to-stand repetitions in 30 seconds. When used clinically, the gold-standard assessment requires one trial, however for the purposes of this study, the participants were given 3 attempts; the mean and the maximum number of repetitions were used in this study. A standard chair height is used and the participant is instructed to stand and sit down as quickly and safely as they can. For the clinical version, the clinician observes and records the number of attempts completed in 30 seconds. For the SELF version, a timer appears on the screen, the participant taps the “Start” button and then taps the on-screen “+1” button each time they stand up (Figure 3). The participant can also keep track of their score “in their mind” and input the number of sit-stands after the 30-second timer runs out. For the SENSOR version, the Kinect Sensor automatically tracks the number of times the participant stands up in 30 seconds (Figure 4). The CST has been found to demonstrate excellent test-retest reliability: r = 0.89 with 95% confidence interval (0.79 - 0.93) and interrater reliability: r = 0.95 with 95% confidence interval (0.84 - 0.97) . Norms have been established for community dwelling older adults . The CST has demonstrated excellent criterion validity   and excellent construct validity with the 50 ft. walk test .
Each participant performed the FRQ, CST, OLST, with each of the 3 versions (e.g. CLINICAL, SELF, SENSOR) with a licensed occupational therapist for clinical measurement or qualified research assistant to provide stand-by assistance in
Figure 3. CST tablet version.
Figure 4. CST Kinect version.
the event a participant lost their balance. Each participant completed each version three times in random order for a total of 9 trials. Simple randomization of version assignment was used to avoid order affects, whereas each consecutive participant started the trials at a different station. Rest breaks were provided any time the participant requested and for at least five minutes after each assessment version. Each test session took an average of two hours from consent to completion of the ninth trial.
2.4. Statistical Analysis
A power analysis (G-Power) indicated that in order to find a 0.6 correlation with a β = 0.8 and α = 0.05, 15 subjects were required. Statistical analyses were conducted using IBM SPSS Statistics 23 (IBM Corporation, USA) and R 3.2.4 for Windows (Microsoft, USA). To evaluate concurrent validity, the relationship between the best score of the three trials for each outcome for the SELF and SENSOR platforms with the CLINICAL standard was assessed using the Pearson product moment correlation or the Spearman’s rank order correlation. The Spearman’s coefficient was selected for data violating the normality assumption. The FRQ was only assessed once per platform thereby the only data considered for calculation. According to Bishop  a correlation of 0.8 to 1.0 is very strong, 0.6 to 0.79 is strong, 0.4 to 0.59 is moderate, 0.20 to 0.39 is weak, and smaller than 0.20 is very weak. The coefficient of determination (r2) was used to indicate the percent of total variance shared. Significant findings were followed up with dependent t-tests or Wilcoxon signed rank tests for non-normal distributed data to check for systematic differences between platforms. Lastly, the intra-class coefficient (ICC) was used to accommodate for the limitations of using correlation coefficients as the sole indicator . The ICC was estimated using information from two-way ANOVA tests:
whereas BMS is between targets mean square, EMS is residual mean square, k is number of raters . According to Portney and Watkins  an ICC of more than 0.75 is interpreted as good reliability, values between 0.5 and 0.75 as moderate reliability and less than 0.5 as poor reliability. The 95% confidence intervals for all ICCs were calculated. A bootstrap sampling distribution of the ICC based on 1000 bootstrap replications was applied to the ICC for non-normal distributions.
Relative reliability was assessed for parametric data for each outcome using the Pearson product moment correlation or Spearman’s rank order correlation for non-normally distributed data. Significant correlations were followed up with repeated measures analysis of variance (ANOVA) tests or Friedman tests for non-normally distributed data to check for systematic differences between trials. Using information from two-way ANOVA tests results, the ICC (3, 1) was determined. A bootstrap sampling distribution of the ICC based on 1000 bootstrap replications was applied to the ICC for non-normal distributions. Absolute reliability was determined for each outcome measure within each platform by calculating the Standard Error of the Measurement (SEM) and the Minimal Detectable Change at 95% confidence level (MDC95). The SEM was calculated as , where MSE = Mean Square Error and .
Fifteen Caucasian, non-working or retired adults (10 female) aged 63 - 80 (mean 70.67, SD = 5.35) participated in this study. One participant completed only seven of the nine trials due to frustration with the instruments and personal performance. One participant’s performance on one trial of the SELF condition was not captured due to equipment malfunction/researcher error. Since participants were asked to complete 9 trials of the CST and OLST, fatigue was evaluated using repeated measures ANOVA and Friedman tests. No significant differences were found from the first 3 trials and the last 3 trials.
3.2. Clinical Standard
Concurrent validity values are presented in Table 1 and relative and absolute reliability values are presented in Table 2. For the gold standard clinical version, statistically significant very strong positive correlations were found between trials of the CST confirmed well by ICC. A follow-up ANOVA did not reveal any significant differences among trials. Relative and absolute reliability were also
Table 1. Concurrent validity of SENSOR and SELF digital versions with the CLINICAL standard.
a. *p < 0.05; **p < 0.001; (rS) = Spearman rank correlation; nd = not determined due to small sample size and/or high level of agreement. ICC = intra-class coefficient; CI= confidence interval; SEM = standard error of measurement; MDC = minimal detectable change. FRQ = Falls Risk Questionnaire; CST = 30-Second Chair Stand Test; OLST-R/L = One Leg Stance Test-bearing weight through right or left leg.
Table 2. Relative and absolute reliability of CLINICAL, SENSOR, and SELF versions.
a. *p < 0.05; **p < 0.001; (rS) = Spearman rank correlation. ICC = intra-class coefficient; CI= confidence interval. b = Bias corrected accelerated interval; FRQ = Falls Risk Questionnaire; CST = 30 Second Sit to Stand Test; OLST-R/L = One Leg Stance Test-bearing weight through right or left leg.
confirmed for the clinical version of the OLST, whereas a statistically significant positive correlation between trials was found and confirmed by ICC. A Friedman’s test revealed significant differences among trials for OLST-R, X2 (2) = 6.780, p = 0.034. Post hoc analysis with Wilcoxon signed-rank tests were conducted and there were no significant differences between Trial 1 and 2 or between Trial 2 and 3. There was a significant difference with a large effect found between Trial 1 and 3 (Z = −2.096, p = 0.036, r = 0.54). A Friedman’s test did not reveal significant differences among trials for OLST-L.
3.3. Sensor (Kinect Version)
To measure concurrent validity for the FRQ, the Spearman’s rank-order correlation revealed a statistically significant very strong positive correlation, confirmed well by ICC. A follow-up Wilcoxon signed rank test did not reveal significant differences between the two variables. Consistency of results from the best of three trials for CST compared to the clinical standard revealed a statistically significant strong positive correlation confirmed well by ICC. Relative and absolute reliability was measured via Spearman rank-order correlation which revealed statistically significant positive correlations between Trial 1 and Trial 2 and Trial 2 and Trial 3. A Friedman’s test to analyze the difference between trials 1-3 did not reveal significant findings. Consistency of results from the best of three trials for OLST-R and OLST-L using Spearman Rho revealed statistically significant strong positive correlations confirmed by ICC. A follow-up Wilcoxon signed rank test did not reveal significant differences between variables. For absolute and relative reliability, the Spearman rank-order correlation for OLST-R did not reveal a statistically significant correlation between Trial 1 and 2, however, a statistically significant weak positive correlation between Trial 1 and 3 was found. A moderate positive correlation was found between Trial 2 and 3 and confirmed by ICC. A Friedman’s test revealed significant differences among trials, X2 (2) = 1.395, p < 0.001. Post hoc analysis with Wilcoxon signed-rank tests were conducted and there was a significant difference with a large effect found between Trial 1 and 2 (Z = −2.133, p = 0.033, r = 0.55) and a significant difference with a very large effect found between Trial 1 and 3 (Z = −2.936, p = 0.003, r = 0.79). For OLST-L, the Spearman rank-order correlation revealed statistically significant moderate to strong positive correlations between trials confirmed by ICC. A Friedman’s test revealed significant differences among trials, X2 (2) = 12.5, p = 0.002. Post hoc analysis with Wilcoxon signed-rank tests were conducted and there was a significant difference found between Trial 2 and 3 (Z = −2.201, p = 0.028, r = 0.59), and Trial 1 and 3 (Z = −2.654, p = 0.008, r = 0.71).
3.4. SELF (Tablet Version)
For concurrent validity of the FRQ, Spearman’s rank-order correlation revealed a statistically significant very strong positive correlation which was confirmed well by ICC. A follow-up Wilcoxon signed rank test did not reveal significant differences between the two variables. For the CST, there was a statistically significant very strong positive correlation confirmed well by ICC. For relative and absolute reliability of the CST, statistically significant very strong to strong positive correlations were found between trials. A follow-up repeated measures ANOVA with Greenhouse-Geisser correction determined scores differed statistically significantly between trials (F (1.428, 18.56) = 15.028, p < 0.001). Post hoc tests using the Bonferroni correction revealed significant differences between Trial 1 and 2 (10.79 ± 3.70 vs 13 ± 3.8, p = 0.001) and Trial 1 and 3 (10.79 ± 3.70 vs 13.36 ± 4.14, p = 0.005). For concurrent validity of OLST, there was a statistically significant weak positive correlation for OLST-R and significant moderate positive correlation for OLST-L, both confirmed by ICC. A follow-up Wilcoxon signed rank test did not reveal significant differences between variables. For relative and absolute reliability of OLST-R, a statistically significant strong positive correlation between Trial 1 and 2 and Trial 1 and 3 and weak positive correlation between Trial 2 and 3 was found and confirmed by ICC. A Friedman’s test did not reveal significant differences among trials. For OLST-L, a statistically significant moderate to strong positive correlation was found between trials confirmed by ICC. A statistically significant strong positive correlation was found between Trial 1 and 3. A Friedman’s test did not reveal significant differences among trials.
The purpose of this study was to evaluate the use of a commercially available software-based falls risk self-assessment tool to accurately measure falls risk in an effort to increase an individual’s awareness and thus motivate their engagement with their healthcare system prior to the first fall. The results of this study indicate that digital versions of common falls risk assessment tools can be consistently administered by community dwelling healthy adults over the age of 60. Since identification of balance impairments, history of falls and depression are strong predictors of future falls, tools such as Health in Motion are important advancements in the delivery of in-home self-assessments. The added benefit of capturing falls risk information digitally is that changes in falls risk can be tracked over time. Currently, seniors can download a paper version of the Fall Risk Questionnaire from the CDC’s website, however, they are unable to easily track changes in their responses over time.
Older adults want to exercise control over their own lives and prefer home-based interventions . By enabling seniors to self-assess their falls risk at home, similar to a widely accepted and effective practice of self-monitoring blood pressure   and blood glucose  , tools such as Health in Motion may enable seniors to become more accountable for their health and more motivated to seek help prior to a potentially devastating fall. Tools such as those used in this study are in strong alignment with the World Health Organization’s recommendation that falls prevention strategies should include fall screens in the home by making the previously clinic-only assessments available in the home.
This preliminary study conducted with a small homogenous sample which was Caucasian and predominantly female limits application to a wider population. Based on the results of this study and others, the Blue Marble team has added new content and features to address usability and expand the self-assessment test battery offerings. As new content and features are added, Health in Motion© is further validated. Today, Health in Motion© is commercially available at http://www.bluemarblehealthco.com/. Ongoing research continues to validate Health in Motion© with larger diverse populations, assesses acceptance and usability, and incorporates remote observation. The key to reducing the healthcare burden of fall-related injuries among older adults is to increase awareness of fall risks to encourage at-risk individuals to utilize the healthcare system prior to their first fall. The results from this preliminary study support the use of these technology-based objective falls risk screening tools. The Health in Motion© falls screening tool contains valid and reliable measures for identifying potential fall risks and can easily be used by older adults in the home.
Financial support was provided by the National Institute on Aging of the National Institutes of Health under Award Numbers (1R43AG040873-01, 1R43AG043191, 1R43AG047698, 2R44AG043191, 5R44AG043191-03 all to S.F.). The research content herein is solely the responsibility of the authors and does not necessarily represent the official views of the NIH. The authors also acknowledge Dr. Nadjib Bouzar for assistance with bootstrap analysis and Claire Strock, Karlie Lucas, Nicole Helmers, and Dale Walker for their assistance with the project.
Conflict of Interest
Dr. Sheryl Flynn is the co-founder and CEO of Blue Marble Health Company and could benefit financially from the sale of the product described in this paper. In an effort to avoid bias, participant recruitment, data collection, and data analysis was completed by the first author (BAW). All other co-authors do not have a conflict of interest, financial or otherwise, that would inappropriately influence or bias the research reported herein.