-
PDF
- Split View
-
Views
-
Cite
Cite
Johanna Spangler, Marc Mitjans, Ashley Collimore, Aysha Gomes-Pires, David M Levine, Roberto Tron, Louis N Awad, Automation of Functional Mobility Assessments at Home Using a Multimodal Sensor System Integrating Inertial Measurement Units and Computer Vision (IMU-Vision), Physical Therapy, Volume 104, Issue 2, February 2024, pzad184, https://doi.org/10.1093/ptj/pzad184
- Share Icon Share
Abstract
Functional movement assessments are routinely used to evaluate and track changes in mobility. The objective of this study was to evaluate a multimodal movement monitoring system developed for autonomous, home-based, functional movement assessment.
Fifty frail and prefrail adults were recruited from the Brigham and Women’s Hospital at Home program to evaluate the feasibility and accuracy of applying the multimodal movement monitoring system to autonomously recognize and score functional activities collected in the home. Study subjects completed sit-to-stand, standing balance (Romberg, semitandem, and tandem), and walking test activities in likeness to the Short Physical Performance Battery. Test activities were identified and scored manually and by the multimodal movement monitoring system’s activity recognition and scoring algorithms, which were previously trained on lab-based biomechanical data to integrate wearable inertial measurement unit (IMU) and external red–blue–green-depth vision data. Feasibility was quantified as the proportion of completed tests that were analyzable. Accuracy was quantified as the degree of agreement between the actual and system-identified activities. In an exploratory analysis of a subset of functional activity data, the accuracy of a preliminary activity-scoring algorithm was also evaluated.
Activity recognition by the IMU-vision system had good feasibility and high accuracy. Of 271 test activities collected in the home, 217 (80%) were analyzable by the activity-recognition algorithm, which overall correctly identified 206 (95%) of the analyzable activities: 100% of walking, 97% of balance, and 82% of sit-to-stand activities (χ2(2) = 19.9). In the subset of 152 tests suitable for activity scoring, automatic and manual scores showed substantial agreement (Kw = 0.76 [0.69, 0.83]).
Autonomous recognition and scoring of home-based functional activities is enabled by a multimodal movement monitoring system that integrates inertial measurement unit and vision data. Further algorithm training with ecologically valid data and a kitted system that is independently usable by patients are needed before fully autonomous, functional movement assessment is realizable.
Functional movement assessments that can be administered in the home without a clinician present have the potential to democratize these evaluations and improve care access.
Introduction
Older adults are the fastest growing demographic in the USA.1 Along with increased age comes increased risk for frailty, mobility decline, and fall-related hospitalization and mortality.2–4 Frailty is a clinical syndrome with multisystem involvement that is characterized by decreased metabolic reserves and resistance to stressors, resulting in an increased risk for adverse clinical outcomes.2 Frailty is an independent predictor for falls, hospitalization, progressive disability, and all-cause mortality, and it is clinically defined by the presence of 3 of the following: unintentional weight loss, self-reported exhaustion, grip weakness, slow walking speed, or low physical activity.2 Identifying mobility decline at an early stage may provide intervention opportunities to prevent injurious falls and mitigate the onset of frailty.
In older adults, single-activity functional movement assessments, such as tests of walking speed and balance ability, can provide objective markers of emerging mobility decline.5,6 Building on single-activity assessments, multiactivity standardized assessments, such as the Short Performance Physical Battery (SPPB),7 harness performance scores across multiple functional activities to provide an objective and generalizable functional outcome score. SPPB scores independently predict hospitalization8 and mortality.9 Notably, SPPB scores can identify the early signs of frailty in the absence of a walking speed decline,10 allowing for an early diagnosis and targeted intervention at the very beginning of mobility decline.
Standardized functional movement assessments, such as the SPPB, while valuable and straightforward to implement, do not come at zero cost to the patient or the provider. In-clinic administration of these assessments require clinician time and resources, take up valuable 1:1 patient–provider time, and often require dedicated clinic space. Costs to the patient include travel time and transportation costs, time away from work and other responsibilities, child care costs, and insurance copayment. Furthermore, in-clinic functional movement assessments may not accurately reflect everyday mobility.11–13 Indeed, functional movement is thought to emerge from the interaction between the environment, person, and task,14 suggesting that differences between clinical and home environments may result in differences in movement when completing similar tasks. Consequently, idealized and controlled clinical environments for standardized testing may mask the mobility impairment that is present in community and home settings,12,13 potentially resulting in missed detection of functional deficits and missed opportunities for necessary intervention. Movement-sensing technologies that enable the collection of functional mobility data in real-world settings, and without the need for a clinician to be present, have the potential to reduce both provider and patient costs, yield clinically meaningful data on a patient’s functional impairments, and, longer term, improve the access to quality clinical care for aging patients.
Inertial measurement units (IMUs) are extremely portable and relatively inexpensive wearable sensors with a growing evidence base that demonstrates their ability to replicate and extend measurements made by costly and largely inaccessible lab-based motion analysis systems.15,16 IMU-based technologies have moderate-to-strong concurrent validity in quantifying the deficits measured during single-activity functional movement assessments, including walking and balance performance,15,16 and IMU data collected during everyday activities has been shown to be useful for predicting standardized SPPB and Timed “Up & Go” test scores in individuals who are frail and prefrail using deep learning.17 IMU data captured during Timed “Up & Go” tests have also been used for fall risk assessment.18 Computer vision technologies offer distinct advantages to wearable IMU technologies; most notable is the ability to extract whole-body position information without requiring whole-body sensor sets. Like IMUs, computer vision-based measurement systems have been used in complex clinical settings, like the ICU,19,20 and are rapidly emerging as clinically useful for both single-activity21 and multiactivity mobility assessment.22 Though substantially less expensive than the gold standard motion capture systems used in research laboratories, IMU and computer vision systems have their own hardware and software costs and require time and resources to train personnel in their operation. Despite these costs, the advance of portable motion analysis systems is necessary to study real-world mobility and to democratize access to motion analysis beyond specialized clinical and laboratory settings.
Our team has developed a multimodal movement assessment system that combines the inertial data collected by wearable IMUs with computer vision data collected by external RGB-D cameras to identify and evaluate the functional movement performance of individuals with mobility impairments. The IMU-vision system’s activity recognition and scoring algorithms were trained using lab-based human biomechanical data to leverage the synergistic information captured in inertial and vision data to track the 3D position of human lower-extremity joints. We have shown in a lab-based validation study that the IMU-vision system can both identify the functional activity being performed and provide an objective assessment of movement performance.23,24 The objective of the current study is to build on this foundational work by evaluating the feasibility and accuracy of applying the system’s autonomous activity–recognition capabilities to analyze the functional activity data collected in the homes of recently discharged home hospital patients.25 Moreover, in an exploratory subanalysis, we evaluate the accuracy of the system’s autonomous activity–scoring capabilities.
Methods
Participants
We recruited 50 community-dwelling adults discharged from the Brigham and Women’s Home Hospital Program25 to participate in 1 home data collection visit with the combined IMU-vision measurement system. Inclusion criteria included age ≥18, a completed course of acute care in the Brigham and Women’s Home Hospital program, residing within 10 miles of Brigham and Women’s Hospital, ability to provide written informed consent, and ability to follow 3-step commands. Exclusion criteria included unhoused participants, the inability to communicate with study staff, active substance use, active psychosis, or additional comorbidities that prevented participation in the study. Demographic information was collected by subject interview, and frailty risk status was determined through the Program of Research to Integrate Services for the Maintenance of Autonomy (PRISMA)-7 questionnaire.26 All subjects provided written informed consent for participation in the study. All study procedures were approved by the Massachusetts General Brigham and Boston University Institutional Review Boards.
Home Data Collection
Data collection consisted of a single home visit and included completion of activities similar to the subtasks of the SPPB.7 The SPPB includes standing balance, walking, and sit-to-stand activities. The standing balance activities include timed trials of Romberg, semitandem stance, and tandem stance and were scored on the participant’s ability to hold the position for 10 seconds without assistance. The walking activity includes 2 timed trials of a 3-Meter Walk Test, with the faster of the 2 timed trials recorded and scored. The sit-to-stand activity includes a single chair stand test without upper extremity support or assistance evaluated for safety, followed by a timed 5 Times Sit-to-Stand Test that was administered only if the subject safely completed the single sit-to-stand trial. Though the SPPB standardized instructions for sit-to-stand tests indicate that the test should be initiated from a sitting position, we include sit-to-stand tests that were initiated from a standing position. Though these tests can still be timed and scored in likeness to the SPPB standardized scoring instructions, the deviation in starting position may introduce a bias in the score calculation. However, because this bias would affect both the manual and automated times in the same manner, we have chosen to keep these sit-to-stand tests in the study’s analyses. The trained investigator who supervised and administered the SPPB activities also manually scored the activities.
IMU-Vision System
The IMU-vision system uses an Intel NUC9 small factor mini PC (Intel Corporation, Santa Clara, CA, USA; operation system: Ubuntu Linux 18.04) equipped with an Intel i5 processor and an NVIDIA GTX 2070 graphics card (NVIDIA Corporation, Santa Clara, CA, USA) and with full-disk encryption for data collection. IMU data were collected at 120 Hz by 4 IMUs (Xsens MTW) attached bilaterally, 1 on each lower (midtibia) and upper (midthigh) leg, and connected wirelessly to the PC via a recording station. Vision data were collected by 2 RGB-Depth cameras (Intel RealSense D435, Intel Corporation). For the purposes of this study, each camera has an effective field of view of 87° × 58°, with an image resolution of 1280 × 720. The cameras were mounted on tripods and were placed to maximize the capture space and at least 3 m apart. The cameras captured images with an effective frame rate of 25 Hz. It should be noted that the IMU-vision system’s processing and scoring algorithms (which are run offline on the same computer) use images captured by only 1 camera; 2 cameras are used during data collection to increase the likelihood of collecting analyzable data for each test. A cost breakdown of the off-the-shelf components used in the IMU-vision system, as equipped for this study, is provided in the Supplementary Table.
In addition to the data collection module, the IMU-vision system is equipped with 2 other modules: an activity-recognition module and an activity-scoring module. These modules were trained using annotated lab-based biomechanical data collected from 5 individuals who were healthy and who completed a series of structured functional activities consisting of the subtasks included in the SPPB, Timed “Up & Go” test, and 10-Meter Walk Test. More specifically, the training data were manually labeled on a per-frame basis and were preprocessed to extract the relevant features. However, it should be noted that the spatial resolution of these training data was limited by the camera’s capabilities (see the Activity Recognition Module section for more detail). As a result, the training data were not sufficient to train the activity-recognition module to differentiate between the individual balance subtasks of the SPPB (ie, tandem, semitandem, and Romberg standing balance tests). This capability will be added with further development of the system. To adapt the system for use in uncontrolled home settings, a custom neural network was trained to address the higher likelihood of vision occlusions and denoise the initial 3D measurements.
Activity Recognition Module
The IMU-vision system’s activity-recognition subsystem is driven by a multimodal measurement fusion system that combines IMU data, RGB images, and depth measurements over a sliding temporal window of 6 seconds (150 frames at 25 Hz) to estimate the 3D movement of the lower limbs. The fused measurements form a point cloud that is used to obtain the 3D skeletal reconstruction (Fig. 1, see our prior work for more detail23,24). The 3D temporal data are then input to a neural network that predicts, on a per-frame basis, the activity that the user is performing out of a set of predefined activities: standing, walking, sitting, standing up, and sitting down. Finally, the temporal sequence of individual frame activities is then input to a manually prespecified decision tree that outputs the activity identified in each recorded sequence based on the predefined functional activities (ie, for this study, they are the SPPB subtasks).

Three different perspective projections of the reconstructed 3D skeleton of a study participant sitting on a chair. The different axes represent the X (right–left), Y (up-down), and Z (depth) coordinates, respectively, in the 3D space.
As noted, due to limitations in the IMU-vision system’s data collection capabilities—more specifically high noise-to-signal in the depth measurements—the IMU-vision system was not capable of differentiating the foot position differences between the tandem, semitandem, and Romberg standing balance tests. Thus, the activity-recognition module was not trained to distinguish between these balance subtasks. For the purposes of this study, these different tests were all classified as “standing balance tests”. Grouping the balance tests together in this way allows for the evaluation of the IMU-vision system’s ability to differentiate between the SPPB motor test categories (ie, walking, standing balance, and sit-to-stand).
Activity Scoring Module
The activity-scoring subsystem receives as input the identified SPPB subtask and the temporal sequence of individual frame activities. The activity score is generated based on the automatically extracted time between subtask-specific significant events. For example, for the standing balance tests, the amount of time the participant maintains the standing position is extracted, with the final score assigned based on the SPPB standard scoring guidelines. Due to the current IMU-vision system’s limited ability to distinguish between the different standing balance tests (as noted above), and because tandem tests are scored on a different scale (ie, tandem tests are scored out of a max of 2, whereas the other tests are scored out of a max of 1), all tandem tests were excluded from analysis by the activity-scoring module.
Data Collection and Preparation
Of 316 data collection attempts across the 50 study participants, issues encountered during data collection or data preprocessing resulted in 45 data collection attempts not producing complete datasets. That is, 271 complete IMU-vision datasets were available for this foundational study of the activity recognition and scoring capabilities of the IMU-vision system. Incomplete or missing datasets were the result of improper equipment setup during the test that prevented the IMU-vision system from collecting data, or manual data preprocessing errors, such as incorrectly cropped videos.
Data Analyses
To evaluate the feasibility of using the IMU-vision system’s activity-recognition module with functional activity data collected in the home, we calculated the percentage of tests with complete IMU-vision datasets that were analyzable. To evaluate the accuracy of the activity-recognition module’s outputs, we calculated the percentage of analyzable tests that were correctly identified. A χ square (χ2) test evaluated the differences in activity recognition accuracy across the 3 potential activity types: standing balance, sit-to-stand, and walking.
In an exploratory subanalysis, we evaluated the accuracy of the activity scoring module’s outputs. The activity scoring module is built to automatically sequence with the output of the activity-recognition module, enabling an automatic end-to-end functional movement assessment pipeline. However, because the scoring criteria used for 1 activity are not translatable to other activities, errors in activity recognition will naturally lead to errors in activity scoring. Thus, for the purposes of this study of the IMU-vision system’s foundational recognition and scoring capabilities, we evaluate the performance of each module independently. More specifically, we do not include the 6 tests misclassified by the activity-recognition module in the exploratory subanalysis of the activity scoring module’s accuracy. This subanalysis included calculating the percentage of scorable tests that were correctly scored and computing the Cohen weighted κ (Kw) to evaluate agreement between the system-identified activity scores and the manual activity scores provided by the trained in-person rater. A χ square (χ2) test evaluated the differences in score accuracy across the 3 activity types: standing balance, sit-to-stand, and walking. α was set at .05 for all hypothesis testing.
Post Hoc Laboratory-Based Assessment of Walking Speed Estimation Accuracy
As a post hoc extension of our evaluation of the IMU-vision system’s activity scoring capabilities, we sought to quantify the error present in the system’s walking speed estimations relative to a laboratory-based ground truth motion capture system. More specifically, we analyzed data collected as part of our prior study (see Mitjans et al 202124) to compare the walking speeds measured by the lab-based optical motion capture system (Qualisys, Qualisys AB, Gotëborg, Sweden) with the walking speeds estimated by our IMU-vision system. This post hoc analysis included data collected from 1 individual after stroke and 4 adults who were healthy. In brief, each participant completed 6 10-m walk tests, while the motion data were collected concurrently by the lab-based and IMU-vision motion capture systems. Both IMU-vision and ground truth speeds were calculated as the average speed of the 3D trajectories: the former were obtained in the same manner as the in-home recordings, and the latter were obtained from the position of the optical markers. More specifically, due to the discontinuous data obtained from the motion capture system (caused by visual occlusions during the tests), the tracked joint with the longest visible motion capture trajectory was used as the source to compute the ground truth speed, and the estimated trajectory of the same joint along the same time window was used to compute the average IMU-vision speed. To evaluate the IMU-vision system’s speed estimation accuracy, we plot these data relative to an identity line and compute the average speed estimation error across the approaches. This average error is computed as the weighted average of all relative errors across all tests, each one weighted by the length of the time window used to compute the walking speed. These windows correspond to the longest time sequences with the presence of ground truth data and therefore can be of different length for each test.
Role of the Funding Source
The funders played no role in the design, conduct, or reporting of this study.
Results
Participants
Fifty-nine individuals were initially enrolled in the study; however, 9 individuals withdrew prior to any study procedures, resulting in 50 individuals in the final enrollment. After accounting for all data collection attempts that resulted in tests with missing or incomplete IMU-vision datasets (Fig. 2), 271 tests from 47 individuals were ultimately included in our analysis of the IMU-vision system’s movement monitoring capabilities (Table). Frailty risk status was determined by the PRISMA-7 Questionnaire using a cut-off score of ≥3 indicating frailty risk.26

Final test counts used to evaluate the multimodal system’s autonomous activity–recognition and activity-scoring capabilities of IMU-vision data collected in the home. IMU = inertial measurement unit.
Participants Characteristics . | n (%) . |
---|---|
Age, y, mean (SD) | 59.1 (16.72) |
Sex Female | 22 (46.8) |
Race and ethnicity White African-American Latino/a Multi/other | 27 (57.5) 12 (25.5) 4 (8.5) 4 (8.5) |
Primary language English Spanish Creole Other | 40 (85.1) 4 (8.5) 1 (2.1) 2 (4.3) |
Partner status Single Married Domestic partner Divorced Widowed Other | 12 (25.5) 24 (51.1) 2 (4.3) 3 (6.4) 5 (10.6) 1 (2.1) |
Living situation Lives alone Lives with spouse/partner Lives with spouse/partner and family Lives with nonspouse family Other | 11 (23.4) 10 (21.3) 17 (36.2) 6 (12.8) 3 (6.4) |
Education level Less than high school High school graduate/GED Some college College graduate Postgraduate education Other | 2 (4.3) 16 (34.0) 11 (23.4) 8 (17.0) 9 (19.2) 1 (2.1) |
Employment status Employed Unemployed Disabled Retired Other | 14 (29.8) 8 (17.0) 4 (8.5) 17 (36.2) 4 (8.5) |
Smoking status Never smoked Former smoker Active smoker | 21 (44.7) 22 (46.8) 4 (8.5) |
Use of home health aide Yes | 6 (12.8) |
At risk for frailtyb | 24 (51.0) |
Participants Characteristics . | n (%) . |
---|---|
Age, y, mean (SD) | 59.1 (16.72) |
Sex Female | 22 (46.8) |
Race and ethnicity White African-American Latino/a Multi/other | 27 (57.5) 12 (25.5) 4 (8.5) 4 (8.5) |
Primary language English Spanish Creole Other | 40 (85.1) 4 (8.5) 1 (2.1) 2 (4.3) |
Partner status Single Married Domestic partner Divorced Widowed Other | 12 (25.5) 24 (51.1) 2 (4.3) 3 (6.4) 5 (10.6) 1 (2.1) |
Living situation Lives alone Lives with spouse/partner Lives with spouse/partner and family Lives with nonspouse family Other | 11 (23.4) 10 (21.3) 17 (36.2) 6 (12.8) 3 (6.4) |
Education level Less than high school High school graduate/GED Some college College graduate Postgraduate education Other | 2 (4.3) 16 (34.0) 11 (23.4) 8 (17.0) 9 (19.2) 1 (2.1) |
Employment status Employed Unemployed Disabled Retired Other | 14 (29.8) 8 (17.0) 4 (8.5) 17 (36.2) 4 (8.5) |
Smoking status Never smoked Former smoker Active smoker | 21 (44.7) 22 (46.8) 4 (8.5) |
Use of home health aide Yes | 6 (12.8) |
At risk for frailtyb | 24 (51.0) |
Continuous variables are reported as mean (SD) and categorical variables are reported as frequency (proportion).
Frailty risk was determined by the PRISMA-7 Questionnaire26 with a cut-off score of ≥3 for risk of frailty.
Participants Characteristics . | n (%) . |
---|---|
Age, y, mean (SD) | 59.1 (16.72) |
Sex Female | 22 (46.8) |
Race and ethnicity White African-American Latino/a Multi/other | 27 (57.5) 12 (25.5) 4 (8.5) 4 (8.5) |
Primary language English Spanish Creole Other | 40 (85.1) 4 (8.5) 1 (2.1) 2 (4.3) |
Partner status Single Married Domestic partner Divorced Widowed Other | 12 (25.5) 24 (51.1) 2 (4.3) 3 (6.4) 5 (10.6) 1 (2.1) |
Living situation Lives alone Lives with spouse/partner Lives with spouse/partner and family Lives with nonspouse family Other | 11 (23.4) 10 (21.3) 17 (36.2) 6 (12.8) 3 (6.4) |
Education level Less than high school High school graduate/GED Some college College graduate Postgraduate education Other | 2 (4.3) 16 (34.0) 11 (23.4) 8 (17.0) 9 (19.2) 1 (2.1) |
Employment status Employed Unemployed Disabled Retired Other | 14 (29.8) 8 (17.0) 4 (8.5) 17 (36.2) 4 (8.5) |
Smoking status Never smoked Former smoker Active smoker | 21 (44.7) 22 (46.8) 4 (8.5) |
Use of home health aide Yes | 6 (12.8) |
At risk for frailtyb | 24 (51.0) |
Participants Characteristics . | n (%) . |
---|---|
Age, y, mean (SD) | 59.1 (16.72) |
Sex Female | 22 (46.8) |
Race and ethnicity White African-American Latino/a Multi/other | 27 (57.5) 12 (25.5) 4 (8.5) 4 (8.5) |
Primary language English Spanish Creole Other | 40 (85.1) 4 (8.5) 1 (2.1) 2 (4.3) |
Partner status Single Married Domestic partner Divorced Widowed Other | 12 (25.5) 24 (51.1) 2 (4.3) 3 (6.4) 5 (10.6) 1 (2.1) |
Living situation Lives alone Lives with spouse/partner Lives with spouse/partner and family Lives with nonspouse family Other | 11 (23.4) 10 (21.3) 17 (36.2) 6 (12.8) 3 (6.4) |
Education level Less than high school High school graduate/GED Some college College graduate Postgraduate education Other | 2 (4.3) 16 (34.0) 11 (23.4) 8 (17.0) 9 (19.2) 1 (2.1) |
Employment status Employed Unemployed Disabled Retired Other | 14 (29.8) 8 (17.0) 4 (8.5) 17 (36.2) 4 (8.5) |
Smoking status Never smoked Former smoker Active smoker | 21 (44.7) 22 (46.8) 4 (8.5) |
Use of home health aide Yes | 6 (12.8) |
At risk for frailtyb | 24 (51.0) |
Continuous variables are reported as mean (SD) and categorical variables are reported as frequency (proportion).
Frailty risk was determined by the PRISMA-7 Questionnaire26 with a cut-off score of ≥3 for risk of frailty.
Feasibility
The 271 tests with complete IMU-vision datasets were screened manually for their analyzability by the IMU-vision system’s activity-recognition module. Fifty-four of these tests were not analyzable: 18 due to external factors that resulted in incomplete visual information captured during testing (eg, cameras misplaced during collection, or a person or object obstructing view of the subject) and 36 due to internal system factors (eg, video freezes, corrupt files, or subject misdetection). Ultimately, 217 (80%) of the 271 tests with complete datasets were analyzable.
Automated Activity Recognition Accuracy
The activity-recognition module correctly identified 95% of the 217 analyzable tests. Accuracy rates differed across activities (χ2(2) = 19.9, P < .001), with correct identification of 100% of walking activities, 97% of standing activities, and 82% of sit-to-stand activities (Fig. 3A).

(A) Accuracy assessment of the IMU-vision system’s automated activity recognition algorithm. The proportion of all assessed subtasks that were accurately recognized (left) is shown alongside the proportion of each subtask (Sit-to-Stand, Standing Balance, and Walking) that was accurately recognized (right). (B) Accuracy assessment of the IMU-vision system’s automated activity scoring algorithm. The proportion of all assessed subtasks that were accurately scored (left) is shown alongside the proportion of each subtask (Sit-to-Stand, Standing Balance, and Walking) that was accurately scored (right). IMU = inertial measurement unit.
Exploratory Subanalysis: Automated Activity Scoring Accuracy
Of the 217 tests analyzable for activity recognition, only 152 were scorable by the activity scoring module: 33 tests were tandem standing balance tests that the activity-scoring module is not yet trained to score (see Activity Scoring Module in Methods), 26 tests lacked ground truth scores (ie, they were unscorable single sit-to-stand tests administered to assess safety before proceeding to the 5 Times Sit-to-Stand Test, or otherwise scorable tests where the in-person rater did not record a ground truth score), and 6 tests were not scorable because they were not accurately identified by the activity-recognition module (Fig. 2).
The activity scoring module correctly scored 74% of the 152 scorable tests. Accuracy rates again differed across activity types (χ2(2) = 14.9, P < .001); 87% of scorable standing balance tests, 50% of sit-to-stand tests, and 66% of 3-Meter Walk Tests received the same manual and automated scores (Fig. 3B). The agreement analysis indicated that the automated test scoring algorithm had substantial agreement to the trained manual rater (Kw = 0.76 [0.69, 0.83]; P < .001).
Post Hoc Laboratory-Based Assessment of Walking Speed Estimation Accuracy
Given the lower-than-expected accuracy in 3-Meter Walk Test scores, we conducted a post hoc examination of the IMU-vision system’s walking speed estimation accuracy relative to a lab-based ground truth to establish a minimum expected accuracy in the home. The analysis included 29 10-Meter Walk Tests collected from 1 individual after stroke (man, 49 years, 15 years after stroke) and 4 adults who were healthy (3 men, 1 woman; average age of 27.5 years). Each participant completed 6 tests, with 1 test being not usable. An average 0.16 m/s discrepancy in the speed estimations produced by the optical motion capture system and the IMU-vision system was observed, corresponding to an average relative error of 10.30% (Fig. 4).

Scatter plot of IMU-vision system speed estimations versus a lab-based (Qualisys) reference, with identity line (red). Each point is the average speed calculated from a single 10-Meter Walk Test. IMU = inertial measurement unit.
Discussion
Advances in movement sensor technologies and algorithms are enabling the automated administration and scoring of functional movement assessments. In this study, we sought to evaluate the activity recognition and scoring capabilities of a multimodal sensor system used to collect integrated IMU-vision data during the completion of walking, standing balance, and sit-to-stand activities by 50 community-dwelling adults recruited from a Hospital at Home program.25 The IMU-vision system’s activity-recognition capability demonstrated good feasibility and high accuracy when differentiating between the 3 SPPB subtasks (ie, sit-to-stand, walking, and standing balance). Moreover, the IMU-vision system’s current activity scoring capabilities demonstrated good accuracy for the subset of currently scorable activities. Taken together, the findings of this study demonstrate the potential of integrating inertial and vision sensor data in an in-home functional movement assessment system, while highlighting important research and development pathways toward fully autonomous, home-based movement monitoring.
Toward an End-to-End Functional Movement Assessment Pipeline
A key advantage of combining inertial and vision data in a multimodal movement monitoring system is the potential for combining activity recognition and activity scoring. To automatically match scores to completed activities, at-home movement assessment systems can employ other (simpler) strategies, such as direct input of the completed activity by the patient via a user interface. However, there are benefits to combining activity scoring with activity recognition, including greater usability by individuals with cognitive impairment, which is common in aging populations.2 Though this foundational study of the IMU-vision technology focuses on a limited subset of functional activities, with additional algorithm training, the basic technology has potential for broader application to a variety of functional tasks (eg, reaching, turning, and stair-climbing). Moreover, beyond structured functional activity assessment, we envision the ultimate end of this line of research and development being unstructured movement monitoring in the background of one’s life. Movement-sensing technologies that can autonomously evaluate day-to-day mobility without requiring user input have significant potential to improve clinical care,27 including facilitating a shift in focus to preventative, rather than reactive, treatments for mobility decline.
Activity Recognition
As noted above, limitations in the training data prevented the development of an activity recognition module that could distinguish between the 3 balance subtasks included in the SPPB, which will be a key area for improvement in future iterations of the IMU-vision system. Despite this limitation, the IMU-vision system demonstrated noteworthy accuracy identifying the SPPB subtask category—ie, 95% of balance, sit-to-stand, and walking tests were correctly identified. This finding highlights one of the main strengths of the IMU-vision system. However, reduced accuracy in certain subtask categories—ie, only 82% of sit-to-stand activities were correctly identified—reveals potential for refinement of the system’s activity recognition module. For example, unlike clinic-based sit-to-stand tests that are completed from a standard height chair, the sit-to-stand tests included in this home-based study were completed from regular household furniture that could be higher or lower than a standard height chair. Differences in starting height affected the accuracy of the system’s sit-to-stand detection. In the short term, specific instructions to complete home-based sit-to-stand tests from standard height chairs would immediately improve the activity recognition accuracy. In the long-term, the data collected from this uncontrolled home-based study can be used to refine the IMU-vision system’s machine learning algorithms. Indeed, the current algorithms were trained using biomechanical data collected under idealized lab-based conditions. The variability in accuracy observed across home-based activities is thus not surprising and highlights the importance of developing ecologically valid movement assessment algorithms when systems are being designed for real-world settings and diverse clinical populations.
Activity Scoring
The IMU-vision system’s activity scoring module produced test scores with substantial agreement to the scores produced by the manual rater. While the high weighted Cohen κ indicates high reliability across the IMU-vision system and in-person rater, there was inconsistency in the accuracy of the different activity scores, with a greater proportion of the 3-Meter Walk Test and sit-to-stand test scores being less accurate. Because the SPPB standardized instructions for scoring are time-based, differences in scoring are largely the result of differences in measured times. With this context, the 3-Meter Walk Test and the sit-to-stand test scoring errors could be caused by 2 potential sources. On one hand, the manual use of a stopwatch to trigger the start and end of the timed portion of a timed walk test is prone to error due to human reaction time delays, which are estimated to be about 250 milliseconds in response to visual stimuli.28 Though manual error should affect all tests in a similar manner, and theoretically should have the same effect both at the start and end of each test, any impact is likely to be larger on tests of shorter duration. Indeed, larger test–retest error was recently reported in 10-Meter Walk Test speeds among individuals who were faster versus individuals who were slower.29
On the other hand, and from a technical standpoint, the depth sensor used can, in some instances, provide highly noisy and biased spatial measurements. In fact, the ideal range of use recommended by the manufacturer of the depth cameras used in this study is between 0.3 and 3 m.30 Because home conditions often required our study participants to perform the tests at different distances from the camera, this recommendation was not always met. Though this source of error was anticipated in our development of the IMU-vision scoring algorithms, which include methods to filter out high-frequency noise in the skeletal reconstructions, this module was trained using laboratory data, which contained more accurate measurements and less distorted skeletal reconstructions. That is, inconsistency between the training data (collected in the lab) and test data (collected in the home) has a direct impact on the performance of the machine learning algorithms that we trained to identify functional activities on a per-frame basis, and which are the inputs considered by the automatic scoring module. This error source becomes especially significant when scoring sit-to-stand tests because a noisy functional activity graph can have catastrophic consequences in the automated time measurements. For an example of this source of error, please see the supplementary figure, which displays a sample depth map showing the noise source of error (Suppl. Fig. 1A) as well as the effect on per-frame activity recognition (Suppl. Fig. 1B).
The higher-than-expected scoring errors observed in the 3-Meter Walk Tests motivated a post hoc examination of the IMU-vision system’s speed estimation accuracy free from human sources of error, such as those due to incorrect starting and stopping of the stopwatch. This post hoc analysis was conducted using laboratory-based data to provide a quantitative insight into the magnitude of estimation error that should be expected due to noisy depth measurements. This analysis shows that our system can estimate walking speeds with an average error of about 10%. While the average error estimate determined by our speed estimation analysis (0.16 m/s) is less than the previously established minimal detectable change for walking speed (0.19 m/s),31 as seen in Figure 4, the IMU-vision system can generate outliers and still requires further development. In the short term, the use of a better depth sensor would greatly reduce noise in the skeleton reconstructions and thus improve the accuracy of the speed estimations. Regardless, it is noteworthy that the speed estimation accuracy study performed with the laboratory-based data (ie, vs an optical motion capture ground truth) greatly outperformed the accuracy assessment performed with the in-home 3-Meter Walk Test estimations (ie, vs a human rater). However, it is important to mention that the laboratory conditions were optimized to the experiment, whereas home environments are often uncontrolled and thus add much more noise to both the images and the point cloud depth information. Unfortunately, without a motion capture system available in participants’ homes, we are unable to quantitatively capture the system-based error in the home data. However, this lab-based analysis provides an objective expectation for the minimum system-based error, as it was equipped for this study.
Feasibility of In-Home Assessments
An important finding of this study is that 80% of the complete datasets available in this study were analyzable by the IMU-vision system’s activity recognition module. While this suggests good feasibility, there is room for improvement. Examining the reasons why 20% of tests were not analyzable illuminates several opportunities for system improvement. In brief, tests were not usable for 2 primary reasons: external factors that affected data quality and internal system failures. For an example of an external factor, a bright window located within view of the vision system’s RGB-D camera can make the vision data collected by the camera susceptible to higher noise, resulting in very poor skeleton reconstructions and inaccurate results. Moreover, pets and children moving in front of the camera also affect the usability of the vision data. These sources of error require procedural controls and user-facing instructions for use. By contrast, an example of an internal system failure that can be addressed in future development efforts is limited computer resources caused by multiprocessing tasks. For example, when the system performs multiple tasks at the same time, the high resource utilization slows down system performance and may result in data losses during collection. Future iterations of the IMU-vision system can mitigate these challenges by using better cameras with higher resolution depth sensors, adding more cameras that the system can automatically switch to when a single camera’s data are not usable, and improved human detection algorithms that can better differentiate between multiple moving bodies. Notably, the IMU-vision system achieved a high activity recognition accuracy despite the integrated IMU and vision data being collected in home settings that differed significantly across participants and markedly from the lab-based settings that the system was originally built for. Indeed, the variability in home settings was not directly accounted for during the development of the IMU-vision system and is a key area for future improvement.
Alternative Approaches
As one of the motivations of this study is to shorten the gap between newly developed mobility assessment technology and its adoption by clinicians, it is necessary to compare the technology presented in this study with alternatives in the literature. One of the main differences between our technology and others is our use of a gray box pipeline with intermediate explainable stages that favor transparency in the final score computations, as opposed to using full deep learning black box models. In particular, the 3D reconstruction of the human skeletons can provide additional quantitative information to a remotely monitoring clinician, and the final automatic scores are computed based on a straightforward reading of the time taken between key activity frames rather than being generated by a neural network.32 Additionally, our method requires only a single depth camera, as opposed to other studies that rely on triangulated measurements from multiple cameras, such as Yeung et al 2019.20 Other studies, such as Ng et al 2020,21 use 2D-only vision information, which prevents calculating distances (such as lengths or velocities). Our system still has accuracy limitations when it comes to speed and velocity estimations, but as discussed above, future iterations of the system that use better cameras and sensors are likely to show significant improvements in this regard.
However, even with a future system that reports very high accuracy in both activity recognition and scoring, clinical adoption of this or a similar system would require additional validation module(s) before fully autonomous home assessment is clinically viable. That is, clinicians need to know how much trust they can place in the outputs of the automated system. Even though this challenge was not the focus of this study, features to address this could be added in 2 complementary but different ways: (1) passively, by improving how the outputs of the system are calculated or (2) actively, by developing a new external system to validate the given outputs. An example of the first pathway would be updating the system to output a CI together with point scores, which would provide the clinician with additional information indicating the precision of the obtained score. An example of an active external system would be the development of a fault detection system that would identify when either the data or the outputs of the intermediate stages (eg, the 3D skeletal reconstruction) show evident anomalies that could impede a proper score calculation by the automatic scoring module. We want to emphasize that addressing this problem is crucial for the advance of all automatic assessment systems and thus encourage future research in this direction.
Study Limitations
Key limitations of the study include the data collection and preparation errors that resulted in tests with incomplete IMU-vision datasets, the inability of the current system and testing procedures to derive SPPB test scores, and system limitations and set-up requirements. Moreover, applying the technology to a wider set of structured activities from other standardized functional assessments (eg, the Timed “Up & Go” and/or functional gait assessment) and unstructured functional activities will be needed to advance this technology.
First, data collection and preparation errors due to human error in the set-up, collection, and postprocessing steps of data acquisition resulted in the loss of 45 tests in the dataset (from 316 test attempts to 271 tests with complete datasets). Improved training of individuals who set up the device and the development of a concurrent data monitoring system will substantially improve the proportion of test attempts collected by the IMU-vision system that result in complete datasets.
Second, though the functional activities completed in this study were inspired and motivated by the subtasks of the SPPB, this study was not able to evaluate the IMU-vision system’s ability to automatically produce full SPPB scores due to the lack of tandem balance and standardized 5 Times Sit-to-Stand Tests. Future studies with standardized clinical testing will be essential to demonstrate the capacity of the system to automatically score standardized functional mobility tests.
While this system has been designed to favor a low-cost and easy setup (as opposed to other vision-only or IMU-vision systems which may require multiple cameras), the design also comes with certain limitations. The physical space required for the camera to have a sufficient field of view of the subject is limiting, especially for tests that require long-range mobility, such as walking activities. Indeed, as currently trained, the vision component of the system requires seeing almost the whole body of the subject. Indeed, many of the incorrectly scored walking tests were the result of subjects being too close to the camera either at the start or at the end of the test, which ultimately impairs the system’s ability to detect the true start and end of the 3-Meter Walk Test. In addition, as discussed in the Methods section, another current system limitation is the high noise-to-signal ratio in the depth measurements, which does not allow the system to accurately distinguish between tandem, semitandem, and Romberg balance tests. Future system iterations that use more powerful depth cameras are needed to enable this level of analysis.
Another limitation comes with the specifications of the selected sensors, as discussed in previous sections. For example, using an upgraded version of the cameras, such as the Intel RealSense D45530 (which extends the operating depth range up to 6 m) (Intel Corporation), should lead to more favorable results. We also confirmed that the unstructured settings of in-home environments have a direct effect on the noise added to the RGB-D measurements. Since our algorithms were developed and trained using laboratory-based datasets, they were not robust enough against a higher level of noise. Nonetheless, this study allowed us to collect a significant amount of new data that can be used in the future to fine-tune these algorithms and increase their robustness.
As for the automatic scoring algorithm, our system only considers the legs and torso of the participant, while ignoring the position of the arms. This poses a clear limitation for those tests that require the participant to hold the arms in a specific position. Future versions of the system must account for the position of the arms when scoring a mobility test.
Finally, the IMU-vision system in its current form is not autonomous in its data collection capabilities. Equipment set-up and training are required for the end user, which is currently completed by a trained clinical researcher. In a future autonomous system, the subject will require a minimal level of training for equipment set-up and donning/doffing of sensors prior to testing. Additionally, in this study, manual data processing was required to identify the analyzable tests for input into the system. We anticipate that future iterations of the IMU-vision system will improve the proportion of collected and analyzable videos as well as the automaticity of the data processing system. At such time, it will be necessary to test the feasibility of this fully autonomous system for in-home mobility monitoring.
Conclusions
The IMU-vision technology has the potential to advance clinical care paradigms that can take advantage of the autonomous recognition and scoring of functional activities performed in everyday settings. The ability to reliably measure functional mobility in the home would give health care professionals access to valuable information on functional mobility that can help direct care and clinical decision-making. The current study demonstrates good feasibility for using home-collected data with an autonomous activity recognition algorithm that relies on integrated IMU and vision data. Additionally, our finding of high accuracy in the IMU-vision system’s activity recognition outputs advances critical functionality that can support broader future applications, such as monitoring of unstructured or unscripted activity in community and home settings. Indeed, the continuous collection of functional mobility data in everyday settings could improve clinical diagnostic capabilities and facilitate lifesaving or life-improving interventions that can prevent hospitalizations, falls, and progressive mobility decline. We anticipate that scalable, home-deployable movement monitoring technologies, that also boast high-accuracy measurement capabilities, will create new opportunities for longitudinally assessing functional mobility in diverse settings and contexts and ultimately facilitate timely interventions.
Author Contributions
Johanna Spangler (Data curation, Formal analysis, Methodology, Writing—original draft, Writing—review & editing), Marc Mitjans (Data curation, Formal analysis, Methodology, Software, Visualization, Writing—original draft, Writing—review & editing), Ashley Collimore (Formal analysis, Methodology, Writing—review & editing), Aysha Gomes-Pires (Investigation, Methodology, Writing—review & editing), David M. Levine (Conceptualization, Data curation, Funding acquisition, Project administration, Resources, Supervision, Writing—review & editing), Roberto Tron (Conceptualization, Formal analysis, Funding acquisition, Investigation, Project administration, Resources, Software, Supervision, Writing—review & editing), and Lou Awad (Conceptualization, Formal analysis, Funding acquisition, Investigation, Methodology, Project administration, Resources, Software, Supervision, Visualization, Writing—original draft, Writing—review & editing)
Acknowledgments
We acknowledge and are grateful for the research support of the Boston University Neuromotor Recovery Laboratory research physical therapists and assistants. We are also grateful for our study participants who generously share their time and open their homes to enable scientific research.
Ethics Approval
All subjects provided written informed consent for participation in the study. All study procedures were approved by the Massachusetts General Brigham and Boston University Institutional Review Boards.
Funding
This work was supported by the National Institutes of Health (R01AG067394).
Disclosures and Presentations
The authors completed the ICMJE Form for Disclosure of Potential Conflicts of Interest and reported no conflicts of interest.
References
Author notes
Johanna Spangler and Marc Mitjans Contributed equally.
Comments