-
PDF
- Split View
-
Views
-
Cite
Cite
Lina Goh, Serene S Paul, Colleen G Canning, Kaylena A Ehgoetz Martens, Jooeun Song, Stephanie L Campoy, Natalie E Allen, The Ziegler Test Is Reliable and Valid for Measuring Freezing of Gait in People With Parkinson Disease, Physical Therapy, Volume 102, Issue 12, December 2022, pzac122, https://doi.org/10.1093/ptj/pzac122
- Share Icon Share
Abstract
The purpose of this study was to determine interrater and test–retest reliability of the Ziegler test to measure freezing of gait (FOG) severity in people with Parkinson disease. Secondary aims were to evaluate test validity and explore Ziegler test duration as a proxy FOG severity measure.
Physical therapists watched 36 videos of people with Parkinson disease and FOG perform the Ziegler test and rated FOG severity using the rating scale in real time. Two researchers rated 12 additional videos and repeated the ratings at least 1 week later. Interrater and test–retest reliability were calculated using intraclass correlation coefficients (ICCs). Bland–Altman plots were used to visualize agreement between the researchers for test–retest reliability. Correlations between the Ziegler scores, Ziegler test duration, and percentage of time frozen (based on video annotations) were determined using Pearson r.
Twenty-four physical therapists participated. Overall, the Ziegler test showed good interrater (ICC2,1 = 0.80; 95% CI = 0.65–0.92) and excellent test–retest (ICC3,1 = 0.91; 95% CI = 0.82–0.96) reliability when used to measure FOG. It was also a valid measure, with a high correlation (r = 0.72) between the scores and percentage of time frozen. Ziegler test duration was moderately correlated (r = 0.67) with percentage of time frozen and may be considered a proxy FOG severity measure.
The Ziegler test is a reliable and valid tool to measure FOG when used by physical therapists in real time. Ziegler test duration may be used as a proxy for measuring FOG severity.
Despite FOG being a significant contributor to falls and poor mobility in people with Parkinson disease, current tools to assess FOG are either not suitably responsive or too resource intensive for use in clinical settings. The Ziegler test is a reliable and valid measure of FOG, suitable for clinical use, and may be used by physical therapists regardless of their level of clinical experience.
Introduction
Approximately 50% of people with Parkinson disease will experience freezing of gait (FOG), which is defined as brief, episodic absence or marked reduction of forward progression of the feet despite the intention to walk.1 The incidence of FOG increases as disease progresses,2 with people with FOG likely to experience reduced mobility, increased falls,3,4 and reduced quality of life.5
To determine FOG severity and consequently identify suitable interventions to reduce freezing, clinicians and researchers need reliable and valid measures. Tools commonly used to assess FOG severity remain largely inadequate. For instance, the New Freezing of Gait Questionnaire6 has a minimal detectable change of 9.95 points on a 28-point scale and may not be suitably responsive to reflect a clinically meaningful change.7 The use of video annotations to measure FOG by calculating the percentage of time spent frozen while people with Parkinson disease perform FOG-provoking tasks (ie, total time spent frozen during FOG episodes divided by total time spent performing the tasks)8 or the use of wearable technologies and machine learning to detect FOG9 have potential to accurately capture FOG. However, both these methods require information to be recorded and then analyzed, are time and resource intensive, and hence are not yet feasible in a clinical setting.
The Ziegler test may be used clinically to assess FOG.10 It involves walking a course with 4 situations that commonly trigger freezing (start, clockwise turn, counterclockwise turn, and walking through a doorway) and is repeated under 2 dual-task conditions (carrying a tray with a cup of water and carrying a tray with a cup of water plus counting backwards). The performance of each situation under each condition is rated using a 4-point rating scale based on the type of freezing observed. It is easy to set up, fast to use, and suitable for use in clinics and at home. Although there was high interrater and test–retest reliability between the researchers who designed and tested the Ziegler test,10 the reliability and validity of the Ziegler test when used by physical therapists are unknown. There was also recent evidence that suggests the time taken to complete the Ziegler test was an independent predictor of FOG severity11 and may be used as a proxy FOG severity measure. However, the utility of the Ziegler test duration as a proxy FOG severity measure when used by physical therapists is unknown.
The primary aim of this study was to determine the interrater and test–retest reliability of the Ziegler test when used by physical therapists in real time to assess the severity of FOG in people with Parkinson disease. The secondary aims were to evaluate the validity of the Ziegler test by determining correlations between the outcomes from the rating scale and percentage of time spent frozen measured using video annotations, to explore whether the Ziegler test duration is a valid proxy measure of FOG severity, to evaluate test usability, and to evaluate any differences in reliability based on level of clinical experience.
Methods
Participants
Physical therapists were recruited from public and private health care sectors and academic institutions in New South Wales and the Australian Capital Territory, Australia. Physical therapists were eligible to participate in this study if they held general registration to practice in Australia and had a minimum of 6 months of clinical experience.
Design
Ten people (9 males, 1 female) with Parkinson disease and FOG completed the Ziegler test as part of previous study.12 They had a mean age of 70.6 years (SD = 7.7), moderate to severe Parkinson disease with a mean Movement Disorder Society-Unified Parkinson’s Disease Rating Scale Section III (motor) score of 37.3 (SD = 13.3), and moderate to severe FOG with a mean New Freezing of Gait Questionnaire score of 20.0 (SD = 4.3).
For the Ziegler test,10 they were asked to stand up from a chair positioned 3.4 m in front of a closed door, walk 1 m forward to a square outlined with tape on the ground (40 × 40 cm) in which they completed two 360-degree turns (clockwise then counterclockwise), walk a further 2 m forward towards the door, open the door, and walk through the doorway. Their performance at each test situation (ie, start, clockwise turn, counterclockwise turn, and doorway) was rated: 0 points when no festination and no FOG was detected, 1 point when festination or any hastening steps (“shuffling”) were observed but the person was able to continue moving forwards, 2 points when FOG (trembling-in-place or total akinesia) occurred in which the person was unable to move forwards but could overcome that freezing episode themselves, and 3 points for any abortion of the task or any need of interference by the examiner (eg, acoustic or visual one-off cue to overcome the freezing episode). People with Parkinson disease completed this test under 3 conditions in this order: (1) no additional task, (2) with an additional manual task (ie, holding a tray with a full cup of water), and (3) with an additional cognitive and manual task (ie, counting backwards by 7 seconds from 100 and holding a tray with a full cup of water). The total score ranged from 0 to 36. The performance of each test condition was captured on video, with a total of 3 videos generated per Ziegler test.
People with Parkinson disease performed the Ziegler test during their “on” phase (ie, when their Parkinson medications were working optimally) in the clinic and “off” phase (ie, after their Parkinson medication were withdrawn for at least 12 hours overnight) in their homes. Assessment at home during the “off” phase was for pragmatic reasons to ensure patient safety. A subset of videos was chosen across “on” and “off” phases to include a broad representation of FOG types and severity because more severe forms of FOG such as trembling in place and akinesia are more likely to be triggered in the “off” phase. Videos representing a range of Ziegler scores reflecting different FOG types and severity were chosen to maximize the robustness of the calculations for intraclass correlations. The subset of videos was chosen by 2 researchers (L.G. and S.S.P.) not involved in the ratings used in this study.
Outcome
Interrater reliability was determined based on our sample of physical therapists who were recruited into the study plus 2 physical therapists from the research team. All physical therapists viewed the videos remotely on a secure web platform (REDCap) and rated 12 sets of 3 videos (ie, a total of 36 videos), with the order of each set of videos randomized. Instructions on how to rate the performance were provided as outlined in Ziegler et al10 (Suppl. Material 1). No additional instructions were provided to the physical therapists except to watch the videos only once at normal speed to best mimic use of the rating scale in a clinical setting in real time. For each test condition (ie, with or without additional tasks), physical therapists were asked to simultaneously determine the Ziegler test duration (ie, time taken to complete the Ziegler test from start to finish) using a stopwatch. At the end of the ratings, the physical therapists completed a closed-end questions survey to explore usability of the rating scale. At the end of the closed-end questions, they were given an opportunity to provide additional feedback via free text.
Two physical therapists from the research team, 1 Parkinson disease expert (N.E.A.) and 1 novice (S.L.C.), rated an additional 4 sets of 3 videos (ie, a total of 48 videos). To determine the test–retest reliability, both researchers re-rated the 48 videos after at least 1 week. Two other members of the research team determined the percentage of time spent frozen (J.S. and K.A.E.M.) via video annotations using a pre-determined protocol.12
Statistical Analyses
The interrater and test–retest reliability of the Ziegler rating scale were calculated using intraclass correlation coefficients (ICC2,1 and ICC3,1, respectively). An ICC > 0.90 was considered excellent; 0.75 to 0.90, good; 0.50 to 0.74, moderate; and <0.50 poor.13 Overall interrater and test–retest reliability were calculated as well as reliability for individual test condition (ie, with or without additional tasks) and individual test situations (ie, starts, turns, doorways). Data from the clockwise and counterclockwise turns were pooled as “turns” due to their similar movement patterns. Bland–Altman plots were used to visualize agreement between the 2 researchers for test–retest reliability. Correlations between outcomes of the Ziegler rating scale, time taken to complete the test, and percentage of time frozen were determined using Pearson correlation coefficients (r). An r > 0.90 was considered very high; 0.70 to 0.90, high; 0.50 to 0.70, moderate; 0.30 to 0.50, low; and <0.30 negligible.14 Closed responses from the survey were analyzed using descriptive statistics, and open responses were analyzed thematically. Statistical analyses of the data were conducted using the SPSS Statistics for Windows, version 26.0 (IBM Corporation, Armonk, NY, USA).
Role of the Funding Source
The funding bodies played no role in the design, conduct, or reporting of this study.
Results
Participants
Twenty-four physical therapists (22 recruited, 2 researchers) participated in this study. They practiced in diverse health care settings, from public (n = 13) and private (n = 9) clinics to academic institutions (n = 2), across acute (n = 7), rehabilitation (n = 6), and community care (n = 4). Seven physical therapists had between 0.5 and 2 years of postgraduate clinical experience, 9 had between 3 and 10 years, and 8 had more than 10 years. Ten physical therapists reported seeing between 0 and 10 people with Parkinson disease in the past year, 11 reported seeing between 11 and 30, and 3 reported seeing more than 30. All physical therapists, except the Parkinson disease expert (N.E.A.), were unfamiliar with the Ziegler test.10 The 2 researchers were not involved in conducting the data analyses.
Interrater Reliability
The total score from the Ziegler test showed good interrater reliability (ICC2,1 = 0.80; 95% CI = 0.65–0.92). There was also good interrater reliability for individual test conditions with no additional tasks (ICC2,1 = 0.76; 95% CI = 0.59–0.90), and an additional manual task (ICC2,1 = 0.76; 95% CI = 0.60–0.90). However, there was only moderate interrater reliability for the test condition with an additional cognitive and manual task (ICC2,1 = 0.69; 95% CI = 0.51–0.87) and when the start, turns, and doorway test situations were rated in isolation across the 3 test conditions (start: ICC2,1 = 0.51, 95% CI = 0.40–0.65; clockwise and counterclockwise turns: ICC2,1 = 0.64, 95% CI = 0.56–0.72; doorway: ICC2,1 = 0.61, 95% CI = 0.50–0.73) (Tab. 1).
Interrater Reliability (ICC2,1) With 95% CI of the Ziegler Test and When Rating Individual Test Conditions and Situationsa
Ziegler Test . | n . | ICC . | 95% CI . |
---|---|---|---|
Overall | 24 | 0.80 | 0.65 to 0.92 |
Test condition | |||
No additional task | 24 | 0.76 | 0.59 to 0.90 |
Additional manual task | 24 | 0.76 | 0.60 to 0.90 |
Additional manual and cognitive tasks | 24 | 0.69 | 0.51 to 0.87 |
Test situation | |||
Starts | 24 | 0.51 | 0.40 to 0.65 |
Turns | 24 | 0.64 | 0.56 to 0.72 |
Doorways | 24 | 0.61 | 0.50 to 0.73 |
Ziegler Test . | n . | ICC . | 95% CI . |
---|---|---|---|
Overall | 24 | 0.80 | 0.65 to 0.92 |
Test condition | |||
No additional task | 24 | 0.76 | 0.59 to 0.90 |
Additional manual task | 24 | 0.76 | 0.60 to 0.90 |
Additional manual and cognitive tasks | 24 | 0.69 | 0.51 to 0.87 |
Test situation | |||
Starts | 24 | 0.51 | 0.40 to 0.65 |
Turns | 24 | 0.64 | 0.56 to 0.72 |
Doorways | 24 | 0.61 | 0.50 to 0.73 |
ICC = intraclass correlation coefficient.
Interrater Reliability (ICC2,1) With 95% CI of the Ziegler Test and When Rating Individual Test Conditions and Situationsa
Ziegler Test . | n . | ICC . | 95% CI . |
---|---|---|---|
Overall | 24 | 0.80 | 0.65 to 0.92 |
Test condition | |||
No additional task | 24 | 0.76 | 0.59 to 0.90 |
Additional manual task | 24 | 0.76 | 0.60 to 0.90 |
Additional manual and cognitive tasks | 24 | 0.69 | 0.51 to 0.87 |
Test situation | |||
Starts | 24 | 0.51 | 0.40 to 0.65 |
Turns | 24 | 0.64 | 0.56 to 0.72 |
Doorways | 24 | 0.61 | 0.50 to 0.73 |
Ziegler Test . | n . | ICC . | 95% CI . |
---|---|---|---|
Overall | 24 | 0.80 | 0.65 to 0.92 |
Test condition | |||
No additional task | 24 | 0.76 | 0.59 to 0.90 |
Additional manual task | 24 | 0.76 | 0.60 to 0.90 |
Additional manual and cognitive tasks | 24 | 0.69 | 0.51 to 0.87 |
Test situation | |||
Starts | 24 | 0.51 | 0.40 to 0.65 |
Turns | 24 | 0.64 | 0.56 to 0.72 |
Doorways | 24 | 0.61 | 0.50 to 0.73 |
ICC = intraclass correlation coefficient.
Physical therapists with high and low levels of overall or Parkinson disease–specific clinical experience showed moderate to good interrater reliability (Tab. 2).
Interrater Reliability (ICC2,1) With 95% CI of the Ziegler Test Based on Physical Therapists’ Clinical Experiencea
Categories of Physical Therapists . | Ziegler Test . | ||
---|---|---|---|
n . | ICC . | 95% CI . | |
Overall clinical experience, y | |||
0.5–2 | 7 | 0.74 | 0.51 to 0.90 |
3–10 | 9 | 0.83 | 0.68 to 0.94 |
>10 | 8 | 0.83 | 0.67 to 0.94 |
No. of people with PD seen in past year | |||
0–10 | 10 | 0.81 | 0.63 to 0.93 |
11–30 | 11 | 0.81 | 0.66 to 0.93 |
>30 | 3 | 0.84 | 0.47 to 0.95 |
Categories of Physical Therapists . | Ziegler Test . | ||
---|---|---|---|
n . | ICC . | 95% CI . | |
Overall clinical experience, y | |||
0.5–2 | 7 | 0.74 | 0.51 to 0.90 |
3–10 | 9 | 0.83 | 0.68 to 0.94 |
>10 | 8 | 0.83 | 0.67 to 0.94 |
No. of people with PD seen in past year | |||
0–10 | 10 | 0.81 | 0.63 to 0.93 |
11–30 | 11 | 0.81 | 0.66 to 0.93 |
>30 | 3 | 0.84 | 0.47 to 0.95 |
ICC = intraclass correlation coefficient; PD = Parkinson disease.
Interrater Reliability (ICC2,1) With 95% CI of the Ziegler Test Based on Physical Therapists’ Clinical Experiencea
Categories of Physical Therapists . | Ziegler Test . | ||
---|---|---|---|
n . | ICC . | 95% CI . | |
Overall clinical experience, y | |||
0.5–2 | 7 | 0.74 | 0.51 to 0.90 |
3–10 | 9 | 0.83 | 0.68 to 0.94 |
>10 | 8 | 0.83 | 0.67 to 0.94 |
No. of people with PD seen in past year | |||
0–10 | 10 | 0.81 | 0.63 to 0.93 |
11–30 | 11 | 0.81 | 0.66 to 0.93 |
>30 | 3 | 0.84 | 0.47 to 0.95 |
Categories of Physical Therapists . | Ziegler Test . | ||
---|---|---|---|
n . | ICC . | 95% CI . | |
Overall clinical experience, y | |||
0.5–2 | 7 | 0.74 | 0.51 to 0.90 |
3–10 | 9 | 0.83 | 0.68 to 0.94 |
>10 | 8 | 0.83 | 0.67 to 0.94 |
No. of people with PD seen in past year | |||
0–10 | 10 | 0.81 | 0.63 to 0.93 |
11–30 | 11 | 0.81 | 0.66 to 0.93 |
>30 | 3 | 0.84 | 0.47 to 0.95 |
ICC = intraclass correlation coefficient; PD = Parkinson disease.
Test–Retest Reliability
Overall, the total score from the Ziegler test showed excellent test–retest reliability (ICC3,1 = 0.91; 95% CI = 0.82–0.96). There was also excellent test–retest reliability for the test condition with an additional cognitive and manual task (ICC3,1 = 0.92; 95% CI = 0.85–0.96) and good test–retest reliability for test conditions with no additional task (ICC3,1 = 0.88; 95% CI = 0.76–0.94) and with an additional manual task (ICC3,1 = 0.88; 95% CI = 0.77–0.94). When test situations were rated in isolation, the test–retest reliability was good (start: ICC3,1 = 0.85, 95% CI = 0.77–0.90; clockwise and counterclockwise turns: ICC3,1 = 0.81, 95% CI = 0.75–0.85; doorway: ICC3,1 = 0.86, 95% CI = 0.80 to 0.91) (Tab. 3). The Bland–Altman plots did not show any systematic bias between the Ziegler scores at the 2 time-points (Fig. 1).
Test–Retest Reliability (ICC3,1) With 95% CI of the Ziegler Test Rated at Least 1 Week Apart and When Rating Individual Test Conditions and Situationsa
Ziegler Test . | Overall (n = 2) . | Expert (n = 1) . | Novice (n = 1) . | |||
---|---|---|---|---|---|---|
ICC . | 95% CI . | ICC . | 95% CI . | ICC . | 95% CI . | |
Overall | 0.91 | 0.82 to 0.96 | 0.94 | 0.79 to 0.98 | 0.88 | 0.70 to 0.96 |
Test condition | ||||||
No additional task | 0.88 | 0.76 to 0.94 | 0.87 | 0.62 to 0.95 | 0.90 | 0.73 to 0.96 |
Additional manual task | 0.88 | 0.77 to 0.94 | 0.93 | 0.81 to 0.97 | 0.83 | 0.58 to 0.94 |
Additional manual and cognitive tasks | 0.92 | 0.85 to 0.96 | 0.95 | 0.75 to 0.98 | 0.89 | 0.72 to 0.96 |
Test situation | ||||||
Starts | 0.85 | 0.77 to 0.90 | 0.83 | 0.68 to 0.91 | 0.88 | 0.80 to 0.93 |
Turns | 0.81 | 0.75 to 0.85 | 0.86 | 0.79 to 0.91 | 0.74 | 0.62 to 0.82 |
Doorways | 0.86 | 0.80 to 0.91 | 0.88 | 0.79 to 0.93 | 0.84 | 0.74 to 0.91 |
Ziegler Test . | Overall (n = 2) . | Expert (n = 1) . | Novice (n = 1) . | |||
---|---|---|---|---|---|---|
ICC . | 95% CI . | ICC . | 95% CI . | ICC . | 95% CI . | |
Overall | 0.91 | 0.82 to 0.96 | 0.94 | 0.79 to 0.98 | 0.88 | 0.70 to 0.96 |
Test condition | ||||||
No additional task | 0.88 | 0.76 to 0.94 | 0.87 | 0.62 to 0.95 | 0.90 | 0.73 to 0.96 |
Additional manual task | 0.88 | 0.77 to 0.94 | 0.93 | 0.81 to 0.97 | 0.83 | 0.58 to 0.94 |
Additional manual and cognitive tasks | 0.92 | 0.85 to 0.96 | 0.95 | 0.75 to 0.98 | 0.89 | 0.72 to 0.96 |
Test situation | ||||||
Starts | 0.85 | 0.77 to 0.90 | 0.83 | 0.68 to 0.91 | 0.88 | 0.80 to 0.93 |
Turns | 0.81 | 0.75 to 0.85 | 0.86 | 0.79 to 0.91 | 0.74 | 0.62 to 0.82 |
Doorways | 0.86 | 0.80 to 0.91 | 0.88 | 0.79 to 0.93 | 0.84 | 0.74 to 0.91 |
ICC = intraclass correlation coefficient.
Test–Retest Reliability (ICC3,1) With 95% CI of the Ziegler Test Rated at Least 1 Week Apart and When Rating Individual Test Conditions and Situationsa
Ziegler Test . | Overall (n = 2) . | Expert (n = 1) . | Novice (n = 1) . | |||
---|---|---|---|---|---|---|
ICC . | 95% CI . | ICC . | 95% CI . | ICC . | 95% CI . | |
Overall | 0.91 | 0.82 to 0.96 | 0.94 | 0.79 to 0.98 | 0.88 | 0.70 to 0.96 |
Test condition | ||||||
No additional task | 0.88 | 0.76 to 0.94 | 0.87 | 0.62 to 0.95 | 0.90 | 0.73 to 0.96 |
Additional manual task | 0.88 | 0.77 to 0.94 | 0.93 | 0.81 to 0.97 | 0.83 | 0.58 to 0.94 |
Additional manual and cognitive tasks | 0.92 | 0.85 to 0.96 | 0.95 | 0.75 to 0.98 | 0.89 | 0.72 to 0.96 |
Test situation | ||||||
Starts | 0.85 | 0.77 to 0.90 | 0.83 | 0.68 to 0.91 | 0.88 | 0.80 to 0.93 |
Turns | 0.81 | 0.75 to 0.85 | 0.86 | 0.79 to 0.91 | 0.74 | 0.62 to 0.82 |
Doorways | 0.86 | 0.80 to 0.91 | 0.88 | 0.79 to 0.93 | 0.84 | 0.74 to 0.91 |
Ziegler Test . | Overall (n = 2) . | Expert (n = 1) . | Novice (n = 1) . | |||
---|---|---|---|---|---|---|
ICC . | 95% CI . | ICC . | 95% CI . | ICC . | 95% CI . | |
Overall | 0.91 | 0.82 to 0.96 | 0.94 | 0.79 to 0.98 | 0.88 | 0.70 to 0.96 |
Test condition | ||||||
No additional task | 0.88 | 0.76 to 0.94 | 0.87 | 0.62 to 0.95 | 0.90 | 0.73 to 0.96 |
Additional manual task | 0.88 | 0.77 to 0.94 | 0.93 | 0.81 to 0.97 | 0.83 | 0.58 to 0.94 |
Additional manual and cognitive tasks | 0.92 | 0.85 to 0.96 | 0.95 | 0.75 to 0.98 | 0.89 | 0.72 to 0.96 |
Test situation | ||||||
Starts | 0.85 | 0.77 to 0.90 | 0.83 | 0.68 to 0.91 | 0.88 | 0.80 to 0.93 |
Turns | 0.81 | 0.75 to 0.85 | 0.86 | 0.79 to 0.91 | 0.74 | 0.62 to 0.82 |
Doorways | 0.86 | 0.80 to 0.91 | 0.88 | 0.79 to 0.93 | 0.84 | 0.74 to 0.91 |
ICC = intraclass correlation coefficient.

Bland–Altman plots showing agreement on the Ziegler test rated at 2 time-points at least 1 week apart. The solid line indicates the mean difference between the 2 scores, and the dashed lines indicate the limits of agreement (1.96 SDs of the mean difference).
Both the expert and novice physical therapists showed good to excellent test–retest reliability when rating the overall Ziegler test or individual test conditions. Both therapists also showed good reliability when rating individual test situations, except for the novice physical therapist who showed moderate test–retest reliability when rating the turns (Tab. 3).
Validity
There was a high correlation between scores from the overall Ziegler test and percentage of time spent frozen (r = 0.72) and moderate to high correlations between scores from individual test conditions and their corresponding percentages of time spent frozen (no additional task: r = 0.79; additional manual task: r = 0.64; additional cognitive and manual task: r = 0.64) (Tab. 4).
Correlations (r) Between the Ziegler Test Scores, Test Duration, and Percentage of Time Spent Frozen
Ziegler Test . | Percentage of Time Spent Frozen . | |||
---|---|---|---|---|
Overall . | No Additional Task . | Additional Manual Task . | Additional Manual and Cognitive Tasks . | |
Score | ||||
Overall | 0.72 | – | – | – |
No additional task | – | 0.79 | – | – |
Additional manual task | – | – | 0.64 | – |
Additional manual and cognitive tasks | – | – | – | 0.64 |
Duration | ||||
Overall | 0.67 | – | – | – |
No additional task | – | 0.77 | – | – |
Additional manual task | – | – | 0.66 | – |
Additional manual and cognitive tasks | – | – | – | 0.57 |
Ziegler Test . | Percentage of Time Spent Frozen . | |||
---|---|---|---|---|
Overall . | No Additional Task . | Additional Manual Task . | Additional Manual and Cognitive Tasks . | |
Score | ||||
Overall | 0.72 | – | – | – |
No additional task | – | 0.79 | – | – |
Additional manual task | – | – | 0.64 | – |
Additional manual and cognitive tasks | – | – | – | 0.64 |
Duration | ||||
Overall | 0.67 | – | – | – |
No additional task | – | 0.77 | – | – |
Additional manual task | – | – | 0.66 | – |
Additional manual and cognitive tasks | – | – | – | 0.57 |
Correlations (r) Between the Ziegler Test Scores, Test Duration, and Percentage of Time Spent Frozen
Ziegler Test . | Percentage of Time Spent Frozen . | |||
---|---|---|---|---|
Overall . | No Additional Task . | Additional Manual Task . | Additional Manual and Cognitive Tasks . | |
Score | ||||
Overall | 0.72 | – | – | – |
No additional task | – | 0.79 | – | – |
Additional manual task | – | – | 0.64 | – |
Additional manual and cognitive tasks | – | – | – | 0.64 |
Duration | ||||
Overall | 0.67 | – | – | – |
No additional task | – | 0.77 | – | – |
Additional manual task | – | – | 0.66 | – |
Additional manual and cognitive tasks | – | – | – | 0.57 |
Ziegler Test . | Percentage of Time Spent Frozen . | |||
---|---|---|---|---|
Overall . | No Additional Task . | Additional Manual Task . | Additional Manual and Cognitive Tasks . | |
Score | ||||
Overall | 0.72 | – | – | – |
No additional task | – | 0.79 | – | – |
Additional manual task | – | – | 0.64 | – |
Additional manual and cognitive tasks | – | – | – | 0.64 |
Duration | ||||
Overall | 0.67 | – | – | – |
No additional task | – | 0.77 | – | – |
Additional manual task | – | – | 0.66 | – |
Additional manual and cognitive tasks | – | – | – | 0.57 |
When considering the time taken to complete the Ziegler test and percentage of time spent frozen, there was a moderate correlation between the overall Ziegler test duration and total percentage of time spent frozen (r = 0.67) and moderate to high correlations between duration of individual test conditions and their corresponding percentages of time spent frozen (no additional task: r = 0.77; additional manual task: r = 0.66; additional cognitive and manual task: r = 0.57) (Tab. 4).
Survey Responses
Of the 24 physical therapists, 21 (88%) strongly agreed or agreed that the rating scale was easy to use, 2 (8%) were neutral, and 1 (4%) disagreed. Twenty-one (88%) physical therapists strongly agreed or agreed that the instructions for using the rating scale were clear and 3 (13%) were neutral. Sixteen (67%) physical therapists strongly agreed or agreed it was easy to distinguish between the different scores of the rating scale, 5 (21%) were neutral, and 3 (13%) disagreed. Fifteen (63%) physical therapists strongly agreed or agreed the scores of the rating scale reflected the severity of FOG observed in the videos, 3 (13%) were neutral, and 6 (25%) disagreed. Of the 6 physical therapists who disagreed, 5 (83%) felt that the score underestimated freezing severity. Fifteen physical therapists (63%) reported they will use the Ziegler rating scale in their clinical practice (Fig. 2).

There was variability in how the physical therapists interpreted the start and end of test situations. Fifteen (63%) physical therapists rated any freezing that occurred during “start” from the start of walking up to the moment before the clockwise turn occurred, whereas 9 (38%) rated freezing occurring at the start of walking only. Fourteen (58%) physical therapists rated any freezing that occurred during “doorway” from the end of the counterclockwise turn up to the moment the person with Parkinson disease walked through the doorway; 4 (17%) rated while the person was approaching the door, opening the door, and walking through the doorway; 3 (13%) rated when the person was walking through the doorway only; 2 (8%) rated while the person was opening the door and walking through the doorway; and 1 (4%) rated when the person approached the door only.
Fifteen physical therapists provided additional feedback on the Ziegler scale (Suppl. Material 2). A few physical therapists expressed difficulties identifying the presence or type of FOG. For example, they reported it was difficult to distinguish if the person with Parkinson disease was walking/turning with small steps or experiencing festination, or if a stoppage in movement was due to dual tasking or FOG. Several therapists also reported difficulties interpreting test instructions because they felt it was unclear when test situations began and ended (eg, start and doorway) and if prompts by the examiners to perform the task were considered interference. Two therapists felt the rating scale did not reflect the severity of FOG observed because it was possible to have significant differences in FOG severity within the 1 score level. For instance, a single festination episode of a short duration and multiple festination episodes of long durations were both scored as 1 point. One therapist suggested using a combination of Ziegler scores and test duration to obtain a better indication of FOG severity.
Discussion
This study aimed to explore the clinical utility of the Ziegler test to assess FOG severity in people with Parkinson disease. Our results suggest the Ziegler test had good interrater and excellent test–retest reliability when used by physical therapists in real time. The Ziegler test was also found to be a valid measure with a high correlation with the percentage of time spent frozen, which is currently considered the “gold standard” for measuring FOG.15
The Ziegler test showed good interrater reliability when used in its entirety and moderate to good reliability when test conditions were individually rated. Further exploration to examine the effect of clinical experience on interrater reliability showed that physical therapists with low or high levels of overall or Parkinson disease–specific clinical experience demonstrated moderate to good interrater reliability. The Ziegler test also showed excellent test–retest reliability for the overall score and good to excellent reliability for the individual test conditions, indicating a high level of reproducibility over time. Both the expert and novice physical therapists demonstrated good to excellent test–retest reliability. Furthermore, the Ziegler test was shown to be a valid tool to measure FOG, with a high correlation between the overall Ziegler score and the percentage of time spent frozen. Correlations remained moderate to high when scores from individual test conditions were compared with corresponding percentages of time spent frozen and when the Ziegler test duration was compared with percentage of time spent frozen, indicating that the time taken to complete the Ziegler FOG provoking course may be a valid proxy measure of FOG.
These results suggest the Ziegler test is a reliable and valid measure of FOG, and its use should be considered in clinical practice. The test provides several additional benefits over current FOG measures. It provides an empirical measure of FOG and presents further insights to FOG severity not captured by self-report questionnaires.15 The Ziegler test is cheap, easy to set up, and fast to implement and thus highly accessible to time- and resource-poor clinicians. The use of wearable technologies and automated algorithms shows potential in capturing FOG throughout the day and in the home, but it may be years before it can be successfully implemented in clinical practice due to the need to optimize testing procedures and fine-tune algorithms to accurately detect FOG, as well as the associated high cost and lack of resources and supporting infrastructure.9,16–18 In contrast, the Ziegler test can be implemented in current clinical practice and offers an immediate solution to gaps in objective FOG measurements.
Because it is often difficult to trigger FOG in the clinic,15 the Ziegler test situations and conditions provide a useful framework for triggering FOG. To implement the Ziegler test in clinical practice, physical therapists have a number of options. They may choose to complete the entire test, or, for those who lack time, it may be possible to reliably measure FOG severity by completing only 1 or 2 of the test conditions (ie, with or without additional tasks). Physical therapists may choose to assess people with mild disease using test conditions with additional tasks, which are necessary to trigger FOG, and assess people with moderate to severe disease using the test condition without an additional task to avoid placing people with Parkinson disease under undue stress. In addition to the numerical rating, observations made by the physical therapist during test performance may inform potential intervention strategies.12
All physical therapists can consider using the Ziegler test even if they have minimal overall or Parkinson disease–specific clinical experience, because their ratings agree with therapists with more experience. A second therapist may be required to ensure safety of the person with Parkinson disease performing the test given the high risks of falls in people with FOG. Our results also support outcomes from a previous study by Herman et al, who reported a moderate correlation between the Ziegler test duration and Ziegler score.11 Therefore, physical therapists who work alone may choose to time the Ziegler test as a proxy FOG severity measure because it may be easier to operate a stopwatch than rate the performance in real time while supervising a person at high risk of falls during the test.
Despite the overall good to excellent reliability and validity, several limitations with using the Ziegler test to assess FOG remain. The Ziegler test is designed to measure FOG by identifying the type of freezing observed, that is, festination, trembling in place, or akinesia, with the assumption that trembling in place and akinesia are more severe than festination. However, it is unclear if this assumption is valid because individuals experience FOG and its impact differently, and different FOG types may not reflect increases in FOG severity. Whether the presence of different types of freezing reflect different freezing severity when measured by number and duration of FOG episodes, and how individuals with FOG perceive the relation between types of freezing and freezing severity, remain to be determined.
Although identifying the type of freezing to measure FOG severity may be appropriate as a simple and quick clinical tool, the Ziegler test does not quantify the frequency and duration of freezing, which may provide a more detailed assessment of freezing severity.15 Scores from the Ziegler test likely underestimate freezing severity in people with worse FOG, because individuals experiencing many freezing episodes of long duration were scored similarly to individuals experiencing a single freezing episode of short duration. There are also limitations to using the Ziegler test duration as a proxy measure for FOG severity because increases in time taken to complete the test conditions with additional tasks may be attributed to FOG and/or dual or triple demands and not FOG alone. This is reflected in the lower correlations between Ziegler test durations and percentages of time spent frozen for test conditions with additional tasks compared with no additional task.
The Ziegler test situations were not well defined in the instructions provided by the test developers10 (Suppl. Material 1), with the physical therapists having varying interpretations. Whereas it was apparent when the “turns” occurred, how physical therapists interpreted “start” and “doorway” was highly variable. For example, approximately half of the physical therapists defined “start” as start of walking only, whereas the others defined “start” as the start of walking up to the start of the clockwise turn. Hence, any FOG that occurred during forward walking just after the start of walking would be scored by some and not by others.15 Likewise, approximately half of the physical therapists defined “doorway” as the period from the end of the counterclockwise turn to after the person steps across the doorway, whereas others rated only part of the task, such as approaching the door only, walking through the doorway only, or opening the door and walking through the doorway only. FOG may again be scored variably. This may explain the lower reliability when physical therapists rated “starts,” “turns,” and “doorways” movement types, which suggest ratings of Ziegler test situations (ie, starts, turns, doorways) in isolation are currently not recommended.
The reliability and validity of the Ziegler test may be improved by developing a more detailed rating scale that clearly outlines when test situations begin and end and accounts for frequency and duration of freezing episodes. The reliability between physical therapists when rating “turns” may also be improved by defining what constitutes a FOG episode during turning, because movements during turning do not show the same patterns as straight-line walking tasks.19 This is supported by emerging evidence that showed excellent reliability between raters assessing FOG using a set of established criteria describing FOG episodes triggered by alternating 360-degree turns.19 Future studies should examine if Ziegler test reliability and validity vary depending on duration of FOG episodes, with recent studies showing poor agreement between raters for very short (<1 second) FOG episodes and better agreement for short (2–5 seconds) and long (>5 seconds) FOG episodes measured via video analyses.9,16 Future studies should also explore the utility of combining the Ziegler score and duration as a FOG measure and investigate the responsiveness of the Ziegler test because its ability to detect changes in FOG over time is unknown.
Limitations
Due to the global Covid-19 pandemic, physical therapists were asked to rate videos remotely. Therapists were instructed to view videos only once to best mimic clinical practice. Although video files were compressed to make them suitable for online streaming and instructions were given to the physical therapists to load the videos prior to viewing, 10% of videos were viewed more than once due to lagging or other technical issues and 3% of test durations were unable to be measured. The different sites used for testing during “on” and “off” phase testing may also have influenced outcomes, with videos captured at home less likely to be optimized for viewing due to lighting or space constraints. To mitigate any possible differences in raters’ ability to view FOG, the camera in both the clinic and home settings was set up similarly with adequate lighting such that the performance of the person with Parkinson disease performing the Ziegler test was clearly visible for rating in all videos. For some videos, it was difficult to determine if there was forward advancement during a FOG episode and consequently difficult to distinguish between scores on the rating scale. The reliability and validity of the Ziegler test may also differ if scored by physical therapists in person. However, differences in scores are likely to be negligible because prior evidence showed good agreement between video and in-person ratings of Parkinson disease impairments.20 The low representation of women with Parkinson disease and FOG in this study may also limit the generalizability of our results in relation to the reliability and validity of the Ziegler test to measure FOG across different genders.
The Ziegler test is a reliable and valid tool to measure FOG when used by physical therapists in real time. Scores from individual test conditions and time taken to complete the test may also be used to measure FOG. Further clarification of Ziegler test instructions and a rating scale that accounts for frequency and duration of freezing episodes may improve its clinical utility and suitability to use as an outcome measure.
Author Contributions and Acknowledgements
Concept/idea/research design: L. Goh, S.S. Paul, C.G. Canning, N.E. Allen
Writing: L. Goh, N.E. Allen
Data collection: L. Goh, S.S. Paul, C.G. Canning, K.A. Ehgoetz Martens, J. Song, S.L. Campoy, N.E. Allen
Data analysis: L. Goh, S.S. Paul, C.G. Canning, K.A. Ehgoetz Martens, J. Song, N.E. Allen
Project management: L. Goh.
Fund procurement: L. Goh, C.G. Canning, K.A. Ehgoetz Martens, J. Song, N.E. Allen
Consultation (including review of manuscript before submitting): S.S. Paul, C.G. Canning, K.A. Ehgoetz Martens, J. Song, S.L. Campoy
Ethical Approval
The authors received ethical approval from The University of Sydney Human Research Ethics Committee (project number 2021/174). All data have been de-identified.
Funding
Author Lina Goh is supported by an Australian Government Research Training Program Scholarship. The University of Sydney Charles Perkins Centre Active Ageing Seeding Grant funded the original study, which generated data used in this study. The funding bodies played no role in the design, conduct, or reporting of this study.
Disclosures
The authors completed the ICMJE Form for Disclosure of Potential Conflicts of Interest and reported no conflicts of interest.
References
Comments