Pulmonary hypertension (PH) is defined as an increase in mean pulmonary arterial pressure, often accompanied by indicators such as dyspnea on exertion, exercise intolerance, and systemic muscle dysfunction. Various protocols exist that can indirectly assess these indicators through the sit-to-stand test (STST).
ObjectiveAssess the psychometric properties of different STST protocols in patients with PH.
MethodsThis study is a systematic review. We searched the PubMed, EMBASE, SciELO, Cochrane Central Register of Controlled Trials (CENTRAL), and Web of Science databases. The risk of bias was assessed using the COSMIN tool and the certainty of evidence using the modified Recommendations, Assessment, Development, and Evaluation (GRADE) classification. Two investigators evaluated independently, and a third evaluator was consulted as needed.
ResultsOut of a total of 7933 articles identified, only 5 articles met the criteria for inclusion in the analysis. Four psychometric properties were assessed across the five protocols used. The 1-STST protocol provided high-quality evidence for both convergent validity and responsiveness. The 30-STST protocol showed moderate-quality evidence for convergent validity and responsiveness, while the 5-STST also demonstrated moderate-quality evidence for responsiveness. The between-groups validity and reliability of the 30-STST protocol were considered to be low and very low, respectively.
ConclusionDespite the limited number of studies, we can infer that the most commonly used protocol is the 1-STST, which has a high degree of convergent validity and responsiveness when compared to other assessment tools.
Pulmonary hypertension (PH) is defined by increased mean pulmonary arterial pressure (mPAP). This health condition considers clinical, pathophysiological, and hemodynamic factors. It is also categorized into five groups, namely pulmonary arterial hypertension (PAH), PH associated with left heart disease, PH associated with lung disease, PH associated with pulmonary artery obstruction, and PH with multifactorial mechanisms.1 It is regarded as a life-threatening condition, and it is typically associated with progressive deterioration of function and increased mortality. Moreover, it affects approximately 1 % of the global population.2
The principal and earliest symptom is exercise-related dyspnea, which is progressive and reflects the inability of the cardiovascular system to increase cardiac output during exercise.1 Other symptoms may be present, including fatigue, presyncope, chest pain, and ankle edema, which may develop from right ventricular failure.1,3 At the muscular level, these individuals tend to suffer from systemic muscle dysfunction due to an increased risk of functional decline as a result of loss of muscle function, decreased trophism, and an increase in type II fibers. Additionally, proteins regulating mitochondrial fusion in skeletal muscle are expressed at lower levels and associated with exercise intolerance compared to healthy individuals.4 As observed through pathophysiological mechanisms in high-intensity exercise such as cardiopulmonary exercise testing (CPET) ,5 more feasible tests such as the submaximal sit-to-stand test (STST) are useful and practical tools which provide indirect and non-invasive evaluations of the lower limb skeletal muscle strength6 and exercise tolerance.7,8 There are various protocols such as the 5-repetitions STST (5-STST), the 30-second STST (30-STST), and the 1-minute STST (1-STST).
The 5-STST protocol is a cost-effective tool,9 reliable for the older population,10 and for assessing lower limb strength and balance control in healthy individuals and adults with diseases.11 Moreover, it is a marker of the low functional performance of patients with chronic obstructive pulmonary disease (COPD).12,13 The 30-STST is considered a safe and feasible method for assessing lower limb function and strength14 like the poor exercise tolerance in patients with COPD.15 The 1-STST is regarded as a reliable, valid, and responsive alternative to measure physical capacity in healthy people and for some pathologies. It is correlated with the 6-minute walk test (6MWT), making it an ideal alternative for use in the office,16,17 and was recently shown of detecting exertional desaturation in patients post COVID-19.18
Understanding the psychometric properties and evaluation methodologies of exercise tolerance assessments in PH is crucial for obtaining valid measures.19,20 This approach ensures the selection of reliable, safe, and reproducible tests for patients, supporting clinical investigation and intervention applicability. Despite its clinical importance in patients with PH, there are few systematic reviews on the STST. This systematic review is crucial for guiding scientific decisions and developing optimal STST protocols for PH and its subtypes. Consequently, we aimed to identify the available evidence on the STST when used for patients with different forms of PH and present the psychometric properties and the number of repetitions achieved with different STST protocols.
MethodsThis systematic review was conducted conformed to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA).21 This review also followed the PICOT mnemonic strategy, with the research question, "What are the psychometric properties of the different sit-to-stand test measures in individuals with pulmonary hypertension?". The risk of bias was assessed using the Consensus-Based Standards for the Selection of Health Measurement Instruments (COSMIN).22,23 The protocol was registered in the International Prospective Register of Systematic Reviews (PROSPERO, no. CRD42021244271), and detailed methods have been previously published.24
Eligibility criteriaRandomized or quasi-randomized controlled clinical trials were included, as well as observational studies published in English. We considered studies conducted in inpatient, outpatient, or primary care settings.
This review included studies of adult participants with a clinical diagnosis of one of the five types of PH.¹ Exclusion criteria were: systematic reviews, in vitro studies, conference abstracts, theses, dissertations, literature reviews, studies carried out with children or mixed populations, and studies in which the STST protocol did not meet the criteria described by Kahraman et al.²⁵ Furthermore, studies were excluded if the data from patients with PH were not analyzed separately or could not be extracted or obtained even after contacting the authors.
Type of interventionIncluded studies were those that assessed exercise tolerance using the STST (30-STST, 1-STST, or 5-STST) following the methodology proposed by Kahraman et al.,25 as to perform as many sit-to-stand cycles as possible from a standard chair without arm support, with arms crossed at the wrists against the chest, starting on the command "go" to full standing and back to the seated position.
Type of outcome measuresPrimary resultsPsychometric properties (validity [criterion validity, construct validity], reliability [internal consistency and measurement error], and responsiveness) and repeatability achieved in STST protocols by participants with PH.
Secondary results- •
Association between STST and patient-reported outcome measures (PROMs);
- •
Symptoms reported during STST protocols using the Borg scale;
- •
Changes in the respiratory or cardiovascular system response to STST protocols before and after the intervention.
A comprehensive search of the PubMed/MEDLINE, EMBASE, SciELO, Cochrane Central Register of Controlled Trials (CENTRAL), and Web of Science databases was conducted in April 2023, with no defined publication time restriction. The search strategy for all the databases was performed with keywords such as pulmonary hypertension OR pulmonary arterial hypertension AND sit-to-stand test OR exercise tolerance (Supplemental online material).
Data collection and analysisTwo researchers (NC-LN) reviewed the selected studies that were entered into the Rayyan software26 and independently analyzed the titles and abstracts and removed irrelevant studies. A third researcher (JS) assessed any discrepancies as necessary. The full texts of the eligible studies were attached to the Rayyan software and analyzed by the same researchers (NC-LN). In case of conflict, the third researcher (JS) was consulted. The reasons for exclusion were recorded, and the screening process was summarized in a PRISMA flowchart (Fig. 1).
Data extraction and managementData on measurement properties were extracted using the COSMIN23 data extraction form, which qualitatively summarizes the results of studies in checklist boxes, based on general observations, data such as intraclass correlation coefficients (ICC), or those required to determine that measurement property. For secondary outcomes, a previously tested word extraction form was used. All extractions were performed by two researchers (NC-JS), and the third evaluator (VR) intervened to resolve discrepancies.
Risk of bias (RoB) assessmentTwo investigators (NC-JS) independently analyzed the RoB using the COSMIN.27,28 Disagreements were resolved by a third investigator (VR). Each consistent result of the individual study was summarized qualitatively and compared against criteria for good measurement properties, interpreted as very good, adequate, doubtful, or inadequate.28,29 The results of each study were classified according to specific criteria and updated for each measurement property as follows: sufficient (+), insufficient (-), or indeterminate (?).2
Evidence certainty assessmentThe Grading of Recommendations, Assessment, Development, and Evaluation (GRADE) rating was employed to assess the certainty of the evidence. The evidence was classified as high, moderate, low, or very low,27 based on four domains: RoB, which evaluates the methodological quality of studies; inconsistency, which verifies unexplained inconsistency of results across studies; imprecision, which analyzes the total sample size of available studies and; indirection, which evaluates evidence from populations other than the population of interest in the review.30 This evaluation was conducted by two independent evaluators (NC-JS), with the involvement of a third evaluator (VR) in case of discrepancies.
Data synthesisWe used Odds Ratio, Relative Risk, or Risk Difference for dichotomous data analysis for continuous variables and to carry out a quantitative synthesis (meta-analysis), the software RevMan V.5.3.528. In the only case of missing data or ongoing clinical trials, the authors were contacted via email. The objective was to conduct a subgroup analysis, by type of PH or by STST protocols. If necessary, we would perform sensitivity analyses to examine methodological quality's effects on the pooled estimate by removing studies classified as having a high RoB.
ResultsA total of 7933 articles were selected. Of these, 1668 were excluded due to duplication, leaving 6265 titles and abstracts. Of these, 6251 were excluded. The remaining 14 articles proceeded on to the full-text search stage, during which seven were excluded. Two were conference abstracts, two by wrong publication, two were clinical trials still in progress, and ultimately, an article could not be accessed in its entirety. The final two papers excluded, were studies not on the population of interest (Fig. 1).
The five studies included25,31-34 were conducted in diverse geographical locations, with a predominance of the 1-STST protocol, high expressiveness of group 1, PAH, and female subjects in the studies. However, the majority of studies included a low number of individuals in the sample. There was one RCT,32 and four observational studies,25,31,33,34 included. The summary of the characteristics of the included studies populations are presented in Table 1.
Population characteristics of included studies.
Population | Disease characteristic | Test | Country | |||
---|---|---|---|---|---|---|
Authors | n | Age | % Feminine gender | Disease | ||
González-Saiz et al32 | 40 | IG = 46 ± 11CC= 45 ± 12 | 60 % | PAH or CTEPH | 5-STST | Spain/USA |
Kahraman et al25 | 38 | 50.3 ± 18.0 | 81 % | PAH or CTEPH | 30- STST | Turkey |
Nakazato et al31 | 20 | 44.3 ± 13.2 | 80 % | IPAH or PAHCTD | 1- STST | Brazil |
Kronberger et al33 | 106 | 66 ± 15 | 57 % | All PH | 1- STST | Austria |
Keen et al34 | 75 | 52 ± 16.8 | 77 % | PAH or CTEPH | 1- STST | UK |
n, numbers of individuals; IG, intervention group; CG, control group; PAH, pulmonary arterial hypertension; CTEPH, chronic thromboembolic pulmonary hypertension; IPAH, idiopathic pulmonary arterial hypertension; PAHCD, PAH associated with connective tissue disease; All PH, all groups pulmonary hypertension; 5- STST, 5-repetition sit-to-stand test; 30- STST, 30-second sit-to-stand test; 1- STST, one minute sit-to-stand test.
Four studies assessed convergent validity (Table 2).25,31,33,34 Kahraman et al.25 demonstrated a moderate correlation with the strength of the knee extensor muscles. With regard to age and the New York Heart Association (NYHA) classification, moderate negative correlations were found. Moreover, all correlations were statistically significant. In the study by Nakazato et al.31 moderate correlation was observed between the frequency reached in the test with the accelerometer for the number of steps and the time achieved, and p < 0.05. Kronberger et al.33 found a high correlation with the Borg dyspnea score, while Keen et al.34 showed a moderate correlation with WHO Functional Class (WHO-FC), both with p < 0.001.
Characterization of the psychometric properties of the studies.
Types of tests | Authors | Tests | Psychometric properties evaluated | Results |
---|---|---|---|---|
STST | González-Saiz et al32 | 5- STST | Responsiveness | Before and after 8 weeksIG: 5- STST (performance time, s): 7.5 ± 1.4 to 6.0 ± 1.1 (p < 0.001)CG: 5- STST (performance tempo, s): 7.0 ± 1.6 to 6.9 ± 1.4 (p > 0.05) |
Kahraman et al25 | 30- STST | Test-rest reliability/ Convergent Validity/ Validity between groups/ Responsiveness | Test-retest reliability30- STST (number): first evaluation 12.23 ± 3.77 / second evaluation 12.07 ± 3.87 / Difference = 0.15 ± 1.19ICC (95 % CI) 0.95 (0.90, 0.97)Convergent Validity 30- STSTAge r=−0.61, p < 0.001; NYHA-FC r= −0.45, p = 0.004; knee extensor muscle strength r = 0.54, p < 0.001; IPAQ-SF r = 0.37, p = 0.02.Validity between groups: Comparison p = 0.00430- STST (number) and NYHA Class II 13.68 (3.34); 30- STST (number) versus NYHA Class III 10.25 (3.49).Responsiveness (comparison with other outcome measurement instruments)6MWT r = 0.66, p < 0.001. | |
Nakazato et al31 | 1- STST | Convergent Validity | 1- STST was 23.8 ± 6.1Accelerometer: steps per day, n. 4280.2 ± 2351.7Activity time (min) 41.6 ± 19.3Convergent Validity 1- STSTAccelerometer number of steps per day r = 0.59, p = 0.006Uptime accelerometer (min) r = 0.58, p = 0.007 | |
Kronberger et al33 | 1- STST | Convergent Validity/ Responsiveness | 1- STST was 17 ± 7Borg:5.0 ± 2.3Convergent Validity 1- STSTWHO-FC r = −0.59, p < 0.001NT-proBNP r = −0.40, p < 0.001mPAP r = −0.28, p < 0.001BDS r = 0.70, p < 0.001.Responsiveness (comparison with other outcome measurement instruments)6MWD r = 0.71, p < 0.001 | |
Keen et al34 | 1- STST | Convergent Validity/ Responsiveness | 1-STST was 20.1Borg: 3.6 ± 1.8Convergent Validity 1- STSTWHO-FC r =−0.50, p < 0.001NT-proBNP r= −0.26, p = 0.02emPHasis10 r= −0.43, p < 0.001Age r= −0.39, p < 0.001Responsiveness (comparison with other outcome measurement instruments)ISWT r = 0.70, p < 0.001 |
5- STST, 5-repetition sit-to-stand test; 30-STST, 30-second sit-to-stand test; 1-STST, one minute sit-to-stand test; IG, intervention group; CG, control group; NYHA-FC, New York Heart Association - functional class; ICC, intraclass correlation coefficient; IPAQ-SF, international physical activity questionnaire; 6MWT, 6-minute walk test; WHO-FC, WHO functional class; NT-proBNP, natriuretic peptide; mPAP, mean pulmonary arterial pressure; BDS, Borg dyspnea score; emPHasis10, quality of life questionnaire in Phasis10; ISWT, Incremental Shuttle Walking Test.
Kahraman et al.25 investigated the validity between known groups, comparing the number of repetitions achieved in the test and the functional class and found in FC II and FC III (Table 2). Reliability was only assessed in the study by Kahraman et al.25 in the test-retest, demonstrating an excellent ICC.
Responsiveness was divided into responsiveness before and after training (assessed only by Gonzalez-Saiz et al.32) and the responsiveness between the STST and other tests like the 6MWT25,33 or incremental shuttle walking test (ISWT)34 (Table 2). Gonzalez-Saiz et al.32 demonstrated a positive change in the intervention group after eight weeks of exercise, with p < 0.001. Kahraman et al.25 and Kronberger et al.33 observed a moderate and high correlation between STST protocols versus 6MWT, respectively, and both were statistically significant. With regard to the ISWT, only the study by Keen et al. exhibited a strong correlation and p < 0.001.
Concerning the repetitions performed during the test, variability was observed due to the diversity of protocols. Kahraman et al.,25 in their 30-STST study, reached the result of the reliability of the first evaluation of 12.23 ± 3.77 repetitions and the second evaluation of 12.07 ± 3.87 repetitions. González-Saiz et al.32 with the 5-STST protocol, it was seen that the group that initially performed the intervention lasted 7.5 ± 1.4 s and post-intervention managed to make it in 6.0 ± 1.1 s (p < 0.001). The studies that employed the 1-STST protocol reported the following results, Nakazato et al.31 achieved 23.8 ± 6.1 repetitions, Kronberger et al.33 reported 17 ± 7 repetitions, and Keen et al.34 recorded 20.1 repetitions.
Table 3 presents the results of the RoB assessment for the studies included in this review. The studies that evaluated convergent validity were rated as very good, as they met all the criteria determined by COSMIN.23 Responsiveness was assessed in the studies by Gonzalez-Saiz et al.,32 Kahraman et al.,25 Kronberger et al.,33 and Keen et al.34 The reliability of the study by Kahraman et al.25 was deemed inadequate due to the evaluator being aware of the previous measurement during the second test. Moreover, the validity between known groups was considered doubtful because the author did not define the characteristics of each subgroup studied. The COSMIN evaluation defines the updated criteria for good measurement properties, which can be seen in Table 3. The majority of the study results were determined to be sufficient, as they align with the proposal of each measurement property by Mokkink et al.27 except for Kronberger et al.33 on convergent validity, and Keen et al.,34 on convergent validity and responsiveness. Both studies lack the hypothesis of their work.
Results of measurement properties risk of bias assessments and updated criteria for good measurement properties according to COSMIN.
5- STST, 5-repetition sit-to-stand test; 30- STST, 30-second sit-to-stand test; 1- STST, one minute sit-to-stand test; ICC, intraclass correlation coefficient; “+”, sufficient; “?”, indeterminate.
The certainty of the evidence for the psychometric properties of each study is presented in Table 4, which was evaluated based on the four modified GRADE criteria.29 The 30-STST and 1-STST studies that assessed the convergent validity are considered of moderate and high quality, respectively. The certainty of the evidence of the first one was downgraded due to imprecision as there is only one study conducted with a low number of included individuals. In terms of evaluating the validity between known groups, the 30-STST was classified as low certainty rating due to imprecision small sample size, and risk of bias (only one study of doubtful quality is available). Still, for the 30-STST, the reliability was classified as very low certainty of evidence due to imprecision (low number of individuals involved) and risk of bias (only one study of inappropriate quality is available). Responsiveness in the 5-STST was classified as moderate certainty of evidence, due to imprecision as there is only one study conducted with a low number of included individuals. As well as the responsiveness when compared to other tests, with the 30-STST, which also presented a moderate certainty of evidence for the same reason. Finally, the 1-STST had high certainty of evidence, as it fulfilled the necessary criteria.
Certainty of evidence by modified GRADE.
Properties | Summarized or grouped results | General evaluation | Certainty of evidence |
---|---|---|---|
Convergent validity30- STST | Very good – 30- STST vs age r= −0.61 p < 0.001; 30- STST vs NYHA r= −0.45 p = 0.004; 30- STST vs knee extensor muscle strength r = 0.54 p < 0.001; 30- STST vs IPAQ-SF r = 0.37 p = 0.02 (Kahraman et al., 2020). | sufficient | Moderate1 duo to imprecision |
1- STST | Very good 1- STST vs Accelerometer number of steps per day r = 0.59 p = 0.006; 1- STST vs Accelerometer uptime (min) r = 0.58 p = 0.007 (Nakazato et al., 2021); 1- STST vs WHO-FC r = −0.59 p < 0.001; 1- STST vs NT-proBNP r = −0.40 p < 0.001; 1- STST vs mPAP r = −0.28 p < 0.001; 1- STST vs BDS r = 0.70 p < 0.001 (Kronberger et al., 2023); 1- STST vs WHO-FC r=−0.50 p < 0.001; 1- STST vs NT-proBNP r= −0.26 p = 0.02; 1- STST vs emPHasis10 r= −0.43 p < 0.001; 1- STST vs age r= −0.39 p < 0.001(Keen et al., 2023). | indeterminate | High |
Validity between groups | |||
30- STST | Doubtful 30- STST (repetitions) and NYHA Class II 13.68 (3.34) versus NYHA Class III 10.25 (3.49); p = 0.004 (Kahraman et al., 2020). | sufficient | Low1,2 duo to imprecision, risk of bias |
Reliability (test-retest) | |||
30- STST | Inappropriate ICC 0.95(0.90–0.97) (Kahraman et al., 2020). | sufficient | Very low1,3 duo to imprecision, risk of bias |
Responsiveness (Before and After) | |||
5- STST | Very good 5- STST (performance time, s): 7.5 ± 1.4 to 6.0 ± 1.1 (p < 0.001) (González-Saiz et al., 2017). | sufficient | Moderate1 duo to imprecision |
Responsiveness (Other outcome measurement instrument) | |||
30- STST | Very good 30- STST vs 6MWT r = 0.66 p < 0.001 (Kahraman et al., 2020). | sufficient | Moderate1 duo to imprecision |
1- STST | Very good 1- STST vs 6MWD r = 0.71, p < 0.001. (Kronberger et al., 2023); 1-STST vs ISWT r = 0.70 p < 0.001 (Keen et al., 2023). | indeterminate | High |
STST, sit-to-stand test; 5- STST, 5-repetition sit-to-stand test; 30- STST, 30-second sit-to-stand test; 1- STST, one minute sit-to-stand test; NYHA, New York Heart Association; IPAQ-SF, international physical activity questionnaire; WHO-FC, WHO functional class; NT-proBNP, natriuretic peptide; emPHasis10, quality of life questionnaire in Phasis10; ICC, intraclass correlation coefficient; 6MWT, 6-minute walk test; ISWT, Incremental Shuttle Walking Test.
GRADE Working group of evidence.
High: Further research is very unlikely to change our confidence in the estimate of effect.
Moderate: Further research is likely to have an important impact on our confidence in the estimate of effect and may change the estimate.
Low: Further research is very likely to have an important impact on our confidence in the estimate of effect and is likely to change the estimate.
Very low: We are very uncertain about the estimate.
Regarding the secondary outcomes, dyspnea was assessed using the modified Borg scale (0–10), with final results as follows: Nakazato et al.31 reported 4.5 ± 1.5, Kronberger et al.33 reported 5.0 ± 2.3, and Keen et al.34 reported 3.6 ± 1.8. Cardiovascular alterations were reported in the study by Kronberger et al. ,33 which demonstrated an increase in systolic BP of 12 ± 20 mmHg, diastolic BP of 2.3 ± 10 mmHg, HR of 21 ± 16 beats/min, and a decrease in SpO2 of 4.6 ± 5.9 %. Similarly, the study by Keen et al.34 also showed an increase in systolic BP of 10.1 ± 10.5 mmHg, diastolic BP of 2.9 ± 7.8 mmHg, HR of 9.4 ± 8.0 beats/min, and a decrease in SpO2 of 3.8 ± 4.0 %. The PROMs were not reported in any included studies. We did not find sufficient results to justify conducting a meta-analysis.
DiscussionThis systematic review aimed to determine the psychometric properties of different STST protocols in patients with PH. The primary findings indicate that the hypothesis test for construct validity, particularly in terms of convergent validity, is confirmed for the 30-STST and especially for the 1-STST. Moreover, the psychometric property of responsiveness is validated for the 5-STST, 30-STST, and 1-STST in patients with PH.
Bowman et al.14 found a moderate correlation (rho = 0.49, p = 0.006) between the IPAQ-SF and the 30-STST protocol in the oncology population, confirming this review and suggesting good convergent validity between populations. Nakazato et al.31 also found a moderate correlation between the 1-STST and accelerometer measurements in daily activity. Thus far, literature has not shown studies relating the STST with accelerometers, only correlations with the 6MWT have been observed. Cho et al.35 demonstrated a moderate and significant correlation between step count and the 6MWT in individuals with sarcoidosis. Consequently, the use of test protocols in various populations remains limited, which restricts comparisons with our findings.
In the present study, moderate negative correlations were observed in Kronberger et al.33 between 1-STST and WHO-FC, NT-proBNP, and weak correlations with mPAP. In Keen et al.,34 negative correlations were noted, with moderate associations found with WHO-FC and emPHasis10, and weaker correlations observed with age and NT-proBNP. These data are frequently cited as prognostic factors for risk classification for PH.¹ Li et al.36 showed that medication use improved patient classification based on WHO-FC; however, the effects on NT-proBNP were not observed. In the systematic review by Fu et al.37 on new drugs for PH, improvements were observed in WHO-FC and mPAP. Regarding physical training, the review by An et al.38 suggests that exercise improves mPAP. However, no study has demonstrated effects linking the STST with the variables identified in the present study.
Validity between groups was explored with a single PH.25 The 30-STST protocol appears sufficiently sensitive to discern differences between NYHA classifications among patients with PH.39 Similarly, Tarrant et al.40 found significant differences using the 1-STST in postoperative and medical readmissions. Larger studies with robust statistical methods, including ROC curve analysis, are necessary for confirmation.
Reliability was assessed in one study25 in this review, indicating excellent reliability but low-quality evidence. Figueiredo et al.41 reported high reliability (ICC 0.93, 95 % CI: 0.86, 0.96) of the STST in hemodialysis patients. Bohannon et al.16 systematic review also showed high reliability across diverse populations: ICC ranges from 0.80–0.98 in hemodialysis.42 older adults,43 and in cystic fibrosis.44 Mong et al.45 found the 5-STST highly reliable (ICC 0.98–0.99) in stroke patients. While suggesting excellent reliability in these groups, more studies are needed to validate reliability in PH patients.
In the present study, responsiveness was evaluated in two ways, the test's responsiveness before and after some type of training and responsiveness when compared to other instruments. The first analysis utilized data from a single article.32 In this study, the authors assessed the responsiveness of the STST following an 8-week intervention involving aerobic, resistance, and inspiratory muscle training in patients with PH. They observed a reduction in test time, consistent with findings in the literature using a similar protocol. For instance, Augustín et al.46 demonstrated responsiveness of the test in patients with stroke, highlighting how the severity and stage of recovery can impact functional outcomes. Zampogna et al.47 demonstrated that both asthma and COPD populations experienced improvements in the 5-STST time following a pulmonary rehabilitation program. While the 5-STST shows promise as a tool for evaluating functional capacity response to intervention programs, further studies are needed to establish STST protocols in clinical practice for patients with PH and to explore additional properties of the test. Three studies were identified that assessed the responsiveness of the STST compared to other measurement instruments.25,33,34 It appears that the 30-STST protocol, and particularly the 1-STST, shows good responsiveness to other measurement instruments. According to Kimberlin and Winterstein39 it is crucial to evaluate the validity of an instrument by comparing it with a gold standard or other existing measures that assess the same construct.
This review compares three protocols measuring repetitions across specific populations. Gephine et al.48 supported this review's reported 24 ± 5 repetitions for the 1-STST in COPD. In the 5-STST, PH patients31 outperformed Zampogna et al.47 findings in asthma and COPD (15.7 [IQR 12.7–17.3] seconds and 14.6 [IQR 12.1–16.6] seconds, respectively). Figueiredo et al.41 found hemodialysis patients achieved 12.6 (11.8–13.4) repetitions in the 30-STST, comparable to PH patients.25 Despite limited data, STST in PH suggests reliable benchmarks similar to other populations.
The secondary outcomes addressed in the included studies were: dyspnea and cardiovascular changes. On dyspnea, all three studies31,33,34 reported final values "mild" measured by the Borg scale of 0–10. Corroborating with Ozalevli et al.,49 in patients with COPD who reported "mild, moderate". Briand et al.,50 in addition to agreeing with the aforementioned studies, also demonstrate that all three subgroups of patients studied had a significant difference in the most severe dyspnea, reported by the Borg scale, at the conclusion of the 1-STST in comparison to that observed during the 6MWT.
Limitations and clinical practice usefulnessA limitation of this systematic review study is the relatively small sample sizes in some of the included articles, and, the methodological quality of the included studies was found to be variable.
This study emphasizes using validated STST protocols in clinical practice for PH patients. Validity confirmation of 30-STST and 1-STST suggests accurate measurement of lower limb strength and function. Also, the responsiveness across the 5-STST, 30-STST, and 1-STST, highlights their utility in detecting changes in physical function and endurance over time. This is crucial for monitoring disease progression, evaluating treatment efficacy, and guiding rehabilitation strategies.
ConclusionBased on our systematic review, the 5-STST shows responsiveness to treatment. Moreover, both the 30-STST and 1-STST protocols exhibit valid convergent validity and excellent responsiveness when compared to other measurement instruments in the context of PH. Despite limited studies with appropriate methodologies, STST protocols are relevant for this population, and further comparisons are needed to enhance the evidence on measurement properties.