# Estimating upper-extremity function from kinematics in stroke patients following goal-oriented computer-based training – Journal of NeuroEngineering and Rehabilitation

Dec 31, 2021

### Subjects

Our retrospective analysis uses data of 191 RGS sessions from 98 individuals with hemiparesis (age in [23, 87], mean 63; days post-stroke in [5, 3045], mean 400; Table 1) who were recruited between 2010 and 2015 to participate in studies conducted in Barcelona and Tarragona, Spain [18,19,20]. Participants met the following inclusion criteria: (1) ischemic strokes (middle cerebral artery territory) or hemorrhagic strokes (intracerebral); (2) mild-to-moderate upper limb hemiparesis (Medical Research Council scale for proximal muscles (>2)) after a first-ever stroke; (3) age between 20 and 90 years old; and (4) the absence of any significant cognitive impairment (Mini-Mental State Evaluation (>22)).

### Protocol

Participants followed a rehabilitation protocol including 3–5 weekly sessions of 30 minutes each for 3–12 weeks using the RGS in Fig. 1. The RGS consists of a PC, a 17 inch LCD touch display, an image-based motion capture device positioned on top of the screen [18]. The virtual tasks logic and graphics are implemented using the Torque 3D and Unity 3D game engines. The joint movements of the user’s head, shoulders and elbows are tracked and mapped onto an avatar through a biomechanical model using a custom-developed vision-based motion capture system. Arm movements are displayed on a screen from a first-person perspective, realising a rehabilitation paradigm that combines goal-oriented embodied action execution and observation.

For the RGS sessions in the main dataset (Table 1) the participants are instructed to intercept virtual spheres by executing horizontal bimanual movements over the surface of a table (‘Spheroids’ protocol, cf. inset in Fig. 1). The task parameters (the frequency of sphere appearance, their speed, their range, and size) are combined in a single parameter (‘difficulty level’) and automatically adjusted during the session in order to maintain the user’s performance between 70 and 80% success rate [17, 21]. The system allows for the storage and extraction of performance parameters as well as hand path trajectories derived from joints’ positions and rotations recorded at about 100 Hz.

During the rehabilitation patients are evaluated using standard clinical scales: Fugl-Meyer Assessment for the upper extremity (FM-UE), Chedoke Arm and Hand Activity Inventory (CAHAI) and Barthel Index (BI). When collecting the 191 samples (Table 1), the following measures are taken to improve data quality:

• The clinical score measurements (FM-UE, CAHAI, BI) are coupled to the RGS session closest in time, with a maximum time separation of 4 days between the measurement and the RGS session;

• The first two RGS sessions of a patient at the start of the rehabilitation trajectory are excluded to ensure that patients are familiar with the RGS environment for all collected samples.

### Outcomes and analysis

To analyse the potential of RGS-derived movement descriptors for capturing both impairment and recovery in standardised clinical scales, we first extract a set of variables that are known to correlate with the severity of hemiparesis [13, 16]. Next, we define a model that combines the information of several variables to estimate the patient’s score on the FM-UE clinical scale. The model includes an error estimate. We use repeated cross-validation to avoid overfitting, allocating 50% of the samples to the training set and 50% to the validation set. We study the model evaluating its convergent validity, test-retest reliability, external validity, and generalisation to other tasks and clinical scales.

#### Analysis of convergent validity: estimating impairment

To explore the convergent validity of RGS-derived kinematic descriptors in comparison to standardised clinical assessments, we compute Pearson correlations between all the variables (the RGS-derived descriptors and the baseline characteristics) and the clinical scores. By comparing these to a randomised outcome distribution we identify a threshold of (r simeq 0.081) for all the Pearson correlations variable-scores (Fig. 12 in Appendix). Subsequently, we adopt a nonlinear parametric model to combine information from several variables for estimating the patients’ impairment level (i.e., FM-UE scale). We use only the patient’s baseline characteristics and RGS-derived movement descriptors (extracted from a single session logfile) as predictors. In particular, we rule out the two variables sessions completed so far and time since stroke and functions of them. In this way, the estimate is intended to assess the clinical status of the patient at a given moment without knowledge of the rehabilitation history. To avoid overfitting, we perform repeated cross-validation with 50% of samples for training and 50% for validation, obtaining the optimal active set of variables possible for our dataset. In the cross-validation procedure, we define the accuracy of the model as the percentage of samples that are correctly estimated above or below the median score (e.g. 47 for FM-UE). Additionally, we report the values of the Leave-one-out cross-validation (LOOCV) for the optimal active set. To quantify the performance of the model, we compare true and estimated FM-UE scores reporting the average error, the Pearson correlation, and the coefficient of determination. Finally, we report the performance of the estimation of FM-UE obtained by a linear model with the same active set of variables.

#### Analysis of test–retest reliability

To evaluate the test-retest reliability, we consider two unseen datasets each composed of 921 RGS sessions, for a total of 1842 unseen RGS sessions. Each session in the first dataset ‘test’ is associated with a session in the second dataset ‘retest’ and obtained from the same patient within less than 48 h. The small time frame makes it plausible that the clinical state of the patient is unchanged between the two test–retest sessions, and so they can be used to assess the reliability of the regression models. These data were collected in the same trials as the main dataset, but they correspond to rehabilitation sessions for which we do not have an associated measurement of a clinical scale (so they cannot be used for training the model). To quantify the reliability in the estimation of the FM-UE scores, we evaluate the intraclass correlation coefficient (ICC) between the estimations obtained by the model for test and retest sessions.

#### Analysis of external validity: estimating the change in impairment

Starting from the original dataset (Spheroids, Table 1), we design a new dataset (recovery dataset, Table 2) composed of 54 samples where each sample represents a couple of sessions of the same patient for which the time lapsed between the two is larger than 16 days. Next, we compute the correlations between the change in the movement descriptors and clinical improvements. We utilise this dataset to analyse the external validity of the previous model (obtained for estimating impairment) in detecting changes in the clinical status of the same patient. Therefore, we adopt the same set of variables used for the estimate of impairment. We evaluate the performance of the model by comparing the true and predicted change of FM-UE score. Next, we identify 38 out of the 54 samples from the recovery dataset (Table 2) for which the associated (Delta {text {FM-UE}}) values exceed an MDC of 4 points. We then determine the sensitivity of the model evaluating the percentage of (Delta {text {FM-UE}}) that are correctly predicted above the MDC.

#### Analysis of task generalisation

The Spheroids protocol requires predominantly movements in the lateral (left/right) direction. To explore the potential of the model to generalise to other tasks that involve bimanual 2D planar reaching movements, we identify a second dataset of 37 samples from a previous study in which 19 subacute stroke patients with hemiparesis trained with a different RGS-based activity which is a variation on the well-known arcade game ‘Whac-A-Mole’ [18, 22]. The observed FM-UE scores in this dataset are in the range [5, 60], with a mean of 37 and an SD of 14. The gameplay of Whac-A-Mole requires movements on the full 2d plane so, besides the movement descriptors used for Spheroids, we define independent descriptors for the movements in the front/back direction. Subsequently, we adopt the nonlinear parametric model to combine information from different variables to estimate the impairment level as measured by the FM-UE scale. To quantify the performance of the model, we compare true and estimated FM-UE scores reporting the average error, the Pearson correlation, and the coefficient of determination.

#### Analysis of generalisation to other clinical scales

To explore the potential of the model to generalise for the estimation of impairment captured by other standardised clinical approaches, we considered the estimation of the CAHAI and BI scales that are available in the Spheroids dataset (Table 1). We, therefore, follow the same steps as above to analyse the convergent validity of the model in estimating CAHAI and BI scores. It is useful to anticipate here that the results for CAHAI are close to those for FM-UE. This similarity is expected as the two scales have a high relative correlations and comparable correlations to most of the variables, cf. Table 1. To be quantitative, we can consider the case in which the FM-UE score (S_i^{text {FM}}) of the ‘Spheroids’ dataset is estimated by simply rescaling the corresponding CAHAI value (S_i^{text {CAHAI}}) by 66/91; this leads to the standard error

begin{aligned} frac{1}{191}sqrt{sum _{i=1}^{191} left[ left( S_i^{text {FM}} – (66/91) S_i^{text {CAHAI}}right) ^2right] } simeq 10.1 end{aligned}

(1)

and an (R^2) value of (1 – sigma _{text {FM,CAHAI}}^2 / sigma ^2_{text {FM}} simeq 0.62). Estimating BI scores from kinematic descriptors is instead generally harder. This can be explained by the lower correlation of BI scores with most of the variables (Table 1). If we estimate the FM-UE score by a simple rescaling of the BI value by 66/100 we get a standard error of

begin{aligned} frac{1}{176}sqrt{sum _{i=1}^{176} [(S_i^{text {FM}} – (66/100) S_i^{text {BI}})^2 ]} simeq 15.5 end{aligned}

(2)

and a (R^2) value of (1 – sigma _{text {FM,BI}}^2 / sigma ^2_{text {FM}} simeq 0.12). So we see again that BI carries different information than FM-UE (or CAHAI). The above values give us a natural benchmark to assess the performance of the model estimation of the clinical scales.

We conducted these analyses using in-house software (SaddlePoint Signature v2.9.3).

### Identification of variables

Given the technical constraints of the setup (e.g., sampling rate) and the nature of the training protocol (i.e., execution of horizontal reaching movements), we extracted all task-relevant kinematic measures of arm use that we could identify in the literature [13] . In total we obtained 31 variables, 12 first-order variables and 19 second-order variables obtained as functions of the first-order ones.

We identify two groups of first-order variables: (1) demographic and physiological data at recruitment, variables 1–5 in Table 1, and (2) kinematic descriptors extracted for the more affected limb during training sessions, variables 6–12 in Table 1. For the evaluation of all kinematic descriptors, the first and last two minutes of each training session are discarded to avoid the interference of behaviours or events related to the start and the ending of the training session (i.e., revision of instructions, postural adjustments, exposure to the final score screen, etc.).

Second-order variables include chronicity (i.e. acute, sub-acute, and chronic categories) and the difference between the less and the more affected upper-limb in each of the quantitative first-order variables, as well as their logarithmic transformations. The descriptive statistics of second-order variables are given in Table 5 in the Appendix.

Among the above-mentioned variables, most are well-known [13]. The work area is computed as the area of the complex hull of the hand movements using standard methods (Jarvis’ Algorithm [23]). The distance covered refers to the total length of the hand paths. The performance success rate is defined as the ratio of spheres intercepted over the total. However, we introduce also the new descriptors ‘Smoothness’ and ‘Total-goal directed movement’ (TGDM). Specifically, to extract information on UE motor function we introduce an original kinematic descriptor, (J(sigma )), to assess the patient’s movements at a specific temporal resolution, (sigma). This metric allows us to isolate goal-oriented movements from the hand trajectory in a certain direction, assumed to be stored over time as a function f(t). For the main dataset, we consider the left/right direction, as it is the principal axes in the movement dynamics of the ‘Spheroids’ protocol. We assume measurements are taken at discrete time points (t_i = iDelta) for (i = 0,1,ldots ,T – 1), with (Delta) being the timestep (for the ‘Spheroids’ dataset we have (Delta simeq 0.01) s). We define the total hand displacement during goal-oriented movements (J(sigma )) as the difference between the actual movements and a smoothed version of the discrete movements. The smoothed hand path (f_sigma (t)) is obtained using a Gaussian smoothing process

begin{aligned} f_sigma (t_i)&= frac{sum _{j=0}^{T-1} f(t_j) exp left[ -frac{ (t_i – t_j)^2}{2sigma ^2}right] }{sum _{j=0}^{T-1} exp left[ -frac{ (t_i – t_j)^2}{2sigma ^2}right] } end{aligned}

(3)

where the parameter (sigma) defines how smooth the new trajectory will be, see an example in Fig. 2. Therefore (J(sigma )) is obtained as

begin{aligned} J(sigma ) = sqrt{frac{sum _{i=0}^{T-1} left[ f_sigma (t_i) – f(t_i) right] ^2}{T} } , . end{aligned}

(4)

Following this analysis method, we derive the two new variables corresponding to the value of (J(sigma )) at the two peaks in the (sigma)-dependent Pearson correlation with the clinical scales, Fig. 3: ‘Smoothness’ in correspondence to the high-frequency peak, and ’TGDM’ in correspondence to the low-frequency peak. The location of the two peaks is weakly dependent on the clinical scale considered, yet it appears to be related to the data structure: the high-frequency peak is linked to the time resolution of the data ((Delta simeq 0.01) s), while the low-frequency peak is related to the typical timescale of the Spheroids protocol (i.e. a set of spheres is launched every (sim 10) s).

### Models description

To combine variables for estimating clinical scores of impairment and recovery, we introduce a model that allows for the presence of noise on both the variables ({varvec{Z}}) and the score S, and we hence name it a double-noise parametric model. Its generative functional form is

begin{aligned} S({varvec{Z}} | varvec{theta })&= frac{B-A}{2} tanh left( varvec{beta } cdot {varvec{Z}} + beta _0 + sigma _1 u right) nonumber \&quad + frac{B+A}{2} + sigma _2 v end{aligned}

(5)

where (varvec{theta } = {varvec{beta }, beta _0, A, B, sigma _1, sigma _2 }) are the (p+5) model hyperparameters to be inferred: (varvec{beta }) (association parameters of p active variables), (beta _0) (parametric offset), A and B (range offsets), (sigma _1) and (sigma _2) (noise strengths). The sources of noise u (the covariate noise) and v (the score noise) are both assumed to be standard normally distributed. Since (tanh (-x) = – tanh (x)), we remove the resulting parameter sign ambiguity by enforcing (B ge A). Note that the saturation of the sigmoidal function captures the limits of the clinical scores so that the average over the noise of the estimated score S is constrained in the interval [AB].

The probability of a particular score S, given the variables and the hyperparameters, is given by

begin{aligned}&p(S|{varvec{Z}}, varvec{theta })\&quad = frac{1}{sigma _2sqrt{ 2pi }} int Dv ~ mathrm{e}^{-frac{1}{2 sigma _2^2} [S- atanh ( varvec{beta } cdot {varvec{Z}} + beta _0 +sigma _1 v) -b ] ^2 } nonumber end{aligned}

(6)

with the short-hands (D v = (2 pi )^{-frac{1}{2}} mathrm{e}^{-frac{1}{2} v^2 } dv), (a=(B-A)/2), (b=(B+A)/2). We use the following (improper) prior distribution over the parameters (varvec{theta }):

begin{aligned} p(varvec{theta }) = Z^{-1} mathrm{e}^{-frac{1}{2} d varvec{beta }^2} p(sigma _1) , p(sigma _2). end{aligned}

(7)

We take (p(sigma _1) sim mathrm{e}^{-q/sigma 1}) and similar for (sigma _2) with q being a very small number, typically of the order of the accuracy of the numerical work, e.g. (q sim 10^{-10}). These priors guarantee that the log-likelihood function is bounded from below.

We adopt Maximum A Posteriori (MAP) inference: given the dataset ({mathcal {D}}={({varvec{Z}}_1,S_1), ldots , ({varvec{Z}}_n,S_n)}), the optimal parameters (varvec{theta }) correspond to the maximum of the posterior probability or, alternatively, to the minimum of the regularised log-likelihood function:

begin{aligned} Omega (varvec{theta })&= -sum _{i=1}^n log int Dz ~ mathrm{e}^{-frac{1}{2sigma _2^2} [S_i – atanh ( varvec{beta } cdot {varvec{Z}}_i + beta _0 +sigma _1 z) -b ] ^2 } nonumber \&quad +frac{1}{2} d varvec{beta }^2 +nlog (sigma _2) + frac{q}{sigma _2} + frac{q}{sigma _1} , . end{aligned}

(8)

The errors on the inferred parameters (varvec{theta }) are estimated from the curvature of the regularised log-likelihood function at the minimum.

Two simpler models derived from Eq. 8 can be considered corresponding to having either score noise only or covariate noise only:

• Score noise model, (sigma _1 = 0). Taking the limit (sigma _1rightarrow 0) in Eq. 8 gives

begin{aligned} Omega _{text {sco}}(varvec{theta })&=frac{1}{2sigma _2^2} sum _{i = 1}^n left[ S_i – b – a tanh (varvec{beta } cdot {varvec{Z}}_i + beta _0) right] ^2 nonumber \&quad + n log (sigma _2 ) + frac{1}{2} d varvec{beta }^2 +frac{q}{sigma _2} , end{aligned}

(9)

• Covariate noise model, (sigma _2 = 0) Taking the limit (sigma _2rightarrow 0) in Eq. 8 gives

begin{aligned} Omega _{text {cov}}(varvec{theta })&=frac{1}{2sigma _1^2}sum _{i=1}^n Big [ tanh ^{-1}Big (frac{S_i – b}{a}Big ) – varvec{beta } cdot {varvec{Z}}_i – beta _0 Big ] ^2 nonumber \&quad + sum _{i=1}^n log big [a^2 – (S_i – b)^2big ] +nlog Big (frac{sigma _1}{a}Big ) nonumber \&quad +frac{1}{2} d varvec{beta }^2 +frac{q}{sigma _1} end{aligned}

(10)

provided (|S_i – b|<a) for all i, otherwise (Omega _{text {cov}}(varvec{theta }) = infty). This implies that for each b the minimisation over a is to be carried out strictly over the open interval (a>mathrm{max}_i |S_i – b|).