A multi-centre cross-sectional design was used. The Regional Ethics Review board in Uppsala and in Stockholm approved all procedures (approval number 2014/1489-31/4 and 2015/339, respectively).
Participants and procedure
To obtain 99% confidence that the item calibration (item difficulty measure) is within ± ½ logit of its stable value, a minimum sample size of 243 is recommended . To ensure the stability of item difficulty between participant groups (in other words, to limit item bias), it is recommended to have at least 100 participants per group . Since we planned such analyses in groups based on sex (two groups), age (four groups) and diagnosis (seven groups), we required at least 700 participants. A cross-sectional convenience sample was chosen because no control over the recruitment process was possible. Patients at 20 psychiatric outpatient units in four regions in Central Sweden (Dalarna, Uppsala, Örebro, and Stockholm) were included. Data collection was conducted between December 2014 and December 2017. The inclusion criteria were the ability to read and understand Swedish. During a regular visit, the attending clinician provided written and oral information about the study and collected demographic and clinical information. In total, 837 patients agreed to participate in the study. All participants signed an informed consent form and completed the 36-item WHODAS 2.0 questionnaire.
In line with the recommendations in the WHODAS 2.0 manual, data with a maximum of two missing responses per subject, but no more than one missing response in any domain, were accepted for inclusion in the analyses. This led to 57 participants being omitted, and 780 remained in the final analyses. Each participant’s main diagnosis was reported by the clinician, or if the main diagnosis was missing or ambiguously reported, it was inferred from the type of clinic from which the participants were recruited. In 22 cases this was not possible, and these cases were thus without diagnosis. The mean age (standard deviation, SD) was 39.5 (15.7) years, and 65.6% of the participants were women. The distribution of participants with respect to sex, age group and diagnosis is reported in Table 1.
The WHODAS 2.0 is a generic standardized questionnaire available in 12-item, 12 + 24-item, and 36-item versions. For the 12 + 24 item version, the 12-item version is used to screen for problematic areas of functioning and, based on the responses to the 12 items, respondents may be given up to 24 additional questions from the 36-item version . The WHODAS 2.0 measures difficulty in activity performance and participation through six domains: D1, Understanding and communicating; D2, Getting around; D3, Self-care; D4, Getting along with people; D5, Life activities; and D6, Participation in society. D5 (Life activities) is divided into two areas: D5a = Domestic responsibilities, and D5b = Work and school. In the 36-item version, the items that comprise the domains are distributed as follows: Cognition (D1.1–D1.6; six items), Mobility (D2.1–D2.5; five items), Self-care (D3.1–D3.4; four items), Getting along (D4.1–D4.5; five items), Life activities (D5.1–D5.4 [D5a]; D5.5–D5.8 [D5b]; both four items), and Participation (D.6.1–D6.8; eight items). The items are scored on a common five-point Likert scale ranging from 0 = no difficulty to 4 = extreme difficulty or cannot do. Thus, a higher score indicates a higher level of disability. The full version of the original WHODAS 2.0 can be found elsewhere .
The WHODAS 2.0 can be completed through self-report, interviewer administration, or proxy. For this study, the Swedish 36-item self-report version was used .
Since each of the WHODAS domains can be used separately from the others or combined into a total summary score, we decided to run the analyses both for each domain separately and for all the domains together. Furthermore, in the WHODAS 2.0 complex scoring method there are two different rating scale structures (the collapsed three categories and the original five categories). This could be an indication that the rating scale structure has some problems. Hence, even though all items in WHODAS 2.0 share the same rating categories, as in other studies, we used the Rasch partial credit model to analyse each item separately [25, 26]. By using Rasch analysis, the data are evaluated against Rasch assumptions, such as unidimensionality (the assumption that all items reflect one single dimension, the latent variable, which is disability in our study). The recommended values reflect the hypothesis we test our data against. By investigating the psychometric properties of the instrument, we accumulate evidence for the validity of the WHODAS 2.0. More information about Rasch analysis can be found elsewhere .
With the original rating category order of WHODAS 2.0, a higher score indicates a higher level of disability. This is because more difficult items have a high measure (difficulty level in logits) whereas abler persons achieve a low measure. Since the output from the Rasch analysis is reported on the same scale for both items and persons, we changed the category order so that persons with greater ability received a higher measure. Therefore, before the analyses were performed, the order of the rating scale categories was reversed as follows: 0 = extreme/cannot, 1 = severe, 2 = moderate, 3 = mild, 4 = no difficulty.
Evidence for the validity of the WHODAS 2.0 was investigated based on six aspects:
(I) Item fit: The data were considered to usefully fit the Rasch model if at least 95% of the items (i.e. 34 of 36 items) had an infit mean square within the range 0.6–1.5 [28, 29]. Infit is more sensitive to the response pattern for items that are targeted on the person and vice versa ; therefore, it reflects whether the item hierarchy is similar for all responders. Outfit is more sensitive to the outlying responses, in other words, the performance of persons at a distance from the item’s location .
(II) Unidimensionality: The Rasch assumption is that items reflect only one main dimension. The principal component analysis (PCA) of residuals was used to investigate data against this assumption, that is, whether the unexplained part of the data (residuals) is random noise or demonstrates another meaningful dimension [31, 32]. Unidimensionality is supported when the variance explained by the main dimension is equal to or above 60% of the total variance  and the eigenvalue of the unexplained variance of the first contrast is less than 2 logits [31, 32]. Another indicator of unidimensionality is point-biserial correlation; a positive point-biserial correlation indicates that items contribute positively to the total raw score [34, 35]. A disattenuated correlation (correlation corrected for measurement error) indicates whether the subsets of items are correlated with each other under the same domain or measurement tool, which confirms unidimensionality . A disattenuated correlation of approximately 1 indicates that the item subsets measure the same dimension (the same latent variable) ; the cut-off point for the disattenuated correlation was > 0.7 . Another assumption was item local independency, meaning that items are independent from each other. That is, if one item is deleted from the instrument, this will not affect the other items . Item independency was evaluated by measuring the correlation of residuals for two item pairs. Item local independency was assumed if the correlation coefficient was < 0.70 .
(III) Reliability and separation of persons and items: These were calculated based on person and item measures (in logits), respectively [33, 34]. Cronbach’s alpha was calculated based on raw scores to investigate the internal consistency; an alpha value > 0.80 was considered acceptable . However, for instruments used in clinical evaluation, the recommended value is > 0.90 . Item and person separation are additional reliability indices. Item separation indicates a difficulty hierarchy indicating how many strata of items can be differentiated by the respondents; low item separation indicates that the sample size is not large enough to confirm the item difficulty hierarchy. Low person separation with an appropriate sample size may indicate that the instrument is not sensitive enough to distinguish between persons based on their ability . A separation value above 3 is recommended as a minimum .
(IV) Targeting between item difficulty and participant ability: This is established by measuring the distance between item and person means, between ceiling and floor effects and the effective operational range . The effective operational range encompasses participants who have a more than 50% chance of being rated above the bottom category of the least difficult item and below the top category of the most difficult item . This range is reported as a proportion of the participants’ abilities that were covered by the instrument (all items), and in this study, a range that covered 90% of the participants was considered to be highly satisfactory .
(V) Rating scale functioning: The guidelines from Linacre state the following minimum requirements: each rating scale category should include at least 10 observations; the outfit mean square (MnSq) should be below 2.0; average measures and step difficulty for each category should increase monotonically (in other words, a more difficult category should have a higher logit value); and categories should be ordered as intended, with an acceptable distance between adjacent categories (recommended distance 1.4 to 5 logits) .
(VI) Differential item functioning (DIF): This investigates the stability of item difficulty in the total dataset between participant groups (item bias) based on sex and age. DIF analysis is recommended where there are at least 100 participants per group ; therefore, in this study two diagnostic groups (“affective disorders” and “Attention Deficit Hyperactivity Disorder (ADHD) and autism spectrum disorders”) were included in a DIF analysis for diagnosis. Four age groups were defined and used for the DIF analyses (see Table 1). Due to the low number of older participants, the 65 + age group had fewer than 100 participants. To identify any statistically significant DIF between groups, the following two criteria were applied: 1) a difference between item measurements (DIF size) between groups of > 0.5 logits, which is large enough to have substantial consequences; and 2) a statistical significance level (p-value) < 0.05 [24, 47]. The analyses were performed using WINSTEPS 3.90 .
To explore the linear relationship between methods of calculating overall scores, Pearson’s correlation analyses were performed among three datasets with the two scoring models. These models represented the 0–100 possible range and were calculated based on the observed data as follows: (i) Missing data were imputed, and each person’s raw scores were re-calculated to an overall score on a 0–100 scale according to the IRT scoring model (WHODAS-complex model); (ii) each person’s raw scores were also summed and divided by the total available score to create an overall score on a 0–100% scale according to the simple scoring model  (WHODAS-simple model); and (iii) Each person’s ability measures from the Rasch analysis (in logits) were converted to a 0–100 scale in WINSTEPS (Rasch 0–100 scale). For this calculation, no imputation for missing data was performed because Rasch analysis allows for missing data. For the first two calculations, the method for imputation indicated in the WHODAS 2.0 manual was used; this specifies that, in cases where one item in a domain is missing, the mean score across all items within that domain is assigned to the missing item.
The correlation analyses were reported with the 95% confidence interval (CI) and performed using SPSS v.25 (IBM Corp, Armonk, NY).
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.