Commentary characteristics

There were 618 discrete first draft comments provided across the three rounds of Task 2 rehearsal essay writing. This works out as an average of approximately 25 comments per essay, indicating the feedback was extensive (Pearson, 2018b; Yang & Badger, 2015). Table 2 shows the frequencies and proportions of the various characteristics of written feedback commentary. The majority encompassed marginal comments targeting issues in the text body (n = 321), although as can be seen, their distribution across the four assessment criteria varied. Comments addressing how learners responded to the task, including the clarity, development, and relevance of the ideas presented constituted 38.6% of all marginal comments and just over a quarter of end comments. As in other test preparation activities, this reflects the role of written feedback in helping test-takers improve their awareness of and cope with the demands of the testing system (Brown, 1998; Hayes & Read, 2008; Mickan & Motteram, 2008; Saif et al., 2021; Yang & Badger, 2015). Language was not ignored (Yang & Badger, 2015), with 31.2% of all marginal comments addressing Lexical Resource, often the appropriacy and naturalness of word forms, word choices, or collocations. Coherence and Cohesion was rarely brought up in marginal commentary, reflecting the relative inaccessibility of such features compared to lexicogrammar and TR (Cotton & Wilson, 2011; Riazi & Knox, 2013).

Table 2 Characteristics of teacher commentary

There were a number of consistent characteristics to the content and delivery of marginal comments. They most frequently featured an advisory pragmatic function (39.7%), often explicitly outlining a strategy the learner could adopt to improve an aspect of their essay (61.1%), usually in regard to Task Response or Lexical Resource. They were typically short, with 67% being below 20 words, though hedging (the most frequent mitigation strategy and one which generally increases the lengths of comments) occurred in 24.3% of marginal comments. The 33% of marginal comments that were 20 words or longer usually affixed criticism or praise to an advisory statement, with the former serving to problematise the issue and the latter to soften the blow (Hyland & Hyland, 2001). A less common function overall, although still notable, were critical comments (20.9%), often co-occurring with implicit feedback, requiring the learner to work out how to address the issue.

In contrast, end comments (n = 297) were more evenly distributed across the assessment criteria, reflecting their contrasting role in Task 2 rehearsal essay feedback. The pragmatic function of describing learner performance vis-à-vis the public band descriptors featured more prominently (22.3%), as did praise (24.2%), coded in comments that reiterated or explained the key messages of marginal comments, often in unmitigated statements of 20 words or more (70%). End comments did not always require revisions (40.1%), perhaps because they consisted of a general remark on performance (6.4%) or constituted praise. Those that did mostly conveyed an advisory sentiment implicitly (36%), as such comments often addressed students’ texts from a global perspective, making it difficult to provide specific strategies. End comments did not forcefully instruct learners to make changes through the use of the imperative, unlike a small number of marginal comments (5.4%).

The propensity of advisory and critical comments reflects the highly evaluative nature of WFC on IELTS Task 2 rehearsal essays (Pearson, 2018b), where the teacher judges the correctness of the work and justifies the marks given (Weaver, 2006), in an effort to improve future test outcomes from a deficit perspective. The teacher, with superior knowledge of both the language and testing system, is legitimised as the ultimate authority on the test (Saif et al., 2021). Such an imbalance is visible in the prevalence of explicit commentary, often appropriations encouraging the writer to shift her/his position by injecting the teacher’s own meaning into the students’ words through reformulations or suggested topic ideas/development (Goldstein, 2004; Tardy, 2019). Clearly, there is a propensity for critical WFC to constitute a threat to students’ self-concept (Hyland & Hyland, 2001), which might explain the presence of extensive mitigation. However, the pressure to achieve goals may help ‘immunise’ some students against the potential harm of critical WFC (Han & Hyland, 2019).

Ratings of student revisions

The outcomes of student revisions in response to actionable content and end form-focused comments, measured as both the extent of the revision and the effect on textual quality are presented in Table 3. It can be seen that notable proportions of marginal (26.1%) and end comments (41.1%) were not acted upon by learners, possibly indicating a lack of engagement (Han & Hyland, 2015). These figures are higher than studies of teacher WFC in tertiary-level process writing environments (Christiansen & Bloch, 2016; Conrad & Goldstein, 1999; Ene & Upton, 2014; Hyland, 2003; Nurmukhamedov & Kim, 2009; Ranalli, 2021; Zhang & Hyland, 2018), where comments across multiple drafts serve as scaffolding to help students develop their texts with an initial focus on content and organisation and later, grammar and mechanics. Two studies uncovered higher rates of no response (Ferris, 1997; Sugita, 2006), although Ferris (1997) acknowledges the frequency of praise reduced students’ agency to revise. Additionally, the participants of this study were generally more disposed towards undertaking minimal textual changes (41.2%) than they were substantive ones (22.8%), mirroring some studies in non-writing for assessment purposes contexts (Nurmukhamedov & Kim, 2009; Sugita, 2006). Non or perfunctory resolutions were notable responses to end comments, of which only 17.8% were met with a substantive revision.

Table 3 Ratings of content and form-focused end comment revisions

Students’ unwillingness or inability to revise can be considered surprising given that they rarely met their band score goals across first drafts and due to the high amount of explicit, advisory WFC that outlined ways forward. However, significant proportions of (especially end) comments constituted praise or described aspects of written performance, implying no revision was necessary (Ferris, 1997; Hyland & Hyland, 2001). Non-simulated writing may not have been considered reflective of authentic written outcomes, resulting in participants not perceiving a valid purpose in response (Zareekbatani, 2015; Zheng et al., 2020). It could also be the case that, as in other writing settings, there was disagreement with the commentary (Goldstein & Kohls, 2002; Pratt, 1999), stemming from a lack of trust in the credibility of the feedback provider (Ranalli, 2021), who was not initially known to the participants. This is not an insignificant issue since purported experts on IELTS preparation abound on social networking groups (Pearson, 2018a), along with much ‘folk knowledge’ passed off as test-taking gospel (Allen, 2016). Alternatively, it was possible the comprehensive WFC posed difficulties for the learners to respond to all messages, while frequent mitigation may have lessened the impetus to revise by diluting the importance in which a textual issue was framed (Hyland & Hyland, 2001).

Rates of successful response to WFC appeared lower than several prior studies (Conrad & Goldstein, 1999; Ferris, 1997; Nurmukhamedov & Kim, 2009; Sugita, 2006), painting an unclear picture of feedback effectiveness in this context. Marginal comments about content were effective at inducing a positive effect on students’ texts 40.5% of the time, although many successful revisions were minimal in scope (21.7%). In terms of raw frequency, end comments resulted in more numerous instances of enhanced textual quality (n = 76), though a greater percentage induced a mixed impact as opposed to definitively improving it. It could be that certain key messages became diluted in the lengthy end comment descriptions, deeply coded using the conventions of language assessment specialists (Weaver, 2006). Alternatively, the comprehensiveness of the information may have proved overwhelming and unmanageable (Evans et al., 2010; Lee, 2019), especially for the weaker learners with ambitious IELTS band score targets.

Greater success was exhibited by the learners in addressing marginal form comments. Target-like revisions occurred at a rate of 51.4%, although many comments directly treated student errors or contained explicit reformulations. 24% of marginal form comments resulted in deletion of the problematic feature, suggesting student avoidance of the issue (Han & Hyland, 2015), while a somewhat concerning 14.2% remained in draft two. However, as found elsewhere (Ferris, 1997; Nurmukhamedov & Kim, 2009; Sugita, 2006), occurrences of content revisions that worsened the text were rare (n = 5), explained by the advisory, appropriating nature of the WFC and that the students appeared reluctant to take risks in response to comments. Likewise, just 10.4% of form revisions lead to non-target-like outcomes.

The influence of comment characteristics on student revisions

Table 4 shows the influence of the five categories of commentary characteristics investigated in the present study, with negative effects on textual quality excluded owing to the infrequency of such occurrences. First, with regard to textual focus, the greatest frequency of substantive revisions was brought about by comments targeting Task Response (34.6%). This is not surprising since such comments typically encouraged learners to improve the clarity, support, or extension of main or supporting ideas, requiring substantial changes. With the notable rate of 21.8% substantive, positive changes, TR constituted the criterion most likely to bring about tangible textual improvements through WFC. However, it was also the case that TR comments were frequently ignored (30.3%) or resulted in minimal revisions that did not definitively improve the text (19.7%). As in other contexts (Christiansen & Bloch, 2016; Ferris, 1997; Sugita, 2006; Uscinski, 2017), students preparing for Task 2 both paid attention to written feedback that helps them make substantive, effective revisions, but also disregarded suggestions, highlighting the salience of addressing individual student factors in conjunction with content and delivery attributes of WFC (Conrad & Goldstein, 1999) in this context.

Table 4 Relationship between content and end form comment characteristics and revision ratings

Substantive changes in a text’s Coherence and Cohesion occurred at the much lower rate of 20.6%, with a 6.6% lower proportion of positive revision outcomes, suggesting learners struggled to address CC issues, mirroring the experiences of teachers (Cotton & Wilson, 2011; Riazi & Knox, 2013). While a significant proportion (55.7%) of comments addressing Lexical Resource (a significant focal of area of WFC), resulted in no change, LR constituted the criteria in which learners tended to perform closest to their targets, meaning many such comments praised the overall resource or specific items used. In contrast, grammar-focused end comments were seldom addressed with substantive or positive revisions, although as such comments tended to be descriptive, they may have helped reinforce or explicate the messages contained in marginal form-focused comments and indirectly treated errors (Ferris, 1997).

In terms of length, it was found comments of 1–5 words offered little utility, particularly in facilitating successful revisions, which occurred only four times. This is because they often featured praise (Ferris, 1997) or were facile (Treglia, 2008; Walker, 2009; Weaver, 2006). Average comments also featured a low take up rate, (47.7%) though did contribute to small-scale improvements in textual quality (20.7%). In comparison, long comments seemed to offer more utility, with 14.5% fewer being ignored and 3.5% more positive outcomes. Importantly, substantive revisions with a positive effect were the most frequent outcome of very long comments, accompanied by low rates of no change (16.4%). This could be because longer comments tended to combine description of problematic textual features with advisory information to help the learners resolve the issue (Conrad & Goldstein, 1999) or because the amount of text dedicated to the issue conveyed its seriousness to the learner. Nevertheless, the high rate of mixed effects (41.1%) indicates learners experienced difficulties acting on detailed WFC, a phenomenon not unique to this context (Christiansen & Bloch, 2016; Conrad & Goldstein, 1999; Ferris, 1995).

Comparable rates of substantive revisions resulted from explicit (32.2%) and implicit WFC (29%). This could be because criticism, a key semantic function underlying implicit feedback (Ene & Upton, 2014), served to highlight something was wrong thereby triggering a substantive revision attempt. Nevertheless, the 7.7% higher frequency of substantive, positive revisions suggests feedback that explains and scaffolds what learners need to do to better meet their goals is more helpful at encouraging revisions than merely criticising the work (Treglia, 2008). Identical rates of marginal, positive responses show the inclusion of specific revision strategies did not always significantly affect the quality of subsequent revisions (Conrad & Goldstein, 1999), perhaps because learners lacked the assessment literacy to translate commentary deeply coded in the language of assessment into actionable strategies (Weaver, 2006). Alternatively, since there was a higher rate of explicit feedback not being acted upon (by 3.4%), learners may have disagreed with the information (Goldstein & Kohls, 2002; Pratt, 1999) as it did not align with their schema of what constituted an effective response or a workable approach in test conditions. Perhaps unsurprisingly, in 82.3% of cases, if WFC did not outline or imply a response to a problematic issue, no revision attempt was made on the highlighted issue.

Several salient patterns emerged in learners’ responses to comments of varying semantic function and mitigation. The functions least able to induce a revision response were, unsurprisingly, praise (74.5% no change), mirroring the findings of Ferris (1997), and reader reflection (71.4%), albeit the latter was a far less frequent comment type. In contrast, criticism constituted a polarising pragmatic function, accounting for both a high proportion of substantive revisions with positive effects (26%), but also the most occurrences of marginal changes with mixed effects (30.7%). This is perhaps because learners lacked understandings of the framing of problematic textual issues (in relation to the band descriptors) (Conrad & Goldstein, 1999; Goldstein, 2004), did not agree the issues were problematic (Goldstein & Kohls, 2002), or perceived a reduced self-concept stemming from repeatedly performing below their target (Estaji & Tajeddin, 2012). Interestingly, unmitigated comments were likely to be ignored (44%) or acted upon perfunctorily (20.7%), suggesting the participants appreciated the sting being taken out of face-threatening feedback (Hyland & Hyland, 2001; Treglia, 2008). Personal attribution exhibited the highest rates of substantive, positive changes, perhaps because test preparation candidates are known to highly value the input of outside experts (Allen, 2016; Mickan & Motteram, 2009), and thus perceived such messages as insider information.

Descriptive comments that characterised learners’ texts did not act as a catalyst for extensive revisions, with 37% not being acted on and 25.9% resulting in marginal, positive effects. A likely explanation is that the absence or implicitness of a revision imperative combined with the generality of such comments made them difficult to act upon (Ferris, 1997). Interestingly, 26.2% of all questions posed led to substantive, positive textual changes, possibly because learners were encouraged to think more deeply about the identified issue and/or consult the assessment criteria/test preparation materials. It may not be the case that merely rephrasing WFC in the interrogative triggers such a response, as the equivalent outcomes of comments hedged using interrogative syntax were significantly lower (12.5%). Comparable rates of positive (41.2%) and mixed revision effects (38%) and the 9.2% lower rate of substantive, positive outcomes for the most common function, advisory, provides further evidence learners struggled to act on WFC requesting changes to their essays (Christiansen & Bloch, 2016; Conrad & Goldstein, 1999; Ferris, 1995), a phenomenon requiring additional exploration.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit


This article is autogenerated using RSS feeds and has not been created or edited by OA JF.

Click here for Source link (