Concerns over the effectiveness of placement tests
The central purpose of placement tests is to assess learners’ language proficiency so as to place homogeneous language-ability learners in appropriate language courses or classes (Hughes, 2003; Li, 2015; Weaver, 2016). Groupings can be made based on a student’s relative standing along the scoring continuum, referred to as a norm-referenced test (NRT), or depending on a student’s mastery or nonmastery of certain learning objectives, called a criterion-referenced test (CRT) (Green, 2012; Long et al., 2018).
Placement tests can also be categorized into the types of language course-based and proficiency-oriented tests (Wall et al., 1994). The first type “has a pre achievement orientation such as the English Placement Test at the University of Illinois at Urbana Champaign that reverberates the academic demands of the courses offered at the university” (Weaver, 2016, p. 6), whereas the second type has a general proficiency orientation, such as the General English Proficiency Test (GEPT), TOEFL, TOEIC, and IELTS, with no strong association with the content of language course.
Since the primary goal of a placement test is to correctly place a student into an appropriate class (Brown, 2004), it is of particular importance to examine its effectiveness in terms of the accuracy and validity of placement decisions made based on the test scores. Inaccurate placement tends to exert negative influence on teaching and learning, namely, harmful washback (also known as backwash), which is an aspect of impact of test use (Bachman & Palmer, 1996) and is connected with construct validity (Messick, 1996).
Differences in the test content and the objectives and academic demands of the course have been identified as a crucial factor in misplacement (McMillan & Joyce, 2011). One of the critical criteria for a placement test is, thus, to include test tasks which “enable valid inferences regarding mastery of knowledge, abilities or skills taught on the course” (Green & Weir, 2004, p. 469) so as to counter the two threats to construct validity known as “construct under-representation” and “construct-irrelevant variance” (Messick, 1996, p. 244). In this regard, to what extent the test task characteristics, including text and task features, align with the content of instruction emerges as an important indicator of the effectiveness of placement decisions (Hille & Cho, 2020).
To ensure that placement tests accord with the instructional content and objectives of language programs, some schools attempt to develop placement tests to correspond to curricular. However, some researchers (e.g., Kung & Wu, 2010) cast doubt on the reliability of these internally developed placement tests and raised concerns about the score-based inferences being made based on the test scores, which in turn imperil the accuracy of placement decisions. In Su’s (2010) survey with 452 college students, only 35.5% of the responses to using in-house placement tests to group students were positive.
Another challenge lies in students’ attitudes towards taking post-entry placement tests. Students’ intention to fail placement tests to obtain good grades or an easy pass invokes concerns over test fairness (Su, 2016). Furthermore, developing a high-quality placement test is rather time-consuming and resource-intensive. Without sufficient support and relentless research effort, it is not easy for a local language program to develop a credible placement test with high reliability and validity (Kung & Wu, 2010).
Given the insurmountable issues involved in self-designed placement tests and a plethora of time and human resources required for test development, there has unsurprisingly been an increased use of commercial standardized proficiency tests for placement, grounded in a belief that most commercially available standardized tests have been carefully developed and thus help to make correct placement decisions. A recent survey study (Ling et al., 2014) revealed that using commercially available tests for placement purposes have gained popularity.
Evidence was presented in the study of Wang et al. (2008) to favor using TOEFL iBT for placing ESL students. Papageorgiou and Cho (2014) also found a strong relation of secondary school students’ TOEFL Junior Standard scores to teacher judgments of placement levels and a moderate prediction accuracy of placement based on the test scores, ranging from 72% (for school B) to 79% (for school A).
Yet, not all proficiency tests can provide specific diagnostic information (Green & Weir, 2004). Standardized proficiency tests are usually designed to assess skills, knowledge, and abilities within a certain target language domain (Fulcher & Davidson, 2009), which may be different from students’ learning experience and environment (Weaver, 2016) or may fail to address specific needs of a particular language program (Kokhan, 2012). Such a mismatch may pose a threat to the validity of the interpretation of test scores and placement decisions afterwards.
Kokhan (2013) investigated the appropriateness of using pre-arrival standardized test scores for ESL placement and argued against this method as there was a high probability that students might be misplaced. In a similar vein, Fox (2009) found large variability among students placed in the same level of the English for Academic purposes (EAP) program using commercially available proficiency test scores and raised concern over misplacement problems, which led to a negative impact on teaching and prompted teachers to rely on additional information about their students so as to meet their needs.
Conflicting results in previous studies have attracted increasing research attention. Hille and Cho (2020) recently examined placement accuracy by comparing ESL students’ test scores on two commercially available tests and one locally developed writing test with their actual placement levels determined by their teachers. The results revealed that each test by itself performed similarly in predicting student placement and the combined use of all three tests produced the highest accuracy, yet neither individual tests nor any combinations of these tests could significantly predict students’ course grades and advancement to the next level.
Hille and Cho’s (2020) study contributes to the understanding of the predictive value of commercial standardized tests and in-house placement tests in informing placement decisions and student subsequent performance in class. However, as most previous studies, the ESL context of their study where taking a standardized English proficiency test is usually a pre-entry requirement for matriculated international students may make it difficult to generalize the findings to the EFL context where students are generally required to take nationwide entrance exams developed based on the national curriculum standard.
In many non-English speaking Asian countries, English is a compulsory subject included in college entrance exams. Some schools, hence, tend to rely on students’ scores on the English admission test for placement into levels of English courses for economical and convenient reasons, a ubiquitous phenomenon in universities and colleges in Taiwan.
In Taiwanese higher education, ability grouping for English instructions has been encouraged to implement by the Ministry of Education since 2001 (Chien et al., 2002). The means of and criteria for placement yet vary across tertiary institutions (Su, 2010). Students may be placed into different levels of English class based on their performance on the English college entrance exams (Tsai, 2008), English proficiency tests (Liu, 2009), or language course-based placement tests (Tsai, 2008). Some schools may regroup students pursuant to their achievements during semesters (Yu, 1994). These placement practices are implemented and favored for different reasons. Given the popularity of using college entrance examinations for placement in EFL settings, the following discussion will center on relevant issues.
Previous studies on the use of English college entrance examinations for EFL placement
In contrast to ESL settings where most international students are required to take standardized proficiency tests before entry to prove their English language ability, EFL learners in many non-English speaking countries encounter the challenge of taking a nationwide English admission test as a prerequisite for studying in college. The credibility and accessibility of college entrance exams, as well as the gaining popularity of ability grouping instructions, have made these admission tests become a prevalent alternative tool for placement in some Asian countries such as Japan (LeTendre et al., 2003) and Taiwan (Yang & Lee, 2014).
The practice of English ability grouping has been commonly implemented in many universities and colleges in Taiwan, with the use of students’ scores of college entrance exams for making placement decisions (Tsao, 2003). Yet, questions remain as to how well these college admission tests can function as placement tests. One particular area of concern is the accuracy of the score-based classification decisions and the predictive validity of these entrance exams in placement testing contexts. However, these issues have, regrettably, been insufficiently researched.
Feng and Chang (2010) compared 1511 Taiwanese non-English major freshmen’s test scores on the TVE (Technological & Vocational Education) Joint College Entrance Examination and on the Global English Test (GET) and found that the TVE English scores were reliable and useful to be used as a basis for grouping students and claimed that additional placement tests would not be necessary. Nevertheless, the feasibility of college entrance exams as placement tests was still not fully discussed.
Given the concern over the appropriateness of the English GSAT which does not include a listening component for placement into both English reading and listening courses, Yu (2009) compared 2347 college students’ English GSAT scores with their scores on self-developed English reading and listening tests (following the test format and difficulty level of the intermediate-level GEPT reading and listening tests). The English GSAT appeared to be highly correlated with the internally developed reading and listening tests, indicating that the English GSAT had a high level of criterion-related validity and worked well to stream students into different reading and listening classes. However, the content alignment between these tests and courses and the use of cut scores (Hille & Cho, 2020) were not thoroughly investigated.
Another related question emerges as to whether the English GSAT can still function well in differentiating between students for the second semester. Yu and her colleagues (Yu, 2009) continued, in the same study, to administer another form of reading and listening proficiency test to 2118 students who had completed their first-semester English course. The results revealed that the English GSAT had no significant correlation with the second proficiency test and neither did it have strong predictive power of the proficiency test scores, suggesting that the discrepancy between students’ proficiency had increased and thus warranted another administration of proficiency test for placement. Yet, plausible factors contributing to the non-significant correlation between these tests were not explored.
In Yang and Lee’s (2014) recent study, 805 Taiwanese college freshmen who took the required year-long English reading and writing course were placed into three levels based on their college entrance exam scores. They were given the Oxford Online Placement Test in the beginning of the first semester to assess their reading ability. Their test performance consistently varied across the three levels and moderately correlated with their college entrance exam scores. Yang and Lee contended that it seemed feasible to use college entrance exam scores for the reading and writing course placement. Slightly different from Yu (2009) though, they considered content alignment and treated the use of English GSAT for placement into English listening and speaking courses with caution.
Recently, Su (2016) also found a similar qualm expressed by EFL teacher participants in her survey study and argued that “students who score high on reading and writing tests may not necessarily perform well in speaking and listening skills” (p. 66). So a majority of respondents in Tsai’s (2008) survey study favored using both college entrance exam scores and internally developed placement test scores to place them so that their current language proficiency level could be more accurately assessed, resonating with some researchers’ (Hille & Cho, 2020; Kokhan, 2012) arguments for multiple measures.
Given that English college entrance exams aim to select prospective students for universities, not to place students into EFL classes for a specific local program, specific skills and dimensions of language competence assessed in these admission tests may not correspond to the teaching objectives and language skills demanded for a particular English course. The feasibility of using a college admission test for EFL placement is thus called into doubt and yet is insufficiently researched.
A complete picture of placement accuracy in terms of the relations of college admission tests to commercial or/and in-house tests as well as to student academic success has not yet been obtained. In particular, not much research to date has examined the relationships with concerns over content alignment and cut-off scores across levels of English classes or compared the relative predictive power of tests for student subsequent performance in EFL classes.
The current research
As demonstrated in the preceding review, the appropriate use of tests for making fair and correct placement decisions is one of the decisive factors contributing to the success of ability grouping instruction. Notwithstanding the ubiquity of using high-stakes English college entrance exams as placement tools, research attempts into the feasibility and validity of these admission tests used for placement have thus far been remarkably scarce, particularly in EFL settings.
The purpose of the study, therefore, is two-fold: (a) to investigate whether a college admission test can serve as a fair and valid placement tool and its relations with other English proficiency tests across level classes and (b) to examine its predictive power, in comparison with other proficiency tests, on students’ performance in English classes. The specific research questions raised for this study are as follows:
To what degree can the English GSAT correctly place students into appropriate EFL classes? Specifically, what is the placement agreement between the English GSAT and GEPT?
How are the scores on the English GSAT and GEPT inter-correlated?
To what extent can the English GSAT, in comparison with the GEPT, predict students’ subsequent performance in EFL classes?
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.