In the United States, the National Center for Education Statistics (NCES, 2004) reported that “The number and percentage of language minority youth and young adults— that is, individuals who speak a language other than English at home—increased steadily in the United States between 1979 and 1999” (p. 1). NCES added, Of those individuals ages 5–24 in 1979, 6 million spoke a language other than English at home. By 1999, that number had more than doubled, to 14 million. Accordingly, of all 5- to 24-year-olds in the United States, the percentage who were language minorities increased from 9 percent in 1979 to 17 percent in 1999. p. 1) The number of ESL students in U. S. public schools has almost tripled over the last decade (Goldenberg, 2006). In 2004 Crawford observed that one-fourth of the school-age students in the United States were from homes where a language other than English was spoken. The school-age population (K–12) will reach about 40% ESL in about 20 years (Center for Research on Education, Diversity, and Excellence, 2002). Between 1990 and 2000, the number of Spanish speakers increased from about 20 to 31 million (U. S.
Census Bureau, 2001). The Census Bureau report also showed a significant increase in the number of speakers from other linguistic groups, particularly Chinese and Russian. Individuals at all ages enter school to learn the English skills they need to learn, gain employment and participate in society. Planning for their instruction is a significant issue for teachers at all levels and assessment becomes central. In this chapter we first define and differentiate terms such as ESL and ELL and describe the populations they represent.
The use of assessment measures to place students into appropriate instructional groups is described and the distinction between interpersonal and academic language is reviewed. The use of assessment in the classroom and as a gate-keeping tool is addressed in addition to the appropriateness of the use of published measures to assess ESL students. The first issue addressed is terminology. Defining ELL Over the years students who speak a language other than English have been titled English as a Second Language (ESL) learners.
However, English in some cases is not the second language (L2), but may be the third (L3), the 4th (L4), etc. , language, and, as a result, members of this population have different linguistic resources to draw on. The term “English Language Learner” (ELL) has been adopted by educators, primarily in the United States, to describe better the notion that English may not be the L2. However, it is not a particularly good term because students who speak English as a First Language (L1) are also English language learners (Gunderson, 2008).
The term “Teaching English to Speakers of Other Languages” (TESOL) is used outside of the United States. Students who learn English in environments where it is not the language of the community are referred to as English as a Foreign Language (EFL) students. The pedagogy related to EFL is different from ESL (ELL) because students are not immersed in English in the community and the major task of the teacher is to try to provide them English models (Gunderson, 2008, 2009). An added difficulty with the term “ESL” or “ELL” is that it does not adequately characterize the diversity of human beings it represents.
Those who use the term “ELL” do so to describe those K–12 students who come from homes in which the language used for daily communications is not English and who must learn English to succeed in schools where the medium of instruction is English. The ELL (ESL) Population A serious problem with the ELL (ESL) conceptualization is that it does not adequately describe the underlying complexities of differences in age, motivation, literacy background, and first and second language achievement (Gunderson, 2008, 2009).
Those classified as ELL or ESL vary in age from pre-school to senior adults. Many speak no English at all, while others vary in oral English proficiency. Many have never attended school, while others have earned high academic credentials in the language of instruction in their home countries. They are from diverse cultural backgrounds that vary in the way they perceive the importance of teaching and learning. Many are immigrants to an English-speaking country, while many ELL learners are born in an English-speaking country, but speak a different language at home (Gunderson, 2008, 2009).
Indeed, in the Vancouver, Canada, school district 60% of the kindergarten students are ESL and 60% of this number are born in Canada (Gunderson, 2007, 2009). Many immigrant ESL students come from impoverished refugee backgrounds, others have high levels of education and socioeconomic status. Thus, ESLs or ELLs do not adequately represent the underlying complexity of the human beings in the category. Assessment Issues in ELL
Instruction in mainstream classes, those typically enrolling students of different abilities but of the same relative age in the same classrooms, is based broadly on the notion that the acquisition of English is developmental and occurs over time as human beings grow into maturity. It is also thought that there is a relationship between language development and “grade level. ” Grade 1 students differ from Grade 7 students in systematic ways. Their teachers design instruction that is appropriate for their grade levels.
ESL (ELL) students represent a more complex problem because their English and their cultural and learning backgrounds vary in many different ways, even in individuals who are the same chronological age (Gunderson, 2009). In addition, Cummins (1979a, 1979b, 1981, 1983, 2000) and Cummins and Swain (1986) argued there are two basic kinds of English a learner has to learn; “basic interpersonal communicative skill” [BICS] and “cognitive academic language proficiency” [CALP], the language of instruction and academic texts. BICS appears to take about 2 to 3 years to develop and CALP about 5 to 7. “Hello, how are you? and “What is your name” represent BICS, while “Identify a current controversial world political issue and develop and defend your position” is an example of CALP. Teachers are faced with the task of determining what learning activities and materials are appropriate for instruction and measurement of learning, while institutions such as universities and some governments are interested in determining whether or not an individual’s English ability is advanced enough for them to either enter a post-secondary program or to have the skills necessary to be integrated into a society and, therefore, be eligible to immigrate.
Thus, in some instances, assessment serves to guide learning by informing teachers of students’ needs while in others it serves as a gatekeeper by excluding those who do not meet its standards. Instructional Levels—Determining Appropriate Instructional Strategies Language teachers have for some time opted to assess their students to ascertain their “level” of English language proficiency. The difficulty with the levels approach is that they do not really exist (Gunderson, 2009). A popular levels approach was developed in 1983 by the American Council for the Teaching of Foreign Languages (ACTFL).
The assessment is a one-on-one assessment focusing primarily on oral language. Three levels of beginner, intermediate, and advanced are distinguished (see, ACTFL, 1983). A learner can be identified as a low beginner or a high intermediate, etc. The behaviors that determine inclusion in a particular group are usually described in an assessment matrix. The assessor asks a series of questions to elicit knowledge of vocabulary, syntax, and pragmatics. The following is an example of a matrix developed by Gunderson (2009) showing oral language “levels” and their attendant features. * 0-Level English 1.
Cannot answer even yes/no questions 2. Is unable to identify and name any object 3. Understands no English 4. Often appears withdrawn and afraid * Beginner 1. Responds to simple questions with mostly yes/no or one-word responses 2. Speaks in 1–2 word phrases 3. Attempts no extended conversations 4. Seldom, if ever, initiates conversations * Intermediate 1. Responds easily to simple questions 2. Produces simple sentences 3. Has difficulty elaborating when asked 4. Uses syntax/vocabulary adequate for personal, simple situations 5. Occasionally initiates conversations * Advanced 1. Speaks with ease 2. Initiates conversations 3.
May make phonological or grammatical errors, which can then become fossilized 4. Makes errors in more syntactically complex utterances 5. Freely and easily switches codes More elaborate approaches involve the assessment of English listening, speaking, reading and writing skills, e. g. , the Canadian Language Benchmarks (CCLB, 2007). The notion of levels is an important one for teachers because they are thought to predict a student’s probability of succeeding within a particular teaching and learning environment. A beginner is different from an intermediate in various ways, and the instruction they are involved in is also different.
Teachers often refer to ESL students as Level 1 or Level 5, depending upon their performance on an assessment measure. The notion of levels varies widely from jurisdiction to jurisdiction. In some cases there are 3, 4, 5, 8, or 10 levels, which are determined most often by locally developed informal assessment measures (Gunderson & Murphy Odo, 2010). Good assessment is essential to the design of appropriate instructional programs. The difficulty for classroom teachers is that there are few, if any, appropriate measures for them to use. Classroom Assessment
Black and William (1998) reviewed more than 250 studies and found that there was a relationship between good classroom assessment and student performance. Most classroom-based assessment has been developed by teachers (Frisby, 2001; Wiggins, 1998). Unfortunately, most teachers report they are unprepared to assess and teach ESL students (Fradd & Lee, 2001). According to Pierce (2002), the majority of teachers employ assessments they remember they were involved in when they were in school: multiple-choice, cloze-like measures, matching, and true/false tests.
This seems to have been the pattern for 50 years (Bertrand, 1994). Unfortunately, it seems, “... many teachers are unprepared for the special needs and complexities of fairly and appropriately assessing ELLs” (Ehlers-Zavala, Daniel, & Sun-Irminger, 2006, p. 24). Gunderson and Murphy Odo (2010) have recently reviewed the measures used by teachers in 12 local school districts to assess ESL students. The number of different measures and approaches in use was surprising. The Idea Proficiency Test (IPT) (see Ballard, Dalton, & Tighe, 2001a, 2001b) was the measure most often used for primary level ESL students.
Other assessments mentioned were the Brigance, (1983) the Bilingual Syntax Measure (Burt, Dulay, & Hernandez, 1976), the Woodcock Reading Mastery Test (Woodcock, various dates), the Woodcock-Munoz (Woodcock-Munoz-Sandoval, 1993), the Pre-IPT, the Comprehensive English Language Test (CELT; Harris & Palmer, 1986), informal reading inventories, the Waddington Diagnostic Reading Inventory (Waddington, 2000), the Alberta Diagnostic Reading Inventory, the SLEP, the Gap (McLeod & McLeod, 1990), PM Benchmarks (a system for placing students in leveled books), the RAD (Reading Achievement District—a local assessment measure), the Peabody Picture Vocabulary Test (PPVT; Dunn & Dunn, 1997), and a variety of locally developed listening, speaking, reading, and writing assessments. A serious difficulty is that most of these measures were not designed to provide ESL instructional levels so different heuristics in different districts were developed to translate them into levels.
The designation “beginner,” for instance, varies significantly across districts as a result of the measures involved and the number of levels districts chose to identify. Two school districts reported the development and norming of tests for elementary and secondary students comprised of leveled passages taken from academic textbooks that were transformed into maze passages (see Guthrie, Seifert, Burnham, & Caplan, 1974). Scores from these measures were used to compute ESL levels; four in one case and five in the other. Interestingly, different metrics were used to compute instructional levels. So, for instance, a CELT score was used to determine ESL levels based on local intuition and experience.
Most often the locally developed assessments involved one-on-one interviews in which students respond to tasks that require recognition of colors, body parts, school items, and the ability to answer simple questions (see, for example, Gunderson, 2009). There are also standardized assessments used by personnel at post-secondary institutions to make decisions concerning admissions to their programs. Predicting Academic Success The best known standardized English assessment measure is the Test of English as a Foreign Language (TOEFL) published by Educational Testing Service (ETS). The publisher notes: In fact, more institutions accept TOEFL test cores than any other test scores in the world — more than 7,000 colleges, universities and licensing agencies in more than 130 countries, to be exact. (ETS, 2009a) There are different forms of the TOEFL. The classic paper-and-pencil form had standardized scores with 500 being the mean and 50 being the standard deviation. There are newer versions including a computer- and an Internet-based version that have different scoring criteria (see score comparison tables (ETS, 2009b)). The online version is based on a “communicative competence” model that requires learners to view clips of science lessons, for example, take notes, and respond to questions.
TOEFL scores are used by post-secondary institutions to screen students for admission to their programs. The criteria for admission to programs varies from institution to institution and among departments in institutions (see, for instance, University of British Columbia, 2009). There is evidence that TOEFL scores are not highly predictive of success in university (Al-Musawi & Al-Ansari, 1999), however, although they continue to be used to do so. ETS also produces the Test of English for International Communication (TOEIC) and the Secondary Level English Proficiency (SLEP), both standardized assessment measures. The primary users of the SLEP are secondary teachers.
The SLEP “measures the ability to understand spoken English,” and “the ability to understand written English” focusing on grammar, vocabulary, and reading comprehension (ETS, 2009c). The International English Language Testing System (IELTS) is a test of English language proficiency developed by the University of Cambridge Local Examinations Syndicate (2009). There are two versions: individuals who want to gain admission to a university in an English-speaking country take the academic version, while the other version is appropriate for trade schools and other purposes. Scores range from 1 to 9 with 1 being zero-level English, while 9 indicates native-like ability. Different universities require different IELTS scores to be eligible for admission.
Both ETS and Cambridge have international centers around the world where students can take these tests. ELL assessment issues and standardized testing are procedures relevant to large-scale achievement testing in the United States. Large Scale or High-Stakes Testing According to Abedi, Hofstetter, and Lord (2004), “Historically, English language learners in the United States were excluded from participation in large-scale student assessment programs; there were concerns about the confounding influences of language proficiency and academic achievement” (p. 1). However, the United States has seen a focus on large-scale assessments due to the accountability requirements of the No Child Left Behind Act of 2001 (PL 107-110).
No Child Left Behind permits assessing ELLs in their first language for up to 3 years, but few states do. In 2005 a group of school districts sued the state of California to force it to allow Spanish-speaking students to take state-mandated tests in Spanish. Plaintiffs in Coachella Valley Unified School District v. California argued that the state “violated its duty to provide valid and reliable academic testing” (King, 2007). On July 30, 2009, “The First District Court of Appeal in San Francisco rejected arguments by bilingual-education groups and nine school districts that English-only exams violate a federal law’s requirement that limited-English-speaking students ‘shall be assessed in a valid and reliable manner’” (Egelko, 2009).
A lawyer for the school districts and advocacy groups stated, The court dodges the essential issue in the lawsuit, which is: What is the testing supposed to measure? If you don’t have to evaluate the testing, California gets a free pass on testing kids (who) don’t speak English, using tests that they have literally no evidence of their validity. (Egelko, 2009) The ruling was that “The law does not authorize a court to act as “the official second-guesser” of the reliability of a state’s testing methods. ” The difficulty is that English measures are neither reliable nor valid when ESL students are involved. In some cases, accommodations are made for them.
The procedures of providing ELL students accommodations during assessment sessions varies across jurisdictions, but includes such activities as lengthening the time allowed to take a test, allowing ELLs to be tested in separate rooms, allowing students to use bilingual dictionaries, the use of two versions of the test at the same time written in English and students’ first languages, providing oral translations for students, and composing responses in first languages. In 1998–1999, 39 states reported using test accommodations (Rivera, Stansfield, Scialdone, & Sharkey, 2000). There is considerable controversy about providing accommodations, however.
At the time of the writing of this chapter, accommodating students through the provision of L1 assessments has been judged not to be required. ELLs, Assessment, and Technology Advances in technology have made it possible for assessments to be administered as computer- or Internet-based measures. These developments have already taken place with measures such as the TOEFL (see above). An increasing use of technology to administer standardized and non-standardized assessments has raised interest in issues relating to mode-effects (e. g. , computer displays versus print form) and familiarity with computers, which have significant implications for ELLs.
There is evidence that performance in paper-based and computer-based modes of assessment may vary due to ethnicity or gender (Gallagher, Bridgeman, & Cahalan, 2002). In addition, familiarity with computers is known to influence performance in writing (Horkay, Bennett, Allen, Kaplan, & Yan, 2006) and mathematics (Bennett et al. , 2008) high-stakes tests. These issues need to be taken into consideration with ELLs particularly immigrant and refugee students. A related problem has to do with access. Indeed, access to computer and/or to the Internet is widely varied and, therefore, creates systematic differences in access. These are all areas that need further research. The State of the Art of ELL Assessment Research
As noted above, the category ESL or ELL is deceptive in that it represents millions of human beings who vary in age, first-language development, English achievement (both interpersonal and academic), educational backgrounds, immigration status, motivation, socioeconomic background, cultural views of teaching and learning, professional backgrounds, and social and academic aspirations. It is not, therefore, possible to review the breadth and depth of available research in this chapter. There are, however, some overall generalizations that can be made. Generally, the assessment practices and approaches designed for and used with native English speakers have been adopted and used with ELL students. This phenomenon is especially apparent in jurisdictions such as the United States where high-stakes assessments have become so important.
There are serious validity and reliability concerns associated with this practice. It is not clear that the notion of accommodation, one borrowed from special education, helps in either case. Leung and Lewkowicz (2008) argue that this “common educational treatment irrespective of differences in language backgrounds” (p. 305) is emblematic of the view that both treatment and assessment should be inclusive. It does not account, among other features, for cultural differences that can cause difficulties for ESL students (Fox, 2003; Fox & Cheng, 2007; Norton & Stein, 1998). Overall, English proficiency is a significant variable in ELL assessment.
In addition to the BICS/CALP distinction mentioned above, Bailey (2005) proposes that there is a language of tests that is a different “register” or “discourse domain. ” The use of such language creates a problem of “face validity. ” Is the test actually testing what it is designed to test or is it a test of the language of tests? English as a Foreign Language (EFL) students around the world are assessed using many of the same measurements that are used to assess ELL students. EFL students are enrolled in programs in non-English contexts such as Japan where the language of the community is not English. They do not have ready access to native models of English that ELL students usually do. This is very much like the way students learn Latin in secondary school.
It appears that EFL assessments are generally used to measure oral language ability such as the ACTFL mentioned previously. Our review of the assessment procedures and methods in use in K–12 schools in 12 school districts raised several issues that related to ESL learners’ assessment that were not found in studies such as Bertrand (1994), so we present them here. First, we found that there was a need for a measure that would discriminate students with language pathologies and/or learning disabilities from those who only needed English instruction. District members also expressed the need for a reliable measure to sort out secondary students’ content knowledge and their linguistic knowledge.
Lastly, they contended that assessment should be developed to isolate ESL students’ specific areas of weakness so that teachers could more effectively use them to guide instruction. Summary and Conclusions The use ELL or ESL is unfortunate because it masks the underlying complexity of the human beings included in the category. ELL is inaccurate as a term because native English-speaking adults continue to be English language learners well into old age. Perceptions and pedagogical prescriptions are the most troubling aspects of the use of these terms. In article after article the ESL or ELL is used as though they represent a homogenous group of human beings.
Pedagogical recommendations are made on the notion that they are a single group with the same skills and abilities. Of course, this is far from the truth. Our experience is that teachers use the term to represent all students who speak English as an additional language. In addition, they appear to perceive ESL students as human beings who have trouble learning to read (English). And this too, is far from the truth for some students, but not for others. ESL (ELL) is a term that should either be qualified when used or discarded as a general term. The assessment of ELL/ESL/EFL learners is a significant foundational process for teachers to determine the appropriate teaching and learning programs for their students from kindergarten to the mature adult level.
ELL assessment traditionally includes measures of listening, speaking, reading, and writing. There are three basic kinds of assessment instruments. The first is purely instructional in that it is designed to indicate the level at which students should be placed for instruction. The second type of measure is designed to provide an estimate of proficiency related to norm groups and involves scores such as percentiles and NCEs. The third is designed to provide predictive information concerning how well a student will succeed academically. Unfortunately, it appears that most measures are based on native English models. Another difficulty is that students’ English proficiency has a profound effect on their ability to succeed on a test.
It is often difficult for a student to succeed on a test when the language of the test is difficult or unknown to them. Some have noted that the language of tests is also unique. Recently, assessment measures have been computerized and some have been put on the Internet. This raises serious questions of access, especially for students from countries where access is difficult or non-existent. For example, we have been told that the cost of taking an online test in a country like Zimbabwe is prohibitive. Educators from many jurisdictions have borrowed the concept of accommodation from special education to make the assessment procedures fair to ELLs who differ in various ways from native English speakers.
There is disagreement concerning the validity of test results as a result of accommodations since they are not often included in the norming procedures of the instruments. We have heard some opine that accommodation is not itself fair, and that the results of standardized assessment provide information about how well students will do in an English-speaking instructional setting. It has been recommended that assessment measures be constructed that are written in different first languages. Some have argued that the number of first languages in schools would make this an expensive and impractical approach. In July 2009 the use of English-only assessment measures was upheld in a federal appeals court in California.
It is clear from a review of existing assessment practices that school-based personnel use a wide variety of instruments and procedures. It is also clear that there is the belief that it is important to identify a student’s “English level” for instructional purposes, but there is little agreement on how many levels should be identified. The precise process for determining a level is somewhat fuzzy, but it involves the interpretation of a variety of scores from a variety of tests. The research base concerning ELL assessment is not substantial. It focuses on measures originally designed for native English speakers. They do not do well generally on such measures. Indeed, they do not do well in school and a great number drops out, particularly from lower socioeconomic groups.
The state of the art of assessment and instruction involving ELLs is extremely dire. The issues of ELL assessment needs urgent attention since ELLs are the most rapidly growing group in our schools. References ? Abedi, J. , Hofstetter, C. G. , & Lord, C. (2004). Assessment accommodations for English language learners: Implications for policy-based empirical research. Review of Educational Research, 74, 1-28. ? Al-Musawi, N. M. &. Al-Ansari, S. H. (1999). Test of English as a foreign language and first certificate of English tests as predictors of academic success for undergraduate students at the University of Bahrain. System, 27(3), 389-399. American Council for the Teaching of Foreign Languages (ACTFL). (1983). ACTFL proficiency guidelines. Hastings-on-Hudson, NY: ACTFL Materials Center. ? Bailey, A. L. (2005). Language analysis of standardized tests: Considerations in the assessment of English language learners. In Abedi, J. , Bailey, A. , Castellon-Wellington, M. , Leon, S. , & Mirocha, J. (Eds. ), The validity of administering large-scale content assessments to English language learners: An investigation from three perspectives (pp. 79-100). Los Angeles: Center for Research on Evaluation/National Center for Research on Evaluation, Standards, and Student Testing (CRESSR). Ballard, W. , Dalton, E. , & Tighe, P. (2001a). IPT I oral grades K-6 examiner’s manual. Brea, CA: Ballard & Tighe. ? Ballard, W. , Dalton, E. , & Tighe, P. (2001b). IPT I oral grades K-6 technical manual. Brea, CA: Ballard & Tighe. ? Bennett, R. E. , Braswell, J. , Oranje, A. , Sandene, B. , Kaplan, B. , & Yan, F. (2008). Does it matter if I take my mathematics test on computer? A second empirical study of mode effects in NAEP. The Journal of Technology, Learning and Assessment, 6(9), 1-40. ? Bertrand, J. E. (1994). Student assessment and evaluation. In Harp, B. (Ed. ), Assessment and evaluation for student centered learning (pp. 7-45). Norwood, MA: Christopher-Gordon. ? Black, O. , & William, D. (1998). Inside the black box: Raising standards through classroom assessment. Phi Delta Kappan, 80(2), 141-148. ? Burt, M. K. , Dulay, H. C. , & Hernandez, E. (1976). Bilingual syntax measure. New York: Harcourt Brace Javonovich. ? Brigance, A. H. (1983). Brigance Comprehensive Inventory of Basic Skills II (CIBS II). North Billerica, MA: Curriculum Associates. ? Cambridge University Press. (2009). IELTS catalogue. Retrieved July 14, 2010, from http://www. cambridgeesol. org/. ? Centre for Canadian language benchmarks (CCLB). (2007). Canadian language benchmarks.
Retrieved August 10, 2009, from http://www. language. ca/display_page. asp? page_id=206. ? Center for Research on Education Diversity and Excellence. (2002). A national study of school effectiveness for language minority students’ long-term academic achievement final report. Retrieved August 10, 2009, from http://www. crede. ucsc. edu/research/llaa/1. 1_final. html. ? Cummins, J. (1979a). Cognitive/academic language proficiency, linguistic interdependence, the optimum age question and some other matters. Working Papers on Bilingualism, 19, 175-205. ? Cummins, J. (1979b). Linguistic interdependence and the educational development of bilingual children.
Review of Educational Research, 49(2), 222-251. ? Cummins, J. (1981). Age on arrival and immigrant second language learning in Canada: A reassessment. Applied Linguistics, 2(2), 132-149. ? Cummins, J. (1983). Language proficiency and academic achievement. In Oller, J. W. (Ed. ), Issues in language testing research (pp. 108-129). Rowley, MA: Newbury House. ? Cummins, J. (2000). Language, power and pedagogy. Toronto, ON: Multilingual Matters. ? Cummins, J. , & Swain, M. (1986). Linguistic interdependence: A Central principle of bilingual education. In Cummins, J. & Swain, M. (Eds. ), Bilingualism in education (pp. 80-95). New York: Longman. ? Crawford, J. (2004).
Educating English learners: Language diversity in the classroom (5th ed. ). Los Angeles: Bilingual Educational Services. ? Dunn, L. M. , & Dunn, D. M. (1997). Peabody picture vocabulary test. San Antonio, TX: Pearson. ? Educational Testing Service (ETS). (2009a). TOEFL® Internet-based Test (iBT). Retrieved August 10, 2009, from http://www. ets. org/portal/site/ets/menuitem. 1488512ecfd5b8849a77b13bc3921509/? vgnextoid=f138af5e44df4010VgnVCM10000022f95190RCRD&vgnextchannel=b5f5197a484f4010VgnVCM10000022f95190RCRD. ? Educational Testing Service (ETS). (2009b). TOEFL® Internet-based Test (iBT). Retrieved August 10, 2009, from http://www. ets. org/Media/Tests/TOEFL/pdf/TOEFL_iBT_Score_Comparison_Tabl