Science Benchmarking Report TIMSS 1999–Eighth Grade




CHAPTER 2: Performance at International Benchmarks

The TIMSS 1999 international benchmarks delineate performance of the top 10 percent, top quarter, top half, and lower quarter of students in the entities participating in the study. To help interpret the achievement results, Chapter 2 describes eighth-grade science achievement at each of these benchmarks together with examples of the types of items typically answered correctly by students performing at the benchmark.


To provide an idea of the science understandings and skills displayed by students performing at different levels on the TIMSS science achievement scale, TIMSS described performance at four international benchmarks. The TIMSS 1999 international benchmarks delineate performance of the top 10 percent, top quarter, top half, and lower quarter of students in the countries participating in the TIMSS 1999 study. (The benchmarks were set at the 90th, 75th, 50th, and 25th percentiles, respectively.)

As states and school districts spend time and energy on improving students' science achievement, it is important that educators, curriculum developers, and policy makers understand what students know and can do in science, and what areas, concepts, and topics need more focus and effort. To help interpret the range of achievement results for the TIMSS 1999 Benchmarking participants presented in Chapter 1, this chapter describes eighth-grade science achievement at each of the TIMSS 1999 international benchmarks, explaining the types of science understandings and skills typically displayed by students performing at the benchmarks. The benchmark descriptions are presented together with examples of the types of science test questions typically answered correctly by students reaching the benchmark. Appendix D contains the descriptions of the understandings and skills assessed by each item in the TIMSS 1999 assessment at each benchmark.(1)

For each of the example test questions, the percentages of correct responses are provided for selected countries as well as for the jurisdictions participating in the TIMSS 1999 Benchmarking project. The countries and Benchmarking jurisdictions are presented in descending order, with those performing highest shown first The countries included for purposes of comparison are the United States as well as a dozen European and Asian countries of interest. These include several high-performing European countries (Belgium (Flemish), the Czech Republic, the Netherlands, and the Russian Federation), countries that are major economic trading partners of the United States (Canada, England, and Italy), and the top-scoring Asian countries of Chinese Taipei, Hong Kong, Japan, Korea, and Singapore.

Presented previously in Chapter 1, Exhibit 1.4 shows the percentages of students in each participating entity reaching each international benchmark – Top 10%, Upper Quarter, Median, and Lower Quarter. If an entity had high average achievement in science and a large percentage of its students at or above the upper benchmarks, this indicates that the students are concentrated among the highest-achieving students internationally. For example, top-performing Singapore had nearly one-third (32 percent) of its students reaching the Top 10% Benchmark and more than half (56 percent) reaching the Upper Quarter Benchmark – the point on the scale that typically only 25 percent of the students would be expected to reach if achievement were distributed equally from country to country. Four-fifths of the Singaporean students (80 percent) reached the Median Benchmark. Performance in the United States was a little better than might be expected if achievement were distributed the same from country to country: 15 percent of the students reached the Top 10% Benchmark, 34 percent reached the Upper Quarter Benchmark, and 62 percent reached the Median Benchmark.

The analysis of performance at these benchmarks in science suggests that six primary factors appeared to differentiate performance at the four levels:

The depth and breadth of content area knowledge

The level of understanding and use of technical vocabulary

The context of the problem (progressing from practical to more abstract)

The level of scientific investigation skills

The complexity of diagrams, graphs, tables, and textual information

The completeness of written responses.

For example, there is evidence that students performing at the lower end of the scale could recognize basic facts from the earth, life, and physical sciences presented in non-technical language and could interpret and use information presented in simple diagrams. In contrast, students performing at the higher end of the scale demonstrated a grasp of more complex and abstract science concepts; applied knowledge to solve problems; interpreted and used information in diagrams, tables and graphs; and could provide written explanations to communicate their scientific knowledge.

How Were the Benchmark Descriptions Developed?

To develop descriptions of achievement at the TIMSS 1999 international benchmarks, the International Study Center used the scale anchoring method. Scale anchoring is a way of describing students' performance at different points on the TIMSS 1999 achievement scale in terms of the types of items they answered correctly. It involves an empirical component in which items that discriminate between successive points on the scale are identified, and a judgmental component in which subject-matter experts examine the content of the items and generalize to students' knowledge and understandings.

For the scale anchoring analysis, the results of students from all the TIMSS 1999 countries were pooled, so that the benchmark descriptions refer to all students achieving at that level. (That is, it does not matter which country the students are from, only how they performed on the test.) Certain criteria were applied to the TIMSS 1999 achievement scale results to identify the sets of items that students reaching each international benchmark were likely to answer correctly and those at the next lower benchmark were unlikely to answer correctly.(2) The sets of items thus produced represented the accomplishments of students reaching each benchmark and were used by a panel of subject matter experts from the TIMSS countries to develop the benchmark descriptions.(3) The work of the panel involved developing a short description for each item describing the scientific understandings demonstrated by students answering it correctly, summarizing students' knowledge and understandings across the set of items for each benchmark to provide more general statements of achievement, and selecting example items illustrating the descriptions.

How Should the Descriptions Be Interpreted?

In general, the parts of the descriptions that relate to the knowledge of science concepts and to skills are relatively straightforward. It needs to be acknowledged, however, that the cognitive behavior necessary to answer some items correctly may vary according to students' experience. An item may require only simple recall for a student familiar with the item's content and context, but necessitate problem-solving strategies from one unfamiliar with the material. Nevertheless, the descriptions are based on what the panel believed to be the way the great majority of eighth-grade students could be expected to perform.

It also needs to be emphasized that the descriptions of achievement characteristic of students at the international benchmarks are based solely on student performance on the TIMSS 1999 items. Since those items were developed in particular to sample the science domains prescribed for this study, neither the set of items nor the descriptions based on them purport to be comprehensive. There are undoubtedly other science curriculum elements on which students at the various benchmarks would have been successful if they had been included in the assessment.

Please note that students reaching a particular benchmark demonstrated the knowledge and understandings characterizing that benchmark as well as those characterizing the lower benchmarks. The description of achievement at each benchmark is cumulative, building on the description of achievement demonstrated by students at the lower benchmarks.

Finally, it must be emphasized that the descriptions of the international benchmarks are one possible way of beginning to examine student performance. Some students scoring below a benchmark may indeed know or understand some of the concepts that characterize a higher level. Thus, it is important to consider performance on the individual items and clusters of items in developing a profile of student achievement in each participating entity.

Several example items are included for each benchmark to complement the descriptions by giving a more concrete notion of the abilities students demonstrated. Each example item is accompanied by the percentage of correct responses for each TIMSS 1999 Benchmarking participant. Percentages are also provided for selected countries, as is the international average for all 38 countries that participated in TIMSS 1999. In general, the several entities scoring highest on the overall test also scored highest on many of the example items. Not surprisingly, this was true for items assessing the range of performance expectations – recognizing basic facts; understanding simple and complex information; applying scientific understanding to solve problems and provide explanations; interpreting and using data in tables, graphs and diagrams; and demonstrating scientific investigation skills.

Item Examples and Student Performance

The remainder of this chapter describes each benchmark and presents four to six example items illustrating what students know and can do at that level. The correct answer is circled for multiple-choice items. For open-ended items, the answers shown exemplify the types of student responses that were given full credit. The example items are ones that students reaching each benchmark were likely to answer correctly, and they represent the types of items used to develop the description of achievement at that benchmark.(4)

next section >

1 For a detailed description of the items and benchmarks for TIMSS 1995 at fourth and eighth grades and how they compare to the National Research Councilís National Science Education Standards, see Smith, T.A., Martin, M.O., Mullis, I.V.S., and Kelly, D.L. (2000), Profiles of Student Achievement in Science at the TIMSS International Benchmarks: U.S. Performance and Standards in an International Context, Chestnut Hill, MA: Boston College.
2 For example, for the Top 10% Benchmark, an item was included if at least 65 percent of students scoring at the scale point corresponding to this benchmark answered the item correctly and less than 50 percent of students scoring at the Upper Quarter Benchmark answered it correctly. Similarly, for the Upper Quarter Benchmark, an item was included if at least 65 percent of students scoring at that point answered the item correctly and less than 50 percent of students at the Median Benchmark answered it correctly.
3 The participants in the scale anchoring process are listed in Appendix E.
4 Some of the items used to develop the benchmark descriptions are being kept secure to measure achievement trends in future TIMSS assessments and are not available for publication.

Click here to return to the ISC homepage

TIMSS 1999 is a project of the International Study Center
Boston College, Lynch School of Education