TIMSS 1999 Benchmarking Science Report Chap1

Table of Contents
Executive Summary
Introduction

Chapter 1 Contents
	How Do Participants Differ in Science Achievement?
	How Do Benchmarking Participants Compare with International Benchmarks of Science Achievement?
	What Are the Gender Differences in Science Achievment?

Chapter 2
Chapter 3
Chapter 4
Chapter 5
Chapter 6
Chapter 7
Reference 1
Reference 2
Reference 3
Reference 4
Appendix A
Appendix B
Appendix C
Appendix D
Appendix E

Chapter 1 summarizes eighth-grade achievement on the TIMSS 1999 science assessment for each of the Benchmarking states, districts, and consortia, as well as for each participating country. Comparisons of participants’ performance against international benchmarks, as well as gender differences in performance, are also provided.

How Do Participants Differ in Science Achievement?

Exhibit 1.1 presents the distribution of student achievement for the 38 TIMSS 1999 countries and the 27 Benchmarking participants in a two-page display.(1) The left-hand page shows countries and Benchmarking participants together, in decreasing order of average (mean) scale score, and indicates whether the average for each participant is significantly higher or lower than the international average of 488. The international average was obtained by averaging across the mean scores for each of the 38 participating countries. On the right-hand page is a tabular display of average achievement, along with the number of years of formal schooling and the average age of students tested.

Many of the Benchmarking participants performed fairly well on the TIMSS 1999 science assessment. Average performance for the 13 Benchmarking states was generally clustered in the upper half of the international distribution of results for the 38 countries. All but three of the Benchmarking states performed significantly above the international average. The United States as a whole also had average science achievement just above the international average.

The Benchmarking Study underscores the extreme importance of looking beyond the averages to the range of performance found across the nation. Performance across the participating school districts and consortia reßected nearly the full range of achievement internationally. The highest-achieving Benchmarking participants were the Naperville School District, the First in the World Consortium, the Michigan Invitational Group, and the Academy School District. These were four of the Benchmarking participants with the lowest percentages of students from low-income families (Naperville, 2 percent; First in the World, 14 percent; Michigan Invitational Group, 22 percent; Academy School District, 4 percent).(2) Benchmarking participants with the lowest average science achievement included four urban school districts with high percentages of students from low-income families – the Rochester City School District (73 percent), the Chicago Public Schools (71 percent), the Jersey City Public Schools (89 percent), and the Miami-Dade County Public Schools (59 percent). Although not quite as low as the lowest-scoring countries in TIMSS 1999, the range of average performance across the Benchmarking districts and consortia was almost as broad as across all the TIMSS 1999 countries.

That achievement is distributed broadly within as well as across participating entities is graphically illustrated in Exhibit 1.1 showing the distribution of student performance within each entity. Achievement for each participant is shown for the 25th and 75th percentiles as well as for the 5th and 95th percentiles.(3) Each percentile point indicates the percentages of students performing below and above that point on the scale. For example, 25 percent of the eighth-grade students in each participating entity performed below the 25th percentile for that entity, and 75 percent performed above the 25th percentile. The range between the 25th and 75th percentiles represents performance by the middle half of students. In most entities, the range of performance for the middle group was between 100 and 150 scale-score points. Performance at the 5th and 95th percentiles represents the extremes in both lower and higher achievement. The range of performance between these two score points, which includes 90 percent of the population, is between 250 and 300 points for most participants. The dark boxes at the midpoints of the distributions show the 95 percent confidence intervals around the average achievement in each entity.(4)

As well as showing the wide spread of student achievement within each entity, the percentiles also provide a perspective on the size of the differences among entities. Even though performance generally differed very little between one participant and the next higher- or lower-performing one, the range across participants was very large. For example, average performance in Chinese Taipei exceeded performance at the 95th percentile in the lower-performing countries such as the Philippines, Morocco, and South Africa. This means that only the most proficient students in the lower-performing countries approached the level of achievement of students of average proficiency in Chinese Taipei.

Exhibit 1.2 compares overall mean achievement in science among individual entities. This figure shows whether or not the differences in average achievement between pairs of participants are statistically significant. Selecting a participant of interest and reading across the exhibit, a triangle pointing up indicates significantly higher performance than the comparison participant listed across the top; a circle indicates no significant difference in performance; and a triangle pointing down indicates significantly lower performance.

The data in Exhibit 1.2 reinforce the point that, when ordered by average achievement, adjacent participants usually did not significantly differ from each other, although the differences in achievement between the high-performing and low-performing participants were very large.

The Naperville School District, Chinese Taipei, Singapore, the First in the World Consortium, the Michigan Invitational Group, and the Academy School District had the highest average performance, closely followed by Hungary, Japan, and Korea. Naperville, First in the World, the Michigan Invitational Group, and the Academy School District all had average achievement comparable to that of high-performing Chinese Taipei and Singapore. The difference in performance from one participant to the next was often negligible. Among Benchmarking jurisdictions, Michigan, the Southwest Pennsylvania Math and Science Collaborative, the Project smart Consortium, Oregon, Indiana, Guilford County, Massachusetts, and Connecticut were outperformed by very few entities, and had higher average achievement than almost half of them. Montgomery County, Pennsylvania, Idaho, Missouri, and Illinois also had very similar performance, each scoring above slightly more than 20 other entities and being outscored by nine or fewer. Another group with roughly similar achievement includes the Fremont/Lincoln/Westside Public Schools, South Carolina, North Carolina, Maryland, and the Delaware Science Coalition. Each of these performed better than about 20 other entities and was outperformed by about 20 entities. Texas had similar achievement, but its large standard error reduced the number of statistically significant differences. The Rochester City School District, the Chicago Public Schools, the Jersey City Public Schools, and the Miami-Dade County Public Schools had average eighth-grade science performance lower than most of the TIMSS 1999 countries and comparable to that of Jordan, Iran, Indonesia, Turkey, and Tunisia.

1	TIMSS used item response theory (IRT) methods to summarize the achievement results on a scale with a mean of 500 and a standard deviation of 100. Given the matrix-sampling approach, scaling averages students’ responses in a way that accounts for differences in the difÞculty of different subsets of items. It allows students’ performance to be summarized on a common metric even though individual students responded to different items in the test. For more detailed information, see the “IRT Scaling and Data Analysis” section of Appendix A.
2	Low-income Þgures are percentages of students eligible to receive free or reduced-price lunch through the National School Lunch Program, as reported by participating schools.
3	Tables of the percentile values and standard deviations for all participants are presented in Appendix C.
4	See the “IRT Scaling and Data Analysis” section of Appendix A for more details about calculating standard errors and confidence intervals for the TIMSS statistics.

TIMSS 1999 is a project of the International Study Center
Boston College, Lynch School of Education