Exhibit 
            A.1 The Three Aspects oand Major Categories of the Mathematics 
            Frameworks
          Exhibit 
            A.2 Distribution of Mathematics Items by Content Reporting Category 
            and Performance Category
          Exhibit 
            A.3 Coverage of TIMSS 1999 Target Population  Countries
          Exhibit 
            A.4 School Sample Sizes  Countries
          Exhibit 
            A.5 Student Sample Sizes  Countries
          Exhibit 
            A.6 Overall Participation Rates  Countries
          Exhibit 
            A.7 TIMSS 1999 Within-Country Free-Response Scoring Reliability 
            Data for Mathematics Items
          Exhibit 
            A.8 Cronbach's Alpha Reliability Coefficient  TIMSS 1999 
            Mathematics Test
          Exhibit 
            A.9 Country-Specific Variations in Mathematics Topics in the Curriculum 
            Questionnaire
           
          History
          TIMSS 1999 represents the continuation of a long series of studies 
            conducted by the International Association for the Evaluation of Educational 
            Achievement (IEA). Since its inception in 1959, the IEA has conducted 
            more than 15 studies of cross-national achievement in the curricular 
            areas of mathematics, science, language, civics, and reading. The 
            Third International Mathematics and Science Study (TIMSS), conducted 
            in 1994-1995, was the largest and most complex IEA study, and included 
            both mathematics and science at third and fourth grades, seventh and 
            eighth grades, and the final year of secondary school. 
            In 1999, TIMSS again assessed eighth-grade students in both mathematics 
            and science to measure trends in student achievement since 1995. TIMSS 
            1999 was also known as TIMSS-Repeat, or TIMSS-R.(1)
          To provide U.S. states and school districts with an opportunity to 
            benchmark the performance of their students against that of students 
            in the high-performing TIMSS countries, the International Study Center 
            at Boston College, with the support of the National Center for Education 
            Statistics and the National Science Foundation, established the TIMSS 
            1999 Benchmarking Study. Through this project, the TIMSS mathematics 
            and science achievement tests and questionnaires were administered 
            to representative samples of students in participating states and 
            school districts in the spring of 1999, at the same time the tests 
            and questionnaires were administered in the TIMSS countries. Participation 
            in TIMSS Benchmarking was intended to help states and districts understand 
            their comparative educational standing, assess the rigor and effectiveness 
            of their own mathematics and science programs in an international 
            context, and improve the teaching and learning of mathematics and 
            science.
          Participants in TIMSS Benchmarking
          Thirteen states availed of the opportunity to participate in the 
            Benchmarking Study. Eight public school districts and six consortia 
            also participated, for a total of fourteen districts and consortia. 
            They are listed in Exhibit 
            1 of the Introduction, together with the 38 countries that took 
            part in TIMSS 1999.
          Developing the TIMSS 1999 Mathematics Test
          The TIMSS curriculum framework underlying the mathematics tests was 
            developed for TIMSS in 1995 by groups of mathematics educators with 
            input from the TIMSS National Research Coordinators (nrcs). As shown 
            in Exhibit 
            A.1, the mathematics curriculum framework contains three dimensions 
            or aspects. The content aspect represents the subject matter content 
            of school mathematics. The performance expectations aspect describes, 
            in a non-hierarchical way, the many kinds of performances or behaviors 
            that might be expected of students in school mathematics. The perspectives 
            aspect focuses on the development of students attitudes, interest, 
            and motivation in mathematics. Because the frameworks 
            were developed to include content, performance expectations, and perspectives 
            for the entire span of curricula from the beginning of schooling through 
            the completion of secondary school, some aspects may not be reected 
            in the eighth-grade TIMSS assessment.(2) Working 
            within the framework, mathematics test specifications for TIMSS in 
            1995 were developed that included items representing a wide range 
            of mathematics topics and eliciting a range of skills from the students. 
            The 1995 tests were developed through an international consensus involving 
            input from experts in mathematics and measurement specialists, ensuring 
            they reected current thinking and priorities in mathematics.
          About one-third of the items in the 1995 assessment were kept secure 
            to measure trends over time; the remaining items were released for 
            public use. An essential part of the development of the 1999 assessment, 
            therefore, was to replace the released items with items of similar 
            content, format, and difficulty. With the assistance of the Science 
            and Mathematics Item Replacement Committee, a group of internationally 
            prominent mathematics and science educators nominated by participating 
            countries to advise on subject-matter issues in the assessment, over 
            300 mathematics and science items were developed as potential replacements. 
            After an extensive process of review and field testing, 114 items were 
            selected for use as replacements in the 1999 mathematics assessment. 
          
          Exhibit 
            A.2 presents the five content areas included in the 1999 mathematics 
            test and the numbers of items and score points in each area. Distributions 
            are also included for the five performance categories derived from 
            the performance expectations aspect of the curriculum framework. About 
            one-fourth of the items were in the free-response format, requiring 
            students to generate and write their own answers. Designed to take 
            about one-third of students test time, some free-response questions 
            asked for short answers while others required extended responses with 
            students showing their work or providing explanations for their answers. 
            The remaining questions used a multiple-choice format. In scoring 
            the tests, correct answers to most questions were worth one point. 
            Consistent with the approach of allotting students longer response 
            time for the constructed-response questions than for multiple-choice 
            questions, however, responses to some of these questions (particularly 
            those requiring extended responses) were evaluated for partial credit, 
            with a fully correct answer being awarded two points (see later section 
            on scoring). The total number of score points available for analysis 
            thus somewhat exceeds the number of items. 
          Every effort was made to help ensure that the 
            tests represented the curricula of the participating countries and 
            that the items exhibited no bias towards or against particular countries. 
            The final forms of the tests were endorsed by the nrcs of the participating 
            countries.(3)
          TIMSS Test Design
          Not all of the students in the TIMSS assessment responded to all 
            of the mathematics items. To ensure broad subject-matter coverage 
            without overburdening individual students, TIMSS used a rotated design 
            that included both the mathematics and science items. Thus, the same 
            students participated in both the mathematics and science testing. 
            As in 1995, the 1999 assessment consisted of eight booklets, each 
            requiring 90 minutes of response time. Each participating student 
            was assigned one booklet only. In accordance with the design, the 
            mathematics and science items were assembled into 26 clusters (labeled 
            A through Z). The secure trend items were in clusters A through H, 
            and items replacing the released 1995 items in clusters I through 
            Z. Eight of the clusters were designed to take 12 minutes to complete; 
            10 of the clusters, 22 minutes; and 8 clusters, 10 minutes. In all, 
            the design provided 396 testing minutes, 198 for mathematics and 198 
            for science. Cluster A was a core cluster assigned 
            to all booklets. The remaining clusters were assigned to the booklets 
            in accordance with the rotated design so that representative samples 
            of students responded to each cluster.(4) 
          Background Questionnaires
          TIMSS in 1999 administered a broad array of questionnaires to collect 
            data on the educational context for student achievement and to measure 
            trends since 1995. National Research Coordinators, with the assistance 
            of their curriculum experts, provided detailed information on the 
            organization, emphases, and content coverage of the mathematics and 
            science curriculum. The students who were tested answered questions 
            pertaining to their attitudes towards mathematics and science, their 
            academic self-concept, classroom activities, home background, and 
            out-of-school activities. The mathematics and science teachers of 
            sampled students responded to questions about teaching emphasis on 
            the topics in the curriculum frameworks, instructional practices, 
            professional training and education, and their views on mathematics 
            and science. The heads of schools responded to questions about school 
            staffing and resources, mathematics and science course offerings, and 
            teacher support. 
          Translation and Verification
          The TIMSS instruments were prepared in English and translated into 
            33 languages, with 10 of the 38 countries collecting data in two languages. 
            In addition, it sometimes was necessary to modify the international 
            versions for cultural reasons, even in the nine countries that tested 
            in English. This process represented an enormous effort for the national 
            centers, with many checks along the way. The translation 
            effort included (1) developing explicit guidelines for translation 
            and cultural adaptation; (2) translation of the instruments by the 
            national centers in accordance with the guidelines, using two or more 
            independent translations; (3) consultation with subject-matter experts 
            on cultural adaptations to ensure that the meaning and difficulty 
            of items did not change; (4) verification of translation quality by 
            professional translators from an independent translation company; 
            (5) corrections by the national centers in accordance with the suggestions 
            made; (6) verification by the International Study Center that corrections 
            were made; and (7) a series of statistical checks after the testing 
            to detect items that did not perform comparably across countries.(5) 
          
          Population Definition and Sampling
          TIMSS in 1995 had as its target population students enrolled in the 
            two adjacent grades that contained the largest proportion of 13-year-old 
            students at the time of testing, which were seventh- and eighth-grade 
            students in most countries. TIMSS in 1999 used 
            the same definition to identify the target grades, but assessed students 
            in the upper of the two grades only, which was the eighth grade in 
            most countries, including the United States.(6) 
            The eighth grade was the target population for all of the Benchmarking 
            participants. 
          The selection of valid and efficient samples was essential to the 
            success of TIMSS and of the Benchmarking Study. For TIMSS internationally, 
            NRCs, including Westat, the sampling and data collection coordinator 
            for TIMSS in the United States, received training in how to select 
            the school and student samples and in the use of the sampling software, 
            and worked in close consultation with Statistics Canada, the TIMSS 
            sampling consultants, on all phases of sampling. As well as conducting 
            the sampling and data collection for the U.S. national TIMSS sample, 
            Westat was also responsible for sampling and data collection in each 
            of the Benchmarking states, districts, and consortia. 
          To document the quality of the school and student samples in each 
            of the TIMSS countries, staff from Statistics Canada and the International 
            Study Center worked with the TIMSS sampling referee (Keith Rust, Westat) 
            to review sampling plans, sampling frames, and sampling implementation. 
            Particular attention was paid to coverage of the target population 
            and to participation by the sampled schools and students. The data 
            from the few countries that did not fully meet all of the sampling 
            guidelines are annotated in the TIMSS international reports, and are 
            also annotated in this report. The TIMSS samples for the Benchmarking 
            participants were also carefully reviewed in light of the TIMSS sampling 
            guidelines, and the results annotated where appropriate. Since Westat 
            was the sampling contractor for the Benchmarking project, the role 
            of sampling referee for the Benchmarking review was filled by Pierre 
            Foy, of Statistics Canada. 
          Although all countries and Benchmarking participants were expected 
            to draw samples representative of the entire internationally desired 
            population (all students in the upper of the two adjacent grades with 
            the greatest proportion of 13-year-olds), the few countries where 
            this was not possible were permitted to define a national desired 
            population that excluded part of the internationally desired population. 
            Exhibit 
            A.3 shows any differences in coverage between the international 
            and national desired populations. Almost all TIMSS countries achieved 
            100 percent coverage (36 out of 38), with Lithuania and Latvia the 
            exceptions. Consequently, the results for Lithuania are annotated, 
            and because coverage fell below 65 percent for Latvia, the Latvian 
            results are labeled Latvia (lss), for Latvian-Speaking 
            Schools. Additionally, because of scheduling difficulties, Lithuania 
            was unable to test its eighth-grade students in May 1999 as planned. 
            Instead, the students were tested in September 1999, when they had 
            moved into the ninth grade. The results for Lithuania are annotated 
            to reect this as well. Exhibit 
            A.3 also shows that the sampling plans for the Benchmarking participants 
            all incorporated 100 percent coverage of the desired population. Four 
            of the 13 states (Idaho, Indiana, Michigan, and Pennsylvania) as well 
            as the Southwest Pennsylvania Math and Science Collaborative included 
            private schools as well as public schools.
          In operationalizing their desired eighth-grade population, countries 
            and Benchmarking participants could define a population to be sampled 
            that excluded a small percentage (less than 10 percent) of certain 
            kinds of schools or students that would be very difficult or resource-intensive 
            to test (e.g., schools for students with special needs or schools 
            that were very small or located in extremely rural areas). Exhibit 
            A.3 also shows that the degree of such exclusions was small. Among 
            countries, only Israel reached the 10 percent limit, and among Benchmarking 
            participants, only Guilford County and Montgomery County did so. All 
            three are annotated as such in the achievement chapters of this report.
          Within countries, TIMSS used a two-stage sample design, in which 
            the first stage involved selecting about 150 public and private schools 
            in each country. Within each school, countries were to use random 
            procedures to select one mathematics class at the eighth grade. All 
            of the students in that class were to participate in the TIMSS testing. 
            This approach was designed to yield a representative sample of about 
            3,750 students per country. Typically, between 450 and 3,750 students 
            responded to each achievement item in each country, depending on the 
            booklets in which the items appeared.
          States participating in the Benchmarking study were required to sample 
            at least 50 schools and approximately 2,000 eighth-grade students. 
            School districts and consortia were required to sample at least 25 
            schools and at least 1,000 students. Where there were fewer than 25 
            schools in a district or consortium, all schools were to be included, 
            and the within-school sample increased to yield the total of 1,000 
            students.
          Exhibits A.4 
            and A.5 
            present achieved sample sizes for schools and students, respectively, 
            for the TIMSS countries and for the Benchmarking participants. Where 
            a district or consortium was part of a state that also participated, 
            the state sample was augmented by the district or consortium sample, 
            properly weighted in accordance with its size. Schools in a state 
            that were sampled as part of the U.S. national TIMSS sample were also 
            used to augment the state sample. For example, the Illinois sample 
            consists of 90 schools, 41 from the state Benchmarking sample (including 
            five schools from the national TIMSS sample), 27 from the Chicago 
            Public Schools, 17 from the First in the World Consortium, and five 
            from the Naperville School District. 
          Exhibit 
            A.6 shows the participation rates for schools, students, and overall, 
            both with and without the use of replacement schools, for TIMSS countries 
            and Benchmarking participants. All of the countries met the guideline 
            for sampling participation  85 percent of both the schools and 
            students, or a combined rate (the product of school and student participation) 
            of 75 percent  although Belgium (Flemish), England, Hong Kong, 
            and the Netherlands did so only after including replacement schools, 
            and are annotated accordingly in the achievement chapters.
          With the exception of Pennsylvania and Texas, all the Benchmarking 
            participants met the sampling guidelines, although Indiana did so 
            only after including replacement schools. Indiana is annotated to 
            reect this in the achievement chapters, and Pennsylvania and Texas 
            are italicized in all exhibits in this report.
          Data Collection
          Each participating country was responsible for carrying out all aspects 
            of the data collection, using standardized procedures developed for 
            the study. Training manuals were created for school coordinators and 
            test administrators that explained procedures for receipt and distribution 
            of materials as well as for the activities related to the testing 
            sessions. These manuals covered procedures for test security, standardized 
            scripts to regulate directions and timing, rules for answering students 
            questions, and steps to ensure that identification on the test booklets 
            and questionnaires corresponded to the information on the forms used 
            to track students. As the data collection contractor for the U.S. 
            national TIMSS, Westat was fully acquainted with the TIMSS procedures, 
            and applied them in each of the Benchmarking jurisdictions in the 
            same way as in the national data collection.
          Each country was responsible for conducting quality control procedures 
            and describing this effort in the nrcs report documenting procedures 
            used in the study. In addition, the International Study Center considered 
            it essential to monitor compliance with standardized procedures through 
            an international program of quality control site visits. nrcs were 
            asked to nominate one or more persons unconnected with their national 
            center, such as retired school teachers, to serve as quality control 
            monitors for their countries. The International Study Center developed 
            manuals for the monitors and briefed them in two-day training sessions 
            about TIMSS, the responsibilities of the national centers in conducting 
            the study, and their own roles and responsibilities. In all, 71 international 
            quality control monitors participated in this training.
          The international quality control monitors interviewed 
            the nrcs about data collection plans and procedures. They also visited 
            a sample of 15 schools where they observed testing sessions and interviewed 
            school coordinators.(7) Quality control monitors 
            interviewed school coordinators in all 38 countries, and observed 
            a total of 550 testing sessions. The results of the interviews conducted 
            by the international quality control monitors indicated that, in general, 
            nrcs had prepared well for data collection and, despite the heavy 
            demands of the schedule and shortages of resources, were able to conduct 
            the data collection efficiently and professionally. Similarly, the 
            TIMSS tests appeared to have been administered in compliance with 
            international procedures, including the activities before the testing 
            session, those during testing, and the school-level activities related 
            to receiving, distributing, and returning material from the national 
            centers.
          As a parallel quality control effort for the Benchmarking project, 
            the International Study Center recruited and trained a team of 18 
            quality control observers, and sent them to observe the data collection 
            activities of the Westat test administrators in a sample of about 
            10 percent of the schools in the study (98 schools in all).(8) 
            In line with the experience internationally, the observers reported 
            that the data collection was conducted successfully according to the 
            prescribed procedures, and that no serious problems were encountered.
          Scoring the Free-Response Items
          Because about one-third of the written test time was devoted to free-response 
            items, TIMSS needed to develop procedures for reliably evaluating 
            student responses within and across countries. Scoring used two-digit 
            codes with rubrics specific to each item. The first digit designates 
            the correctness level of the response. The second digit, combined 
            with the first, represents a diagnostic code identifying specific types 
            of approaches, strategies, or common errors and misconceptions. Although 
            not used in this report, analyses of responses based on the second 
            digit should provide insight into ways to help students better understand 
            mathematics concepts and problem-solving approaches.
          To ensure reliable scoring procedures based on the TIMSS rubrics, 
            the International Study Center prepared detailed guides containing 
            the rubrics and explanations of how to implement them, together with 
            example student responses for the various rubric categories. These 
            guides, along with training packets containing extensive examples 
            of student responses for practice in applying the rubrics, were used 
            as a basis for intensive training in scoring the free-response items. 
            The training sessions were designed to help representatives of national 
            centers who would then be responsible for training personnel in their 
            countries to apply the two-digit codes reliably. In the United States, 
            the scoring was conducted by National Computer Systems (NCS) under 
            contract to Westat. To ensure that student responses from the Benchmarking 
            participants were scored in the same way as those from the U.S. national 
            sample, NCS had both sets of data scored at the same time and by the 
            same scoring staff.
          To gather and document empirical information about the within-country 
            agreement among scorers, TIMSS arranged to have systematic subsamples 
            of at least 100 students responses to each item coded independently 
            by two readers. Exhibit 
            A.7 shows the average and range of the within-country percent 
            of exact agreement between scorers on the free-response items in the 
            mathematics test for 37 of the 38 countries. A high percentage of 
            exact agreement was observed, with an overall average of 99 percent 
            across the 37 countries. The TIMSS data from the reliability studies 
            indicate that scoring procedures were robust for the mathematics items, 
            especially for the correctness score used for the analyses in this 
            report. In the United States, the average percent exact agreement 
            was 99 percent for the correctness score and 96 percent for the diagnostic 
            score. Since the Benchmarking data were combined with the U.S. national 
            TIMSS sample for scoring purposes, this high level of scoring reliability 
            applies to the Benchmarking data also.
          Test Reliability
          Exhibit 
            A.8 displays the mathematics test reliability coefficient for 
            each country and Benchmarking participant. This coefficient is the 
            median kr-20 reliability across the eight test booklets. Among countries, 
            median reliabilities ranged from 0.76 in the Philippines to 0.94 in 
            Chinese Taipei. The international median, 0.89, is the median of the 
            reliability coefficients for all countries. Reliability coefficients 
            among Benchmarking participants were generally close to the international 
            median, ranging from 0.88 to 0.91 across states, and from 0.84 to 
            0.91 across districts and consortia.
          Data Processing
          To ensure the availability of comparable, high-quality 
            data for analysis, TIMSS took rigorous quality control steps to create 
            the international database.(9) TIMSS prepared manuals 
            and software for countries to use in entering their data, so that 
            the information would be in a standardized international format before 
            being forwarded to the IEA Data Processing Center in Hamburg for creation 
            of the international database. Upon arrival at the Data Processing 
            Center, the data underwent an exhaustive cleaning process. This involved 
            several iterative steps and procedures designed to identify, document, 
            and correct deviations from the international instruments, file structures, 
            and coding schemes. The process also emphasized consistency of information 
            within national data sets and appropriate linking among the many student, 
            teacher, and school data files. In the United States, the creation 
            of the data files for both the Benchmarking participants and the U.S. 
            national TIMSS effort was the responsibility of Westat, working closely 
            with NCS. After the data files were checked carefully by Westat, they 
            were sent to the IEA Data Processing Center, where they underwent 
            further validity checks before being forwarded to the International 
            Study Center.
          IRT Scaling and Data Analysis
          The general approach to reporting the TIMSS 
            achievement data was based primarily on item response theory (IRT) 
            scaling methods.(10) The mathematics results were 
            summarized using a family of 2-parameter and 3-parameter IRT models 
            for dichotomously-scored items (right or wrong), and generalized partial 
            credit models for items with 0, 1, or 2 available score points. The 
            IRT scaling method produces a score by averaging the responses of 
            each student to the items that he or she took in a way that takes 
            into account the difficulty and discriminating power of each item. 
            The methodology used in TIMSS includes refinements that enable reliable 
            scores to be produced even though individual students responded to 
            relatively small subsets of the total mathematics item pool. Achievement 
            scales were produced for each of the five mathematics content areas 
            (fractions and number sense, measurement, data representation, analysis, 
            and probability, geometry, and algebra), as well as for mathematics 
            overall. 
          The IRT methodology was preferred for developing comparable estimates 
            of performance for all students, since students answered different 
            test items depending upon which of the eight test booklets they received. 
            The IRT analysis provides a common scale on which performance can 
            be compared across countries. In addition to providing a basis for 
            estimating mean achievement, scale scores permit estimates of how 
            students within countries vary and provide information on percentiles 
            of performance. To provide a reliable measure of student achievement 
            in both 1999 and 1995, the overall mathematics scale was calibrated 
            using students from the countries that participated in both years. 
            When all countries participating in 1995 at the eighth grade are treated 
            equally, the TIMSS scale average over those countries is 500 and the 
            standard deviation is 100. Since the countries varied in size, each 
            country was weighted to contribute equally to the mean and standard 
            deviation of the scale. The average and standard deviation of the 
            scale scores are arbitrary and do not affect scale interpretation. 
            When the metric of the scale had been established, students from the 
            countries that tested in 1999 but not 1995 were assigned scores on 
            the basis of the new scale. IRT scales were also created for each 
            of the five mathematics content areas for the 1999 data. Students from 
            the Benchmarking samples were assigned scores on the overall mathematics 
            scale as well as in each of the five mathematics content areas using 
            the same item parameters and estimation procedures as for TIMSS internationally.
          To allow more accurate estimation of summary statistics for student 
            subpopulations, the TIMSS scaling made use of plausible-value technology, 
            whereby five separate estimates of each students score were generated 
            on each scale, based on the students responses to the items 
            in the students booklet and the students background characteristics. 
            The five score estimates are known as plausible values, 
            and the variability between them encapsulates the uncertainty inherent 
            in the score estimation process.
          Estimating Sampling Error
          Because the statistics presented in this report are estimates of 
            performance based on samples of students, rather than the values that 
            could be calculated if every student in every country or Benchmarking 
            jurisdiction had answered every question, it is important to have 
            measures of the degree of uncertainty of the estimates. The 
            jackknife procedure was used to estimate the standard error associated 
            with each statistic presented in this report.(11) 
            The jackknife standard errors also include an error component due 
            to variation between the five plausible values generated for each 
            student. The use of confidence intervals, based on the standard errors, 
            provides a way to make inferences about the population means and proportions 
            in a manner that reects the uncertainty associated with the sample 
            estimates. An estimated sample statistic plus or minus two standard 
            errors represents a 95 percent confidence interval for the corresponding 
            population result.
          Making Multiple Comparisons
          This report makes extensive use of statistical hypothesis-testing 
            to provide a basis for evaluating the significance of differences 
            in percentages and in average achievement scores. Each separate test 
            follows the usual convention of holding to 0.05 the probability that 
            reported differences could be due to sampling variability alone. However, 
            in exhibits where statistical significance tests are reported, the 
            results of many tests are reported simultaneously, usually at least 
            one for each country and Benchmarking participant in the exhibit. 
            The significance tests in these exhibits are based on a Bonferroni 
            procedure for multiple comparisons that hold to 0.05 the probability 
            of erroneously declaring a statistic (mean or percentage) for one 
            entity to be different from that for another entity. In the multiple 
            comparison charts (Exhibit 1.2 and those in Appendix B), the Bonferroni 
            procedure adjusts for the number of entities in the chart, minus one. 
            In exhibits where a country or Benchmarking participant 
            statistic is compared to the international average, the adjustment 
            is for the number of entities.(12) 
          Setting International Benchmarks of Student Achievement
          International benchmarks of student achievement were computed at 
            each grade level for both mathematics and science. The benchmarks 
            are points in the weighted international distribution of achievement 
            scores that separate the 10 percent of students located on top of 
            the distribution, the top 25 percent of students, the top 50 percent, 
            and the bottom 25 percent. The percentage of students in each country 
            and Benchmarking jurisdiction meeting or exceeding the international 
            benchmarks is reported. The benchmarks correspond to the 90th, 75th, 
            50th, and 25th percentiles of the international distribution of achievement. 
            When computing these percentiles, each country contributed as many 
            students to the distribution as there were students in the target 
            population in the country. That is, each countrys contribution 
            to setting the international benchmarks was proportional to the estimated 
            population enrolled at the eighth grade. 
          In order to interpret the TIMSS scale scores and analyze achievement 
            at the international benchmarks, TIMSS conducted a scale anchoring 
            analysis to describe achievement of students at those four points 
            on the scale. Scale anchoring is a way of describing students 
            performance at different points on a scale in terms of what they know 
            and can do. It involves a statistical component, 
            in which items that discriminate between successive points on the 
            scale are identified, and a judgmental component in which subject-matter 
            experts examine the items and generalize to students knowledge 
            and understandings.(13)
          Mathematics Curriculum Questionnaire
          In an effort to collect information about the content of the intended 
            curriculum in mathematics, TIMSS asked National Research Coordinators 
            and Coordinators from the Benchmarking jurisdictions to complete a 
            questionnaire about the structure, organization, and content coverage 
            of their curricula. Coordinators reviewed 56 mathematics topics and 
            reported the percentage of their eighth-grade students for which each 
            topic was intended in their curriculum. Although most topic descriptions 
            were used without modification, there were occasions when Coordinators 
            found it necessary to expand on or qualify the topic description to 
            describe their situation accurately. The country-specific adaptations 
            to the mathematics curriculum questionnaire are presented in Exhibit 
            A.9. No adaptations to the list of topics were necessary for the 
            U.S. national version, nor were any adaptations made by any Benchmarking 
            participants.
         
        
           
            | 1 | 
             
               The TIMSS 1999 results for mathematics and science, 
                respectively, are reported in Mullis, I.V.S., Martin, M.O., Gonzalez, 
                E.J., Gregory, K.D., Garden, R.A., OConnor, K.M., Chrostowski, 
                S.J., and Smith, T.A. (2000), TIMSS 1999 International Mathematics 
                Report: Findings from IEAs Repeat of the Third International 
                Mathematics and Science Study at the Eighth Grade, Chestnut 
                Hill, MA: Boston College, and in Martin, M.O., Mullis, I.V.S., 
                Gonzalez, E.J., Gregory, K.D., Smith, T.A., Chrostowski, S.J., 
                Garden, R.A., and OConnor, K.M. (2000), TIMSS 1999 International 
                Science Report: Findings from IEAs Repeat of the Third International 
                Mathematics and Science Study at the Eighth Grade, Chestnut 
                Hill, MA: Boston College. 
               | 
          
           
            | 2 | 
             The complete TIMSS curriculum frameworks 
              can be found in Robitaille, D.F., et al. (1993), TIMSS Monograph 
              No.1: Curriculum Frameworks for Mathematics and Science, Vancouver, 
              BC: Pacific Educational Press.. | 
          
           
            | 3 | 
             For a full discussion of the TIMSS 
              1999 test development effort, please see Garden, R.A. and Smith, 
              T.A. (2000), TIMSS Test Development in M.O. Martin, 
              K.D. Gregory, K.M. OConnor, and S.E. Stemler (eds.), TIMSS 
              1999 Benchmarking Technical Report, Chestnut Hill, MA: Boston 
              College. | 
          
           
            | 4 | 
             The 1999 TIMSS test design is identical 
              to the design for 1995, which is fully documented in Adams, R. and 
              Gonzalez, E. (1996), TIMSS Test Design in M.O. Martin 
              and D.L. Kelly (eds.), Third International Mathematics and Science 
              Study Technical Report, Volume I, Chestnut Hill, MA: Boston College. | 
          
           
            | 5 | 
            More details about the translation verification 
              procedures can be found in OConnor, K., and Malak, B. (2000), 
              Translation and Cultural Adaptation of the TIMSS Instruments 
              in M.O. Martin, K.D. Gregory, K.M. OConnor, and S.E. Stemler 
              (eds.), TIMSS 1999 Benchmarking Technical Report, Chestnut 
              Hill, MA: Boston College. | 
          
           
            | 6 | 
            The sample design for TIMSS is described 
              in detail in Foy, P., and Joncas, M. (2000), TIMSS Sample 
              Design in M.O. Martin, K.D. Gregory and S.E. Stemler (eds.), 
              TIMSS 1999 Technical Report, Chestnut Hill, MA: Boston College. 
              Sampling for the Benchmarking project is described in Fowler, J., 
              Rizzo, L., and Rust, K. (2001), TIMSS Benchmarking Sampling 
              Design and Implementation in M.O. Martin, K.D. Gregory, K.M. 
              OConnor, and S.E. Stemler (eds.), TIMSS 1999 Benchmarking 
              Technical Report, Chestnut Hill, MA: Boston College. | 
          
           
            | 7 | 
            Steps taken to ensure high-quality data 
              collection in TIMSS internationally are described in detail in OConnor, 
              K., and Stemler, S. (2000), Quality Control in the TIMSS Data 
              Collection in M.O. Martin, K.D. Gregory and S.E. Stemler (eds.), 
              TIMSS 1999 Technical Report, Chestnut Hill, MA: Boston College. | 
          
           
            | 8 | 
            Quality control measures for the Benchmarking 
              project are described in OConnor, K. and Stemler, S. (2001), 
              Quality Control in the TIMSS Benchmarking Data Collection 
              in M.O. Martin, K.D. Gregory, K.M. OConnor, and S.E. Stemler 
              (eds.), TIMSS 1999 Benchmarking Technical Report, Chestnut 
              Hill, MA: Boston College. | 
          
           
            | 9 | 
            These steps are detailed in Hastedt, 
              D., and Gonzalez, E. (2000), Data Management and Database 
              Construction in M.O. Martin, K.D. Gregory, K.M. OConnor, 
              and S.E. Stemler (eds.), TIMSS 1999 Benchmarking Technical Report, 
              Chestnut Hill, MA: Boston College. | 
          
           
            | 10 | 
            For a detailed description of the TIMSS 
              scaling, see Yamamoto, K., and Kulick, E. (2000), Scaling 
              Methods and Procedures for the TIMSS Mathematics and Science Scales 
              in M.O. Martin, K.D. Gregory, K.M. OConnor, and S.E. Stemler 
              (eds.), TIMSS 1999 Benchmarking Technical Report, Chestnut 
              Hill, MA: Boston College. | 
          
           
            | 11 | 
            Procedures for computing jackknifed 
              standard errors are presented in Gonzalez, E. and Foy, P. (2000), 
              Estimation of Sampling Variance in M.O. Martin, K.D. 
              Gregory, K.M. OConnor, and S.E. Stemler (eds.), TIMSS 1999 
              Benchmarking Technical Report, Chestnut Hill, MA: Boston College. | 
          
           
            | 12 | 
            The application of the Bonferroni procedures 
              is described in Gonzalez, E., and Gregory, K. (2000), Reporting 
              Student Achievement in Mathematics and Science in M.O. Martin, 
              K.D. Gregory, K.M. OConnor, and S.E. Stemler (eds.), TIMSS 
              1999 Benchmarking Technical Report, Chestnut Hill, MA: Boston 
              College. | 
          
           
            | 13 | 
            The scale anchoring procedure is described 
              fully in Gregory, K., and Mullis, I. (2000), Describing International 
              Benchmarks of Student Achievement in M.O. Martin, K.D. Gregory, 
              K.M. OConnor, and S.E. Stemler (eds.), TIMSS 1999 Benchmarking 
              Technical Report, Chestnut Hill, MA: Boston College. An application 
              of the procedure to the 1995 TIMSS data may be found in Kelly, D.L., 
              Mullis, I.V.S., and Martin, M.O. (2000), Profiles of Student 
              Achievement in Mathematics at the TIMSS International Benchmarks: 
              U.S. Performance and Standards in an International Context, 
              Chestnut Hill, MA: Boston College. |