Svoboda | Graniru | BBC Russia | Golosameriki | Facebook
Skip to main content

    Mark Reckase

    ABSTRACT There is a continuing tension in testing programs to equate forms and maintain score scales and at the same time allow for changing conditions in the educational system, such as curriculum shifts or practical limits on testing... more
    ABSTRACT There is a continuing tension in testing programs to equate forms and maintain score scales and at the same time allow for changing conditions in the educational system, such as curriculum shifts or practical limits on testing time. When such changes occur, psychometric staff members are challenged to develop linking methods that allow for comparable reporting but meet requirements for psychometric rigor. This article describes a method addressing such shifts in testing programs. The application of the method is demonstrated on a large-scale educational testing program that had changes in test length, content distribution, and decision-making process. The method used to accomplish the linkage was to develop a pseudo test from the items included in the longer test before the change that was designed to mimic the test after the change. The linking of the tests using the pseudo test process resulted in a percentage of successful students that was similar to the percentages obtained prior to the changes. The linked scores were treated as comparable rather than equated scores.
    Computerized Adaptive Testing (CAT) is gaining wide acceptance with the ready availability of computer technology. The general intent of is to adapt the difficulty of the test to the capabilities of the examinee so that measurement... more
    Computerized Adaptive Testing (CAT) is gaining wide acceptance with the ready availability of computer technology. The general intent of is to adapt the difficulty of the test to the capabilities of the examinee so that measurement accuracy is improved over fixed tests, and the entire testing process is more efficient. However, many computer administration designs, such as two-stage tests, stratified adaptive tests, and those with content balancing and exposure control, are called adaptive, but the amount of adaptation greatly varies. In this paper, several measures of the amount of adaptation for a CAT are presented along with information about their sensitivity to item pool size, distribution of item difficulty, and exposure control. A real data application is presented to show the level of adaptation of a mature, operational CAT. Some guidelines are provided for how much adaptation should take place to merit the label of an “adaptive test.”
    The MIRT models presented in this book are useful from a theoretical perspective because they provide a model for the interaction between persons and test items. The different kinds of models represent different theoretical perspectives.... more
    The MIRT models presented in this book are useful from a theoretical perspective because they provide a model for the interaction between persons and test items. The different kinds of models represent different theoretical perspectives. For example, the compensatory and partially compensatory models provide two different conceptions of how levels on hypothetical constructs combine when applied to items that require some level on the constructs to determine the correct response. Although the theoretical models are interesting in their own right, the practical applications of the models require a means of estimating the item and person parameters for the models. Without practical procedures for parameter estimation, the usefulness of the models is very limited.
    This chapter describes the property of estimates of points in a multidimensional space that is labeled by some as paradoxical, shows when this property of the estimates is present, and also shows that the paradoxical result is not flaw in... more
    This chapter describes the property of estimates of points in a multidimensional space that is labeled by some as paradoxical, shows when this property of the estimates is present, and also shows that the paradoxical result is not flaw in estimation because estimates improve with additional information even when the paradox occurs. The paradox is that when a correct response to a test item is added to the string of responses for an examinee to previous items, at least one of the coordinates of the new estimated θ-point decreases compared to the estimate based on the initial string of responses. The information presented in the chapter shows that this can occur whenever the likelihood function for the estimates has a particular form. This form is present in many cases when the item responses for a test can not be described by simple structure. Results are presented to show that the additional response improves the estimate of the θ-point even though the paradoxical result occurs.
    The design and development of international sampling procedures for FIRSTMATH was directed at resolving issues related to definitions, sampling, and methods before launching a larger comparative study of beginning mathematics teachers.... more
    The design and development of international sampling procedures for FIRSTMATH was directed at resolving issues related to definitions, sampling, and methods before launching a larger comparative study of beginning mathematics teachers. Through the process of planning and enacting the FIRSTMATH study the research team was able to (a) develop a common definition of a “beginning teacher,” (b) determine feasible stratified sampling plans, (c) gain access to teachers and classrooms to collect data, and (d) achieve high response rates.
    The present study extended the p-optimality method to the multistage computerized adaptive test (MST) context in developing optimal item pools to support different MST panel designs under different test configurations. Using the Rasch... more
    The present study extended the p-optimality method to the multistage computerized adaptive test (MST) context in developing optimal item pools to support different MST panel designs under different test configurations. Using the Rasch model, simulated optimal item pools were generated with and without practical constraints of exposure control. A total number of 72 simulated optimal item pools were generated and evaluated by an overall sample and conditional sample using various statistical measures. Results showed that the optimal item pools built with the p-optimality method provide sufficient measurement accuracy under all simulated MST panel designs. Exposure control affected the item pool size, but not the item distributions and item pool characteristics. This study demonstrated that the p-optimality method can adapt to MST item pool design, facilitate the MST assembly process, and improve its scoring accuracy.
    This paper presents a study on the generation of mathematics test items using algorithmic methods. The history of this approach is briefly reviewed and is followed by a survey of the research to date on the statistical parallelism of... more
    This paper presents a study on the generation of mathematics test items using algorithmic methods. The history of this approach is briefly reviewed and is followed by a survey of the research to date on the statistical parallelism of algorithmically generated mathematics items. Results are presented for 8 parallel test forms generated using 16 algorithms covering a variety of mathematics content and cognitive categories. The majority of the algorithms yielded items that were very homogeneous in their statistical characteristics. Those algorithms that did not yield homogeneous items were analyzed to determine if causes for differences could be determined. A possible innovative application of the algorithms is the computer generation of new test forms with specific content and statistical specifications and without the need for a preexisting item bank. Includes two tables, three figures. (Contains 16 references.) (Author) Reproductions supplied by EDRS are the best that can be made fr...
    Abstract In this article, we report on the challenges entailed in the development of concepts, methods, and strategies for designing and implementing a cross-national research study of the first-years of school-mathematics teaching,... more
    Abstract In this article, we report on the challenges entailed in the development of concepts, methods, and strategies for designing and implementing a cross-national research study of the first-years of school-mathematics teaching, including an exploration of how beginning mathematics teachers differ in their preparation, knowledge for teaching, teaching practice, working conditions, and pupil characteristics. The study was designed as a proof-of-concept for a study of teaching and teacher education to be implemented by educationalists, teacher educators, and early career teachers as an ongoing professional endeavor. Primary among the challenges was the development of the sampling design and the construction of measures.
    Defining what teachers need to know to teach algebra successfully is important for informing teacher preparation and professional development efforts. Based on prior research, an analysis of video, interviews with teachers, and an... more
    Defining what teachers need to know to teach algebra successfully is important for informing teacher preparation and professional development efforts. Based on prior research, an analysis of video, interviews with teachers, and an analysis of textbooks, the authors define categories of knowledge and practices of teaching for understanding and assessing teachers' knowledge for teaching algebra. They argue that the combination of categories and practices must be covered in assessments of teacher knowledge, if the assessments are to be used in research that investigates the presumed links among teachers' content preparation, their knowledge, their practice, and student learning.
    Determining a correct response to many test items frequently requires more than one ability. This paper describes the characteristics of items of this type by proposing generalizations of the item response theory concepts of... more
    Determining a correct response to many test items frequently requires more than one ability. This paper describes the characteristics of items of this type by proposing generalizations of the item response theory concepts of discrimination and information. The conceptual framework for these statistics is presented, and the formulas for the statistics are derived for the multidimensional extension of the two-parameter logistic model. Use of the statistics is demonstrated for a form of the ACT Mathematics Usage Test.
    We investigate whether commonly used value-added estimation strategies produce accurate estimates of teacher effects under a variety of scenarios. We estimate teacher effects in simulated student achievement data sets that mimic plausible... more
    We investigate whether commonly used value-added estimation strategies produce accurate estimates of teacher effects under a variety of scenarios. We estimate teacher effects in simulated student achievement data sets that mimic plausible types of student grouping and teacher assignment scenarios. We find that no one method accurately captures true teacher effects in all scenarios, and the potential for misclassifying teachers as highor low-performing can be substantial. However, a dynamic OLS estimator is more robust across scenarios than other estimators. Misspecifying dynamic relationships can exacerbate estimation problems. Value-Added Measures of Teacher Performance Can Value-Added Measures of Teacher Performance Be Trusted? Cassandra M. Guarino (Corresponding Author) Associate Professor of Educational Leadership and Policy Studies Indiana University 4220 W. W. Wright Education Building, 201 N. Rose Avenue Bloomington, IN 47405-1006 USA Phone: (812) 856-2927 Email: guarino@indi...
    The Teacher Education and Development Study (TEDS-M), a cross-national study of teacher education programs that prepare future primary and secondary mathematics teachers, included a series of measures of mathematics achievement designed... more
    The Teacher Education and Development Study (TEDS-M), a cross-national study of teacher education programs that prepare future primary and secondary mathematics teachers, included a series of measures of mathematics achievement designed to determine what prospective teachers knew and could do concerning the mathematics that they would likely teach. One of the goals of the study was to report the information about prospective teachers’ knowledge and skills in a way that is easy to understand by the numerous audiences for the results of the study. The values that are reported are often obtained using an item response theory (IRT) model that gives estimates of a location on the latent scale for a hypothetical construct. For any of these numerical values, it is difficult to interpret what a person knows or can do. At best, the numbers can be used to order persons or groups according to the magnitude of what they know or can do, but not whether they have particular capabilities such as b...
    Computerized adaptive testing requires a well-designed item pool containing an appropriate number of items to build individualized tests that match the examinees’ ability levels. An optimal item pool should also contain well-balanced... more
    Computerized adaptive testing requires a well-designed item pool containing an appropriate number of items to build individualized tests that match the examinees’ ability levels. An optimal item pool should also contain well-balanced items that will achieve optimal item usage and lower the cost of item creation. One of the methods for designing the blueprint for an item pool is Reckase’s method (2003), which is a Monte Carlo method to determine the properties of an optimal item pool. This study extended the method for designing item pools calibrated with the three-parameter logistic model and applied it to situations where the Sympson-Hetter procedure is used to control the item exposure rate. The procedures for designing the item pool and two approaches for simulating test items are presented. The performance of simulated item pools are evaluated along with an operational item pool.

    And 161 more