首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
In test development, item response theory (IRT) is a method to determine the amount of information that each item (i.e., item information function) and combination of items (i.e., test information function) provide in the estimation of an examinee's ability. Studies investigating the effects of item parameter estimation errors over a range of ability have demonstrated an overestimation of information when the most discriminating items are selected (i.e., item selection based on maximum information). In the present study, the authors examined the influence of item parameter estimation errors across 3 item selection methods—maximum no target, maximum target, and theta maximum—using the 2- and 3-parameter logistic IRT models. Tests created with the maximum no target and maximum target item selection procedures consistently overestimated the test information function. Conversely, tests created using the theta maximum item selection procedure yielded more consistent estimates of the test information function and, at times, underestimated the test information function. Implications for test development are discussed.  相似文献   

2.
An important part of test development is ensuring alignment between test forms and content standards. One common way of measuring alignment is the Webb (1997, 2007) alignment procedure. This article investigates (a) how well item writers understand components of the definition of Depth of Knowledge (DOK) from the Webb alignment procedure and (b) how consistent their DOK ratings are with ratings provided by other committees of educators across grade levels, content areas, and alternate assessment levels in a Midwestern state alternate assessment system. Results indicate that many item writers understand key features of DOK. However, some item writers struggled to articulate what DOK means and had some misconceptions. Additional analyses suggested some lack of consistency between the item writer DOK ratings and the committee DOK ratings. Some notable differences were found across alternate assessment levels and content areas. Implications for future item writing training and alignment studies are provided.  相似文献   

3.
《Educational Assessment》2013,18(4):333-356
Alignment has taken on increased importance given the current high-stakes nature of assessment. To make well-informed decisions about student learning on the basis of test results, assessment items need to be well aligned with standards. Project 2061 of the American Association for the Advancement of Science (AAAS) has developed a procedure for analyzing the content and quality of assessment items. The authors of this study used this alignment procedure to closely examine 2 mathematics assessment items. Student work on these 2 items was analyzed to determine whether the conclusions reached through the use of the alignment procedure could be validated. It was found that the Project 2061 alignment procedure was effective in providing a tool for in-depth analysis of the mathematical content of the item and a set of standards and in identifying 1 particular content standard that was most closely aligned with the standard. Through analyzing student work samples and student interviews, it was also found that students' thinking may not correspond to the standard identified as best aligned with the learning goals of the item. This finding highlights the potential usefulness of analyzing student work to clarify any additional deficiencies of an assessment item not revealed by an alignment procedure.  相似文献   

4.
During the development of large‐scale curricular achievement tests, recruited panels of independent subject‐matter experts use systematic judgmental methods—often collectively labeled “alignment” methods—to rate the correspondence between a given test's items and the objective statements in a particular curricular standards document. High disagreement among the expert panelists may indicate problems with training, feedback, or other steps of the alignment procedure. Existing procedural recommendations for alignment reviews have been derived largely from single‐panel research studies; support for their use during operational large‐scale test development may be limited. Synthesizing data from more than 1,000 alignment reviews of state achievement tests, this study identifies features of test–standards alignment review procedures that impact agreement about test item content. The researchers then use their meta‐regression results to propose some practical suggestions for alignment review implementation.  相似文献   

5.

Research related to the “teacher characteristics” dimension of teacher quality has proven inconclusive and weakly related to student success, and addressing the teaching contexts may be crucial for furthering this line of inquiry. International large-scale assessments are well positioned to undertake such questions due to their systematic sampling of students, schools, and education systems. However, researchers are frequently prohibited from answering such questions due to measurement invariance related issues. This study uses the traditional multiple group confirmatory factor analysis (MGCFA) and an alignment optimization method to examine measurement invariance in several constructs from the teacher questionnaires in the Trends in International Mathematics and Science Study (TIMSS) 2015 across 46 education systems. Constructs included mathematics teacher’s Job satisfaction, School emphasis on academic success, School condition and resources, Safe and orderly school, and teacher’s Self-efficacy. The MGCFA results show that just three constructs achieve invariance at the metric level. However, an alignment optimization method is applied, and results show that all five constructs fall within the threshold of acceptable measurement non-invariance. This study therefore presents an argument that they can be validly compared across education systems, and a subsequent comparison of latent factor means compares differences across the groups. Future research may utilize the estimated factor means from the aligned models in order to further investigate the role of teacher characteristics and contexts in student outcomes.

  相似文献   

6.
The purpose of this ITEMS module is to provide an introduction to differential item functioning (DIF) analysis using mixture item response models. The mixture item response models for DIF analysis involve comparing item profiles across latent groups, instead of manifest groups. First, an overview of DIF analysis based on latent groups, called latent DIF analysis, is provided and its applications in the literature are surveyed. Then, the methodological issues pertaining to latent DIF analysis are described, including mixture item response models, parameter estimation, and latent DIF detection methods. Finally, recommended steps for latent DIF analysis are illustrated using empirical data.  相似文献   

7.
In this research, the author addresses whether the application of unidimensional item response models provides valid interpretation of test results when administering items sensitive to multiple latent dimensions. Overall, the present study found that unidimensional models are quite robust to the violation of the unidimensionality assumption due to secondary dimensions from sensitive items. When secondary dimensions are highly correlated with main construct, unidimensional models generally fit and the accuracy of ability estimation is comparable to that of strictly unidimensional tests. In addition, longer tests are more robust to the violation of the essential unidimensionality assumption than shorter ones. The author also shows that unidimensional item response theory models estimate item difficulty parameter better than item discrimination parameter in tests with secondary dimensions.  相似文献   

8.
Although the Expectancy-Value Model offers one of the most influential models for understanding motivation, one component of this model, cost, has been largely ignored in empirical research. Fortunately, recent research is emerging on cost, but no clear consensus has emerged for operationalizing and measuring it. To address this shortcoming, we outline a comprehensive scale development process that builds and extends on prior work. We conducted a literature review of theory and existing measurement, a qualitative study with students, a content alignment with experts, exploratory and confirmatory factor analysis, and a correlational study. In the literature and across our studies, we found that cost was salient to students, separate from expectancy and value components, contained multiple dimensions, and related to student outcomes. This work led to proposing a new, 19 item cost scale with four dimensions: task effort cost, outside effort cost, loss of valued alternatives cost, and emotional cost. In addition, to extend existing cost measures, careful attention was taken to operationalize the cost dimensions such that the scale could be easily used with a wide variety of students in various contexts. Directions for future research and the implications for the study of motivation are discussed.  相似文献   

9.
《教育实用测度》2013,26(2):125-141
Item parameter instability can threaten the validity of inferences about changes in student achievement when using Item Response Theory- (IRT) based test scores obtained on different occasions. This article illustrates a model-testing approach for evaluating the stability of IRT item parameter estimates in a pretest-posttest design. Stability of item parameter estimates was assessed for a random sample of pretest and posttest responses to a 19-item math test. Using MULTILOG (Thissen, 1986), IRT models were estimated in which item parameter estimates were constrained to be equal across samples (reflecting stability) and item parameter estimates were free to vary across samples (reflecting instability). These competing models were then compared statistically in order to test the invariance assumption. The results indicated a moderately high degree of stability in the item parameter estimates for a group of children assessed on two different occasions.  相似文献   

10.
Strategic alignment in learning and talent development (LTD) is a means to make connections between the work of LTD and the business to demonstrate strategic value. Feedback about LTD’s strategic performance may be derived from the degree to which this alignment is achieved and the strategic value that may be delivered to stakeholders. Three studies were performed with the goal of confirming a valid and reliable measure of perceived strategic alignment in LTD functions. The results of the studies produced a two‐factor, 15‐item factor structure for a learning and talent development strategic alignment (LDSA) scale with 58.143% total explained variance and describe the type of relationship and skills stakeholders require to perceive the LTD function as aligned to the business. It also addresses how LTD can proactively design and manage the relationship to improve its perceived connection to the business and ability to demonstrate strategic value.  相似文献   

11.
This paper serves as an illustration of the usefulness of structurally incomplete designs as an approach to reduce the length of educational questionnaires. In structurally incomplete test designs, respondents only fill out a subset of the total item set, while all items are still provided to the whole sample. The scores on the unadministered items are subsequently dealt with by using methods for the estimation of missing data. Two structurally incomplete test designs — one recording two thirds, and the other recording a half of the potentially complete data — were applied to the complete item scores on 8 educational psychology scales. The incomplete item scores were estimated with missing data method Data Augmentation. Complete and estimated test data were compared at the estimates of total scores, reliability, and predictive validity of an external criterion. The reconstructed data yielded estimates that were very close to the values in the complete data. As expected the statistical uncertainty was higher in the design that recorded fewer item scores. It was concluded that the procedure of applying incomplete test designs and subsequently dealing with the missing values is very fruitful for reducing questionnaire length.  相似文献   

12.
The alignment of test items to content standards is critical to the validity of decisions made from standards‐based tests. Generally, alignment is determined based on judgments made by a panel of content experts with either ratings averaged or via a consensus reached through discussion. When the pool of items to be reviewed is large, or the content‐matter experts are broadly distributed geographically, panel methods present significant challenges. This article illustrates the use of an online methodology for gauging item alignment that does not require that raters convene in person, reduces the overall cost of the study, increases time flexibility, and offers an efficient means for reviewing large item banks. Latent trait methods are applied to the data to control for between‐rater severity, evaluate intrarater consistency, and provide item‐level diagnostic statistics. Use of this methodology is illustrated with a large pool (1,345) of interim‐formative mathematics test items. Implications for the field and limitations of this approach are discussed.  相似文献   

13.
This study examined visual cueing methods via reading aptitudes based on a sample of 256 fourth-grade subjects. The methods, based on displays of 12 pictures and accompanying passages, included: (a) noncues; (b) pictorial cues (i.e., arrows and labels]; (c) textual cues (i.e., underlined type and colored type); and (d) a combination of pictorial and textual cues. Subjects were of average reading ability (ARA) and low reading ability (LRA). One major finding indicated that ARA and LRA subjects achieved significantly higher scores on the combinational cues than on the noncues. The results also revealed that ARA subjects outscored LRA subjects on all four methods. In regard to the pictorial cues, the ARA and LRA groups scored higher on the label cues than on the arrow cues, and the ARA group surpassed the LRA group on the label and arrow cues.  相似文献   

14.
A Monte Carlo simulation technique for generating dichotomous item scores is presented that implements (a) a psychometric model with different explicit assumptions than traditional parametric item response theory (IRT) models, and (b) item characteristic curves without restrictive assumptions concerning mathematical form. The four-parameter beta compound-binomial (4PBCB) strong true score model (with two-term approximation to the compound binomial) is used to estimate and generate the true score distribution. The nonparametric item-true score step functions are estimated by classical item difficulties conditional on proportion-correct total score. The technique performed very well in replicating inter-item correlations, item statistics (point-biserial correlation coefficients and item proportion-correct difficulties), first four moments of total score distribution, and coefficient alpha of three real data sets consisting of educational achievement test scores. The technique replicated real data (including subsamples of differing proficiency) as well as the three-parameter logistic (3PL) IRT model (and much better than the 1PL model) and is therefore a promising alternative simulation technique. This 4PBCB technique may be particularly useful as a more neutral simulation procedure for comparing methods that use different IRT models.  相似文献   

15.
States have moved rapidly over the past 20 years to institute systems of standards and assessments. State assessments in particular take on added importance at the high school level as they are required for graduation by an increasing number of states. Federal legislation mandating testing in high school also serves to increase the stakes and impact of state exams. Many states are also using high school exams for postsecondary purposes, although the content and criterion validity of these exams in relation to students' post-high school pursuits is not well documented. Though no state exam was developed with the express intent of aligning specifically with postsecondary education, it is nonetheless important to understand this linkage given the wide-ranging use of high school exams across the country. This study analyzed the content of state tests relative to a set of standards that identify knowledge and skills necessary for success in entry-level university courses. A total of 60 math and English assessments from 20 states were analyzed along a number of alignment dimensions. Exams were found to be moderately aligned with a subset of the university standards, but in an uneven fashion. English exams were somewhat more aligned than math exams, but math exams had high alignment in some specific standard areas, and English exams aligned poorly or not at all in areas requiring higher order thinking. In the future, states using high school exams for postsecondary purposes may want to examine the content of state standards and exams to determine their relationship to college-readiness criteria.  相似文献   

16.
As part of a research project examining relationships between instructional practices and student cognitive and social outcomes in middle-school mathematics classes, external observers and students reported perceptions of teachers’ instructional practices. The extent to which students in classrooms identified by external raters as reform-oriented actually perceive instruction in ways aligned with reform principles has not been established. A 25-item observation protocol aligned with the reform practices called for in the Standards of the National Council of Teachers of Mathematics (NCTM) was used to develop a quantitative profile of instructional practices across two lessons in each of 28 classes of 15 participating teachers. Students in each of the observed classes completed a 49-item survey of their perceptions of instructional practices. As items for both the observation protocol and Student Survey were designed to measure alignment with the same dimensions of reform practice, the convergence of these two data sets was examined as a means to confirm the observation ratings. The findings show moderately strong correlations between ratings of external observers and perceptions of sixth-grade students across three dimensions (pedagogy, tasks and mathematical interactions) of reform-oriented teacher practice in mathematics classrooms. Implications of these findings for future research are discussed. The research in this article was supported by the National Science Foundation (NSF) under grant REC 0125868. The opinions expressed here are those of the authors and do not necessarily reflect the view of NSF. The research was also supported by the Roysters’ Fellowship to Mark Ellis from the Graduate School at the University of North Carolina at Chapel Hill.  相似文献   

17.
Variation in test performance among examinees from different regions or national jurisdictions is often partially attributed to differences in the degree of content correspondence between local school or training program curricula, and the test of interest. This posited relationship between test-curriculum correspondence, or “alignment,” and test performance is usually inferred from highly distal evidence, rather than directly examined. Utilizing mathematics standards content analysis data and achievement test item data from ten U.S. states, we examine the relationship between topic-specific alignment and test item performance. When a particular item’s content type is emphasized by the standards, we find evidence of a positive relationship between the alignment measure and proportion-correct test item difficulty, although this effect is not consistent across samples. Implications of the results for curricular achievement test development and score interpretation are discussed.  相似文献   

18.
This article examines whether the way that PISA models item outcomes in mathematics affects the validity of its country rankings. As an alternative to PISA methodology a two-parameter model is applied to PISA mathematics item data from Canada and Finland for the year 2012. In the estimation procedure item difficulty and dispersion parameters are allowed to differ across the two countries and samples are restricted to respondents who actually answered items in a mathematics cluster. Different normalizations for identifying the distribution parameters are also considered. The choice of normalization is shown to be crucial in guaranteeing certain invariance properties required by item response models. The ability scores obtained from the methods employed here are significantly higher for Finland, in sharp contrast to PISA results, which gave both countries very similar ranks in mathematics.  相似文献   

19.
The experience of fluency while learning might bias students’ metacognitive judgments of learning (JOLs) and impair the efficacy of their study behaviors. In the present experiments, we examined whether perceptual fluency affects JOLs (1) when people only experience one level of fluency, (2) when item relatedness is also available as a cue, and (3) across study-test trials. Participants studied a list of paired associates over two study-test trials and made JOLs for each item after studying it. We varied the perceptual fluency of the memory materials by making the font easy (fluent) or difficult (disfluent) to read. We also varied whether we manipulated the perceptual fluency of the items between-participants or within-participants and whether other memory factors—item relatedness and study time—were available for participants to use to inform their JOLs. We were only able to obtain effects of perceptual fluency on JOLs when we manipulated fluency within-participants and eliminated item relatedness as a cue for JOLs. The present results indicate that some effects of perceptual fluency on JOLs are not robust and might only occur under limited—and somewhat contrived—conditions. Therefore, these effects might be unlikely to bias students’ JOLs in actual learning situations.  相似文献   

20.
Norm‐referenced measurement tools — such as reliability, validity, and item analysis — are commonly used to reach and verify conclusions about criteria. Similar tools for criterion‐referenced testing situations are scant. This study examined faculty planning and testing decisions and applied formulas to arrive at numerical indices that serve as analytical tools for use with criterion‐referenced tests. The research documents the effects of applying the concept of platform unity, which has its roots in curriculum alignment theory. Alignment of curriculum occurs if the planned, the delivered, and the tested curricula are congruent. Specifically, platform unity aligns planned, domain‐referenced content with appropriate test types. Mathematical formulas were created to determine numerically if planned and tested content were congruent. In addition, four other constructs were examined. They included effectiveness and efficiency of test‐item type selection and overtesting and undertesting of course content. A chi‐square goodness‐of‐fit test was used to compare faculty planning and testing decisions. Data indicated significant differences (p < .01) between content plans and the test types used to test content. On the basis of the analysis, it was determined that faculty do not plan and test content congruently across three levels of cognitive content. Also, faculty tended to overtest content; they were effective in their selection of test types, but not efficient.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号