首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Due to recent research in equating methodologies indicating that some methods may be more susceptible to the accumulation of equating error over multiple administrations, the sustainability of several item response theory methods of equating over time was investigated. In particular, the paper is focused on two equating methodologies: fixed common item parameter scaling (with two variations, FCIP‐1 and FCIP‐2) and the Stocking and Lord characteristic curve scaling technique in the presence of nonequivalent groups. Results indicated that the improvements made to fixed common item parameter scaling in the FCIP‐2 method were sustained over time. FCIP‐2 and Stocking and Lord characteristic curve scaling performed similarly in many instances and produced more accurate results than FCIP‐1. The relative performance of FCIP‐2 and Stocking and Lord characteristic curve scaling depended on the nature of the change in the ability distribution: Stocking and Lord characteristic curve scaling captured the change in the distribution more accurately than FCIP‐2 when the change was different across the ability distribution; FCIP‐2 captured the changes more accurately when the change was consistent across the ability distribution.  相似文献   

2.
This paper illustrates that the psychometric properties of scores and scales that are used with mixed‐format educational tests can impact the use and interpretation of the scores that are reported to examinees. Psychometric properties that include reliability and conditional standard errors of measurement are considered in this paper. The focus is on mixed‐format tests in situations for which raw scores are integer‐weighted sums of item scores. Four associated real‐data examples include (a) effects of weights associated with each item type on reliability, (b) comparison of psychometric properties of different scale scores, (c) evaluation of the equity property of equating, and (d) comparison of the use of unidimensional and multidimensional procedures for evaluating psychometric properties. Throughout the paper, and especially in the conclusion section, the examples are related to issues associated with test interpretation and test use.  相似文献   

3.
In educational measurement, the construction of parallel test forms is often a combinatorial optimization problem that involves the time-consuming selection of items to construct tests having approximately the same test information functions (TIFs) and constraints. This article proposes a novel method, genetic algorithm (GA), to construct parallel test forms effectively. The sum of squared errors of the generated TIFs produced by GA were compared with those of the Swanson and Stocking method, and the Wang and Ackerman method. Experimental results show that tests constructed using GA yielded lower error, and an average improvement ratio above 90%.  相似文献   

4.
This paper describes a procedure for automated test forms assembly based on Classical Test Theory (CTT). The procedure uses stratified random content sampling and test form pre-equating to ensure both content and psychometric equivalence in generating virtually unlimited parallel forms. The procedure extends the usefulness of CTT in automated test form construction, yielding classical item statistics based on representative sample distributions and pre-equated test forms with known psychometric characteristics. A rationale for the procedure is presented followed by an example application and discussion of psychometric considerations related to its use.  相似文献   

5.
A mixed‐effects item response theory (IRT) model is presented as a logical extension of the generalized linear mixed‐effects modeling approach to formulating explanatory IRT models. Fixed and random coefficients in the extended model are estimated using a Metropolis‐Hastings Robbins‐Monro (MH‐RM) stochastic imputation algorithm to accommodate for increased dimensionality due to modeling multiple design‐ and trait‐based random effects. As a consequence of using this algorithm, more flexible explanatory IRT models, such as the multidimensional four‐parameter logistic model, are easily organized and efficiently estimated for unidimensional and multidimensional tests. Rasch versions of the linear latent trait and latent regression model, along with their extensions, are presented and discussed, Monte Carlo simulations are conducted to determine the efficiency of parameter recovery of the MH‐RM algorithm, and an empirical example using the extended mixed‐effects IRT model is presented.  相似文献   

6.
The authors conducted a 3‐phase investigation into the credible standards for phenomenological research practices identified in the literature and endorsed by a sample of counselor education qualitative research experts. Utilizing a mixed‐methods approach, the findings offer evidence that professional counseling has a distinctive format in which phenomenological research is produced.  相似文献   

7.
This article used the Wald test to evaluate the item‐level fit of a saturated cognitive diagnosis model (CDM) relative to the fits of the reduced models it subsumes. A simulation study was carried out to examine the Type I error and power of the Wald test in the context of the G‐DINA model. Results show that when the sample size is small and a larger number of attributes are required, the Type I error rate of the Wald test for the DINA and DINO models can be higher than the nominal significance levels, while the Type I error rate of the A‐CDM is closer to the nominal significance levels. However, with larger sample sizes, the Type I error rates for the three models are closer to the nominal significance levels. In addition, the Wald test has excellent statistical power to detect when the true underlying model is none of the reduced models examined even for relatively small sample sizes. The performance of the Wald test was also examined with real data. With an increasing number of CDMs from which to choose, this article provides an important contribution toward advancing the use of CDMs in practical educational settings.  相似文献   

8.
Kathleen M. Goodman, Marcia Baxter Magolda, Tricia A. Seifert, and Patricia M. King review both quantitative and qualitative data to understand students' college experiences and provide powerful information to guide educators.  相似文献   

9.
The current study examined children's identification and reasoning about their subjective social status (SSS), their beliefs about social class groups (i.e., the poor, middle class, and rich), and the associations between the two. Study participants were 117 10‐ to 12‐year‐old children of diverse racial, ethnic, and socioeconomic backgrounds attending a laboratory elementary school in Southern California. Results indicated that children's SSS ratings correlated with indicators of family socioeconomic status and were informed by material possessions, lifestyle characteristics, and social and societal comparisons. Children rated the poor as having fewer positive attributes and more negative attributes than the middle class, and fewer positive attributes than the rich. Lower SSS children held less positive attitudes toward the poor than children with middle SSS ratings.  相似文献   

10.
The purpose of this research was to explore some of the most prevalent methods for conducting Levels 4 and 5 of technical training evaluation among large organizations with a preponderance of technical talent. The researchers collected data through a survey and conducted interviews with select study participants. The sample size for the study (n = 26) comprised predominantly large, global organizations in technical industries. While a larger percentage of organizations have been found than in previous research to conduct Level 4 evaluations, few conduct Level 5 evaluations for their technical training, and most of the participant organizations struggle with advanced analytical techniques for technical training evaluation. The article summarizes some of the most prevalent training evaluation models reported in the literature since 2000, and provides useful examples from study participants of how they evaluated their technical training at Levels 3 and 4, along with their advice to fellow technical training and performance improvement professionals. Although the study was exploratory in nature and utilized a small sample size, the study is only the second study since 2000 to specifically explore the evaluation practices of large organizations with a focus on technical training as opposed to general training.  相似文献   

11.
Test accommodations for English learners (ELs) are intended to reduce the language barrier and level the playing field, allowing ELs to better demonstrate their true proficiencies. Computer-based accommodations for ELs show promising results for leveling that field while also providing us with additional data to more closely investigate the validity and effectiveness of those accommodations. In this study, we evaluate differences across non-ELs and two EL groups in their decision to use either of two computer-based accommodations on high school history and math assessments. We also evaluate differences in response times across these groups. Results showed that ELs used accommodations more than non-ELs; however, many students did not use any accommodations, and use decreased as the assessment progressed. In addition, students had longer response time for items with accommodations in history but not mathematics. Recommendations for future research in accommodations for ELs are discussed.  相似文献   

12.
Given the growing emphasis on supervision in the counseling field, mental health professionals need updated training on the varied and complex issues in supervision and introduction to emerging theoretical models. The authors present a didactic‐theoretical‐experiential model of supervision training to be used in a workshop format. Curriculum information is included, along with ideas for exercises and class discussion. The authors discuss common themes and potential challenges that have emerged from their experiences as well as from feedback from evaluations. Readers are provided with information to help develop and enrich their training skills in supervision.  相似文献   

13.
This study examines the effectiveness of three approaches for maintaining equivalent performance standards across test forms with small samples: (1) common‐item equating, (2) resetting the standard, and (3) rescaling the standard. Rescaling the standard (i.e., applying common‐item equating methodology to standard setting ratings to account for systematic differences between standard setting panels) has received almost no attention in the literature. Identity equating was also examined to provide context. Data from a standard setting form of a large national certification test (N examinees = 4,397; N panelists = 13) were split into content‐equivalent subforms with common items, and resampling methodology was used to investigate the error introduced by each approach. Common‐item equating (circle‐arc and nominal weights mean) was evaluated at samples of size 10, 25, 50, and 100. The standard setting approaches (resetting and rescaling the standard) were evaluated by resampling (N = 8) and by simulating panelists (N = 8, 13, and 20). Results were inconclusive regarding the relative effectiveness of resetting and rescaling the standard. Small‐sample equating, however, consistently produced new form cut scores that were less biased and less prone to random error than new form cut scores based on resetting or rescaling the standard.  相似文献   

14.
张燕君 《海外英语》2012,(21):280-282
Test paper evaluation is an important work for the management of tests,which results are significant bases for scientific summation of teaching and learning.Taking an English test paper of high students’monthly examination as the object,it focuses on the interpretation of SPSS output concerning item and whole quantitative analysis of papers.By analyzing and evaluating the papers,it can be a feedback for teachers to check the students’progress and adjust their teaching process.  相似文献   

15.
The purpose of this mixed‐methods article was to report two studies exploring the relationships between academic procrastination and motivation in 208 undergraduates with (n= 101) and without (n= 107) learning disabilities (LD). In Study 1, the results from self‐report surveys found that individuals with LD reported significantly higher levels of procrastination, coupled with lower levels of metacognitive self‐regulation and self‐efficacy for self‐regulation than those without LD. Procrastination was most strongly (inversely) related to self‐efficacy for self‐regulation for both groups, and the set of motivation variables reliably predicted group membership with regard to LD status. In Study 2, individual interviews with 12 students with LD resulted in five themes: LD‐related problems, self‐beliefs and procrastination, outcomes of procrastination, antecedents of procrastination, and support systems. The article concludes with an integration of quantitative and qualitative results, with attention paid to implications for service providers working with undergraduates with LD.  相似文献   

16.
Testing organization needs large numbers of high‐quality items due to the proliferation of alternative test administration methods and modern test designs. But the current demand for items far exceeds the supply. Test items, as they are currently written, evoke a process that is both time‐consuming and expensive because each item is written, edited, and reviewed by a subject‐matter expert. One promising approach that may address this challenge is with automatic item generation. Automatic item generation combines cognitive and psychometric modeling practices to guide the production of items that are generated with the aid of computer technology. The purpose of this study is to describe and illustrate a process that can be used to review and evaluate the quality of the generated item by focusing on the content and logic specified within the item generation procedure. We illustrate our process using an item development example from mathematics drawn from the Common Core State Standards and from surgical education drawn from the health sciences domain.  相似文献   

17.
Management science professors who teach large classes often assess students with multiple‐choice questions (MCQs) because it is efficient. However, traditional MCQ formats are ill‐fitted for constructive feedback. We propose the reward for omission with confidence in knowledge (ROCK) format as an original formative assessment technique to help guide feedback associated with MCQs in an introductory undergraduate management science course. Our study contributes to theory by empirically showing that students can self‐assess their state of knowledge, signal it to the professor, and use proper answering options. In practice, ROCK is an easily implementable MCQ format that allows professors to gain information on student learning based on answers selected. ROCK identifies lack of knowledge or misinformation at both individual and collective levels thus providing opportunities for better feedback in class and during office hours. Limitations of the application of ROCK are also discussed.  相似文献   

18.
Neuroanatomical localization (NL) is a key skill in neurology, but learners often have difficulty with it. This study aims to evaluate a concise NL tool (NLT) developed to help teach and learn NL. To evaluate the NLT, an extended‐matching questions (EMQ) test to assess NL was designed and validated. The EMQ was validated with fourth‐year medical students and internal medicine and neurology residents. The NLT's usability was evaluated with third‐ and fourth‐year students, and the effectiveness was evaluated with an experimental study of second‐year students, using the EMQ as the outcome measure. Students were taught how to use both the NLT and textbook algorithms (control) to perform NL, then randomized into either group, and only allowed to use their assigned tool to complete the EMQ. Primary outcome was the difference in mean EMQ scores expressed as a percentage of total score. For EMQ validation, students (n = 56) scored lower than residents (n = 50) (76.7% ± 1.7 vs. 83.0% ± 1.6; mean ± standard error of mean, P < 0.009). The EMQ demonstrated good reliability (Cronbach's α 0.85) and generalizability (G‐coefficient 0.85). Third‐ (n = 77) and fourth‐year (n = 42) students found the NLT user‐friendly and helpful in their learning of NL. In the experimental study, scores were significantly higher for NLT group (n = 94) than for controls (n = 101) (42.5 vs. 37.0%, P = 0.014); the effect size (Cohen's d) was 0.36. The EMQ is validated to reliably assess NL and is generalizable, feasible, practical, and of low cost. The concise and user‐friendly NLT for NL was effective in aiding medical student performance of NL. Anat Sci Educ 11: 262–269. © 2017 American Association of Anatomists.  相似文献   

19.
Investigating the fit of a parametric model plays a vital role in validating an item response theory (IRT) model. An area that has received little attention is the assessment of multiple IRT models used in a mixed-format test. The present study extends the nonparametric approach, proposed by Douglas and Cohen (2001), to assess model fit of three IRT models (three- and two-parameter logistic model, and generalized partial credit model) used in a mixed-format test. The statistical properties of the proposed fit statistic were examined and compared to S-X2 and PARSCALE’s G2. Overall, RISE (Root Integrated Square Error) outperformed the other two fit statistics under the studied conditions in that the Type I error rate was not inflated and the power was acceptable. A further advantage of the nonparametric approach is that it provides a convenient graphical inspection of the misfit.  相似文献   

20.
A statistical test for the detection of answer copying on multiple-choice tests is presented. The test is based on the idea that the answers of examinees to test items may be the result of three possible processes: (1) knowing, (2) guessing, and (3) copying, but that examinees who do not have access to the answers of other examinees can arrive at their answers only through the first two processes. This assumption leads to a distribution for the number of matched incorrect alternatives between the examinee suspected of copying and the examinee believed to be the source that belongs to a family of "shifted binomials." Power functions for the tests for several sets of parameter values are analyzed. An extension of the test to include matched numbers of correct alternatives would lead to improper statistical hypotheses.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号