There has been an increased interest in the impact of unmotivated test taking on test performance and score validity. This has led to the development of new ways of measuring test-taking effort based on item response time. In particular, Response Time Effort (RTE) has been shown to provide an assessment of effort down to the level of individual item responses. A limitation of RTE, however, is that it is intended for use with selected response items that must be answered before a test taker can move on to the next item. The current study outlines a general process for measuring item-level effort that can be applied to an expanded set of item types and test-taking behaviors (such as omitted or constructed responses). This process, which is illustrated with data from a large-scale assessment program, should improve our ability to detect non-effortful test taking and perform individual score validation.  相似文献   

This study investigated the performance of four widely used data-collection designs in detecting test-mode effects (i.e., computer-based versus paper-based testing). The experimental conditions included four data-collection designs, two test-administration modes, and the availability of an anchor assessment. The test-level and item-level results were analyzed with inferential statistics and multidimensional scaling, respectively. The test-level results supported the superiority of the single-group counterbalanced design and the random-groups design over the single-group design without counterbalancing and the anchor test design in the recovery of the actual test-mode effects. Analysis at the item level revealed the presence of a two-dimensional solution, with the data-collection design contributing significant variability over and beyond the test mode.  相似文献   

The objective of the present investigation was to examine the comparability of writing prompts for different gender groups in the context of the computer-based Test of English as a Foreign Language? (TOEFL®-CBT). A total of 87 prompts administered from July 1998 through March 2000 were analyzed. An extended version of logistic regression for polytomous items was used to investigate both uniform and non-uniform gender effects. An English Language Ability variable was developed from the multiple-choice components of the TOEFL®-CBT examination and used as a matching variable. Initially, most of the prompts were flagged because of statistically significant uniform gender effects, with some prompts displaying non-uniform effects as well. Nevertheless, the effect sizes were too small for any of those flagged prompts to be classified as having an important group effect. These findings are discussed in relation to prompt content review, gender format differences, and second language learning theories.  相似文献   

大数据分析表明,在高考和中考等学业成就测试方面,女生的表现明显优于男生;而在国际奥林匹克数学、物理和化学竞赛等高阶思维能力测试方面,男生的表现明显优于女生.其中,女生在文史类科目上成绩更好,而男生在数学和自然科学方面成绩更优.其原因可能与个体的生理和心理发展特点,学校、家庭和社会的教育观念、教学模式和心理期待,以及考试内容和方法等因素有关.深层次的原因则需要进一步调研.这些现象带来的启示是:可以鼓励女孩加强数学和自然科学知识的学习,鼓励男孩多关注语言和社会科学知识的学习,同时加强学生高阶思维能力的培养和测试力度.  相似文献   

Are there important aspects of human ability that we have not been measuring? What are the purposes and types of audio that are possible in computerized tests? Will the use of audio in computer‐based tests lead to more valid and reliable measurement?  相似文献   

When low-stakes assessments are administered, the degree to which examinees give their best effort is often unclear, complicating the validity and interpretation of the resulting test scores. This study introduces a new method, based on item response time, for measuring examinee test-taking effort on computer-based test items. This measure, termed response time effort (RTE), is based on the hypothesis that when administered an item, unmotivated examinees will answer too quickly (i.e., before they have time to read and fully consider the item). Psychometric characteristics of RTE scores were empirically investigated and supportive evidence for score reliability and validity was found. Potential applications of RTE scores and their implications are discussed.  相似文献   

论性别对词汇记忆的影响   总被引:1,自引:0,他引:1  
文章从短时和长时记忆成绩两个变量中探究是否存在性别差异及变量间的关系,并对统计数据进行分析;结果发现短时、长时记忆成绩都存在显著性别差异,女生成绩明显优于男生;短时记忆和长时记忆成绩间存在很高的正相关系数,短时记忆对长时记忆成绩的预测效果也很强.  相似文献   

从社会性别视角看新疆少数民族女性禁忌习俗   总被引:1,自引:0,他引:1  
禁忌习俗作为一种传统文化普遍存在于人们的生活之中,为人们所共同遵守。女性禁忌在新疆少数民族女性中也普遍存在,其实质是社会性别论在新疆民族现实生活中的折射与再现。试从社会性别的视角探讨新疆民族女性禁忌习俗的成因及其影响。  相似文献   

Researchers in education are often interested in determining whether independent groups are equivalent on a specific outcome. Equivalence tests for 2 independent populations have been widely discussed, whereas testing for equivalence with more than 2 independent groups has received little attention. The authors discuss alternatives for testing the equivalence of more than 2 independent populations, and they use a Monte Carlo study to demonstrate and compare the performance of these alternatives under several conditions. The results indicate that a 1-way test (e.g., Wellek's F test) is recommended for assessing the equivalence of more than 2 independent groups because approaches based on conducting pairwise tests of equivalence are overly conservative.  相似文献   

The present study focused on gender differences in the tendency to omit items and to guess in multiple-choice tests. It was hypothesized that males would show greater guessing tendencies than females and that the use of formula scoring rather than the use of number of correct answers would result in a relative advantage for females. Two samples were examined: ninth graders and applicants to Israeli universities. The teenagers took a battery of five or six aptitude tests used to place them in various high schools, and the adults took a battery of five tests designed to select candidates to the various faculties of the Israeli universities. The results revealed a clear male advantage in most subtests of both batteries. Four measures of item-omission tendencies were computed for each subtest, and a consistent pattern of greater omission rates among females was revealed by all measures in most subtests of the two batteries. This pattern was observed even in the few subtests that did not show male superiority and even when permissive instructions were used. Correcting the raw scores for guessing reduced the male advantage in all cases (and in the few subtests that showed female advantage the difference increased as a result of this correction), but this effect was small. It was concluded that although gender differences in guessing tendencies are robust they account for only a small fraction of the observed gender differences in multiple-choice tests. The results were discussed, focusing on practical implications.  相似文献   

The integration of modern methods for causal inference with latent class analysis (LCA) allows social, behavioral, and health researchers to address important questions about the determinants of latent class membership. In this article, 2 propensity score techniques, matching and inverse propensity weighting, are demonstrated for conducting causal inference in LCA. The different causal questions that can be addressed with these techniques are carefully delineated. An empirical analysis based on data from the National Longitudinal Survey of Youth 1979 is presented, where college enrollment is examined as the exposure (i.e., treatment) variable and its causal effect on adult substance use latent class membership is estimated. A step-by-step procedure for conducting causal inference in LCA, including multiple imputation of missing data on the confounders, exposure variable, and multivariate outcome, is included. Sample syntax for carrying out the analysis using SAS and R is given in an appendix.  相似文献   

How the use of computers in mathematics classrooms was viewed by students in two middle years mathematics classrooms was the focus of the research described in this paper. The primary data sources consisted of questionnaires, classroom observations supported by videotaping of mathematics lessons, and interviews with two girls and two boys from each class. Thus both qualitative and quantitative methods were used. Girls viewed the computer-based lessons less favourably than did boys. In general, the boys were likely to believe that computers contributed to their experiencing pleasure in these lessons, and to making mathematics more relevant to them. Girls were typically more concerned about whether computers facilitated learning and enabled success in mathematics. The attitudes of students to computer-based mathematics were related to their views of computers.  相似文献   

This was a study of differential item functioning (DIF) for grades 4, 7, and 10 reading and mathematics items from state criterion-referenced tests. The tests were composed of multiple-choice and constructed-response items. Gender DIF was investigated using POLYSIBTEST and a Rasch procedure. The Rasch procedure flagged more items for DIF than did the simultaneous item bias procedure—particularly multiple-choice items. For both reading and mathematics tests, multiple-choice items generally favored males while constructed-response items generally favored females. Content analyses showed that flagged reading items typically measured text interpretations or implied meanings; males tended to benefit from items that asked them to identify reasonable interpretations and analyses of informational text. Most items that favored females asked students to make their own interpretations and analyses, of both literary and informational text, supported by text-based evidence. Content analysis of mathematics items showed that items favoring males measured geometry, probability, and algebra. Mathematics items favoring females measured statistical interpretations, multistep problem solving, and mathematical reasoning.  相似文献   

Structured means analysis is a very useful approach for testing hypotheses about population means on latent constructs. In such models, a z test is most commonly used for testing the statistical significance of the relevant parameter estimates or of the differences between parameter estimates, where a z value is computed based on the asymptotic standard error estimate associated with the parameter of interest. In the current article, a series of population analyses demonstrate that the z tests for latent mean structure parameters or, more directly, the standard error estimates upon which those z tests are based are, not invariant to how factors are scaled. As such, circumstances exist in which latent mean inference is compromised solely as a result of scaling decisions. This problem is illustrated in the context of between-subjects (i.e., multisample) latent means models and within-subjects latent means models. Recommendations for practice are also offered.  相似文献   

研究表明,影响成就口语测试对教学的反拔效应有诸多相关因素,本文采用问卷及访谈的研究工具.以两所在期末成就测试中增加口语测试的大学的80名英语教师为对象.对与教师所持的成就口语测试对教学的反拨效应的观念相关的11项因素进行调查,并利用相关分析方法研究了成就口语测试的反拨效应,研究结果表明:有五项因素与教师所持的反拨效应的观念有关.作者探讨了其原因.并对如何促进大学英语口语考试的正面反拨效应提出了建议.  相似文献   

基于计算机的测评逐渐成为PISA数学素养测评的主要方式,并在测评框架、测评题目设计、作答环境和模式、评分过程及测评结果等方面都呈现出较为明显的计算机化特点。PISA2021数学测评将与计算机技术进行更深层次的融合,通过更具交互性、智能化和适应性的方式达到更好的数学素养测评效果。  相似文献   

The present study was designed to examine whether coaching affects the predictive validity and fairness of scholastic aptitude tests. Two randomly allocated groups, coached and uncoached, were compared, and the results revealed that although coaching enhanced scores on the Israeli Psychometric Entrance Test by about 25% of a standard deviation, it did not affect predictive validity and did not create a prediction bias. These results refute claims that coaching reduces predictive validity and creates a bias against the uncoached examinees in predicting the criterion. The results are consistent with the idea that score improvement due to coaching does not result strictly from learning specific skills that are irrelevant to the criterion.  相似文献   

性别因素对人的语言选择和使用具有重要影响,非对称语言、语言交流中的打断现象和不同性别会话策略的运用,都从不同的理论角度阐述了性别因素对语言使用的重要影响。通过对电影《泰坦尼克号》几个主人公的对话片段的分析,会发现以下特点:性别会影响人的语言选择,会出现男性刚毅和女性温柔、男性好战和女性含情、男性直白和女性感性。男女因不同性别而导致的各具特色的语言使用特点,对我们认识社会生活规律,具有较强的启发意义,特别是女性追求和谐、韬光养晦的语言特性及男性强势进攻的语言特性,均赋予我们时代启迪意义。  相似文献   

近年来,伴随着经济社会的快速发展,我国离婚人数与离婚率在不断上升,与此同时人们的性别观念却在向传统模式回归。夫妻对家庭地位满意度的提升有助于婚姻的稳定与和谐。那么,性别观念的传统化回归将怎样影响夫妻间的相对家庭地位?个体实际家庭地位与其性别观念的匹配是否可以促进其对家庭地位的主观满意程度?运用第三期中国妇女社会地位调查数据,对性别观念之于家庭地位及其满意度的影响进行实证研究发现:性别观念确实影响着夫妻间的相对家庭地位,随着我国目前性别观念向传统观念的回归,夫妻间实际家庭地位也在向传统回归。性别观念本身除了性别能力观念平等有利于提高家庭地位满意度外,性别分工观念对家庭地位满意度并没有显著影响,夫妻对婚后家庭地位的满意程度受到性别观念和家庭地位的交互影响。若夫妻性别观念与实际家庭地位一致,则对家庭地位的满意程度较高;反之,若性别观念与家庭地位不一致,则对家庭地位的满意程度较低。因此,在性别观念向传统回归的趋势下,夫妻实际家庭地位与性别观念相匹配将有利于维护家庭和谐稳定。  相似文献   

