首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Studies suggest that people who cheat on a test overestimate their performance on future tests. Given that erroneous monitoring of one's own cognitive processes impairs learning and memory, this study investigated whether cheating on a test would harm monitoring accuracy on future tests. Participants had the incentive and opportunity to cheat on one (Experiments 1, 2, and 3, with N = 90, 88, and 102, respectively) or two (Experiment 4, N = 214) of four general-knowledge tests. Cheating produced overconfidence in global-level performance predictions in Experiment 2 (Cohen's d ≥ 0.35) but not in Experiments 1 or 4. Also, cheating did not affect the absolute or relative accuracy of item-level performance predictions in Experiments 3 or 4. A Bayesian meta-analysis of all experiments provided evidence against cheating-induced overconfidence in global- and item-level predictions. Overall, our results demonstrate that people who cheat on tests accurately predict their performance on future tests.  相似文献   

2.
In this paper, an attempt has been made to synthesize some of the current thinking in the area of criterion-referenced testing as well as to provide the beginning of an integration of theory and method for such testing. Since criterion-referenced testing is viewed from a decision-theoretic point of view, approaches to reliability and validity estimation consistent with this philosophy are suggested. Also, to improve the decision-making accuracy of criterion-referenced tests, a Bayesian procedure for estimating true mastery scores has been proposed. This Bayesian procedure uses information about other members of a student's group (collateral information), but the resulting estimation is still criterion referenced rather than norm referenced in that the student is compared to a standard rather than to other students. In theory, the Bayesian procedure increases the “effective length” of the test by improving the reliability, the validity, and more importantly, the decision-making accuracy of the criterion-referenced test scores.  相似文献   

3.
Achieving fluency in important mathematical procedures is fundamental to students’ mathematical development. The usual way to develop procedural fluency is to practise repetitive exercises, but is this the only effective way? This paper reports three quasi-experimental studies carried out in a total of 11 secondary schools involving altogether 528 students aged 12–15. In each study, parallel classes were taught the same mathematical procedure before one class undertook traditional exercises while the other worked on a “mathematical etude” (Foster International Journal of Mathematical Education in Science and Technology, 44(5), 765–774, 2013b), designed to be a richer task involving extensive opportunities for practice of the relevant procedure. Bayesian t tests on the gain scores between pre- and post-tests in each study provided evidence of no difference between the two conditions. A Bayesian meta-analysis of the three studies gave a combined Bayes factor of 5.83, constituting “substantial” evidence (Jeffreys, 1961) in favour of the null hypothesis that etudes and exercises were equally effective, relative to the alternative hypothesis that they were not. These data support the conclusion that the mathematical etudes trialled are comparable to traditional exercises in their effects on procedural fluency. This could make etudes a viable alternative to exercises, since they offer the possibility of richer, more creative problem-solving activity, with comparable effectiveness in developing procedural fluency.  相似文献   

4.
分析了贝叶斯网络的拓扑结构,给出了发动机怠速不良的故障树图.然后根据故障树到贝叶斯网络的转化算法,建立了贝叶斯网络模型.借助故障模拟试验数据确定了发动机怠速不良时电控汽油喷射系统贝叶斯网络故障诊断中各节点的先验概率值,然后通过贝叶斯网络找出了发动机怠速不良时电控汽油喷射系统各部件的故障发生概率.  相似文献   

5.
Simulations of computerized adaptive tests (CATs) were used to evaluate results yielded by four commonly used ability estimation methods: maximum likelihood estimation (MLE) and three Bayesian approaches—Owen's method, expected a posteriori (EAP), and maximum a posteriori. In line with the theoretical nature of the ability estimates and previous empirical research, the results showed clear distinctions between MLE and the Bayesian methods, with MLE yielding lower bias, higher standard errors, higher root mean square errors, lower fidelity, and lower administrative efficiency. Standard errors for MLE based on test information underestimated actual standard errors, whereas standard errors for the Bayesian methods based on posterior distribution standard deviations accurately estimated actual standard errors. Among the Bayesian methods, Owen's provided the worst overall results, and EAP provided the best. Using a variable starting rule in which examinees were initially classified into three broad/ability groups greatly reduced the bias for the Bayesian methods, but had little effect on the results for MLE. On the basis of these results, guidelines are offered for selecting appropriate CAT ability estimation methods in different decision contexts.  相似文献   

6.
Learning from printed text is a central academic task that may be challenging for students. Two ways to improve learning from text are to encourage learners to engage in generative learning strategies while reading, such as constructing an outline, or for instructors to include effective instructional design features, such as providing an outline with the text. A meta-analysis of studies comparing a group that was asked to generate an outline while reading a text to a control group that was not asked to outline found an average effect size of g+ = 0.59 on memory tests, g+ = 0.59 on comprehension tests, and g+ = 0.52 on writing assignments favoring learner-generated outlining. A meta-analysis of studies comparing a group that read a text containing an outline with a control group that read the same text without an outline found an effect size of g+ = 0.61 for memory tests and g+ = 0.34 for comprehension tests favoring instructor-provided outlining. Overall, there is encouraging evidence for the effectiveness of outlining as a generative learning strategy and for the effectiveness of outlining as an instructional design feature based on signaling, consistent with generative learning theory.  相似文献   

7.
评述了应用经典统计学和贝叶斯推断检验资产组合均值-方差有效性的文献,提出了这些方法在我国应用的可能性。  相似文献   

8.
In this article, we propose using the Bayes factors (BF) to evaluate person fit in item response theory models under the framework of Bayesian evaluation of an informative diagnostic hypothesis. We first discuss the theoretical foundation for this application and how to analyze person fit using BF. To demonstrate the feasibility of this approach, we further use it to evaluate person fit in simulated and empirical data, and compare the results with those of HT and the infit and outfit statistics. We found that overall BF performed as well as HT statistics and better than the infit and outfit statistics when detecting aberrant responses. Given the BF flexibility in handling data set with a small number of examinees, we suggest that BF can be used as person fit statistics, especially in computerized adaptive tests.  相似文献   

9.
Response accuracy and response time data can be analyzed with a joint model to measure ability and speed of working, while accounting for relationships between item and person characteristics. In this study, person‐fit statistics are proposed for joint models to detect aberrant response accuracy and/or response time patterns. The person‐fit tests take the correlation between ability and speed into account, as well as the correlation between item characteristics. They are posited as Bayesian significance tests, which have the advantage that the extremeness of a test statistic value is quantified by a posterior probability. The person‐fit tests can be computed as by‐products of a Markov chain Monte Carlo algorithm. Simulation studies were conducted in order to evaluate their performance. For all person‐fit tests, the simulation studies showed good detection rates in identifying aberrant patterns. A real data example is given to illustrate the person‐fit statistics for the evaluation of the joint model.  相似文献   

10.
In a series of simultaneous two-choice preference tests, water snakes (Natrix r. rhombifera) displayed a significant preference for a clean area of a test chamber vs an area soiled by a conspecific. No differential responsiveness was found for a clean area as compared to an area soiled by either a sympatric species of garter snake (Thamnophis sirtalis) or by the individual water snake Ss. A similar series of tests with individual garter snakes (Thamnophis radix) revealed significant preferences for areas soiled either by the Ss themselves or by conspecifics as compared to clean areas. No preferences were found for a clean area of the test chamber vs an area soiled by a sympatric water snake (Natrix r. r.). The possible role of chemical cues in the mediation of dispersion and social responsiveness was discussed.  相似文献   

11.
A problem central to structural equation modeling is measurement model specification error and its propagation into the structural part of nonrecursive latent variable models. Full-information estimation techniques such as maximum likelihood are consistent when the model is correctly specified and the sample size large enough; however, any misspecification within the model can affect parameter estimates in other parts of the model. The goals of this study included comparing the bias, efficiency, and accuracy of hypothesis tests in nonrecursive latent variable models with indirect and direct feedback loops. We compare the performance of maximum likelihood, two-stage least-squares and Bayesian estimators in nonrecursive latent variable models with indirect and direct feedback loops under various degrees of misspecification in small to moderate sample size conditions.  相似文献   

12.
Computerized adaptive testing (CAT) is a testing procedure that adapts an examination to an examinee's ability by administering only items of appropriate difficulty for the examinee. In this study, the authors compared Lord's flexilevel testing procedure (flexilevel CAT) with an item response theory-based CAT using Bayesian estimation of ability (Bayesian CAT). Three flexilevel CATs, which differed in test length (36, 18, and 11 items), and three Bayesian CATs were simulated; the Bayesian CATs differed from one another in the standard error of estimate (SEE) used for terminating the test (0.25, 0.10, and 0.05). Results showed that the flexilevel 36- and 18-item CATs produced ability estimates that may be considered as accurate as those of the Bayesian CAT with SEE = 0.10 and comparable to the Bayesian CAT with SEE = 0.05. The authors discuss the implications for classroom testing and for item response theory-based CAT.  相似文献   

13.
Automated Essay Scoring (AES) has garnered a great deal of attention from the rhetoric and composition/writing studies community since the Educational Testing Service began using e-rater® and the Criterion® Online Writing Evaluation Service as products in scoring writing tests, and most of the responses have been negative. While the criticisms leveled at AES are reasonable, the more important, underlying issues relate to the aspects of the writing construct of the tests AES can rate. Because these tests underrepresent the construct as it is understood by the writing community, such tests should not be used in writing assessment, whether for admissions, placement, formative, or achievement testing. Instead of continuing the traditional, large-scale, commercial testing enterprise associated with AES, we should look to well-established, institutionally contextualized forms of assessment as models that yield fuller, richer information about the student's control of the writing construct. Such tests would be more valid, as reliable, and far fairer to the test-takers, whose stakes are often quite high.  相似文献   

14.
Mixture modeling is a widely applied data analysis technique used to identify unobserved heterogeneity in a population. Despite mixture models' usefulness in practice, one unresolved issue in the application of mixture models is that there is not one commonly accepted statistical indicator for deciding on the number of classes in a study population. This article presents the results of a simulation study that examines the performance of likelihood-based tests and the traditionally used Information Criterion (ICs) used for determining the number of classes in mixture modeling. We look at the performance of these tests and indexes for 3 types of mixture models: latent class analysis (LCA), a factor mixture model (FMA), and a growth mixture models (GMM). We evaluate the ability of the tests and indexes to correctly identify the number of classes at three different sample sizes (n = 200, 500, 1,000). Whereas the Bayesian Information Criterion performed the best of the ICs, the bootstrap likelihood ratio test proved to be a very consistent indicator of classes across all of the models considered.  相似文献   

15.
Permitting item review is to the benefit of the examinees who typically increase their test scores with item review. However, testing companies do not prefer item review since it does not follow the logic on which adaptive tests are based, and since it is prone to cheating strategies. Consequently, item review is not permitted in many adaptive tests. This study attempts to provide a solution that would allow examinees to revise their answers, without jeopardizing the quality and efficiency of the test. The purpose of this study is to test the efficiency of a “rearrangement procedure” that rearranges and skips certain items in order to better estimate the examinees' abilities, without allowing them to cheat on the test. This was examined through a simulation study. The results show that the rearrangement procedure is effective in reducing the standard error of the Bayesian ability estimates and in increasing the reliability of the same estimates.  相似文献   

16.
Standard comparative tests are meant to provide a reference system within the framework of educational reforms for development of school lessons. Compulsory comparative testing was introduced in Baden-Württemberg at the end of the school year 2005/2006. The teachers carried out the tests and corrected them themselves. Afterwards they received feedback at school and class levels, which was based on a prior statewide pilot study. The repeated survey had the aim of investigating which intended effects (reference attainment for orientation) and non-intended effects (narrowing-the-curriculum, pre-testing and exercises) teachers associated with this new instrument. Teachers of technical secondary schools (Realschulen) in Baden-Württemberg were surveyed. In the survey before the introduction of standard comparative testing (2004; n t 1?=?914), teachers expected both intended and non-intended effects. Four years after their introduction (2009; n t 1?=?734), respondents were asked to estimate the effects of standard comparative tests. The effects??in every dimension??were judged to be significantly less than had been expected before their introduction. It is pleasing that the teachers did not judge the anticipated narrowing-the-curriculum effects to be significant. However, they also did not see the instrument as a noteworthy orientation help for planning and assessing lessons. Standard comparative tests were not seen by the respondents to provide a reference for new lesson developments.  相似文献   

17.
Pore pressure dissipation during piezocone testing provides a unique tool for estimating the hydraulic properties of in-situ backfills in soil-bentonite (SB) slurry trench cutoff walls. Six tests were performed in an SB slurry trench cutoff wall located in Jiangsu Province, China. The pore pressure dissipation curves obtained are non-monotonic, which, as far as the authors are aware, is reported for the first time in SB cutoff walls. The non-monotonic dissipation curves are attributed to the redistribution of excess pore pressures between the base soil clods and the rest of the backfill around the cone. Four existing interpretation methods are adopted to analyze the measured non-monotonic piezocone dissipation curves. The horizontal coefficients of consolidation (ch) of the backfills obtained by three methods are close to each other and in agreement with the results of fixed-ring consolidometer tests, while the other method gives a high overestimate. The hydraulic conductivities (kh) of the backfills are also estimated by four methods, three based on dissipation test results and one based on piezocone penetration data. kh estimated by consolidation theory are close to the results of flexible wall permeameter tests. Two empirical expressions for dissipation tests give relatively low kh, but the method based on penetration gives kh much larger than the laboratory test results.  相似文献   

18.
几个英语作文自动评分系统的原理与评述   总被引:8,自引:0,他引:8  
本文介绍目前美国在大规模考试和英语教学中最为流行的几个作文自动评分系统的基本原理并对这些系统进行简单的评述。所涉及的系统包括Project Essay Grader(PEG),Intelligent Essay Assessor (IEA),E-rater和Criterion,IntelliMetric和MY Access!,Bayesian Essay Test Scoring System(BETSY)。  相似文献   

19.
This article compares maximum likelihood and Bayesian estimation of the correlated trait–correlated method (CT–CM) confirmatory factor model for multitrait–multimethod (MTMM) data. In particular, Bayesian estimation with minimally informative prior distributions—that is, prior distributions that prescribe equal probability across the known mathematical range of a parameter—are investigated as a source of information to aid convergence. Results from a simulation study indicate that Bayesian estimation with minimally informative priors produces admissible solutions more often maximum likelihood estimation (100.00% for Bayesian estimation, 49.82% for maximum likelihood). Extra convergence does not come at the cost of parameter accuracy; Bayesian parameter estimates showed comparable bias and better efficiency compared to maximum likelihood estimates. The results are echoed via 2 empirical examples. Hence, Bayesian estimation with minimally informative priors outperforms enables admissible solutions of the CT–CM model for MTMM data.  相似文献   

20.
We provide new evidence about the effect of testing language on test scores using data from two rounds (conducted approximately six years apart) of the New Immigrants Survey. In each round, U.S.-born and foreign-born children of Hispanic origin were randomly assigned to take the Woodcock-Johnson achievement (two reading and two math) tests, either in Spanish or in English. U.S.-born children of Hispanic immigrants perform better in reading tests (but not in math tests) when they are assigned to take tests in English. The size of the testing-language effect remains stable across rounds. Foreign-born children of Hispanic immigrants perform better in both reading and math tests when they are assigned to take tests in Spanish in the first round. However, the size of the testing-language effect declines in reading tests and completely disappears in math tests by the second round. Our results suggest that the depreciation of Spanish skills is an essential factor (and, in some cases, more important than the accumulation of English skills) in explaining the decline in the testing-language effect among foreign-born children. We also explore how age at immigration and years spent in the U.S. affect language assimilation.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号