期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Marianne Klem Jan-Eric Gustafsson Bente Hagtvet 《Scandinavian Journal of Educational Research》2013,57(2):195-213

The Norwegian government recommends a systematic language assessment of all four-year-olds as part of the general health surveillance program for the purpose of identifying children at risk of language delay. This study aimed to investigate the construct validity of the recommended language screening tool called LANGUAGE4 [SPRÅK4] by first examining the dimensionality of the underlying construct of the tool, after which the concurrent convergent validity was established by regressing an external language factor, defined by four standardized language tests, on a single higher-order factor. The findings provide support for a higher-order model with one general language factor, suggesting that a large amount of the variance in LANGUAGE4 is attributable to a single common factor at the second-order level. Furthermore, this single factor explained a considerable amount of the variance in the external language factor. Our findings are interpreted as support for satisfactory construct validity of LANGUAGE4. 相似文献

2.

《Learning and Instruction》2020

Developments concerning report cards have led to a potential shift from reporting traditional grades to reporting multiple competencies within and across subjects. In this study, we analyzed the dimensional structure of the teacher judgments on a competency-based report card on fourth-grade elementary school students (N = 469). With a methodologically innovative approach of combining exploratory structural equation modeling (ESEM) and confirmatory factor analysis (CFA), we found one learning-oriented and one social-oriented generic subject-unspecific factor of competency judgments and single factors for each included subject. All subject factors showed relatively high correlations with the respective traditional grades. Second-order commonalities further indicated a general factor represented almost perfectly by the learning-oriented generic judgments. Our analyses generally justified the use of competency-based report cards in terms of the dimensional structure and the association with traditional grades. Further, generic subject-unspecific competency judgments contribute to disentangling the multidimensionality of teacher judgments. 相似文献

3.

Xiaoting Huang Mark Wilson Lei Wang 《教育心理学》2016,36(2):378-390

In recent years, large-scale international assessments have been increasingly used to evaluate and compare the quality of education across regions and countries. However, measurement variance between different versions of these assessments often posts threats to the validity of such cross-cultural comparisons. In this study, we investigated the cross-language, cross-cultural validity of the Programme for International Student Assessment 2006 Science assessment via three differential item functioning (DIF) analyses between the USA and Canada, Chinese Hong Kong and mainland China, and between the USA and mainland China. Furthermore, we explored three plausible causes of DIF via content analysis, namely language, curriculum and cultural differences. Our results revealed that differential curriculum coverage was the most serious cause of DIF among the three factors we investigated in this study, and differential content familiarity also contributed to DIF here. We discussed the implications of the findings for future international assessment development, and for how to best define ‘scientific literacy’ for students around the world. 相似文献

4.

Dubravka Svetina Roy Levy 《Journal of Experimental Education》2016,84(2):398-420

This study investigated the effect of complex structure on dimensionality assessment in compensatory multidimensional item response models using DETECT- and NOHARM-based methods. The performance was evaluated via the accuracy of identifying the correct number of dimensions and the ability to accurately recover item groupings using a simple matching similarity (SM) coefficient. The DETECT-based methods yielded higher proportion correct than the NOHARM-based methods in two- and three-dimensional conditions, especially when correlations were ≤.60, data exhibited ≤30% complexity, and sample size was 1,000. As the complexity increased and the sample size decreased, the performance of the methods typically diminished. The NOHARM-based methods were either equally successful or better in recovering item groupings than the DETECT-based methods and were mostly affected by complexity levels. The DETECT-based methods were affected largely by the test length, such that with the increase of the number of items, SM coefficients would decrease substantially. 相似文献

5.

Using Dimensionality-Based DIF Analyses to Identify and Interpret Constructs That Elicit Group Differences

Mark J. Gierl 《Educational Measurement》2005,24(1):3-14

In this paper I describe and illustrate the Roussos-Stout (1996) multidimensionality-based DIF analysis paradigm, with emphasis on its implication for the selection of a matching and studied subtest for DIF analyses. Standard DIF practice encourages an exploratory search for matching subtest items based on purely statistical criteria, such as a failure to display DIF. By contrast, the multidimensional DIF paradigm emphasizes a substantively-informed selection of items for both the matching and studied subtest based on the dimensions suspected of underlying the test data. Using two examples, I demonstrate that these two approaches lead to different interpretations about the occurrence of DIF in a test. It is argued that selecting a matching and studied subtest, as identified using the DIF analysis paradigm, can lead to a more informed understanding of why DIF occurs. 相似文献

6.

自我监控量表的结构效度研究 总被引：2，自引：0，他引：2

肖崇好《韩山师范学院学报》2005,26(1):72-77

Snyder(1974)编制的自我监控量表是应用相当广泛,同时也颇具争议的一个量表。它反映面部表情控制和自我呈现的个体差异。但国外相当多的研究认为该量表存在多个因子。该研究用中国被试进行研究,结果证实该量表存在三个因子：外向、表演和他人导向。三因子与最初构念存在较大差异,说明该量表的结构效度确实不高。相似文献

7.

Roy Levy Yan Xia Samuel B. Green 《Educational and psychological measurement》2021,81(3):466

A number of psychometricians have suggested that parallel analysis (PA) tends to yield more accurate results in determining the number of factors in comparison with other statistical methods. Nevertheless, all too often PA can suggest an incorrect number of factors, particularly in statistically unfavorable conditions (e.g., small sample sizes and low factor loadings). Because of this, researchers have recommended using multiple methods to make judgments about the number of factors to extract. Implicit in this recommendation is that, when the number of factors is chosen based on PA, uncertainty nevertheless exists. We propose a Bayesian parallel analysis (B-PA) method to incorporate the uncertainty with decisions about the number of factors. B-PA yields a probability distribution for the various possible numbers of factors. We implement and compare B-PA with a frequentist approach, revised parallel analysis (R-PA), in the contexts of real and simulated data. Results show that B-PA provides relevant information regarding the uncertainty in determining the number of factors, particularly under conditions with small sample sizes, low factor loadings, and less distinguishable factors. Even if the indicated number of factors with the highest probability is incorrect, B-PA can show a sizable probability of retaining the correct number of factors. Interestingly, when the mode of the distribution of the probabilities associated with different numbers of factors was treated as the number of factors to retain, B-PA was somewhat more accurate than R-PA in a majority of the conditions. 相似文献

8.

早期教育机构质量的重要性、内涵与评价 总被引：6，自引：0，他引：6

高敬《学前教育研究》2011,(7)

自20世纪末以来,教育质量研究已成为学前教育领域的重要研究主题.国外大量研究已证明高质量的学前教育有利于儿童在认知、社会性、情感等方面的发展,尤其对处境不利的儿童帮助更大.这促进了人们对早期教育机构质量重要性的认识,并促使研究者从不同的视角对早期教育质量的性质、内涵构成及表现指标进行了较为广泛而深入的研究,确定质量应是一个哲学术语而非技术术语,应全面关注质量的结构性、过程性与结果性要素,同时积极开展了学前教育机构质量评价实践,形成了多个具有一定影响的评价量表.研究者还须在评价方法的多样性、内外评价关系处理、评价工具的开发与客观运用等方面加强研究.我国早期教育机构建设应在借鉴上述研究成果的基础上,突出主流的质量价值观,强调过程性评价,结合自身实际,构建科学的多元的质量监控机制与评价体系,以推动我国学前教育机构健康可持续发展. 相似文献

9.

Ove Østerlie Audhild Løhre Gørill Haugan 《Scandinavian Journal of Educational Research》2013,57(6):869-883

One of the main aims of the school subject physical education (PE) is to promote a lifelong healthy lifestyle. The expectancy-value theory represents an essential theoretical perspective to examine and understand adolescents’ learning and motivation in PE. Based on this theory, the Expectancy-Value Questionnaire (EVQ) measures students’ expectancy-related beliefs and perceived task values related to a subject like PE. The aim of the present study was to examine the dimensionality, reliability, and construct validity of the Norwegian version of the EVQ among adolescents in PE. In total, 338 students from six schools completed the EVQ in their PE classes during the spring of 2016. Explorative and confirmatory factor analyses were conducted, suggesting the four-dimensional construct of the EVQ to be superior the two-factor-model. The EVQ measurement model of adolescents’ expectancy-related believes and subjective task values in PE demonstrated satisfying reliability and construct validity. 相似文献

10.

制定幼儿园评估标准需要澄清的几个问题 总被引：4，自引：0，他引：4

康建琴刘焱《学前教育研究》2011,(1)

对幼儿园进行等级评估是各省市教育行政部门对幼儿园进行业务管理的重要手段,很多省市相继出台或修订了幼儿园评估标准。竖于我国地域广阔,各地自然状况与社会文化经济发展水平差异显著,各地确实需要因地制宜,制定地区性的标准,但逐步实现省域范围内标准的统一还是比较适宜的,这不仅便于分级管理,而且利于横向比较,也是符合国内外评估体系建构与使用趋势的。评估标准应始终体现当前幼儿教育的主流价值观,以消除幼儿教育市场化产生的各种不良影响,如可对办园规模、所用教材、课程等进行明确规定与限制,以消除幼儿园小学化倾向及其商业化行为。评估标准应主要围绕教师、设施、课程、儿童四个教育基本要素,建构能够真实反映幼儿园教育质量的指标体系,包括各结构性指标与过程性指标,并科学考虑各指标的权重。在此,要特别强调设施设备等物质为教育教学服务的理念,有关办学条件和设施的指标所占分量应恰当,以避免幼儿园之间在环境创设上的无意义攀比。同时,还应考虑哪些指标是在短暂的评价期间就能观察和了解到的,特别应将幼儿园在幼儿发展评价方面所做的工作列为评价内容,将对班级教育工作的评价作为重点,将幼儿在教育过程中可观察的行为表现列入评价指标。在最终的评价结果上,宜采取先... 相似文献

11.

美国学前教育课程评价研究项目的背景、内容、实施及其启示

钱雨《学前教育研究》2011,(7)

课程评价是学前教育课程建设的重要手段,是学前教育质量提升的重要途径.目前我国学前教育课程种类繁多,但有关课程评价的理论与实践研究却十分滞后,无法对各种课程的质量和实施效果进行系统评估,严重制约了我国学前课程建设的质量.美国教育科学院于2002年启动的学前教育课程评价研究项目对发展我国学前教育课程评价具有重要的启发与借鉴价值.该项目是随着美国各级政府为低收入或处境不利儿童提供越来越多的早期教育机会而产生的,旨在探究和验证各类早期教育课程对这些儿童入学准备的具体效果.为此,组建了专门的课程实施团队,实践了14种较具代表性的学前教育课程方案,而后由独立的课程评价团队对其效果进行评价.他们广泛采用了儿童评价、教师报告、课堂观察、教师访谈与问卷调查、家长访谈等量化与质性评价方法,所得结论虽然不容乐观,亦受到多方质疑,认为该项目在评价目标、所选样本、考虑影响因素、深入分析数据程度等方面存在诸多局限,但其历时七年的研究还是有许多值得我们学习的地方.我们应提高对学前教育课程评价作用与意义的认识,并在此观念指导下,努力建构符合我国实际的科学而专业的学前教育课程评价体系,同时应充分调动政府、学术研究机构、社会组织、学前教育机构等主动参与课程评价的积极性,促使政府给予政策与资金方面的大力支持,推动学前教育课程评价研究与实践的繁荣,从而为学前教育的高质量发展提供有力的保证. 相似文献

12.

Julia E. Strait Emma Kate C. Wright Scott L. Decker 《Psychology in the schools》2019,56(1):148-158

Performance on figure copying tasks is empirically linked to the school readiness, learning, cognition, and neuropsychological functioning. These nonverbal tasks are frequently used to evaluate children from diverse backgrounds to minimize bias due to factors such as language, ethnicity, culture, or socioeconomic status on test performance. The current study examined the possible Differential Item Functioning across African American and Caucasian groups, ages 4 to 7 years, in Bender Motor Gestalt Test, Second Edition (BG‐II) visual‐motor scores. Results indicated that in general the BG‐II can be considered invariant across these ethnic groups in this age range. 相似文献

13.

Commentary: Evaluating the Validity of Formative and Interim Assessment 总被引：1，自引：0，他引：1

Lorrie A. Shepard 《Educational Measurement》2009,28(3):32-37

In many school districts, the pressure to raise test scores has created overnight celebrity status for formative assessment. Its powers to raise student achievement have been touted, however, without attending to the research on which these claims were based. Sociocultural learning theory provides theoretical grounding for understanding how formative assessment works to increase student learning. The articles in this special issue bring us back to underlying first principles by offering separate validity frameworks for evaluating formative assessment (Nichols, Meyers, & Burling) and newly-invented interim assessments (Perie, Marion, & Gong). The article by Heritage, Kim, Vendlinski, and Herman then offers the most important insight of all; that is, formative assessment is of little use if teachers don't know what to do when students are unable to grasp an important concept. While it is true that validity investigations are needed, I argue that the validity research that will tell us the most—about how formative assessment can be used to improve student learning—must be embedded in rich curriculum and must at the same time attempt to foster instructional practices consistent with learning research. 相似文献

14.

A Framework for Evaluating and Planning Assessments Intended to Improve Student Achievement

Paul D. Nichols Jason L. Meyers Kelly S. Burling 《Educational Measurement》2009,28(3):14-23

Assessments labeled as formative have been offered as a means to improve student achievement. But labels can be a powerful way to miscommunicate. For an assessment use to be appropriately labeled formative, both empirical evidence and reasoned arguments must be offered to support the claim that improvements in student achievement can be linked to the use of assessment information. Our goal in this article is to support the construction of such an argument by offering a framework within which to consider evidence-based claims that assessment information can be used to improve student achievement. We describe this framework and then illustrate its use with an example of one-on-one tutoring. Finally, we explore the framework's implications for understanding when the use of assessment information is likely to improve student achievement and for advising test developers on how to develop assessments that are intended to offer information that can be used to improve student achievement. 相似文献

15.

Sally Clare Howell Coral Rae Kemp 《教育心理学》2010,30(4):411-429

Components of early number sense, as identified in two Delphi studies and in the number sense literature related to mathematics difficulties, were assessed for 176 children in preschools and childcare centres across one local government area in Sydney, Australia, using tasks or modifications of tasks reported in the number sense literature. In addition, the children’s receptive vocabulary was measured using The Peabody Picture Vocabulary Test (third edition) and math reasoning was measured using Woodcock‐Johnson III Tests of Achievement. Although the children demonstrated a broad range of skills, there were no significant differences between children attending childcare and preschools for any of the measures. However, boys performed significantly better than girls in quantitative concepts and girls performed better than boys in subitising. In discussing the data, a comparison is made of the skills demonstrated by children and skills that were highlighted in the two Delphi studies and in the early number sense literature as being essential components of number sense prior to school entry. Implications for kindergarten mathematics curricula and approaches to the teaching of early number skills are discussed. 相似文献

16.

学前融合课程评价的有效方法:课程性评估 总被引：1，自引：2，他引：1

钱文《中国特殊教育》2004,65(4):39-42

作为学前融合课程有效的评估方法 ,课程性评估是将课程目标作为教学评估的标准 ,并以此标准评估儿童能力水平和进步情况。课程性评估有非常明确的目标 ,根据不同课程模式可以分为发展里程碑模式、功能 /适应性模式、相互作用模式。通过它独特的关联系统实施评估 ,并根据现实性、公平性、会聚性和敏感性这四个标准保证评估的效果。文章就国内外有关学前融合课程中的课程性评估的研究现状作一综述 ,以作为我国学前融合课程设置中的理论依据。相似文献

17.

The Quality of Local District Assessments Used in Nebraska's School-Based Teacher-Led Assessment and Reporting System (STARS)

Susan M. Brookhart 《Educational Measurement》2005,24(2):14-21

A sample of 293 local district assessments used in the Nebraska STARS (School-based Teacher-led Assessment and Reporting System), 147 from 2004 district mathematics assessment portfolios and 146 from 2003 reading assessment portfolios, was scored with a rubric evaluating their quality. Scorers were Nebraska educators with background and training in assessment. Raters reached an agreement criterion during a training session; however, analysis of a set of 30 assessments double-scored during the main scoring session indicated that the math ratings remained reliable during scoring, while the reading ratings did not. Therefore, this article presents results for the 147 mathematics assessments only. The quality of local mathematics assessments used in the Nebraska STARS was good overall. The majority were of high quality on characteristics that go to validity (alignment with standards, clarity to students, appropriateness of content). Professional development for Nebraska teachers is recommended on aspects of assessment related to reliability (sufficiency of information and scoring procedures). 相似文献

18.

Chad W. Buckendahl James C. Impara Barbara S. Plake 《Educational Measurement》2002,21(4):6-16

Most states have adopted assessment and accountability systems that involve common measures of student performance. A state assessment system that allows school districts to choose the specific strategies they use to measure student performance on state-adopted content standards presents a unique state accountability challenge. The authors propose an accountability model that addresses this challenge using a combination of student performance, technical quality, and noncognitive indicators of performance. They also describe a study that evaluated the proposed model using data from all school districts in a southern state. 相似文献

19.

Making the term ‘validity’ useful

Daniel Koretz 《Assessment in Education: Principles, Policy & Practice》2016,23(2):290-292

相似文献

20.

Predicting children’s academic achievement from early assessment scores: a validity generalization study

Juhu Kim Hoi K. Suen 《Early childhood research quarterly》2003,18(4):547-566

Although there have been numerous studies investigating the predictive validity of early assessment, observed predictive validity coefficients across studies are not stable. A validity generalization study was conducted in order to answer the question of whether the relationship between early assessment of children and later achievement is generalizable or situation-specific. This study examined 716 predictive correlation coefficients from 44 studies using Hierarchical Linear Modeling (HLM). The findings of this study revealed that predictive validity of early assessment is not generalizable. Additional analyses indicated that predictive validity differ across assessments as a function of test type, specific construct being assessed, length of prediction, and administration procedures. The most impressive finding in this study was the variability of effect sizes across different test administration types. In particular, tests that were scored through ratings were found to be most effective. These findings suggest that instead of addressing a broad predictive validity between a test and a criterion measure, it is necessary to understand early assessment procedures as a whole system by including considerations of various variables related to testing conditions. 相似文献