首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.

Summated rating scales to measure attitudes (and other human characteristics) commonly consist of numerous items whose scores are summed to yield a total score. A central assumption underlying the use of this technique is that the items in the scale reflect a common construct. If this assumption is not met, the scoring procedure produces largely meaningless, uninterpretable data. Although this important psychometric principle has been known for a long time, numerous studies in the research literature demonstrate a neglect of this principle. Some studies make no attempt at all to conceptualise the construct to be measured; others conceptualise the construct but then ignore the possibility that it may be multi‐dimensional; still others actually contain evidence which indicates that the construct is multi‐dimensional and then proceed to ignore that evidence. A possible contributor to the confusion is the widespread misunderstanding about the related yet distinct concepts of internal consistency and uni‐dimensionality. This paper presents case studies of poor and good instrument design, in the (forlorn?) hope that clarification of the issues might make a difference in the future.  相似文献   

2.
Parental anxiety in children’s education is closely related to children’s developmental and educational outcomes. The current study reported the development and validation of a self-report instrument to evaluate the Sources of Parental Anxiety in Children’s Education (SPACEs). Qualitative analyses suggested that the construct of parental anxiety in children’s education was multidimensional, representing learning performance anxiety, educational environment anxiety, educational input anxiety, and educational outcome anxiety as four primary sources. The results from exploratory and confirmatory factor analyses supported this four-factor structure comprising 17 items to capture this multidimensional construct. The scale also demonstrated adequate internal consistencies, convergent validity, discriminant validity, criterion-related validity, and test-retest reliability. A series of multi-group tests across age, locality, and children’s grades provided evidence of measurement invariance. Overall, the SPACE scale appear to be a reliable and valid tool to measure educational anxiety in parents in the Chinese context.  相似文献   

3.
Many teachers and curriculum specialists claim that the reading demand of many mathematics items is so great that students do not perform well on mathematics tests, even though they have a good understanding of mathematics. The purpose of this research was to test this claim empirically. This analysis was accomplished by considering examinees that differed in reading ability within the context of a multidimensional DIF framework. Results indicated that student performance on some mathematics items was influenced by their level of reading ability so that examinees with lower proficiency classifications in reading were less likely to obtain correct answers to these items. This finding suggests that incorrect proficiency classifications may have occurred for some examinees. However, it is argued that rather than eliminating these mathematics items from the test, which would seem to decrease the construct validity of the test, attempts should be made to control the confounding effect of reading that is measured by some of the mathematics items.  相似文献   

4.
Many educational and psychological tests are inherently multidimensional, meaning these tests measure two or more dimensions or constructs. The purpose of this module is to illustrate how test practitioners and researchers can apply multidimensional item response theory (MIRT) to understand better what their tests are measuring, how accurately the different composites of ability are being assessed, and how this information can be cycled back into the test development process. Procedures for conducting MIRT analyses–from obtaining evidence that the test is multidimensional, to modeling the test as multidimensional, to illustrating the properties of multidimensional items graphically-are described from both a theoretical and a substantive basis. This module also illustrates these procedures using data from a ninth-grade mathematics achievement test. It concludes with a discussion of future directions in MIRT research.  相似文献   

5.
This research derived information functions and proposed new scalar information indices to examine the quality of multidimensional forced choice (MFC) items based on the RANK model. We also explored how GGUM‐RANK information, latent trait recovery, and reliability varied across three MFC formats: pairs (two response alternatives), triplets (three alternatives), and tetrads (four alternatives). As expected, tetrad and triplet measures provided substantially more information than pairs, and MFC items composed of statements with high discrimination parameters were most informative. The methods and findings of this study will help practitioners to construct better MFC items, make informed projections about reliability with different MFC formats, and facilitate the development of MFC triplet‐ and tetrad‐based computerized adaptive tests.  相似文献   

6.
In the lead article, Davenport, Davison, Liou, & Love demonstrate the relationship among homogeneity, internal consistency, and coefficient alpha, and also distinguish among them. These distinctions are important because too often coefficient alpha—a reliability coefficient—is interpreted as an index of homogeneity or internal consistency. We argue that factor analysis should be conducted before calculating internal consistency estimates of reliability. If factor analysis indicates the assumptions underlying coefficient alpha are met, then it can be reported as a reliability coefficient. However, to the extent that items are multidimensional, alternative internal consistency reliability coefficients should be computed based on the parameter estimates of the factor model. Assuming a bifactor model evidenced good fit, and the measure was designed to assess a single construct, omega hierarchical—the proportion of variance of the total scores due to the general factor—should be presented. Omega—the proportion of variance of the total scores due to all factors—also should be reported in that it represents a more traditional view of reliability, although it is computed within a factor analytic framework. By presenting both these coefficients and potentially other omega coefficients, the reliability results are less likely to be misinterpreted.  相似文献   

7.
Large-scale international comparative studies and cross-ethnic studies have revealed that Chinese students, living either in China or overseas, consistently outperform their counterparts in mathematics. Empirical research has discussed psychological, educational, and cultural reasons behind Chinese students’ better mathematics performance. However, there is scant sociological investigation of this phenomenon. The current mixed methods study aims to make a contribution in this regard. The study conceptualises Chineseness through Bourdieu’s sociological notion of habitus and considers this habitus of Chineseness generating, but not determining, mechanism that underpins commitment to mathematics learning. The study firstly analyses the responses of 230 Chinese Australian participants to a set of questionnaire items. Results indicate that the habitus of Chineseness significantly mediates the relationship between participants’ commitment to mathematics learning and their mathematics achievement. The study then reports on the interviews with five participants to add nuances and dynamics to the mediating role of habitus of Chineseness. The study complements the existing literature by providing sociological insight into the better mathematics achievement of Chinese students.  相似文献   

8.
ABSTRACT

This article attempts to do three things: the first is an exploration of the ways in which Islam is presented in an essentialist way (with a focus on religious education (RE) in England and Wales), leading to stereotypes and unsubstantiated generalisations that are then embedded in resources and agreed syllabi, secondly, it provides a critique of essentialism, and finally a case is made for the role of hermeneutics in the teaching and learning of Islam. We argue that a hermeneutical approach is a sound way to both conceptualise the phenomenon of Islam and a pedagogical opening to make sense of it, that may help overcome some of the weaknesses of the current ways of teaching about Islam.  相似文献   

9.
Many researchers have suggested that the main cause of item bias is the misspecification of the latent ability space, where items that measure multiple abilities are scored as though they are measuring a single ability. If two different groups of examinees have different underlying multidimensional ability distributions and the test items are capable of discriminating among levels of abilities on these multiple dimensions, then any unidimensional scoring scheme has the potential to produce item bias. It is the purpose of this article to provide the testing practitioner with insight about the difference between item bias and item impact and how they relate to item validity. These concepts will be explained from a multidimensional item response theory (MIRT) perspective. Two detection procedures, the Mantel-Haenszel (as modified by Holland and Thayer, 1988) and Shealy and Stout's Simultaneous Item Bias (SIB; 1991) strategies, will be used to illustrate how practitioners can detect item bias.  相似文献   

10.
Studies on identity in general and mathematical identity in particular have gained much interest over the last decades. However, although measurements have been proven to be potent tools in many scientific fields, a lack of consensus on ontological, epistemological, and methodological issues has complicated measurements of mathematical identities. Specifically, most studies conceptualise mathematical identity as something multidimensional and situated, which obviously complicates measurement, since these aspects violate basic requirements of measurement. However, most concepts that are measured in scientific work are both multidimensional and situated, even in physics. In effect, these concepts are being conceptualised as sufficiently uni-dimensional and invariant for measures to be meaningful. We assert that if the same judgements were to be made regarding mathematical identity, that is, whether identity can be measured with one instrument alone, whether one needs multiple instruments, or whether measurement is meaningless, it would be necessary to know how much of the multidimensionality can be captured by one measure and how situated mathematical identity is. Accordingly, this paper proposes a theoretical perspective on mathematical identity that is consistent with basic requirements of measurement. Moreover, characteristics of students’ mathematical identities are presented and the problem of “situatedness” is discussed.  相似文献   

11.
Constructing explanations of complex phenomena is an important part of doing science and it is also an important component of learning science. Students need opportunities to make claims based on available evidence and then use science concepts to justify why evidence supports the claim. But what happens when new evidence emerges for the same phenomenon? The “claim” portion of the claim, evidence, and reasoning explanation framework is viewed as the most accessible to students. When new evidence suggests that students adjust their current thinking however, do students incorporate this new information and modify their claims? This research utilized a time series research design to explore how students modify their claim over four iterations of one explanation, termed an evolving explanation. As new data were collected and analyzed to provide additional evidence, students needed to evaluate their current claim to see if it took into account all available evidence. This research explores that process including the supports that the teacher provided and the challenges that students faced in developing one claim, over time. The findings indicate that many students face challenges adjusting their claims when new, conflicting evidence emerges, even with class discussion, teacher feedback, and written scaffolds. Several possible reasons exist to account for this challenge. Students may (1) ignore new evidence, (2) find “undoing” their initial idea too cognitively demanding, or (3) simply not have any similar experience from which to build. Providing students with experiences of writing evolving explanations reflects what scientists do, while simultaneously preparing students to become more scientifically proficient.  相似文献   

12.
This article considers potential problems that can arise in estimating a unidimensional item response theory (IRT) model when some test items are multidimensional (i.e., show a complex factorial structure). More specifically, this study examines (1) the consequences of model misfit on IRT item parameter estimates due to unintended minor item‐level multidimensionality, and (2) whether a Projection IRT model can provide a useful remedy. A real‐data example is used to illustrate the problem and also is used as a base model for a simulation study. The results suggest that ignoring item‐level multidimensionality might lead to inflated item discrimination parameter estimates when the proportion of multidimensional test items to unidimensional test items is as low as 1:5. The Projection IRT model appears to be a useful tool for updating unidimensional item parameter estimates of multidimensional test items for a purified unidimensional interpretation.  相似文献   

13.
The ability to recognize when it is warranted to be uncertain was developed effectively in 167 elementary school students through daily 15-min lessons over a 5-week period. This initial training did not deal with drug use in any way. A 3-year follow-up evaluation measured retention of these warranted-uncertainty skills, and assessed the effects of these skills on students' use of hard and soft drugs. In comparison with the matched control subjects, there was evidence of some retention after 3 years: the trained subjects were slightly better able to recognize when it is warranted to be uncertain about the effects of drugs. There was also strong evidence that many of the controls had acquired, somehow, the ability to generate warranted uncertainty, and that warranted uncertainty functioned as a stable construct (irrespective of how it was acquired) and was related to drug use. It appeared to produce a skepticism or an analytic attitude that allowed the student to ignore peer and parental dogma about drugs.  相似文献   

14.
In this contribution we concentrate on the features of a particular item format: items having as the last option “none of the above” (NOTA items). There is considerable dispute on the advisability of the usage of NOTA items in testing. Some authors come to the conclusion that NOTA items should be avoided, some come to neutral conclusions while others argue that NOTA items are optimal test items. In this article, we provide evidence to this discussion by conducting protocol analysis on written statements of examinees while answering NOTA items. In our investigation, a test containing 30 multiple-choice items was administered from 169 university students. The results show that NOTA options appear to be more attractive than options with specified solutions in those cases where a problemsolver fails. Also, a relationship is found between the quality of (incorrect) problemsolving and the choice of NOTA items: the more qualitative the incorrect problemsolving process is, the more likely the student is to choose for NOTA items. Overall, our research supports the statement that ‘the more confidence an examinee has in his worked solution, which is inconsistent with one of the specified solutions, the more eager he seems to choose “none of the above”.  相似文献   

15.
难度不是试题的固有属性,而是考生因素与试题特征之间互动的结果。很多试题分析者倾向于将试题难度偏高的原因仅仅归结于学生未掌握相关知识或技能,而忽视试题本身的特征。通过分析60道难度在0.6以下的高考英语试题,探究其难度来源。结果显示,除考生因素外,难题或偏难题的难度来源也与命题技术有关,比如答案的唯一性与可接受性、考查内容超纲、考点设置与评分标准欠妥等方面的问题。为此,提出考试机构应提高命题水平,加强试题质量监控,确保大规模考试科学选拔人才。  相似文献   

16.
One of the key characteristics of effective opinion leaders is that they are highly connected; they know many people and have numerous weak-tie relationships. Two studies were conducted that found evidence consistent with construct validity. The first (N = 35 and N = 57) found that connectors knew more people from a randomly selected list of names. A second study, with two surveys, was created (N = 561 and N = 189) such that the connectedness scores of some of the subjects in the first survey could be linked to how many subjects knew them in the second. Results indicated that those with higher connection scores were more likely to be known by others. Moreover, in the second survey, measures of Facebook use and bridging social capital were found to be associated substantially with connector scores.  相似文献   

17.
This article proposes two multidimensional IRT model-based methods of selecting item bundles (clusters of not necessarily adjacent items chosen according to some organizational principle) suspected of displaying DIF amplification. The approach embodied in these two methods is inspired by Shealy and Stout's (1993a, 1993b) multidimensional model for DIF. Each bundle selected by these methods constitutes a DIF amplification hypothesis. When SIBTEST (Shealy & Stout, 1993b) confirms DIF amplification in selected bundles, differential bundle functioning (DBF) is said to occur. Three real data examples illustrate the two methods for suspect bundle selection. The effectiveness of the methods is argued on statistical grounds. A distinction between benign and adverse DIF is made. The decision whether flagged DIF items or DBF bundles display benign or adverse DIF/DBF must depend in part on nonstatistical construct validity arguments. Conducting DBF analyses using these methods should help in the identification of the causes of DIF/DBF.  相似文献   

18.
Ambivalence is a psychological state in which a person holds mixed feelings (positive and negative) towards some psychological object. Standard methods of attitude measurement, such as Likert and semantic differential scales, ignore the possibility of ambivalence; ambivalent responses cannot be distinguished from neutral ones. This neglect arises out of an assumption that positive and negative affects towards a particular psychological object are bipolar, i.e., unidimensional in opposite directions. This assumption is frequently untenable. Conventional item statistics and measures of test internal consistency are ineffective as checks on this assumption; it is possible for a scale to be multidimensional and still display apparent internal consistency. Factor analysis is a more effective procedure. Methods of measuring ambivalence are suggested, and implications for research are discussed.  相似文献   

19.
Validity evidence based on test content is critical to meaningful interpretation of test scores. Within high-stakes testing and accountability frameworks, content-related validity evidence is typically gathered via alignment studies, with panels of experts providing qualitative judgments on the degree to which test items align with the representative content standards. Various summary statistics are then calculated (e.g., categorical concurrence, balance of representation) to aid in decision-making. In this paper, we propose an alternative approach for gathering content-related validity evidence that capitalizes on the overlap in vocabulary used in test items and the corresponding content standards, which we define as textual congruence. We use a text-based, machine learning model, specifically topic modeling, to identify clusters of related content within the standards. This model then serves as the basis from which items are evaluated. We illustrate our method by building a model from the Next Generation Science Standards, with textual congruence evaluated against items within the Oregon statewide alternate assessment. We discuss the utility of this approach as a source of triangulating and diagnostic information and show how visualizations can be used to evaluate the overall coverage of the content standards across the test items.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号