首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
Previous research has established that SAT scores and high school grade point average (HSGPA) differ in their predictive power and in the size of mean differences across racial/ethnic groups. However, the SAT is scaled nationally across all test takers while HSGPA is scaled locally within a school. In this study, the researchers propose that this difference in how SAT scores and HSGPA are scaled partially explains differences in validity and subgroup differences. Using a large data set consisting of 170,390 students each of whom matriculated at one of 114 separate colleges, the researchers find that awarding SAT scores by ranking SAT within a high school generally results in substantial reduction in the size of subgroup mean differences for this predictor. However, validity for predicting first‐year GPA is also reduced by a small amount. Conversely, placing HSGPA onto a nationally normed metric through the use of multiple regression procedures results in a moderate increase in the size of subgroup mean differences, while also producing a small increase in validity. Taken together, these findings suggest that differences in predictor scaling can partially explain differences in the size of subgroup mean differences between HSGPA and SAT scores and have implications for predictive power.  相似文献   

2.
Abstract

This study examines the utility of the Ban- natyne recategorization system in discriminating among three groups of handicapped students. A stepwise discriminant functions analysis was performed on the subtest scaled scores from the WISC-R for 294 learning disabled (LD), 36 educably mentally retarded (EMR), and 71 emotionally disturbed (ED) students. The results of this analysis revealed that 100 percent of the EMR and ED students were predicted to be labeled LD on the basis of this recategorization, while 99.7 percent of the LD students were predicted to be LD. These findings are examined in relation to the use of alternative statistical methods and different diagnostic procedures to identify and classify students.  相似文献   

3.
In test-centered standard-setting methods, borderline performance can be represented by many different profiles of strengths and weaknesses. As a result, asking panelists to estimate item or test performance for a hypothetical group study of borderline examinees, or a typical borderline examinee, may be an extremely difficult task and one that can lead to questionable results in setting cut scores. In this study, data collected from a previous standard-setting study are used to deduce panelists’ conceptions of profiles of borderline performance. These profiles are then used to predict cut scores on a test of algebra readiness. The results indicate that these profiles can predict a very wide range of cut scores both within and between panelists. Modifications are proposed to existing training procedures for test-centered methods that can account for the variation in borderline profiles.  相似文献   

4.
The study examined two approaches for equating subscores. They are (1) equating subscores using internal common items as the anchor to conduct the equating, and (2) equating subscores using equated and scaled total scores as the anchor to conduct the equating. Since equated total scores are comparable across the new and old forms, they can be used as an anchor to equate the subscores. Both chained linear and chained equipercentile methods were used. Data from two tests were used to conduct the study and results showed that when more internal common items were available (i.e., 10–12 items), then using common items to equate the subscores is preferable. However, when the number of common items is very small (i.e., five to six items), then using total scaled scores to equate the subscores is preferable. For both tests, not equating (i.e., using raw subscores) is not reasonable as it resulted in a considerable amount of bias.  相似文献   

5.
6.
Few adequately normed drawing tests are available for current practice. Two subtests of the McCarthy Scales, Draw-A-Design and Draw-A-Child, are the best normed of all drawing tests for children aged 2½ to 8½ years: however, no age-corrected deviation scaled scores are available for interpretaion, only raw scores and age equivalents. This paper presents scaled scores for use in interpretation of these two drawing tests.  相似文献   

7.

Overcoming the potential dilemma of awarding the same grade to a group of students for group work assignments, regardless of the contribution made by each group member, is a problem facing teachers who ask their students to work collaboratively together on assessed group tasks. In this paper, we report on the procedures to factor in the contributions of individual group members engaged in an integrated group project using peer assessment procedures. Our findings demonstrate that the method we used resulted in a substantially wider spread of marks being given to individual students. Almost every student was awarded a numerical score which was higher or lower than a simple group project mark would have been. When these numerical scores were converted into the final letter grades, approximately one-third of the students received a grade for the group project that was different from the grade that they would have received if the same grade had been awarded to all group members. Based on these preliminary findings we conclude that peer assessment can be usefully and meaningfully employed to factor individual contributions into the grades awarded to students engaged in collaborative group work.  相似文献   

8.
《Educational Assessment》2013,18(4):317-340
A number of methods for scoring tests with selected-response (SR) and constructed-response (CR) items are available. The selection of a method depends on the requirements of the program, the particular psychometric model and assumptions employed in the analysis of item and score data, and how scores are to be used. This article compares 3 methods: unweighted raw scores, Item Response Theory pattern scores, and weighted raw scores. Student score data from large-scale end-of-course high school tests in Biology and English were used in the comparisons. In the weighted raw score method evaluated in this study, the CR items were weighted so that SR and CR items contributed the same number of points toward the total score. The scoring methods were compared for the total group and for subgroups of students in terms of the resultant scaled score distributions, standard errors of measurement, and proficiency-level classifications. For most of the student ability distribution, the three scoring methods yielded similar results. Some differences in results are noted. Issues to be considered when selecting a scoring method are discussed.  相似文献   

9.
This article proposes procedures for assessing and controlling acquiescence in personality scales when acquiescence is related to the content that the scale intends to measure. Our proposal is comprehensive in that it can be applied to different item response formats fitted with response models that can be parameterized as factor-analytic models. In the calibration stage, our proposal makes joint use of a balanced scale and a set of markers for acquiescence, and consists of 2 sequential procedures: a direct semirestricted solution, and a restricted solution with minimal identification constraints. In the scoring stage, we discuss how the information given by the acquiescence–content relation can be used to obtain Bayes expected a posteriori scores. The robustness of the direct procedure is assessed both analytically and by simulation. A free, user-friendly program that implements the procedures proposed is made available. Practical issues of use and interpretation are discussed and illustrated with an empirical application.  相似文献   

10.
基于拉普拉斯方程和边界条件,推导了比例中心选择在指定均一电位直线边界上的比例边界有限元公式,并将其应用到有限域和开域静电场分析中.当比例中心选择在指定均一电位直线边界或绝缘直线边界上时,可以只离散其余边界,使得数据准备工作量和计算量大大降低.数值解和理论解对比结果表明:比例边界有限元法具有精度高、数据准备量小、且处理开域静电场问题相当方便的特点.  相似文献   

11.
This study explored the use of kernel equating for integrating and extending two procedures proposed for assessing item order effects in test forms that have been administered to randomly equivalent groups. When these procedures are used together, they can provide complementary information about the extent to which item order effects impact test scores, in overall score distributions and also at specific test scores. In addition to detecting item order effects, the integrated procedures also suggest the equating function that most adequately adjusts the scores to mitigate the effects. To demonstrate, the statistical equivalences of alternate versions of two large-volume advanced placement exams were assessed.  相似文献   

12.
The authors report the planning, implementation, and evaluation of a training program for foster grandparents employed at a residential diagnostic and treatment center for disturbed children and youth. The training needs were assessed by task and performance analysis procedures, including the use of critical incident technique. A scoring procedure derived values for various responses to a set of problem situations presented to each foster grandparent. The training objectives focused on the role of the foster grandparents in helping their charges to develop socially acceptable behavior. A cognitive problem‐solving strategy was taught by use primarily of role‐modeling and role‐playing methods. The effectiveness of training was evaluated by comparing pretraining scores on a problem situation test with posttraining scores. Foster grandparents who had been highly rated by supervisors on an independent criterion improved with training; the others did not.  相似文献   

13.
The purpose of this study was to examine how different scoring procedures affect interpretation of maze curriculum‐based measurements. Fall and spring data were collected from 199 students receiving supplemental reading instruction. Maze probes were scored first by counting all correct maze choices, followed by four scoring variations designed to reduce the effect of random guessing. Pearson's r correlation coefficients were calculated among scoring procedures and between maze scores and a standardized measure of reading. In addition, t tests were conducted to compare fall to spring growth for each scoring procedure. Results indicated that scores derived from the different procedures are highly correlated, demonstrate criterion‐related validity, and show fall‐to‐spring growth. Educators working with struggling readers may use any of the five scoring procedures to obtain technically sound scores.  相似文献   

14.
Reporting confidence intervals with test scores helps test users make important decisions about examinees by providing information about the precision of test scores. Although a variety of estimation procedures based on the binomial error model are available for computing intervals for test scores, these procedures assume that items are randomly drawn from a undifferentiated universe of items, and therefore might not be suitable for tests developed according to a table of specifications. To address this issue, four interval estimation procedures that use category subscores for the computation of confidence intervals are presented in this article. All four estimation procedures assume that subscores instead of test scores follow a binomial distribution (i.e., compound binomial error model). The relative performance of the four compound binomial–based interval estimation procedures is compared to each other and to the better known normal approximation and Wilson score procedures based on the binomial error model.  相似文献   

15.
This paper illustrates that the psychometric properties of scores and scales that are used with mixed‐format educational tests can impact the use and interpretation of the scores that are reported to examinees. Psychometric properties that include reliability and conditional standard errors of measurement are considered in this paper. The focus is on mixed‐format tests in situations for which raw scores are integer‐weighted sums of item scores. Four associated real‐data examples include (a) effects of weights associated with each item type on reliability, (b) comparison of psychometric properties of different scale scores, (c) evaluation of the equity property of equating, and (d) comparison of the use of unidimensional and multidimensional procedures for evaluating psychometric properties. Throughout the paper, and especially in the conclusion section, the examples are related to issues associated with test interpretation and test use.  相似文献   

16.
Two methods of constructing equal-interval scales for educational achievement are discussed: Thurstone's absolute scaling method and Item Response Theory (IRT). Alternative criteria for choosing a scale are contrasted. It is argued that clearer criteria are needed for judging the appropriateness and usefulness of alternative scaling procedures, and more information is needed about the qualities of the different scales that are available. In answer to this second need, some examples are presented of how IRT can be used to examine the properties of scales: It is demonstrated that for observed score scales in common use (i.e., any scores that are influenced by measurement error), (a) systematic errors can be introduced when comparing growth at selected percentiles, and (b) normalizing observed scores will not necessarily produce a scale that is linearly related to an underlying normally distributed true trait.  相似文献   

17.
Standard errors of measurement of scale scores by score level (conditional standard errors of measurement) can be valuable to users of test results. In addition, the Standards for Educational and Psychological Testing (AERA, APA, & NCME, 1985) recommends that conditional standard errors be reported by test developers. Although a variety of procedures are available for estimating conditional standard errors of measurement for raw scores, few procedures exist for estimating conditional standard errors of measurement for scale scores from a single test administration. In this article, a procedure is described for estimating the reliability and conditional standard errors of measurement of scale scores. This method is illustrated using a strong true score model. Practical applications of this methodology are given. These applications include a procedure for constructing score scales that equalize standard errors of measurement along the score scale. Also included are examples of the effects of various nonlinear raw-to-scale score transformations on scale score reliability and conditional standard errors of measurement. These illustrations examine the effects on scale score reliability and conditional standard errors of measurement of (a) the different types of raw-to-scale score transformations (e.g., normalizing scores), (b) the number of scale score points used, and (c) the transformation used to equate alternate forms of a test. All the illustrations use data from the ACT Assessment testing program.  相似文献   

18.
WISC-R subtest scaled scores for 192 learning disabled Navajo Indian children were recategorized according to the system recommended by Bannatyne (1974), and subsequently analyzed using a one-way repeated measures analysis of variance. A Newman-Keuls Multiple Range Test was also conducted to determine significant pairwise comparisons. Results indicated that, as a group, the subjects failed to demonstrate the Spatial>Conceptual>Sequential pattern predicted by Bannatyne (1974). Implications for use of Bannatyne's system with learning disabled minority children are discussed.  相似文献   

19.
Researchers are often interested in testing the effectiveness of an intervention on multiple outcomes, for multiple subgroups, at multiple points in time, or across multiple treatment groups. The resulting multiplicity of statistical hypothesis tests can lead to spurious findings of effects. Multiple testing procedures (MTPs) are statistical procedures that counteract this problem by adjusting p values for effect estimates upward. Although MTPs are increasingly used in impact evaluations in education and other areas, an important consequence of their use is a change in statistical power that can be substantial. Unfortunately, researchers frequently ignore the power implications of MTPs when designing studies. Consequently, in some cases, sample sizes may be too small, and studies may be underpowered to detect effects as small as a desired size. In other cases, sample sizes may be larger than needed, or studies may be powered to detect smaller effects than anticipated. This paper presents methods for estimating statistical power for multiple definitions of statistical power and presents empirical findings on how power is affected by the use of MTPs.  相似文献   

20.
Learner-generated drawing is a strategy that can improve learning from expository text. In this paper, a model of drawing construction is proposed and the experimental design tests hypotheses derived from this model. Fourth and sixth grade participants used drawing under three experimental conditions with two conditions including varying degrees of support. On a problem solving posttest, both supported drawing groups scored higher than the non-drawing Control group. Although the grade by condition interaction was not significant, there was a strong trend in this direction. When sixth grade participants were considered independently, participants in the most supported drawing condition also obtained higher problem solving scores than those who drew without support. There were no significant condition effects for fourth grade nor were there any significant effects on a multiple-choice recognition posttest. Results were consistent with hypotheses and are discussed in light of the proposed theoretical framework.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号