دورية أكاديمية
Scoring with the Computer: Alternative Procedures for Improving the Reliability of Holistic Essay Scoring
العنوان: | Scoring with the Computer: Alternative Procedures for Improving the Reliability of Holistic Essay Scoring |
---|---|
اللغة: | English |
المؤلفون: | Attali, Yigal, Lewis, Will, Steier, Michael |
المصدر: | Language Testing. Jan 2013 30(1):125-141. |
الإتاحة: | SAGE Publications. 2455 Teller Road, Thousand Oaks, CA 91320. Tel: 800-818-7243; Tel: 805-499-9774; Fax: 800-583-2665; e-mail: journals@sagepub.com; Web site: http://sagepub.com |
Peer Reviewed: | Y |
Page Count: | 17 |
تاريخ النشر: | 2013 |
نوع الوثيقة: | Journal Articles Reports - Evaluative |
Education Level: | Higher Education Postsecondary Education |
Descriptors: | Scoring, Essay Tests, Reliability, High Stakes Tests, College Entrance Examinations, Scoring Rubrics, Interrater Reliability, Automation, Correlation |
Assessment and Survey Identifiers: | Graduate Record Examinations |
DOI: | 10.1177/0265532212452396 |
تدمد: | 0265-5322 |
مستخلص: | Automated essay scoring can produce reliable scores that are highly correlated with human scores, but is limited in its evaluation of content and other higher-order aspects of writing. The increased use of automated essay scoring in high-stakes testing underscores the need for human scoring that is focused on higher-order aspects of writing. This study experimentally evaluated several alternative procedures for eliciting distinct human scores and improving their reliability. Essays written in response to the argument and issue tasks of the Analytical Writing measure of the GRE General Test were scored by experienced raters under different conditions. Criteria for evaluation included inter-rater agreement, agreement with machine scores, and cross-task reliability. First, the use of a modified scoring rubric that focused on higher-order writing skills increased the reliability for one type of task but decreased it for another. Second, scoring in batches of similar length essays did not have any effect on scores. Third, scoring with available automated essay scores increased reliability of human scores, but also increased their similarity with automated scores. Finally, the use of a more refined 18-point scoring scale significantly increased reliability. (Contains 6 tables, 2 figures and 1 note.) |
Abstractor: | As Provided |
Number of References: | 34 |
Entry Date: | 2013 |
رقم الأكسشن: | EJ1005785 |
قاعدة البيانات: | ERIC |
تدمد: | 0265-5322 |
---|---|
DOI: | 10.1177/0265532212452396 |