دورية أكاديمية

Combining Machine Translation and Automated Scoring in International Large-Scale Assessments

التفاصيل البيبلوغرافية
العنوان: Combining Machine Translation and Automated Scoring in International Large-Scale Assessments
اللغة: English
المؤلفون: Ji Yoon Jung (ORCID 0009-0009-5995-219X), Lillian Tyack, Matthias von Davier
المصدر: Large-scale Assessments in Education. 2024 12.
الإتاحة: Springer. Available from: Springer Nature. One New York Plaza, Suite 4600, New York, NY 10004. Tel: 800-777-4643; Tel: 212-460-1500; Fax: 212-460-1700; e-mail: customerservice@springernature.com; Web site: https://link.springer.com/
Peer Reviewed: Y
Page Count: 18
تاريخ النشر: 2024
نوع الوثيقة: Journal Articles
Reports - Research
Education Level: Elementary Secondary Education
Descriptors: Artificial Intelligence, Automation, Scoring, International Assessment, Measurement, Translation, Multilingualism, Achievement Tests, Mathematics Achievement, Foreign Countries, Elementary Secondary Education, Mathematics Tests, Science Achievement, Science Tests, Technology Uses in Education
Assessment and Survey Identifiers: Trends in International Mathematics and Science Study
DOI: 10.1186/s40536-024-00199-7
تدمد: 2196-0739
مستخلص: Background: Artificial intelligence (AI) is rapidly changing communication and technology-driven content creation and is also being used more frequently in education. Despite these advancements, AI-powered automated scoring in international large-scale assessments (ILSAs) remains largely unexplored due to the scoring challenges associated with processing large amounts of multilingual responses. However, due to their low-stakes nature, ILSAs are an ideal ground for innovations and exploring new methodologies. Methods: This study proposes combining state-of-the-art machine translations (i.e., Google Translate & ChatGPT) and artificial neural networks (ANNs) to mitigate two key concerns of human scoring: inconsistency and high expense. We applied AI-based automated scoring to multilingual student responses from eight countries and six different languages, using six constructed response items from TIMSS 2019. Results: Automated scoring displayed comparable performance to human scoring, especially when the ANNs were trained and tested on ChatGPT-translated responses. Furthermore, psychometric characteristics derived from machine scores generally exhibited similarity to those obtained from human scores. These results can be considered as supportive evidence for the validity of automated scoring for survey assessments. Conclusions: This study highlights that automated scoring integrated with the recent machine translation holds great promise for consistent and resource-efficient scoring in ILSAs.
Abstractor: As Provided
Entry Date: 2024
رقم الأكسشن: EJ1420403
قاعدة البيانات: ERIC