دورية أكاديمية
Combining Machine Translation and Automated Scoring in International Large-Scale Assessments
العنوان: | Combining Machine Translation and Automated Scoring in International Large-Scale Assessments |
---|---|
اللغة: | English |
المؤلفون: | Ji Yoon Jung (ORCID |
المصدر: | Large-scale Assessments in Education. 2024 12. |
الإتاحة: | Springer. Available from: Springer Nature. One New York Plaza, Suite 4600, New York, NY 10004. Tel: 800-777-4643; Tel: 212-460-1500; Fax: 212-460-1700; e-mail: customerservice@springernature.com; Web site: https://link.springer.com/ |
Peer Reviewed: | Y |
Page Count: | 18 |
تاريخ النشر: | 2024 |
نوع الوثيقة: | Journal Articles Reports - Research |
Education Level: | Elementary Secondary Education |
Descriptors: | Artificial Intelligence, Automation, Scoring, International Assessment, Measurement, Translation, Multilingualism, Achievement Tests, Mathematics Achievement, Foreign Countries, Elementary Secondary Education, Mathematics Tests, Science Achievement, Science Tests, Technology Uses in Education |
Assessment and Survey Identifiers: | Trends in International Mathematics and Science Study |
DOI: | 10.1186/s40536-024-00199-7 |
تدمد: | 2196-0739 |
مستخلص: | Background: Artificial intelligence (AI) is rapidly changing communication and technology-driven content creation and is also being used more frequently in education. Despite these advancements, AI-powered automated scoring in international large-scale assessments (ILSAs) remains largely unexplored due to the scoring challenges associated with processing large amounts of multilingual responses. However, due to their low-stakes nature, ILSAs are an ideal ground for innovations and exploring new methodologies. Methods: This study proposes combining state-of-the-art machine translations (i.e., Google Translate & ChatGPT) and artificial neural networks (ANNs) to mitigate two key concerns of human scoring: inconsistency and high expense. We applied AI-based automated scoring to multilingual student responses from eight countries and six different languages, using six constructed response items from TIMSS 2019. Results: Automated scoring displayed comparable performance to human scoring, especially when the ANNs were trained and tested on ChatGPT-translated responses. Furthermore, psychometric characteristics derived from machine scores generally exhibited similarity to those obtained from human scores. These results can be considered as supportive evidence for the validity of automated scoring for survey assessments. Conclusions: This study highlights that automated scoring integrated with the recent machine translation holds great promise for consistent and resource-efficient scoring in ILSAs. |
Abstractor: | As Provided |
Entry Date: | 2024 |
رقم الأكسشن: | EJ1420403 |
قاعدة البيانات: | ERIC |
كن أول من يترك تعليقا!