Topic modelling on archive documents from the 1970s: global policies on refugees

التفاصيل البيبلوغرافية
العنوان: Topic modelling on archive documents from the 1970s: global policies on refugees
المؤلفون: Philip Grant, Sara Cosemans, Ratan Sebastian, Marc Allassonnière-Tang
المساهمون: Dynamique Du Langage (DDL), Université Lumière - Lyon 2 (UL2)-Centre National de la Recherche Scientifique (CNRS), Sebastian, Ratan
المصدر: Digital Scholarship in the Humanities
Digital Scholarship in the Humanities, Oxford University Press, 2021, 36 (4), pp.886-904. ⟨10.1093/llc/fqab018⟩
بيانات النشر: HAL CCSD, 2021.
سنة النشر: 2021
مصطلحات موضوعية: Topic model, Linguistics and Language, Refugee, 0211 other engineering and technologies, 02 engineering and technology, computer.software_genre, Language and Linguistics, Personalization, 060104 history, Political science, [INFO.INFO-DL]Computer Science [cs]/Digital Libraries [cs.DL], 0601 history and archaeology, [SHS.LANGUE]Humanities and Social Sciences/Linguistics, 021110 strategic, defence & security studies, business.industry, 06 humanities and the arts, Optical character recognition, Public relations, Computer Science Applications, National archives, Close reading, business, computer, Information Systems, Theme (narrative), Qualitative research
الوصف: This study conducts a historical analysis of global policies on refugees within typewritten and digitally born documents (c. 55,000 pages) from international and national archives. The data originate from the 1970s and are stored in archives from the UK and US governments, plus the United Nations High Commissioner for Refugees (UNHCR). The overarching theme is to analyse the involvement of the UK, the USA, and the UNHCR in different refugee cases that occurred during the 1970s. To do so, we (1) identify the main topics in each document; (2) investigate the transmission of topics horizontally (between organizations) and vertically (through time); and (3) suggest targeted areas of the document set for further close reading by historians. Standard Optical Character Recognition and object detection are used to extract information from documents and categorize them. Then, natural language processing (NLP) methods like topic modelling and clustering are used to identify topics and the relationships between them across time. The results identify several main themes covered by different organizations and how the focus of each organization changes diachronically. Besides its academic contribution, this study also demonstrates how, through the use of existing techniques with limited customization, digital technologies in the hands of the historian can augment and complement qualitative methods in bringing to light the themes and trends demonstrated in large bodies of historical documents.
اللغة: English
تدمد: 2055-7671
2055-768X
URL الوصول: https://explore.openaire.eu/search/publication?articleId=doi_dedup___::1a9b957ca6a2150463bbbe3997bf90ad
https://hal.archives-ouvertes.fr/hal-03435806
حقوق: OPEN
رقم الأكسشن: edsair.doi.dedup.....1a9b957ca6a2150463bbbe3997bf90ad
قاعدة البيانات: OpenAIRE