دورية أكاديمية

Probability estimate and the optimal text size

التفاصيل البيبلوغرافية
العنوان: Probability estimate and the optimal text size
المؤلفون: Kostić Aleksandar, Ilić Svetlana, Milin Petar
المصدر: Psihologija, Vol 41, Iss 1, Pp 35-51 (2008)
بيانات النشر: Drustvo Psihologa Srbije, 2008.
سنة النشر: 2008
المجموعة: LCC:Psychology
مصطلحات موضوعية: corpus linguistics, reliability of text sample, Psychology, BF1-990
الوصف: Reliable language corpus implies a text sample of size n that provides stable probability distributions of linguistic phenomena. The question is what is the minimal (i.e. the optimal) text size at which probabilities of linguistic phenomena become stable. Specifically, we were interested in probabilities of grammatical forms. We started with an a priori assumption that text size of 1.000.000 words is sufficient to provide stable probability distributions. Text of this size we treated as a "quasi-population". Probability distribution derived from the "quasi-population" was then correlated with probability distribution obtained on a minimal sample size (32 items) for a given linguistic category (e.g. nouns). Correlation coefficient was treated as a measure of similarity between the two probability distributions. The minimal sample was increased by geometrical progression, up to the size where correlation between distribution derived from the quasi-population and the one derived from an increased sample reached its maximum (r=1). Optimal sample size was established for grammatical forms of nouns, adjectives and verbs. General formalism is proposed that allows estimate of an optimal sample size from minimal sample (i.e. 32 items).
نوع الوثيقة: article
وصف الملف: electronic resource
اللغة: English
Serbian
تدمد: 0048-5705
Relation: https://doaj.org/toc/0048-5705
DOI: 10.2298/PSI0801035K
URL الوصول: https://doaj.org/article/ecbca457bfc349a0994a316a84c8a45c
رقم الأكسشن: edsdoj.bca457bfc349a0994a316a84c8a45c
قاعدة البيانات: Directory of Open Access Journals
الوصف
تدمد:00485705
DOI:10.2298/PSI0801035K