Significant Sequences in World Englishes: A Data-driven Approach to Variationist Models

The present study attempts a strictly data-driven/bottom-up evaluation of models of World Englishes. Methodologically, diverging degrees of association within lexical and grammatical n-grams are chosen as the linguistic basis on which to estimate similarities and differences between 15 varieties of...

Full description

Saved in:
Bibliographic Details
Main Author: Koch, Christopher
Contributors: Kreyer, Rolf (Prof. Dr.) (Thesis advisor)
Format: Doctoral Thesis
Language:English
Published: Philipps-Universität Marburg 2021
Subjects:
Online Access:PDF Full Text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:The present study attempts a strictly data-driven/bottom-up evaluation of models of World Englishes. Methodologically, diverging degrees of association within lexical and grammatical n-grams are chosen as the linguistic basis on which to estimate similarities and differences between 15 varieties of World Englishes as represented by the International Corpus of English. To this end, sequences of both dynamic and statics lengths are generated from homogenized components of the corpus in both its regular lexical format as well as a POS-annotated version. Analysis of collocational preference is carried out by applying five association measures of both traditional (MI-score, t-score, log-likelihood) as well as more innovative designs (lexical gravity, Delta P) to the respective datasets. On the basis of these association patterns within the different datasets, groups of varieties exhibiting similar association profiles can be established through the application of various clustering techniques. In essence, these methods identify binary pairs of varieties which display the least amount of difference, and consecutively merge these. This process is repeated until all varieties are accounted for within the cluster structure. Since clustering methods, however, have a tendency of discovering patterns even in random data, results from various clustering techniques (hierarchical clustering, k-means, phylogenetic clustering) are triangulated for the present study, and segmentations within the hierarchical structures are empirically substantiated through the application of random resampling of the data. The variety clusters thus obtained are in turn contrasted to expectations derived from extra-linguistic assessments informed by major language-externally grounded models. This particularly concerns three types of models for the description of World Englishes: 1) traditional, tripartite distinctions into English as a native/second/foreign language, 2) models of regional standardization and epicentral effects (Hundt 2013), as well as 3) evolutionary models of language and identity formation in postcolonial settings (Schneider 2007, 2014). Each of these models would suggest different patterns of similarity within the data, and are thus in turn contrasted against the empirical findings based on variety-specific association profiles. Results of the study support an interpretation along regional criteria most strongly. In particular, the African varieties are commonly found to differentiate clearly from the remaining data, while exhibiting internal regional separation. While there is some separation between traditional ENL/ESL varieties within the spoken data, a regional explanation emerges most strongly from the more comprehensive written dataset. The Asian data as a whole least support this interpretation and commonly fragment into smaller and more fluid groups, but on a more fine-grained level, pairs based on regional proximity still recur frequently. Support for groups based on Schneider’s dynamic model is generally low: Some clusters emerging from the data match those expected on the basis of the model, but convergence is generally lower than within an analysis based on proximity. In the latter case, the data frequently mirror not only large-scale but also more fine-grained patterns, while several groups of varieties based on similar degrees of exo- or endonormative normative stabilization fail to reliably emerge from the data. Thus, the analysis concludes by favoring regional and cultural proximity over other explanative approaches for the description of association patterns in World Englishes.
Physical Description:333 Pages
DOI:10.17192/z2023.0077