Sun 01 August 2021

Welcome to Week 1 of my exploration of the languages of South Asia! This week we'll look at Khasi, an Austro-Asiatic language spoken in the state of Meghalaya in Northeast India, particularly in the 3 Khasi Hills districts (East, West, and South West). The biggest city and center for Khasi speakers in this region is of course Shillong, which I do recommend you visit if you haven't been. Try the doh snam (blood sausage) if you're a meat-eater (on that note, if you're a vegetarian you might have some trouble in this region in general).

Khasi Language Distribution

Within the Austro-Asiatic languages, Khasi is generally classified in the Mon-Khmer group, though it is quite distantly related to the two most famous members of that group: Vietnamese and Khmer. Within India Khasi is related to the Munda languages of West Bengal, Jharkhand, Odisha, and Chhattisgarh, though again quite distantly as Munda is generally classified as a branch distinct from Mon-Khmer.

Khasi is an SVO (Subject-Verb-Object) language, with an extensive case system, a classifier system for number phrases, two genders (Feminine - Ka, Masculine - U), and a productive prefixation morphology that results in many complex consonant clusters at word onset (e.g., td, pn, kt). Written Khasi uses the Latin script. Here is the poem, U Phlang Jyrngam ("The Green Grass") by Soso Tham to get a feel for the language (excerpted from Nongkynrih, 2006):

Jar-jar hapoh ki dieng ha khlaw,
U san hapdeng ki ñiut;
U syntiew pher, u tiew-dohmaw,-
Laiphew-na-ar jingmut.

Jar-Jar harud ki wah ba tngen,
Ban iwbih ynda stai;
U tiew tyrkhang ba ai jingkmen,
U jyrngam khadar bnai.

Iathuh, premmiet ba ieit ki blei,
Bad phi ki lyoh bun rong;
Iathuh ia nga u don haei
U khlur ba paw nyngkong.

Jar-Jar u im, jar-jar u jah,
Hapoh rai-eh rai-dam;
Jar-Jar ha jingtep ai un thiah,-
Hapoh u phlang jyrngam.

Work on the Khasi language, like much of this region, is sparse, with considerably greater attention paid to the social, religious, and political/historical characteristics of the Khasi people. But I have no expertise in this area, so I'll leave them for you to explore.

The following is a bibliography showcasing the present state of research on Khasi, organized according to linguistic subfield. In this you'll see a few key names and centers of research. Lili Rabel-Heymann was perhaps the first major authority on Khasi, releasing a grammar in 1961 and several follow-up articles in the decades after. Around the same time, Eugénie Henderson began publishing on Khasi phonology, and to this day Henderson is responsible for the vast majority of work on Khasi phonology. K.S. Nagaraja, who has worked for the Central Institute of Indian Languages (CIIL) for decades, has published numerous books and papers on Khasi, focused primarily on morphology and syntax, but also general grammars and historical/typological considerations. Finally, more recently Medari Tham has been producing a lot of computational research on Khasi. This work is mostly focused on part-of-speech (POS) tagging and corpus construction, though there is likely much more to come since Tham only began working in this area around a decade ago.

Some of the key places you'll find linguists working on Khasi are the North Eastern Hill University in Shillong, the Shillong campus of the English and Foreign Languages University (EFLU), some of the major colleges in Shillong, such as St. Edmonds, St. Anthony's, St. Mary's, and Lady Keane College, though these colleges are focused mostly on teaching, and some of the major universities in the nearby city of Guwahati in Assam: IIT Guwahati and Guwahati University.

Grammars and general descriptions

  • Rabel (1961). Khasi, a language of Assam
  • Roberts (2005). A Grammar of the Khasi Language
  • Bareh, C. (2007). Descriptive analysis of the Jowai and Rymbai dialects of Khasi
  • Nagaraja (2015). "Standard Khasi"

Historical linguistics

  • Diffloth (2008). "Shafer's 'parallels' between Khasi and Sino-Tibetan"
  • Daladier (2011). "The group Pnaric-War-Lyngngam and Khasi as a branch of Pnaric"
  • Sidwell (2011). "Proto Khasian and Khasi-Palaungic"
  • Nagaraja, Sidwell, & Greenhill (2013). "A lexicostatistical study of the Khasian languages: Khasi, Pnar, Lyngngam, and War"
  • Daladier (2015). "The counting unit system of Pnar, War, Khasi, Lyngam and its traces in Austroasiatic composite cardinal systems"

Dialectology and typology

  • Nagaraja (1993). "Khasi dialects: A typological consideration"
  • Sharma (1996). "A contrastive study of Manipuri (Meiteilon) and Khasi"
  • Nagaraja (2008). "Korku-Khasi: A typological study"
  • Diengdoh (2014). Comparative study of Sohkha (War) and Wahiajer (Pnar) varieties of Khasi language

Phonetics and phonology

  • Henderson (1965). "Final-k in Khasi: A secondary phonological pattern"
  • Henderson (1967). "Vowel length and vowel quality in Khasi"
  • Henderson (1976). "Khasi initial clusters"
  • Nagaraja (1990). Khasi phonetic reader
  • Henderson (1992). "Khasi clusters and Greenberg's universals"
  • Syiem, Rynjah, & Singh (2018). "Temporal representation of vowels in Khasi dialect"

Morphology and the lexicon

  • Ehrenfels (1953). "Khasi kinship terms in four dialects"
  • Henderson (1976). "Vestiges of morphology in Modern Standard Khasi"
  • Rabel-Heymann (1976). "Analysis of loanwords in Khasi"
  • Rabel-Heymann (1976). "Sound symbolism and Khasi adverbs"
  • Rabel-Heymann (1977). "Gender in Khasi nouns"
  • Nagaraja (1979). "Contraction of Khasi nouns in compounds"
  • Nagaraja (1984). "Compounding in Khasi"
  • Nagaraja (1984). "Reduplication in Khasi"
  • Rabel-Heymann (1989). "Khasi kinship terminology"
  • Abbi & Victor (1997). "Expressive morphology and manner adverbs in Khasi, Tangkhul Naga and Kuki-Chin languages"
  • Nagaraja (2001). "Word formation in Khasi"
  • Baishya & Shabong (2012). "Indo-Aryan loan words in Khasi: An introductory note"
  • Singh (2015). Khasi English dictionary
  • War (2020). "Changes in Khasi kinship terminology"


  • Jyrwa (1989). A descriptive study of the noun phrase in Khasi
  • War (1992). The personal pronouns and their related clitics in six Khasi dialects: A grammatical and sociolinguistic study
  • Nagaraja (1993). "Agreement in Khasi and Munda languages"
  • Sharma (1999). "A comparison between Khasi and Manipuri word order"
  • Bedell (2011). "Agreement in Khasi relative clauses"
  • Shandilya (2016). "Spatial deixis: A typological study in Kharia, Santhali, Khasi, and Pnar"
  • Rynjah (2019). "A comparative study of the tense and aspect in Standard Khasi and its varieties"

Computational linguistics

  • Tham (2012). "Design considerations for developing a Parts-of-Speech tagset for Khasi"
  • Wahlang (2015). Construction of corpus and detection of communities in Khasi language network
  • Tham (2015). "Experimental observations on applying BIS standard annotation to Khasi language"
  • Hujon (2016). "Finite state automata as analyzers of certain grammar class rules in the Khasi language"
  • Tham (2018). "Challenges and issues in developing an annotated corpus and HMM POS tagger for Khasi"
  • Tham (2018). "Khasi shallow parser"
  • Warjri, Pakray, Lyngdoh, & Maji (2018). "Khasi language as dominant Part-of-Speech (POS) ascendant in NLP"
  • Warjri, Pakray, Lyngdoh, & Maji (2019). "Identification of POS tag for Khasi language based on Hidden Markov Model POS tagger"
  • Rynjah, Syiem, & Singh (2020). "Khasi speech recognition using Hidden Markov Model with different spectral features: A comparison"
  • Singh & Hujon (2020). "Low resource and domain specific English to Khasi SMT and NMT systems"
  • Tham (2020). "A hybrid POS tagger for Khasi, an under resourced language"


  • Henderson (1991). "Problems and pitfalls in the phonetic interpretation of Khasi orthography"
  • Bhattacharya & Bhattacharjee (2003). "Khasi-Bengali to Roman: The colonial transformation of Khasi-Jaintia Hills"
  • Gruessner (2004). "Khasi: A minority language of North-East India, from an unwritten to a written language"
  • Marwein (2014). Protection of Khasi language under the Indian constitution: Need and challenges
  • Talang-Rao (2020). "Factors influencing code-mixing and code-switching in Khasi and Jaiñtia Hills"

In the LSI, Grierson next moves west to survey the Munda languages, though we'll stay in Meghalaya for a bit as there are several languages that were at the time described as dialects of Khasi but have since been shown to be more distinct than previously thought. In this regard, next week we'll start with Pnar.