Nepali Phonetics and Character Mapping for Building Speech Technology

Nepali Phoneme Mapping

Script to Phoneme Mapping

Script Phoneme (IPA) Script Phoneme (IPA)
/ʌ/ /aː/
/i/ /iː/
/u/ /uː/
/e/ /ẽ/
/o/ /au/
अं /ʌ̃/ अँ /ã/
/k/ /kʰ/
/ɡ/ /ɡʱ/
/ŋ/ /t͡s/
/t͡sʰ/ /d͡z/
/d͡zʱ/ /ɲ/
/ʈ/ /ʈʰ/
/ɖ/ /ɖʱ/
/ɳ/ /t̪/
/tʰ/ /d̪/
/dʱ/ /n/
/p/ /pʰ/
/b/ /bʱ/
/m/ /j/
/r/ /l/
/w/ /s/
/ʂ/ /ɦ/
श्र /ʃr/ ज्ञ /d͡zɲ/
क्ष /kʃ/

JSON Representation

{
  "vowels": {
    "अ": "/ʌ/",
    "आ": "/aː/",
    "इ": "/i/",
    "ई": "/iː/",
    "उ": "/u/",
    "ऊ": "/uː/",
    "ऋ": "/r̩/",
    "ॠ": "/r̩ː/",
    "ए": "/e/",
    "ऐ": "/ẽ/",
    "ओ": "/o/",
    "औ": "/au/",
    "अं": "/ʌ̃/",
    "अँ": "/ã/",
    "अ:": "/ʌh/" 
  },
  "diacritics": {
    "ा": "/aː/",
    "ि": "/i/",
    "ी": "/iː/",
    "ु": "/u/",
    "ू": "/uː/",
    "ृ": "/r̩/",
    "ॆ": "/e/",
    "े": "/e/",
    "ै": "/ẽ/",
    "ॉ": "/o/",
    "ो": "/o/",
    "ौ": "/au/",
    "्": "virama",
    "ः": "/h/"  
  },
  "consonants": {
    "क": "/k/",
    "ख": "/kʰ/",
    "ग": "/ɡ/",
    "घ": "/ɡʱ/",
    "ङ": "/ŋ/",
    "च": "/t͡s/",
    "छ": "/t͡sʰ/",
    "ज": "/d͡z/",
    "झ": "/d͡zʱ/",
    "ञ": "/ɲ/",
    "ट": "/ʈ/",
    "ठ": "/ʈʰ/",
    "ड": "/ɖ/",
    "ढ": "/ɖʱ/",
    "ड़": "/ɽ/",
    "ढ़": "/ɽʱ/",
    "ण": "/ɳ/",
    "त": "/t̪/",
    "थ": "/tʰ/",
    "द": "/d̪/",
    "ध": "/dʱ/",
    "न": "/n/",
    "प": "/p/",
    "फ": "/pʰ/",
    "ब": "/b/",
    "भ": "/bʱ/",
    "म": "/m/",
    "य": "/j/",
    "र": "/r/",
    "ल": "/l/",
    "व": "/w/",
    "श": "/ʃ/",
    "ष": "/ʂ/",
    "स": "/s/",
    "ह": "/ɦ/",
    "श्र": "/ʃr/",
    "ज्ञ": "/d͡zɲ/",
    "क्ष": "/kʃ/"
  },
  "digits": {
    "०": "0",
    "१": "1",
    "२": "2",
    "३": "3",
    "४": "4",
    "५": "5",
    "६": "6",
    "७": "7",
    "८": "8",
    "९": "9"
  },
  "punctuation": {
    "!": "exclamation",
    "\"": "quotation",
    "'": "apostrophe",
    ",": "comma",
    ".": "period",
    ":": "colon",
    "?": "question_mark",
    "।": "danda",  
    "_": "underscore"
  }
}

  

What were the rules ?

Rules were inspired from wikipedia article: https://en.wikipedia.org/wiki/Nepali_phonology

Consonants

Spoken Nepali has 30 consonants in its native system, though some have tried to limit the number to 27.

Nepali consonant phonemes
Bilabial Dental Alveolar Retroflex Dorsal Glottal
Nasal m (म) n (न/ञ) (ɳ (ण)) ŋ (ङ)
Plosive/
Affricate
Voiceless Unaspirated p (प) (त) t͡s (च) ʈ (ट) k (क)
Aspirated (फ) (थ) t͡sʰ (छ) ʈʰ (ठ) (ख)
Voiced Unaspirated b (ब) (द) d͡z (ज) ɖ (ड) ɡ (ग)
Aspirated (भ) (ध) d͡zʱ (झ) ɖʱ (ढ) ɡʱ (घ)
Fricative s (स/श/ष) ɦ (ह)
Trill r (र)
Approximant (w (व)) l (ल) (j (य))

Vowels

Nepali has 11 phonologically distinctive vowels, including 6 oral vowels and 5 nasal vowels. In some contexts, intervocalic "h" leads to breathy-voiced vowels.

Nepali vowel phonemes
Front Central Back
Oral Nasal Oral Nasal Oral Nasal
Close i (इ) ĩ (ई) u (उ) ũ (ऊ)
Close-mid e (ए) (ऐ) o (ओ)
Open-mid ʌ (अ) ʌ̃ (अँ)
Open ä (आ) ã (आँ)

Normalization Rules

  1. Handling Halanta (्): A consonant with a halanta removes the inherent vowel "अ".
  2. Combining Matras: Replace the inherent vowel "अ" with the vowel corresponding to the matra.
  3. Clusters and Special Cases: Map clusters like क्ष, ज्ञ, श्र appropriately.