Menu
Menu

India Flag India

Country Overview

Business Culture

Clothing Size Guides

Communications

Cost of Living

Culture and Society

Demographics

Driving and Autos

Economy and Trade

Education

Educational Resources

Environment

Export Process

Food Culture and Drink

Geography

Government

Health and Medical

History

Holidays and Festivals

Import Process

Language

Kids' Stuff

LGBTQ+

Life Stages

Maps

Media Outlets

Money and Banking

Music

Names

National Symbols

Points of Interest

Quality of Life

Real Estate

Religion

Security Briefing

Social Indicators

Travel Essentials

Language: Spoken Languages

The official language of India is Hindi, spoken by 41 percent of the population. English, spoken by 10 percent of the population, functions as a secondary official language on the national level.  Various other languages are official in their respective regions: Bengali (8 percent), Telugu (7 percent), Marathi (7 percent), Tamil (5 percent), Urdu (5 percent), Gujarati (4 percent), Kannada (3 percent), Malayalam (3 percent), Oriya (3 percent), Punjabi (2 percent), Assamese (1 percent), Maithili (1 percent), Odia, Santali, Kashmiri, Nepali, Konkani, Dogri, Sindhi, Manipuri, and Sanskrit. Because large portions of the population are not conversant in either Hindi or English, languages conflicts can arise when traveling between various Indian states.

Many indigenous languages are spoken in addition to the official languages. Some of the more commonly spoken indigenous languages are Bhojpuri, Rajasthani, Magadh, Chhattisgarhi, Haryanvi, Marwari, Malvi, and Mewari.

With the exception of Hindi, Marathi, and Angika, which all use the Devangari script, each language typically uses its own script, most of which are Brahmi-derived. However, Urdu uses an Arabic-derived script, and some languages use scripts that are neither Brahmi nor Arabic.

Hindi / हिन्दी

नमस्कार

( Hello )

History and Evolution

Hindi, the official language of India, is based on Khariboli, the language of Delhi and the surrounding region of western Uttar Pradesh and southern Uttarakhand. Originally a dialect in the 17th century Mughal Empire, it became known as Urdu, or the “language of the court.”  By the late 19th century, a movement to standardize written Hindi separated it from Urdu, and it was accepted as the official language of the powerful state of Bihar. After independence, the government of India ratified A Basic Grammar of Modern Hindi, along with a standardized orthography using the Devanagari script. Despite its standardization and official status, Hindi is essentially the same spoken language as Urdu; the differences are that Urdu is written in the Perso-Arabic script and uses fewer Sanskrit and more Arabic and Persian words. Linguistically, the two languages are very similar.  

क्षमा करें

( Excuse me )

Geographic Distribution

In addition to being the official language of the Republic of India, Hindi is an official language of nine Indian states and five Union Territories. A 2009 report counted 180 million native speakers of Hindustani, although the 2001 census reported 258 million speakers of Hindi and related languages considered to be dialects of Hindi. (Urdu, which is all but identical to Hindi, is the most widely spoken language in neighboring Pakistan.) In addition to India, there are large communities of Hindi-speakers in South Africa and Nepal, as well as pockets of Hindi-speakers in the Middle East, Asia, and North America, among other places.

सुप्रभात

( Good morning )

Prominence in Society

Hindi is the official language of India, and is used in government, education, business, and the media.

कृपया

( Please )

Unique Characteristics

Hindi has 10 vowels (a, ā, i, ī, u, ū, e, o, ai, au) and 28 consonants.

Hindi uses compound verbs, where an auxiliary verb is tacked on to a stem verb to change its meaning. For example, jānā means "to go" with a sense of completion, finality, or change of state; the verb ānā means "to come," while the compound verb ā jānā means "to arrive.”

Word order is relatively free form. Typically, though not always, nouns are followed by postpositions rather than preceded by prepositions, and adjectives come before the words they modify.  The most common sentence structure is subject-object-verb. 

The Hindi script, Devanagari, is a type of abugida alphabet, in which each consonant has an inherent vowel that can be changed with diacritics, or vowel signs. Consonants are often joined to one or two others so that the inherent vowel is suppressed.

Much of the vocabulary of formal Standard Hindi is drawn from Sanskrit, although many words are pronounced or spelled differently.

While Sanskrit spelling is phonetic (sounds correlate to symbols), Hindi written in the Devanagari alphabet is only partly phonetic. A given word has only one correct pronunciation, but not all pronunciations are perfectly clear from the spelling alone.

आप कैसे हैं?

( How are you? )

Loanwords in English

There are many Hindi words in common English usage, however most derive from Sanskrit origins and not from Modern Hindi per se. Many of these words migrated into English during the long period of British rule over the Indian sub-continent.

  • bangle ( बांगड़ी, bāngṛī; a type of bracelet)
  • bungalow (बंगला, bangla; a house in the Bengal style)
  • cot  (खाट, Khāt; a portable bed)
  • jungle (जङल्, jangal;  “wilderness” or “forest”)
  • thug (ठग, thagi; "thief” or “con man")
  • toddy (from ताड़ी, Tārī; juice of the palmyra palm that is often served hot)
  • verandah (बरामदा, baramdaa; “porch”)

धन्यवाद

( Thank you )

Say Whaaat?

There are 48 official dialects of Hindi, and many more unofficial ones. By some counts, there are more than 422 million speakers of languages that are considered dialects of Hindi, although this includes many dialects whose ties to Modern Standard Hindi are tenuous at best.

There are some 380,000 speakers of Fijian Hindi on the islands of Fiji, spoken primarily by those of Indian descent, although the language is not derived from Hindustani. Rather, it is a variety of Awadhi that was influenced by Bhojpuri and other Bihari languages.

Most Bollywood films produced in Mumbai use dialects of Hindi for spoken lines, and two different scripts for titles: Latin and Devanagari. Depending on context, some films will include Perso-Arabic script.

अलविदा

( Good-bye )

Writer: Bruce Falstein

Hindi Quick Facts

Origin

North India

Native Speakers
180–260 million

Second-language Speakers

50–70 million

Official Language

 India

Recognized Language

N/A

Language Family

Indo-European

  • Indo-Iranian
    • Indo-Aryan
      • Sanskrit
        • Central Zone (Hindi)
          • Western Hindi
            • Hindustani
              • Khariboli
                • Hindi

Standard Form

Modern Standard Hindi

Dialects

Western Hindi

Eastern Hindi

Script

Devanagari script

Alphabet

Hindi alphabet

Regulated by

Central Hindi Directorate

ISO Codes

ISO 639-1 (hi)

ISO 639-2 (hin)

ISO 639-3 (hin)

English

Hello

History and Evolution

Originally spoken in the British Isles, English is a major world language that belongs to the West Germanic branch of the Indo-European language family.

Around the 5th century CE, the Germanic Anglo-Saxon tribe invaded the British Isles, arriving in the southeast. Over the following centuries, these settlers moved into the interior and their Germanic language replaced indigenous Celtic vernaculars. By the 7th century, the first texts in Old English (or Anglo-Saxon) emerged. In the late 9th century, King Alfred the Great of Wessex launched a program to translate many religious and secular texts from Latin in order to make the spoken Old English vernacular the language of governance and education. The West Saxon dialect thus became the standard variety of Old English. Old English is nearly impossible for modern speakers to understand. It is grammatically closer to German, as it has an extensive case system with inflected endings for both nouns and verbs.

The first waves of Norse invasions occurred in the 8th century, bringing Old English into close contact with the North Germanic Old Norse language. With the Norman Conquest of England in 1066, Norse-influenced Old English came into contact with Old Norman, the Romance predecessor of modern French.

This period of Norman occupation marked the beginning of the development of Middle English, which lost the characteristic case system and gained a substantial Romance-influenced vocabulary. While Norman rulers and elites spoke Old Norman, the lower classes continued to speak English, resulting in a socio-linguistic divide that brought thousands of Romance and Latin loanwords into English, particularly for topics relating to politics, law, and culture. Today, the elevated register of the Norman influence is seen in the difference between the Norman and Anglo-Saxon words “beverage” and “drink” or “purchase” and “buy.” In the late 14th century, Geoffrey Chaucer’s Middle English collection of stories The Canterbury Tales popularized English as a legitimate literary language.

The Early Modern period began in the 16th century, which was marked by the Great Vowel Shift, a watershed change in vocalic pronunciation, along with increasing linguistic standardization. The introduction of the first printing press to London in 1476 helped establish the London dialect as the standard written form. Literature also flourished during this period, during which some of the most important texts in the English language were published, including the plays of William Shakespeare and a translation of the Bible commissioned by King James I.

While it took centuries for English to dominate the British Isles, the language spread quickly throughout the world, beginning in the 18th century. British explorers and scientists, in the service of the British Empire, brought English to every continent. This linguistic spread was facilitated by increasing standardization, aided by the modern conventions for grammar and orthography established in Dr. Samuel Johnson’s influential 1755 A Dictionary of the English Language.

The decline of the British Empire after World War II did not diminish the global importance of English. As former British colonies gained independence, they established their own standards for speaking and writing English, often in addition to indigenous languages. At the same time, the United States ascended rapidly, becoming a world economic and political superpower. By the end of the 20th century, English had become a global lingua franca.

Good morning

Geographic Distribution

Worldwide, approximately 360 to 400 million people speak English as their native language, making it the third largest language in terms of native speakers, after Mandarin and Spanish. In the six countries of the Anglophone world, the greatest number of English-language speakers are found in the United States, followed by the United Kingdom, Canada, Australia, Ireland, and New Zealand.

Though India does not belong to the Anglophone world, it has the second-highest number of native speakers of English, after the United States. English serves as a lingua franca between speakers of India’s diverse languages.

English is an official language in 67 countries, spanning every continent. Several countries throughout the world use English as a de facto official language, including numerous island nations throughout the Caribbean and Pacific that were historically under British or American colonial control.

The major varieties of English include North American, British, and Australian English, each of which has numerous sub-dialects. The oft-repeated joke that the United States and the United Kingdom are “two countries divided by a common language” is somewhat misleading. While English is a pluricentric language, meaning that there is no single standard, most varieties of English are mutually intelligible and differ largely in terms of pronunciation and vocabulary. For example, Americans live in apartments, wear pants, and eat cookies, while the British live in flats, wear trousers, and eat biscuits. In written English, though, the primary difference between the two varieties of English are minor spelling and punctuation conventions, such as “program” versus “programme” and the use of double or single quotes to mark speech.

Major British dialects are grouped as English, Scottish, Welsh, and Irish. Received Pronunciation, sometimes referred to as “The Queen’s English” or “Oxford English” is considered the standard, but it is only used by about two percent of the population. Major British subdialects include Cockney, which is associated with lower socioeconomic class, Estuary (southern) English, West Country, Midlands, and Northern English. Scottish and Irish accents use a characteristic rolled “r.” Most major British cities also have distinct dialects, such as Liverpool, Birmingham, and Manchester. British dialects tend to display greater variation than American dialects, perhaps due to ethnic factors, isolated geography, or a shorter history of standardized media.

General American English distinguishes itself from most other varieties of English spoken around the world with its rhotic accent, or the “hard ‘r’.” The non-rhotic accent characteristic of modern British English’s Received Pronunciation actually emerged among the upper classes of southern England during the early 19th century as a way of distinguishing themselves from commoners, some of whom had become wealthy during the Industrial Age.

The major American dialects are regional, including the Midwest, Southern, Texan, and New England dialects, or cultural, such as African-American Vernacular English or Chicano English. The Midwestern variety is considered a neutral accent and is the favored form in broadcasting media. New York, Boston, and Southern American accents are less rhotic than most other American dialects. These regions were heavily involved in trade with England during the 19th century, and linguists believe wealthy American colonists adopted the elite British pronunciation as a marker of status.

English-based creole languages, which vary greatly in terms of morphology and lexicon, are commonly spoken in Nigeria, the Philippines, India, and Jamaica. In some cases, the pronunciation and vocabulary of these creole languages are so different from standard American and British English that they are not mutually intelligible.

Please

Prominence in Society

English is considered the world’s first truly global language and serves as the international lingua franca of business, diplomacy, transportation, entertainment, technology, and research. It is one of the official languages of the United Nations, among other prominent international organizations.

As the most commonly taught foreign language in the world, knowledge of English is considered essential for a wide variety of careers, particularly business, medicine, science, and tourism.

English dominates global media, from Hollywood movies and popular music to Indian Bollywood films. Twenty-eight percent of all books published in the world are written in English, and approximately one-third of all internet websites are in English.

English is the official language of the seas and skies by various international treaties; all seafarers and aviation pilots must speak English.

How are you?

Unique Characteristics

English spelling is highly irregular, and pronunciation often has little to do with spelling, as illustrated with the following sentence with words containing “-ough,” pronounced nine different ways: "A rough-coated, dough-faced, thoughtful ploughman strode through the streets of Scarborough; after falling into a slough, he coughed and hiccoughed."

Punctuation is important in English to mark grammatical function. Simply compare these two sentences: “Let’s eat, grandma” and “Let’s eat grandma.” Without the comma, “grandma” becomes the direct object, rather than the addressee.

English makes extensive use of phrasal verbs, which are often quite idiomatic in meaning. For example, “to hang in,” “to hang on,” “to hang out,” and “to hang up” all have very different meanings.

As English lacks an inflected case system (with the exception of who/whom and he/him), word order is extremely important in communicating meaning. The subject-verb-object sequence is used almost exclusively and word order is quite inflexible. Compare, for example, “The dog bites the man” and “The man bites the dog.”

Neologisms (new words) are formed by the following processes: (1) conversion, using nouns as verbs or verbs as nouns (e.g., “to find” and “a find”); (2) compounding, putting two words together (e.g., homesick); or (3) adding suffixes such as “-ness,” “-hood,” or “-ing” to new or existing words.

Modern English derives approximately 29 percent of its words from Latin, 29 percent from French, 26 percent from Germanic languages, 6 percent from Greek, and another 10 percent from 120 other languages. Latin and Greek, for example, gave English much of its medical and scientific vocabulary, while French heavily influenced English’s lexicon pertaining to legal topics, military, politics, and cuisine. The motley origins of English vocabulary are a testament to its early history of foreign domination, and its emergence in the modern period as a global language.

Thank you

Say Whaaat?

With well over a million words, English is believed to have the largest lexicon of any language in the world. The Oxford English dictionary adds approximately 4,000 new words per year, meaning that a new word is coined every two hours.

William Shakespeare alone contributed an estimated 3,000 new words, sayings, and phrases to the English language. If you bid goodbye saying “Good riddance,” find yourself “in a pickle,” or “refuse to budge an inch,” you are quoting Shakespeare.

English is believed to be the first major language in human history to have more non-native speakers than native speakers.

The oldest, shortest, and most commonly used word in English is “I,” both in written and spoken contexts.

“Go” and “I am” are the shortest complete sentences in English.

The letter “e” is the most frequently used letter in English, comprising 11 percent of all words.

Good-bye

Writer: Carly Ottenbreit

English Quick Facts

Origin

Early medieval England

Native Speakers

359–400 million

Second-language Speakers

470 million–1 billion

Official Language

 Akrotiri and Dhekelia

 American Samoa

 Anguilla

 Antigua and Barbuda

 Australia (de facto)

 Bahamas

 Barbados

 Belize

 Bermuda

 Botswana

 British Virgin Islands

 Cameroon

 Canada

 Cayman Islands

 Christmas Island

 Cook Islands

 Curaçao

 Dominica

 Falkland Islands

 Federated States of Micronesia

 Gambia

 Ghana

 Gibraltar

 Grenada

 Guam

 Guyana

 Hong Kong

 India

 Ireland

 Isle of Man

 Jamaica

 Jersey

 Kenya

 Kiribati

 Lesotho

 Liberia

 Malawi

 Malta

 Marshall Islands

 Namibia

 Nauru

 New Zealand (de facto)

 Nigeria

 Niue

 Norfolk Island

 Northern Mariana Islands

 Pakistan

 Palau

 Papua New Guinea

 Philippines

 Pitcairn Islands

 Puerto Rico

 Rwanda

 Saint Kitts and Nevis

 Saint Lucia

 Saint Vincent and the Grenadines

 Samoa

 Seychelles

 Sierra Leone

 Singapore

 Sint Maarten

 Solomon Islands

 Somaliland

 South Africa

 South Sudan

 Sudan

 Swaziland

 Tanzania

 Tonga

 Trinidad and Tobago

 Turks and Caicos Islands

 Tuvalu

 Uganda

 United Kingdom (de facto)

 United States (de facto)

 US Virgin Islands

 Vanuatu

 Zambia

 Zimbabwe

Recognized Language

 Bangladesh

 British Indian Ocean Territory

 Brunei

 Cocos (Keeling) Islands

 Eritrea

 Ethiopia

 Guernsey

 Israel

 Malaysia

 Montserrat

 Saint Helena, Ascension and Tristan da Cunha (UK)

 Sri Lanka

 Tokelau

Language Family

Indo-European

  • Germanic
    • West Germanic
      • North Sea Germanic
        • Anglo-Frisian
          • Anglic
            • English

Standard Forms

Received Pronunciation (UK)

General American English

Standard English (by country)

Dialects

United Kingdom/Ireland

  • British English
  • Northern (England)
  • East Midlands (England)
  • West Midlands (England)
  • East Anglian (England)
  • Southern (England)
  • West Country (England)
  • Scottish English
  • Welsh English
  • Ulster English
  • Manx English
  • Guernsey English
  • Jersey English
  • Hiberno-English (Rep. of Ireland)

United States

  • African American Vernacular English
  • Chicano English
  • New York Latino English
  • Pennsylvania Dutch English
  • Yeshivish
  • Hawaiian Pidgin
  • New England English
  • Mid-Atlantic dialects
  • Inland Northern American English
  • Upper Midwest American English
  • Midland American English
  • Miami accent
  • Southern American English
  • Southwestern dialects
  • Western American English

Canada

  • Newfoundland English
  • Maritime English
  • West-Central Canadian English

Native America/indigenous peoples

  • Mojave English
  • Isletan English
  • Tsimshian English
  • Lumbee English
  • Tohono O'odham English
  • Inupiaq English

Central/South America

  • Belizean English
  • Bay Islands English
  • Falkland Islands English
  • Caribbean English
  • Anguillan English
  • Antiguan English
  • Bahamian English
  • Bajan English (Barbados)
  • Jamaican English
  • Trinidadian English
  • Vincentian English

Asia

  • Brunei English
  • Burmese English
  • Hong Kong English
  • Pakistani English
  • Indian English
  • Nepali English
  • Malaysian English (Manglish)
  • Philippine English
  • Singapore English
  • Sri Lankan English

Africa

  • Cameroonian English
  • Kenyab English
  • Liberian English
  • Malawian English
  • Namibian English (Namlish)
  • Nigerian English
  • South African English
  • South Atlantic English
  • Ugandan English

Australia/New Zealand

  • General Australian
  • Broad Asutralian
  • Cultivated Australian
  • Australian Aboriginal English
  • South Australian English
  • Western Australian English
  • Torres Strait English
  • Victorian English
  • Queensland English
  • New Zealand English
  • Maori English
  • Southland accent (NZ)
  • Taranaki accent (NZ)

Script

Latin script

Alphabet

English alphabet

Regulated by

N/A

ISO Codes

ISO 639-1 (en)

ISO 639-2 (eng)

ISO 639-3 (eng)

Bengali / বাংলা

History and Evolution

Bengali, also called Bangla, is the national language of Bangladesh and the second-most spoken language in India, with about 250 million native speakers in both countries, and 300 million worldwide. The language first emerged around 1000 to 1200 CE, along with other Eastern Indo-Aryan languages. It is believed to have evolved from dialects of Vedic and Sanskrit. Modern Bengali developed during the 19th and early 20th centuries based on the dialect spoken in the west-central region of Nadia. Bengali is an example of diglossia, where the literary and standard form of the language differs greatly from the colloquially spoken form. Prior to the 1730s, there was no documented Bengali grammar; it was a Portuguese missionary, Manuel da Assumpção, who wrote the first Bengali dictionary/grammar. In the mid-20th century, around the time the Indian state of East Bengal was seeking independence as the country of Bangladesh, the Bengali Language Movement coalesced around the notion of a distinct national language. 

Geographic Distribution

Bengali is native to the region of Bengal, which comprises the present-day nation of Bangladesh and all or part of three Indian states, where it is the official language: West Bengal, Tripura, and southern Assam. The language is also spoken by a majority of the population of the Indian union territory of Andaman and Nicobar Islands. Additionally, there are significant Bengali-speaking communities in the Middle East, Japan, the United States, Singapore, Malaysia, Maldives, Australia, Canada, and the United Kingdom.

There are several dialects, the largest four of which—Rarh, Banga, Kamarupa and Varendra—correspond to the four principal geographic regions where the language is widely spoken. The most prevalent is the south-western dialect of Rarh, which forms the basis of standard colloquial Bengali.

Prominence in Society

Bengali is the language of government, education, and the media in Bangladesh. Bengali is one of 23 official languages recognized by the Republic of India.

Unique Characteristics

Bengali script is an abugida, in which consonants are letters and vowels are indicated by diacritics. An "inherent" vowel (অ ô) is assumed for consonants where no vowel is marked.

There are no plural nouns in Bengali. Rather, a “measure word” is inserted between the number and the noun to indicate more than one. For example, “nine cows” is written নয়টা গরু (Nôy-a goru) where a is the measure word. The phrase “four or five teachers” is written চার-পাঁচজন শিক্ষক (Car-pãc-jôn shikkhôk).

About two-thirds of Bengali words are tadbhavas, or native words. One-quarter of words in common usage are borrowed from Sanskrit while the remainder are loanwords.

Bengali is a head-final language with subject–object–verb word order. It uses postpositions, as opposed to the English language’s prepositions. Determiners follow the noun, while numerals, adjectives, and possessors precede the noun.

Loanwords in English

There are many English words that share common Sanskrit origins with Bengali. Most of these words, which typically entered English during the long period of British rule over India and what is now Bangladesh, have altered meanings that are not consistent with the original Bengali or Sanskrit definition. For example, the word “cheetah” in English refers to a large, spotted jungle cat but in Sanskrit चित्रस (chitra-s) means "uniquely marked."

  • ashram (from आश्रम, āśrama; a religious hermitage)
  • bandana (from बन्धन, bandhana, "a bond")
  • dinghy (दिन्गी,  "a tiny boat")
  • jute (from পাট, jhuto, "twisted hair")
  • mantra (मन्त्र, “a holy message or text")
  • nirvana (निर्वाण, "extinction, blowing out")
  • yoga (योग, "yoke, union")
Say Whaaat?

The national anthems of Bangladesh, India, and Sri Lanka, and the national song of India, were all first composed in Bengali.

February 21 is a national holiday in Bangladesh, marking the anniversary of the killing of several students at the University of Dhaka who were protesting to recognize Bengali as the national language. Bengali is likely the only language on earth with a national holiday, Language Movement Day, named for its proponents and defenders.

One of the ways that Bengali grammar is similar to Romani is the lack of a present-tense form of the verb “to be.” The sentence “He is the teacher” is written as সে শিক্ষক (se shikkhôk), literally translated as "he teacher."

Writer: Bruce Falstein

Bengali Quick Facts

Origin

Bengal region

Native Speakers

210–215 million

Second-language Speakers

20–30 million

Official Language

 Assam (India)

 Bangladesh

 India

 Jharkhand (India)

 Sierra Leone

 Tripura (India)

 West Bengal (India)

Recognized Language

 Andaman and Nicobar Islands

 Karachi (Pakistan)

Language Family

Indo-European

  • Indo-Iranian
    • Indo-Aryan
      • Eastern Zone (Magadhan)
        • Bengali-Assamese
          • Bengali

Standard Form

Standard Bengali

Dialects

Banga

Kamarupa

Rarh

Varendra

Script

Bengali script

Alphabet

Bengali alphabet

Regulated by

Bengla Academy (Bangladesh)

Paschimbanga Bangla Akademi (West Bengal)

ISO Codes

ISO 639-1 (bn)

ISO 639-2 (ben)

ISO 639-3 (ben)

Punjabi /پنجابی / ਪੰਜਾਬੀ

History and Evolution

Punjabi is an Indo-Aryan language of the Indo-Iranian branch of the Indo European family. It evolved from the medieval Shauraseni language of northern India and emerged as an independent language by the 11th century CE. While the first literature written in Punjabi can be traced to the Sufi poet Fariduddin Ganjshakar in the late 12th century, Punjabi literature flourished with the invention of the Gurmukhī alphabet in the early 17th century.

For most of their history, Punjabi speakers lived under foreign domination, including the Persinate Ghaznavid dynasty and Mughal Empire during the medieval and early modern periods. Accordingly, Persian left a mark on Punjabi vocabulary and pronunciation. When the region came under British rule in the 19th century, English became the official language of administration.

With the Partition of British India in 1947, Punjabi was effectively divided into two main dialects: Western Punjabi in Pakistan and Eastern Punjabi in India. The cultural and religious backgrounds of speakers from these two countries also led to the institution of two different orthographic systems. Pakistan, with a Muslim majority, uses the the Persian-Arabic Shahmukhī script, while India, with a Hindu majority, uses the Sanskrit-derived Gurumukhī script. Punjabi is thus one of the only languages in the world to be written with two significantly different, mutually unintelligible orthographies.

Geographic Distribution

With 130 million native speakers, Punjabi is the ninth most commonly spoken language in the world. It is the most widely spoken language in Pakistan, where approximately 70 percent of the population speaks Punjabi. It is also the third most commonly spoken language in India, and is an official language in the states of Punjab, Delhi, Haryana, and Chandigarh.

Due to historical immigration, the 10 million members of the Punjab diaspora live largely in Canada, England, the United States, and Australia, as well as in the United Arab Emirates, Saudi Arabia, and Hong Kong. Due to their significant Punjab populations, Punjabi is the third most commonly spoken native language in Canada and the fourth in England.

Majhi is considered to be a prestige dialect and serves as the standard in both Pakistan and India. There are numerous dialects throughout Pakistan, including Doabi, Malwai, and Powadhi. In India, the major dialects include Pothohari, Lahndi, and Multani. Linguists have identified at least 32 distinct Punjabi dialects, which are mutually intelligible despite differences in vocabulary and pronunciation.

Prominence in Society

Punjabi is used in government, business, media, and education in the Punjab region of India and Pakistan. It is also the preferred language among followers of Sikhism, a monotheistic religion that originated in the Punjab region.

Unique Characteristics

Punjabi has two alphabets: the Persian-Arabic Shahmukhī script is written from right to left, and the Gurumukhī script is written from left to right. In addition to this, the language is often transliterated into the Latin alphabet.

Punjabi is the only tonal Indo-European language. Like other Indo-European languages, nouns are inflected (changed) to reflect number, gender, and case. The language lacks definite and indefinite articles (such as “the,” “a,” and “an”).

Sentences typically follow the subject-object-verb construction: Mainū pajābī dē nāla gala (literally, “I Punjabi speak”). Punjabi distinguishes between formal and informal modes of address; tū̃ (you) is used in informal settings, whereas tusī̃ (you) is used as a marker of respect.

Loanwords in English

Few Punjabi words have migrated into English. One exception is the term bhangra, a popular type of music combining traditional Punjabi folk music with western pop music.

Say Whaaat?

Located at the confluence of five tributaries of the Indus river, the region was named Punjab from the Persian panj āb (five waters).

The name of the Perso-Arabic script, Shahmukhi, means “from the mouth of the kings” and that of the Sanskript-based script, Gurmukhī, means “from the mouth of the gurus.”

Writer: Carly Ottenbreit

Punjabi Quick Facts

Origin

Punjab region

Native Speakers

100–130 million

Second-language Speakers

15–30 million

Official Language

 India

Recognized Language

 Pakistan

Language Family

Indo-European

  • Indo-Iranian
    • Indo-Aryan
      • Northwestern
        • Punjabi

Standard Forms

Standard Punjabi

Hindko

Saraiki

Dialects

Majhi

Lahndi

Shahpuri

Jhangochi/Changvi

Jangli/Rachnavi

Malwai

Doabi

Pwadshi

Pothohari/Pahari-Potowari-Panjistani

Multani

Kohati

Peshaweri

Hindko

Derawali

Ghebi

Riasti

Chhachi

Jandali

Thalochi/Thali

Dhani

Jafri/Khetrani

Chenavari

Videsi

Script

Gurmukhi script

Shahmukhi script

Alphabet

Gurmukhi alphabet

Shahmukhi alphabet

Regulated by

N/A

ISO Codes

ISO 639-1 (pa)

ISO 639-2 (pan)

ISO 639-3 (pan)

Tamil / தமிழ்

History and Evolution

Tamil is one of the oldest classical languages in the world still in use today. Its literary tradition dates back more than 2,000 years; inscriptions in the language have been found dating to approximately 500 BCE, and parts of its earliest known grammar, the Tolkāppiyam, were written as early as 300 BCE. The language is a descendant of Proto-Dravidian, which was prevalent around 3000 BCE near the Godavari River Basin in India.

Tamil is remarkable among related Dravidian languages because of its expansion beyond India. Tamil speakers were accomplished seafarers and traders, and inscriptions in the language have been found as far afield as Sumatra and Thailand. Tamil contains words derived from Classical Hebrew, such as tokai (peacock) and akil (Aquila, a type of wood), and from Chinese, such as ciini (sugar) due to its extensive trade contacts.

Tamil was the first Indian language to be published in printed form when Christian missionaries from Portugal published a prayer book in Tamil in 1578. Since then, Tamil has been influenced by European languages and conventions. For example, modern Tamil uses European-style punctuation, and has adopted a more rigid system of word order and complex sentence structures based on English and other languages. 

Throughout its long history, Tamil was also heavily influenced by Sanskrit, which was used as a lingua franca among Indian scholars, poets, and artists for many centuries. In the 20th century, the Pure Tamil Movement arose, demanding the eradication of Sanskrit loanwords and other foreign influences from the language. Because of this, many Sanskrit-influenced words were replaced by Tamil synonyms.

Geographic Distribution

Approximately 77 million people speak Tamil. It is most prominent in Sri Lanka and the Indian state of Tamil Nadu, but is also prevalent in Malaysia, Singapore, the Philippines, Mauritius, and Indonesia, among other countries. An influential population of Tamil speakers lives in Karachi, Pakistan. Immigrant groups have brought it as far afield as England, Canada, South Africa, Germany, Fiji, and the United States.

There are numerous dialects of Tamil, found throughout Tamil Nadu, Sri Lanka, and in other areas of the world. Different dialects are spoken based on caste membership as well as location. There are also different literary forms of Tamil: a classical style called Sankattamiḻ, based on Old Tamil; a modern formal style called Centamiḻ; and an informal, colloquial form called Koṭuntamiḻ. Centamiḻ is usually used in contexts such as in public speaking, in textbooks, and in modern Tamil literature. Most of Tamil-language television, radio, theater, film, and other forms of entertainment are in Koṭuntamiḻ.

Prominence in Society

Tamil is considered a sacred language in Vaishnavism and Shaivism, two branches of the Hindu religion, as well as in the Ayyavazhi religion. It is also prominent in the media; there are approximately 1,863 Tamil-language newspapers.

Unique Characteristics

Tamil was once written in the Braahmi script; modern Tamil uses the Grantha script. While the script includes symbols that indicate numbers and fractions, Arabic numerals are more commonly used.

The Tamil language relies heavily on attaching affixes—usually suffixes—to word roots to indicate the part of speech, meaning, tense, number, person, and other qualities. A word root can have any number of suffixes attached; this means long words with extensive suffixes are common. For example, the word போகமுடியாதவர்களுக்காக, pronounced pōkamuṭiyātavarkaḷukkāka, means “For the sake of those who cannot go.”  

Tamil sentences do not all have subjects, verbs, and objects. Grammatically correct sentences can consist of only one word, such as muṭintuviṭṭatu (completed) or have a subject and object with no verb, such as atu eṉ vīṭu (that my house).

Loanwords in English

While few words have migrated from Tamil to English unchanged, there are a number of English words that are derived from Tamil. These include:

  • cheroot, from churuttu (a cigar with both ends clipped)
  • mango, from mangai
  • curry, from kari
Say Whaaat?

Ancient Tamil speakers once used palm leaves as paper. Because of this, the letters of the Tamil script are mostly curved so writers would not rip the delicate leaves during writing.

 

Writer: Jennifer Williamson

Tamil Quick Facts

Origin

India

Native Speakers

69–72 million

Second-lanuage Speakers

8–9 million

Official Language

 Singapore

 Sri Lanka

 Puducherry (India)

 Tamil Nadu (India)

Recognized Language

 Canada

 Kerala (India)

 Malaysia

 Mauritius

 Réunion

 Seychelles

Language Family

Dravidian

  • Southern
    • Tamil-Kannada
      • Tamil-Kodagu
        • Tamil-Malayalam
          • Tamil languages
            • Tamil

Standard Form

Standard Koṭuntamiḻ

Dialects

Regional and local dialects

Script

Tamil script

Arwi script

Alphabet

Tamil alphabet

Arwi alphabet

Regulated by

N/A

ISO Codes

ISO 639-1 (ta)

ISO 639-2 (tam)

ISO 639-3 (tam)

ISO 639-3 (oty)

ISO 639-3 (ptq)

Urdu / اُردُو

History and Evolution

Urdu evolved from the Shauraseni language during  the 6th to 13th centuries CE. Approximately 99 percent of Urdu verbs are based on Prakrit and Sanskrit roots, but the language has also been heavily influenced by Arabic and Persian.

From the 1200s, the Urdu language was referred to as “Hindustani.” The term “Urdu” was first used in 1780, by the poet Ghulam Hamadani Mushafi. In 1837, Urdu was made an official language of British-controlled India, and use of the language was encouraged by the British in order to reduce the influence of Persian.

Up until this period, Urdu was written in the Persian script. When Urdu was made an official language, conflict between Hindu and Muslim speakers resulted in a linguistic divide of “Urdu” and “Hindi.” Later, that divide was deepened when British-controlled India split into modern-day Pakistan and India after British withdrawal.

Today, the Hindi and Urdu languages differ in their formal vocabulary, reflecting different religious and regional allegiances.

Geographic Distribution

Approximately 60 to 70 million people speak Urdu, with about 52 million in India and 10 million in Pakistan. In addition, expatriate Urdu-speaking communities exist in the United States, the United Kingdom, Bangladesh, and Saudi Arabia.

There are many local dialects of Urdu. In Pakistan, the language has evolved to include loanwords from Punjabi, Sindhi, Balti, and Pashto, as well as other regional languages. By contrast, Indian Urdu has its own dialects, including Kakhini in the south and Khariboli in Punjab.

A combination of English and Urdu, called Urdish, is spoken mainly in Pakistan’s urban areas.

Prominence in Society

Urdu is an important lingua franca in Pakistan, where approximately 93 percent of people speak another regional language as their first language. Because of this, Urdu is a mandatory school subject through secondary school. Millions of people in Pakistan speak Urdu as a second or third language, but only about 7 percent of Pakistanis—mostly Muslim immigrants from South Asia—speak it as a first language.

In India, Urdu is prevalent primarily among Muslim communities and in cities that used to be the capital of Muslim Caliphates. Many schools and madrasahs teach Urdu as a subject.

There are many newspapers and other media publications in Urdu, both in India and in Pakistan.

Unique Characteristics

Urdu has multiple levels of politeness. There is a casual register, sometimes called the rekhtah (rough mixture); and a formal register, referred to as zabān-i Urdū-yi muʿallá (the language of the exalted camp). The name of the formal register may be a reference to the mode of speech within the Imperial army.

Unlike many other languages with casual and formal modes of address, the levels of formality in Urdu are not limited to direct address, but also affect nouns and other words. For example, the words پانی (pānī) and آب (āb) both mean “water,” but the first is casual and the second is formal. Generally, ancient Sanskrit and Indic-derived words are considered “rougher,” while Arabic and Persian words have the connotation of formality.

Verbs can be conjugated to reflect many different levels of politeness in Urdu. For example, the verb bolnā (to speak) can be conjugated six different ways in the imperative alone to reflect varying levels of formality:

  • !تُو] بول] [tū] bol! (“speak”; insulting or very casual)
  • تُم] بولو۔] [tum] bolo (“speak”; relaxed and intimate)
  • آپ] بولو۔] [āp] bolo (“speak”; intimate, yet polite)
  • آپ] بولیں۔] [āp] boleṉ (“speak”; intimate, yet formal)
  • آپ] بولئے۔] [āp] boli'e. (“speak”; formal and polite)
  • آپ] فرمائیے۔] [āp] farmā'iye. (“speak”; extremely formal)

Nouns in Urdu also have several layers of politeness. For example, the phrase “his mother” can be said us kī wālidah (polite), us kī wālidah-yi muḥtarmah (very polite), us kī ammī (casual), or us kī māṉ (very casual, bordering on insulting).

Urdu has a number of different writing systems. Urdu script is read right to left. The Devanagari script is more prevalent in India, and the Latin script is sometimes used as well—in modern times, it is most prevalent in text messaging and online communications.

Loanwords in English

There are a number of Urdu loanwords in English. These include:

  • cushy (from khushi; happiness, ease)
  • cummerbund (from kamerband; a waist binding)
  • chutney (from chatni; to crush)
  • jungle (from jangal; an area of dense foliage)
  • verandah (from bar’aamdah; a roofed porch)
Say Whaaat?

Written Urdu is so difficult to typeset that most Urdu-language newspapers were entirely handwritten up until the 1980s. One daily newspaper in Chennai, The Musalman, is still published in handwritten form.

To many South Asians, Urdu has a grand, formalized quality that gives an impression of aristocracy. This may be due to its many levels of politeness.

 

Writer: Jennifer Williamson

Urdu Quick Facts

Origin

North India

Native Speakers

65–70 million

Second-language Speakers

94–96 million

Official Language

 India

 Pakistan

Recognized Language

N/A

Language Family

Indo-European

  • Indo-Iranian
    • Indo-Aryan
      • Central Zone (Hindi)
        • Western Hindi
          • Hindustani
            • Khariboli
              • Urdu

Standard Form

Modern Standard Urdu

Dialects

Dakhni

Rekhta

Modern Vernacular Urdu

Regional dialects

Script

Devanagari script

Perso-Arabic script

Alphabet

Urdu alphabet

Regulated by

National Council for Promotion of Urdu Language

National Language Authority

ISO Codes

ISO 639-1 (ur)

ISO 639-2 (urd)

ISO 639-3 (urd)