Ethnicity/Caste/Social Group Composition of Population of Nepal According to AI

READING TIME: 3 minutes

Analyzing 2021 Caste and Ethnicity Data with AI

Coming from a science background, I love data. Returning to Nepal in 2013, after spending most of the preceding twenty-five years abroad, I realized I didn’t really know a lot about the country. One way I decided to get to know the country better was through numbers. So, I set about analyzing data, and a major source for this was the census reports.

Early on, something became apparent about the country. Old hill so-called high-caste Hindu men had disproportionate role in many spheres of life in the country. Also referred to as Khas-arya, they mostly controlled and shaped cultural, social, economic, and political institutions. The more data I saw across different structures, the more this became obvious, leading me to analyze political structures. What became clear was that Nepal’s long history of casteism had created a devastating legacy. This legacy included structural discrimination and systemic casteism. To demonstrate this, I analyzed structures.

Following the publication of the 2021 Census Data in March 2023, I needed most recent demographic information disaggregated by ethnicity, caste, and social group. Specifically, I required data on the nation and its different Provinces. I also needed data on Ecological Zones. Enter AI!

I fed the caste/ethnicity data to various AI models: ChatGPT, DeepSeek, Gemini, and Grok 3. The AIs consistently made small mistakes, notably assigning some groups to two categories and errors in the math. For the national data, ChatGPT provided the most reliable results. Here’s how it classified the different groups:

Ethnicity caste social group 2021

ChatGPT’s Classification of National Groups

Khas-Arya: Kshetri, Brahman – Hill, Thakuri, Sanyasi/Dasnami, Dev
Janajati: Magar, Tamang, Rai, Gurung, Sherpa, Yakthung/Limbu, Gharti/Bhujel, Majhi, Kumal, Sunuwar, Chepang/Praja, Danuwar, Santhal, Ghale, Kulung, Thami, Dhimal, Khawas, Tajpuriya, Darai, Yakkha, Bhote, Bantawa, Chamling, Chhantyal/Chhantel, Thakali, Bote, Pun, Hyolmo/Yholmopa, Yamphu, Baram / Baramu, Nachhiring, Bahing, Thulung, Jirel, Khaling, Aathpahariya, Dolpo, Sarbaria, Mewahang, Byasi/Sauka, Dura, Meche, Raji, Sampang, Chai/Khulaut, Chumba/Nubri, Hayu, Loharung, Mugal/Mugum, Karmarong, Kisan, Lhopa, Topkegola, Raute, Walung, Lhomi, Surel, Kusunda, Bankariya, Mijar
Newar: Newa: (Newar), Baniyan, Kayastha, Kathabaniyan
Madhesi: Yadav, Teli, Koiri/Kushwaha, Kurmi, Musahar, Dhanuk, Dusadh/Pasawan/Pasi, Brahman – Tarai, Mallaha, Kewat, Kanu, Hajam/Thakur, Kalwar, Rajbansi, Haluwai, Baraee, Kahar, Rajput, Amat, Gangai, Lodh, Gaderi/Bhediyar, Bhumihar, Rajbhar, Rauniyar, Kori, Dom, Mali, Rajdhob, Dhunia, Bangali, Oraon/Kudukh, Chidimar, Kalar, Pattharkatta/ Kushwadiya, Halkhor, Natuwa, Kewarat, Beldar, Kamar, Dhandi,
Dalit:: Bishwokarma, Pariyar, Chamar/Harijan/Ram, Musahar, Tatma/Tatwa, Khatwe, Nuniya, Sundi, Dhobi, Lohar, Bin, Kumhar, Sonar, Dom, Kori, Khatik, Halkhor, Natuwa, Chidimar, Bantar/Sardar, Baraee, Kalar
Tharu: Tharu, Ranatharu, Badi
Muslim: Musalman
Others: Categories from the census data table itself (Foreigner, Not stated, Others), included those that, as far as I was concerned, can’t be classified, namely Punjabi/Sikh, and Marwadi

As you can see, there are some errors, such as the repetition of “Kalar.” This is one reason I focused on reporting and using percentages rounded to zero decimal places!

For the Province data, after several attempts and refinements, DeepSeek provided the most reliable results. Here it is:

Ethnicity caste social group by Province 2021

Again, I’ve rounded off the numbers to zero decimal places. As for the ethnicity, caste, and social group composition of the three different Ecological Zones–Himalaya, Hill, and Terrai–they are unavailable.

I have categorized the Newars separately from other Janajati groups because they occupy a distinct position. Their historical ties as the indigenous community of the Kathmandu Valley, the long-standing location of Nepal’s capital, contribute to a different set of circumstances compared to other Janajati communities across the country. That’s been amply demonstrated by others’ work as well as by data I myself have analyzed.

My short initial explorations highlight the complexities of categorizing Nepal’s diverse social landscape and the challenges AI faces with such nuanced data. What are your thoughts on these initial findings?

2011 Composition

For your information, here’s what they looked like.

ethnic composition by province

What do you think?

(Visited 3 times, 1 visits today)

Facebook Comments (see farther below for other comments)

comments

Don't leave me hanging...say something....