The Development of Nominal Synsets for the Saraiki Language: A Corpus-based Analysis

Madya Asgher; Musarrat Azher

doi:10.52015/numljci.v23iI.291

PDF

Published: Jun 28, 2025

DOI: https://doi.org/10.52015/numljci.v23iI.291

Keywords:

Lexical Relations Nouns Categorization Saraiki language, Saraiki Nominal Synsets, Antconc, NLP, Corpus, WordNet

Madya Asgher

Lecturer in English at the University of Management and Technology, Sialkot Campus, Pakistan. madya.asghar@skt.umt.edu.pk

https://orcid.org/0000-0002-6109-5969

Musarrat Azher

A Fulbright Alumna, currently working as a professor at Government Sadiq College for Women University, Bahawalpur, Pakistan. musarratazher@gmail.com

https://orcid.org/0000-0002-6720-1259

Abstract

This paper focuses on developing nominal synsets for the Saraiki language (SL), a lesser-studied language spoken in Pakistan. Nominal synsets are groups of nouns that share semantic characteristics and are crucial for natural language processing tasks such as information retrieval, machine translation, and text classification. The research aims to create Saraiki Nominal Synsets (SNS) using the Gurumukhi Punjabi WordNet. The study employs a hybrid approach, combining merge and expansion techniques for analysis and gathers data from PDF textbooks, online sources, and the Saraiki Wikimedia incubator. The collected data is limited to texts published between 2000 and 2019, and manually tagged using Antconc 3.4.4.0 wordlist due to the unavailability of a tagger for the Saraiki Language. The study builds a 2.2 million Saraiki word corpus and a list of 750 nouns, then categorizes and semantically organizes the Saraiki Nominal Synsets based on the list of Saraiki nouns. To identify and classify nouns in SL based on their semantic properties, a corpus-based approach is utilized, and nominal synsets are constructed using a combination of manual and automatic methods. Evaluating the quality of the synsets involves comparing them to existing lexical resources and conducting a semantic similarity analysis. The results demonstrate the effectiveness of the approach in capturing semantic relations among nouns in SL and producing synsets useful for various NLP applications. Overall, this study contributes to the development of linguistic resources for lesser-studied languages and provides valuable support for researchers and developers working on natural language processing tasks involving SL.

Conflict of Interest: The authors declare that there are no conflicts of interest related to the research, authorship, and/or publication of this article, and that the data presented have not been fabricated or falsified.

Funding: This research did not receive any specific grant or financial support from public, commercial, or not-for profit funding agencies.

Participant Consent: The authors confirm that Informed consent was obtained from all participants, and confidentiality was duly maintained.

Data Fabrication/Falsification Statement: The authors declare that no data have been fabricated, falsified, or manipulated in this study.

Downloads

Download data is not yet available.

How to Cite

Asgher, M., & Azher, M. (2025). The Development of Nominal Synsets for the Saraiki Language: A Corpus-based Analysis. NUML Journal of Critical Inquiry, 23(I), 14–35. https://doi.org/10.52015/numljci.v23iI.291