International Academic Journal of Science and Engineering

  • ISSN 2454-3896

An Efficient Set of Parts of Speech in Persian Information Retrieval Systems

Mohammad-Ali Yaghoub-Zadeh-Fard,Saeed Rahmani,Omid Kashefi and Behrouz Minaei-Bidgoli

Abstract: Even though the ultimate aim of any information retrieval system is to fulfil its users’ expectations, reducing index storage size and enhancing the system performance are sometimes infinitely preferable, especially for small-sized companies suffering from a lack of hardware resources. For such companies, it is of paramount importance to remove noninfomative terms from their indices. Selecting a proper set of terms makes it possible to reduce the index storage size and consequently enhance the retrieval performance. In this paper, using parts of speech tagging, we show how to reduce the index storage size without losing precision. Through an experimental process and using Hamshahri corpora, we identify the most effective parts of speech in Persian language. Results demonstrate improvements in the response time and precision of the retrieval

Keywords: Information Retrieval, Natural Language Processing, Part of Speech, Index storage reduction, Term selection, Stop-word detection

Page: 63-72

Volume 2, Issue 2, 2015