Comparing the Effectiveness of the Improved ARLSTem Algorithm with Existing Arabic Light Stemmers


As known in the literature, light stemmers produce more under-stemming errors, while root stemmers produce more over-stemming errors. In this investigation, we deal with the Arabic light stemming problem, where we propose an improvement to ARLSTem algorithm (i.e. ARLSTem v1.1). In particular, we introduce new rules to correct some under-stemming errors produced by ARLSTem. In addition, we compare the new version of ARLSTem with five existing stemming algorithms using ARASTEM corpus. The latter has been corrected, where we have found some errors in seven samples. The experimental results showed that ARLSTem v1.1 outperforms the other existing algorithms in terms of under-stemming and over-stemming errors. Moreover, it presents interesting performances in the Arabic text categorization task

International Conference on Theoretical and Applicative Aspects of Computer Science