Topic Identification of Noisy Arabic Texts Using Graph Approaches

Abstract

This paper deals with the problem of automatic topic identification of noisy Arabic texts. Actually, there exist several works in this field based on statistical and machine learning approaches for different text categories. Unfortunately, most of the proposed methods are effective in clean and long texts. In this research work, we use an in-house dataset of noisy Arabic texts, which are collected from several Arabic discussion forums related to 6 topics. In this investigation, we propose a graph approach called LIGA for topic identification task. This approach was firstly introduced for language identification field. Moreover, we propose two other extensions in order to enhance LIGA performances. The experiments undergone on the Arabic dataset have shown quite interesting performances, reaching about 98% of accuracy.

Publication
International Workshop on Text-based Information Retrieval (TIR-DEXA)
Date