Information Retrieval and Identification in Textual Documents


In this Doctorate thesis we will deal with one of the Artificial Intelligence fields, that is the Data Mining field and especially the Text Mining. In this work we take forums as the main purpose of our application. Our work is divided into three parts related to the information retrieval, the first part is the automatic language identification, the second one is theme recognition, and the last part is author recognition or authorship attribution. In this paper discusses the problem of automatic language identification (LID). The result shows that the used algorithm is powerful and an efficient algorithm.

Doctoriales Télécommunications et Traitement de l’information, USTHB