Using Web Scraping for Automatic Generation of Structured Arabic Lexicon

doi:10.52940/ijici.v2i2.50

pdf

Abstract

Technological development develops every second increasing text data, especially the Arabic texts on the internet. These Arabic data are massive but it is not useful for use because it is unstructured data and it can’t be used for natural language processing (NLP) and its applications. The increase of Arabic language texts on the Internet has led to an increase in Arabic lexicon web pages but it is not ready for use by NLP applications because it is semi-structured or even unstructured lexicons. The method used in this study is web scraping for scrap data from the internet and converting data from unstructured to structured data. This study aims to build an automatic structured Arabic lexicon ready for NLP and its applications using web scraping. which increases the opportunity to use the Arabic language more widely, which is of great importance in natural language processing applications.

pdf

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.