The 7th Workshop on Open-Source Arabic Corpora and Processing Tools (OSACT7) with 5 Shared Tasks (Hybrid)
Palma, Mallorca (Spain)
11-16 May 2026.
Co-located with LREC 2026
The Open-Source Arabic Corpora and Processing Tools (OSACT) workshop series provides a forum for researchers, practitioners, and students in computational linguistics (CL), natural language processing (NLP), and information retrieval (IR) to share and discuss ongoing work on Arabic language resources and technologies. While Arabic remains comparatively resource-poor in relation to English, recent years have seen the emergence of large, freely available classical and Modern Standard Arabic (MSA) corpora, as well as dialectical corpora and processing tools.
Now in its seventh edition, OSACT7 takes an important step forward by celebrating this milestone with seven shared tasks, each addressing timely challenges in Arabic NLP and reflecting broader themes relevant to NLP research in general. OSACT7 builds on its long-standing commitment to open-source contributions that advance accessibility, reproducibility, and fairness, and this year it places inclusivity at the heart of its mission. A key focus is to recognize and support minority dialects and underrepresented varieties of Arabic, ensuring that diverse linguistic voices and resources are not only acknowledged but actively valued within the community.
The workshop will cover general topics in CL, NLP, and IR, with special emphasis on Large Language Models (LLMs) and Generative AI, including pre-trained Arabic language models, corpus design and evaluation, and annotated corpora for tasks such as named entity recognition, machine translation, sentiment analysis, and text classification. Additional areas of focus include crowdsourcing for data annotation, tools for language education, tokenisation, normalisation, morphological analysis, part-of-speech tagging, dialect identification and translation, fake news detection, and web and social media analytics. Methodologies for resource creation and annotation, knowledge extraction, ontologies, terminology, knowledge representation, and integration with the Semantic Web (e.g. Linked Data, Knowledge Graphs) will also be explored.