The 7th Workshop on Open-Source Arabic Corpora and Processing Tools (OSACT7) with 7 Shared Tasks (Hybrid)
with Shared Tasks on Arabic LLMs Hallucination and Dialect to MSA Machine Translation
Palma, Mallorca (Spain)
11-16 May 2026.
Co-located with LREC-COLING 2026
The Open-Source Arabic Corpora and Processing Tools (OSACT) workshop series provides a forum for researchers, practitioners, and students in computational linguistics (CL), natural language processing (NLP), and information retrieval (IR) to share and discuss ongoing work on Arabic language resources and technologies. While Arabic remains comparatively resource-poor in relation to English, recent years have seen the emergence of large, freely available classical and Modern Standard Arabic (MSA) corpora, as well as dialectical corpora and processing tools.
Now in its seventh edition, OSACT7 takes an important step forward by celebrating this milestone with seven shared tasks, each addressing timely challenges in Arabic NLP and reflecting broader themes relevant to NLP research in general. OSACT7 builds on its long-standing commitment to open-source contributions that advance accessibility, reproducibility, and fairness, and this year it places inclusivity at the heart of its mission. A key focus is to recognize and support minority dialects and underrepresented varieties of Arabic, ensuring that diverse linguistic voices and resources are not only acknowledged but actively valued within the community.
The workshop will cover general topics in CL, NLP, and IR, with special emphasis on Large Language Models (LLMs) and Generative AI, including pre-trained Arabic language models, corpus design and evaluation, and annotated corpora for tasks such as named entity recognition, machine translation, sentiment analysis, and text classification. Additional areas of focus include crowdsourcing for data annotation, tools for language education, tokenisation, normalisation, morphological analysis, part-of-speech tagging, dialect identification and translation, fake news detection, and web and social media analytics. Methodologies for resource creation and annotation, knowledge extraction, ontologies, terminology, knowledge representation, and integration with the Semantic Web (e.g. Linked Data, Knowledge Graphs) will also be explored.
OSACT6 invites high-quality submissions written in English. Submissions of two forms of papers will be considered: * Excluding any number of additional pages for references, ethical consideration, conflict-of-interest, as well as data,
and code availability statements. Upon acceptance, final versions of long papers will be given one additional page – up to nine (9) pages of content plus
unlimited pages for acknowledgments and references – so that reviewers’ comments can be taken into account. Final versions
of short papers may have up to five (5) pages, plus unlimited pages for acknowledgments and references. For both long and
short papers, all figures and tables that are part of the main text must fit within these page limits. Furthermore, appendices or supplementary material will also be allowed ONLY in the final, camera-ready version, but not
during submission, as papers should be reviewed without the need to refer to any supplementary materials. Linguistic examples, if any, should be presented in the original language but also glossed into English to allow
accessibility for a broader audience. Note that paper types are decisions made orthogonal to the eventual, final form of presentation (i.e., oral versus
poster).
We follow the LREC 2026 standards for submission format and guidelines.