Workshop Description


The Open-Source Arabic Corpora and Processing Tools (OSACT) workshop series provides a forum for researchers, practitioners, and students in computational linguistics (CL), natural language processing (NLP), and information retrieval (IR) to share and discuss ongoing work on Arabic language resources and technologies. While Arabic remains comparatively resource-poor in relation to English, recent years have seen the emergence of large, freely available classical and Modern Standard Arabic (MSA) corpora, as well as dialectical corpora and processing tools.

Now in its seventh edition, OSACT7 takes an important step forward by celebrating this milestone with seven shared tasks, each addressing timely challenges in Arabic NLP and reflecting broader themes relevant to NLP research in general. OSACT7 builds on its long-standing commitment to open-source contributions that advance accessibility, reproducibility, and fairness, and this year it places inclusivity at the heart of its mission. A key focus is to recognize and support minority dialects and underrepresented varieties of Arabic, ensuring that diverse linguistic voices and resources are not only acknowledged but actively valued within the community.

The workshop will cover general topics in CL, NLP, and IR, with special emphasis on Large Language Models (LLMs) and Generative AI, including pre-trained Arabic language models, corpus design and evaluation, and annotated corpora for tasks such as named entity recognition, machine translation, sentiment analysis, and text classification. Additional areas of focus include crowdsourcing for data annotation, tools for language education, tokenisation, normalisation, morphological analysis, part-of-speech tagging, dialect identification and translation, fake news detection, and web and social media analytics. Methodologies for resource creation and annotation, knowledge extraction, ontologies, terminology, knowledge representation, and integration with the Semantic Web (e.g. Linked Data, Knowledge Graphs) will also be explored.

Shared Tasks

OSACT7 will host seven shared tasks, each organised by a dedicated team:

  1. QIAS 2026: Questions & Answers in Islamic Studies Assessment (Hamad Bin Khalifa University & Nazarbayev University).
  2. Nahw: Grammar Error Detection & Correction (Qatar Computing Research Institute).
  3. Arabic Politeness Detection (King Saud University, Hail & Cadi Ayyad Universities).
  4. SASA 2026: Sentiment Analysis and Swapping in Arabic (KFUPM, Lancaster University & VinUniversity).
  5. Arabic Humour Generation (King Saud University).
  6. KSAA 2026: Arabic Speech Dictation with Automatic Diacritisation (King Salman Academy and Princess Nourah University).
  7. FITRA 2026: Intuitive Thinking and Reasoning in Pan-Arab Commonsense (University of Luxembourg, UM6P, KFUPM, Sorbonne, Lancaster, Ibn Tofail Universities).

Workshop Topics

Language Resources:

  • Pre-trained Arabic language models.
  • Surveys and evaluations of existing Arabic corpora and their associated processing tools.
  • Development and release of new annotated corpora for NLP and IR tasks such as named entity recognition, machine translation, sentiment analysis, text classification, and language learning.
  • Assessing the effectiveness of crowdsourcing platforms for Arabic data annotation.
  • Arabic text and speech processing toolkits.

Tools and Technologies:

  • Language education, including first (L1) and second (L2) language learning applications.
  • Pre-training & fine-tuning approaches for Arabic.
  • Tokenisation, normalisation, segmentation, morphology, and POS tagging.
  • Sentiment analysis, dialect ID, \& classification.
  • Web and social media analytics.
  • Arabic LRs for text, speech, sign, gesture, image, & multimodal data.
  • Best practices for LR interoperability.
  • Construction and annotation of LRs.
  • Knowledge extraction, acquisition, and representation.
  • Ontologies, terminology, and frameworks.
  • LRs and the Semantic Web (Linked Data, Knowledge Graphs).
  • Data contamination, synthetic data, and quality issues.

Paper Types and Formats


OSACT6 invites high-quality submissions written in English. Submissions of two forms of papers will be considered:

  1. Regular long papers – up to eight (8) pages maximum*, presenting substantial, original, completed, and unpublished work.
  2. Short papers – up to four (4) pages*, describing a small focused contribution, negative results, system demonstrations, etc.

* Excluding any number of additional pages for references, ethical consideration, conflict-of-interest, as well as data, and code availability statements.

Upon acceptance, final versions of long papers will be given one additional page – up to nine (9) pages of content plus unlimited pages for acknowledgments and references – so that reviewers’ comments can be taken into account. Final versions of short papers may have up to five (5) pages, plus unlimited pages for acknowledgments and references. For both long and short papers, all figures and tables that are part of the main text must fit within these page limits.

Furthermore, appendices or supplementary material will also be allowed ONLY in the final, camera-ready version, but not during submission, as papers should be reviewed without the need to refer to any supplementary materials.

Linguistic examples, if any, should be presented in the original language but also glossed into English to allow accessibility for a broader audience.

Note that paper types are decisions made orthogonal to the eventual, final form of presentation (i.e., oral versus poster).

Important Dates

Important Dates

  • 10 December 2025: 1st CFP
  • 10 January 2026: 2nd CFP
  • 15 January 2026: Training set release
  • 15 February 2026: Blind test set release
  • 1 March 2026: System submission deadline
  • 10 March 2026: Release of results
  • 20 March 2026: Paper submission deadline
  • 15 April 2026: Notification of acceptance
  • 30 April 2026: Camera-ready deadline
  • 11–16 May 2026: LREC 2026 workshops (TBC)

Submission guidelines

We follow the LREC 2026 standards for submission format and guidelines.

  • Long papers: Up to 8 pages, presenting substantial, original, completed, and unpublished work.
  • Short papers: Up to 4 pages, describing small focused contributions, negative results, system demonstrations, etc.
  • Shared task papers: Up to 4 pages, focusing on methods and results from participation in the shared tasks (including system descriptions).

Accepted Papers

Keynote Speaker

Committees

Organizing Committee

  • Hend Al-Khalifa, Professor, King Saud University, Riyadh, Saudi Arabia, hendk@ksu.edu.sa
  • Mo El-Haj, Reader, VinUniversity, Vietnam, Lancaster University, UK, elhaj.m@vinuni.edu.vn
  • Saad Ezzini, Assistant Professor, King Fahd University of Petroleum and Minerals (KFUPM), Saudi Arabia, saad.ezzini@kfupm.edu.sa

Programme Committee

  • Nizar Habash, New York University Abu Dhabi, UAE
  • Wajdi Zaghouani, Hamad Bin Khalifa University, Qatar
  • Wassim El-Hajj, American University of Beirut, Lebanon
  • Irina Temnikova, Qatar Computing Research Institute, Qatar
  • Khaled Shaalan, The British University in Dubai, UAE
  • Fethi Bougares, Université du Maine, Avenue Laënnec, France
  • Hazem Hajj, American University of Beirut, Lebanon
  • Nadi Tomeh, LIPN University of Paris 13, Sorbonne Paris Cité, Paris, France
  • Muhammad Abdul-Mageed, The University of British Columbia, Canada
  • Lamia Hadrich Belguith, University of Sfax, Tunisia
  • Reem Suwaileh, Qatar University, Qatar
  • Maram Hasanain, Qatar University, Qatar
  • Mucahid Kutlu, TOBB University, Turkey
  • Wejdan AlKhaldi, King Saud University, Saudi Arabia
  • Manal Albahlal, King Saud University, Saudi Arabia
  • Fatemah Husain, Kuwait University, Kuwait
  • Mustafa Jarrar, Bir Zeit University, Palestine
  • Nada Ghneim, Higher Institute for Applied Sciences and Technology, Syria
  • Salam Khalifa, Stony Brook University, USA
  • Salima Harrat, École Normale Supérieure (Bouzaréah), Algeria
  • Salima Mdhaffar, Le Mans University, France
  • Maha Alamri, AlBaha University, Saudi Arabia
  • Saied Alshahrani, Clarkson University, USA
  • Lubna Alhenaki, Majmaah University, Saudi Arabia

Workshop Program