Workshop Description

The Open-Source Arabic Corpora and Processing Tools (OSACT) workshop series provides a forum for researchers, practitioners, and students in computational linguistics (CL), natural language processing (NLP), and information retrieval (IR) to share and discuss ongoing work on Arabic language resources and technologies. While Arabic remains comparatively resource-poor in relation to English, recent years have seen the emergence of large, freely available classical and Modern Standard Arabic (MSA) corpora, as well as dialectical corpora and processing tools.

Now in its seventh edition, OSACT7 takes an important step forward by celebrating this milestone with seven shared tasks, each addressing timely challenges in Arabic NLP and reflecting broader themes relevant to NLP research in general. OSACT7 builds on its long-standing commitment to open-source contributions that advance accessibility, reproducibility, and fairness, and this year it places inclusivity at the heart of its mission. A key focus is to recognize and support minority dialects and underrepresented varieties of Arabic, ensuring that diverse linguistic voices and resources are not only acknowledged but actively valued within the community.

The workshop will cover general topics in CL, NLP, and IR, with special emphasis on Large Language Models (LLMs) and Generative AI, including pre-trained Arabic language models, corpus design and evaluation, and annotated corpora for tasks such as named entity recognition, machine translation, sentiment analysis, and text classification. Additional areas of focus include crowdsourcing for data annotation, tools for language education, tokenisation, normalisation, morphological analysis, part-of-speech tagging, dialect identification and translation, fake news detection, and web and social media analytics. Methodologies for resource creation and annotation, knowledge extraction, ontologies, terminology, knowledge representation, and integration with the Semantic Web (e.g. Linked Data, Knowledge Graphs) will also be explored.

Shared Tasks

OSACT7 will host five shared tasks, each organised by a dedicated team:

QIAS 2026: Questions & Answers in Islamic Studies Assessment
- Organizers: Abdessalam Bouchekif, Samer Rashwani, Mutaz Al-Khatib, Emad Mohamed, Mohammed Ghaly (Hamad Bin Khalifa University & Nazarbayev University)
- For more information, please visit the shared task website: https://sites.google.com/view/qias2026/
- System paper submission link: https://softconf.com/lrec2026/OSACT7/ Choose QIAS 2026 track
AdabEval 2026: Arabic Politeness Detection
- Organizers: Reem Alqifari, Hend Al-Khalifa, Nadia GHEZAIEL HAMMOUDA, Maria BOUNNIT, Hend AlHazmi, Ameera Almasoud, Sharefah AlGhamdi and Noof Alfear (King Saud University, Hail & Cadi Ayyad Universities)
- For more information, please visit the shared task website: https://sites.google.com/view/adabeval2026/home
- System paper submission link: https://softconf.com/lrec2026/OSACT7/ Choose AdabEval 2026 Track
AraSentEval 2026: A Shared Task on Sentiment Analysis and Swapping in Arabic
- Organizers: Saad Ezzini, Paul Rayson, Shadi Abudalfa, Maram Alharbi, Mo El-Haj (KFUPM, Lancaster University & VinUniversity)
- For more information, please visit the shared task website: https://ezzini.github.io/AraSentEval/
- System paper submission link: https://softconf.com/lrec2026/OSACT7/ Choose AraSentEval 2026 track
AraHAHA 2026: Arabic Humour Generation
- Organizers: Ameera Almasoud, Hend Al-Khalifa, Reem Algifari, Nora Alangari, Manal Albahlal (King Saud University)
- For more information, please visit the shared task website: https://sites.google.com/view/arhaha2026/home
- System paper submission link: https://softconf.com/lrec2026/OSACT7/ Choose AraHAHA 2026 Track
KSAA 2026: Arabic Speech Dictation with Automatic Diacritisation
- Organizers: Waad Alshammari, Asma Al Wazrah, Rawan Almatham, Afrah Altamimi, Raghad Al-Rasheed, Sawsan Alqahtani, Hanan Aldarmaki, Rufael Marew, Abdulrahman Alshehri, Mohamed Assar, Abdullah Alharbi, Abdulrahman AlOsaimy (King Salman Academy, Princess Nourah University and MBZUAI)
- For more information, please visit the shared task website: https://arai.ksaa.gov.sa/sharedTask2026/
- System paper submission link: https://softconf.com/lrec2026/OSACT7/ Choose KSAA 2026 Track

Workshop Topics

Language Resources:

Pre-trained Arabic language models.
Surveys and evaluations of existing Arabic corpora and their associated processing tools.
Development and release of new annotated corpora for NLP and IR tasks such as named entity recognition, machine translation, sentiment analysis, text classification, and language learning.
Assessing the effectiveness of crowdsourcing platforms for Arabic data annotation.
Arabic text and speech processing toolkits.

Tools and Technologies:

Language education, including first (L1) and second (L2) language learning applications.
Pre-training & fine-tuning approaches for Arabic.
Tokenisation, normalisation, segmentation, morphology, and POS tagging.
Sentiment analysis, dialect ID, \& classification.
Web and social media analytics.
Arabic LRs for text, speech, sign, gesture, image, & multimodal data.
Best practices for LR interoperability.
Construction and annotation of LRs.
Knowledge extraction, acquisition, and representation.
Ontologies, terminology, and frameworks.
LRs and the Semantic Web (Linked Data, Knowledge Graphs).
Data contamination, synthetic data, and quality issues.

Paper Types and Formats

OSACT6 invites high-quality submissions written in English. Submissions of two forms of papers will be considered:

Regular long papers – up to eight (8) pages maximum*, presenting substantial, original, completed, and unpublished work.
Short papers – up to four (4) pages*, describing a small focused contribution, negative results, system demonstrations, etc.

* Excluding any number of additional pages for references, ethical consideration, conflict-of-interest, as well as data, and code availability statements.

Upon acceptance, final versions of long papers will be given one additional page – up to nine (9) pages of content plus unlimited pages for acknowledgments and references – so that reviewers’ comments can be taken into account. Final versions of short papers may have up to five (5) pages, plus unlimited pages for acknowledgments and references. For both long and short papers, all figures and tables that are part of the main text must fit within these page limits.

Furthermore, appendices or supplementary material will also be allowed ONLY in the final, camera-ready version, but not during submission, as papers should be reviewed without the need to refer to any supplementary materials.

Linguistic examples, if any, should be presented in the original language but also glossed into English to allow accessibility for a broader audience.

Note that paper types are decisions made orthogonal to the eventual, final form of presentation (i.e., oral versus poster).

Important Dates

Paper submission deadline: ~~February 18, 2026~~ → February 25, 2026
Notification of acceptance: ~~March 12, 2026~~ → March 25, 2026
Camera-ready deadline: March 30, 2026
Workshop Date: May 11, 2026 (Morning session (9-1) in Room 10)

Submission guidelines

We follow the LREC 2026 standards for submission format and guidelines.

Long papers: up to 8 pages (excluding references). Substantial, original, and completed work.
Short papers: up to 4 pages (excluding references). Suitable for focused contributions, negative results, datasets, or system descriptions.
Shared task papers: up to 4 pages (excluding references). Suitable for focused contributions, negative results, datasets, or system descriptions.

Submission Link: https://softconf.com/lrec2026/OSACT7/

Accepted Papers

Hidden Sentiments: The Impact of Low-level Adversarial Perturbations on Arabic Sentiment Analysis Services
Abdelrahman Abdelkader
SHEINfer: Implicit Product Category Inference from Arabic E-commerce Reviews
Hend Al-Khalifa
On LLM Prompting Techniques for Arabic Language Arithmetic Reasoning
Reem Alenezi and Ayed Atallah Salman
DIA2 - A Comprehensive and Diverse Diacritized Arabic Corpus for NLP Research
Fatima Dekmak, Shady Elbassuoni, Khaled Shaban, Hazem Hajj, Wassim El-Hajj, Yasmine Abu Adla and Buthaina Alabrash
LLM-Based Financial Sentiment Analysis in Arabic: Evidence from Saudi Markets
Mona H. Albaqawi, Eman M. Albalkhi, Joud A. Albaiti and Enrico Lopedoto
AlignAR: Generative Sentence Alignment for Arabic–English Parallel Corpora of Legal and Literary Texts
Baorong Huang and Ali Asiri
Does Translation Preserve Sentiment? An Analysis of Arabic-English Cross-Lingual Classification
Nour Aldin Al Mubarak and Noura Al Moubayed
Parsing Arabic Dialects Revisited: New Benchmarks, Models, and Insights
Ahmed Farouk Zakaria Elshabrawy, Go Inoue, Muhammed AbuOdeh and Nizar Habash
Helpful or Harmful? The Dual Role of Linguistic Features in LLM-Based Dialectal Machine Translation
Abdelhalim Hafedh Dahou and Mohamed Amine Cheragui
GATE-Reranker: A Strong Arabic Cross-Encoder for Document Reranking
Omer Nacar, Omar Elshehy, Mohamed Zaytoon and Khloud Al Jallad
ASCAT: An Arabic Scientific Corpus and Benchmark for Advanced Translation Evaluation
Serry Sibaee, Khloud Al Jallad, Zineb Yousfi, Israa Elsayed Mohamed Elhosiny, Yousra Yousra El-Ghawi, Batool Balah and Omer Nacar
NAJD-MT: High-Fidelity Saudi Najdi–English Training Data for Bidirectional Neural Machine Translation
Nour Qandos, Samar Essa Ahmed, Omer Nacar, ahmad alrabghi, Rahaf Saeed Al Hallay, Aya Hamod and Shaden Alsuhaim
How Foundation Models behave for Arabic Image Captioning?
Khaoula Dahimi, Amel BELABBACI, Hadda Cherroun and Abdelhamid Haouhat
CV-18 NER: Augmented Common Voice for Named Entity Recognition from Arabic Speech
youssef saidi, Haroun Elleuch and Fethi Bougares
When Bigger Isn’t Better: Evaluating LLMs for Arabic Sentiment Analysis
Mohamed Ibrahim, Abdullah Makki, Youssef Barakat, Nour Samy and Sarah AlHumoud

Committees

Organizing Committee

Hend Al-Khalifa, Professor, King Saud University, Riyadh, Saudi Arabia, hendk@ksu.edu.sa

Mo El-Haj, Reader, VinUniversity, Vietnam, Lancaster University, UK, elhaj.m@vinuni.edu.vn

Saad Ezzini, Assistant Professor, King Fahd University of Petroleum and Minerals (KFUPM), Saudi Arabia, saad.ezzini@kfupm.edu.sa

Programme Committee

Mohammed Alliheedi, Al-Baha University, Saudi Arabia

Mamoun Abu Helou, Al-Istiqlal University, Palestine

Maria Bounnit, FLAM, Cadi Ayyad University, Morocco

Hani M. Iwidat, Al-Istiqlal University, Palestine

Noorhan Abbas, University of Leeds, United Kingdom

Sultan Alrowili, IBM Research, Saudi Arabia

Abdulaziz Alhamadani, Florida Polytechnic University, United States

Amal Haddad Haddad, University of Granada, Spain

Dima Taji, Charles University, Czech Republic

Hamada Nayel, Prince Sattam Bin Abdulaziz University, Saudi Arabia

Salmane Chafik, Mohammed VI Polytechnic University, Morocco

Abdessalam Bouchekif, Hamad Bin Khalifa University (HBKU), Qatar

Sharefah Al-Ghamdi, King Saud University, Saudi Arabia

Akram Mohammed Ahmed Al-Rumaim, Independent Researcher, Netherlands

Ashraf Elnagar, University of Sharjah, United Arab Emirates

Ameera Almasoud, King Saud University, Saudi Arabia

Imed Zitouni, Meta, United States

Mohammad Abuoudeh, Al-Hussein Bin Talal University, Jordan

Wajdi Zaghouani, Northwestern University in Qatar, Qatar

Rana Malhas, Qatar University, Qatar

Ehsan Lotfi, University of Antwerp, Belgium

Abdelkader El Mahdaouy, Mohammed VI Polytechnic University, Morocco

Khalid Choukri, ELRA/ELDA, France

Khalid Al-Khatib, University of Groningen, Netherlands

Amany Fashwan, Alexandria University, Egypt

Salima Harrat, École Normale Supérieure de Bouzaréah (ENSB), Algiers, Algeria

Manal Albahlal, King Saud University, Saudi Arabia

Wafa Aissa, UCLouvain, Belgium

Bassam Haddad, University of Petra, Jordan

Noof Alfear, King Saud University, Saudi Arabia

Amr Keleg, Mohamed bin Zayed University of Artificial Intelligence (MBZUAI), United Arab Emirates

Sara Nabhani, University of Groningen, Netherlands

Ali Al-Laith, University of Copenhagen, Denmark

Fethi Bougares, Elyadata, LIA – Université d'Avignon, France

Amal Htait, Aston University, United Kingdom

Nourah Alangari, King Saud University, Saudi Arabia

Nadia Ghezaiel, University of Ha'il, Saudi Arabia

Khloud Al Jallad, Syrian Society for Startups and Research (SySSR), Syria

Paul Rayson, Lancaster University, United Kingdom

Hamza Alami, L3IA Laboratory, USMBA, Morocco

Abdulhamid Abubakar, Center for Cyberspace Studies, NSU Keffi, Nigeria

Kamel Gaanoun, INSEA, Morocco

Mariam M. Biltawi, Al Hussein Technical University, Jordan

Abdullah I. Alharbi, King Salman Global Academy for Arabic, Saudi Arabia

Marwah Alian, The World Islamic Sciences and Education University, Jordan

Sarah Alnefaie, King Abdulaziz University, Saudi Arabia

Hatim DERROUZ, University of Ibn tofail, Morocco

Nada Ghneim, Damascus University, Syria

Taha Zerrouki, Bouira University, Algeria

Fouzi Takelait, University of North Dakota, USA

Serry Sibaee, Prince Sultan Univeristy, Saudi Arabia

Yassine Saoudi, Al-Ahliyya Amman University, Jordan

Hadeel Saadany, Birmingham City University, UK

salima mdhaffar, University of Avignon, France

Nouran Khallaf, University of Leeds, UK

Amir Hazem, The University of Tokyo, Japan

Osama Hamed, Palestine Technical University, Palestinian

Workshop Program

09:00 – 09:05 | Opening

Opening Remarks

09:05 – 10:25 | Session 1: Sentiment Analysis, Retrieval & Machine Translation

Sentiment Analysis

Hidden Sentiments: The Impact of Low-level Adversarial Perturbations on Arabic Sentiment Analysis Services
Abdelrahman Abdelkader
LLM-Based Financial Sentiment Analysis in Arabic: Evidence from Saudi Markets
Mona H. Albaqawi, Eman M. Albalkhi, Joud A. Albaiti and Enrico Lopedoto
When Bigger Isn’t Better: Evaluating LLMs for Arabic Sentiment Analysis
Mohamed Ibrahim, Abdullah Makki, Youssef Barakat, Nour Samy and Sarah AlHumoud
Does Translation Preserve Sentiment? An Analysis of Arabic-English Cross-Lingual Classification
Nour Aldin Al Mubarak and Noura Al Moubayed

Retrieval & Multimodal

GATE-Reranker: A Strong Arabic Cross-Encoder for Document Reranking
Omer Nacar, Omar Elshehy, Mohamed Zaytoon and Khloud Al Jallad
How Foundation Models behave for Arabic Image Captioning?
Khaoula Dahimi, Amel BELABBACI, Hadda Cherroun and Abdelhamid Haouhat

Machine Translation

AlignAR: Generative Sentence Alignment for Arabic–English Parallel Corpora of Legal and Literary Texts
Baorong Huang and Ali Asiri
Helpful or Harmful? The Dual Role of Linguistic Features in LLM-Based Dialectal Machine Translation
Abdelhalim Hafedh Dahou and Mohamed Amine Cheragui

10:30 – 11:00 | Coffee Break

11:00 – 12:15 | Session 2: Machine Translation, LLM Applications & Language Resources

Machine Translation

ASCAT: An Arabic Scientific Corpus and Benchmark for Advanced Translation Evaluation
Serry Sibaee, Khloud Al Jallad, Zineb Yousfi, Israa Elsayed Mohamed Elhosiny, Yousra Yousra El-Ghawi, Batool Balah and Omer Nacar
NAJD-MT: High-Fidelity Saudi Najdi–English Training Data for Bidirectional Neural Machine Translation
Nour Qandos, Samar Essa Ahmed, Omer Nacar, ahmad alrabghi, Rahaf Saeed Al Hallay, Aya Hamod and Shaden Alsuhaim

LLM Applications & Language Resources

On LLM Prompting Techniques for Arabic Language Arithmetic Reasoning
Reem Alenezi and Ayed Atallah Salman
Parsing Arabic Dialects Revisited: New Benchmarks, Models, and Insights
Ahmed Farouk Zakaria Elshabrawy, Go Inoue, Muhammed AbuOdeh and Nizar Habash
DIA2 - A Comprehensive and Diverse Diacritized Arabic Corpus for NLP Research
Fatima Dekmak, Shady Elbassuoni, Khaled Shaban, Hazem Hajj, Wassim El-Hajj, Yasmine Abu Adla and Buthaina Alabrash
CV-18 NER: Augmented Common Voice for Named Entity Recognition from Arabic Speech
youssef saidi, Haroun Elleuch and Fethi Bougares

12:15 – 12:55 | Session 3: Arabic NLP Shared Tasks

ARHAHA 2026 — Arabic Humor Automatic Generation

(Overview) The Shared Task on Arabic Humor Automatic Generation
Ameera Masoud Almasoud, Hend Al-Khalifa, Reem Fahad Alqifari, Nourah Alangari and Manal M. Albahlal

AdabEval 2026 — Arabic Politeness Detection

(Overview) The AdabEval 2026 Shared Task on Arabic Politeness Detection
Reem Fahad Alqifari, Hend Al-Khalifa, Nadia Ghezaiel, Maria Bounnit, Hend Hamed Alhazmi, Ameera Masoud Almasoud, Sharefah Ahmed Al-Ghamdi and Noof Abdullah Alfear
(Lightning Talk) GHAD NLP at AdabEval2026: Transformer-Based Approach for Arabic Politeness and Pragmatic Category Classification
Ghada Alfattni and Ghader Kurdi

QIAS 2026 — Islamic Inheritance Reasoning

(Overview) QIAS 2026: Overview of the Shared Task on Islamic Inheritance Reasoning
Abdessalam Bouchekif, Somaya Eltanbouly, Shahd Gaben, Mohammed Ghaly, Samer Rashwani, MOHAMED Emad and Heba Sbahi
(Lightning Talk) QU-NLP at QIAS 2026: Multi-Stage QLoRA Fine-Tuning for Arabic Islamic Inheritance Reasoning
Mohammad ALSmadi

KSAA-2026 — Arabic Speech Dictation with Automatic Diacritization

(Overview) KSAA-2026 Shared Task on Arabic Speech Dictation with Automatic Diacritization
Asma Ali Al Wazrah, Waad Alshammari, Rawan Almatham, Raghad Al-rasheed, Afrah Abdulaziz Altamimi, Rufael Marew, Sawsan Alqahtani, Hanan Aldarmaki, Abdullah I. Alharbi, Abdulrahman Saeed Alshehri, Mohamed Assar, Amal Almazrua and Abdulrahman Alosaimy
(Lightning Talk) Thaka at KSAA-2026 Task 2: Regularized Fine-Tuning for Arabic Speech Diacritization
Meshal Abdullah Alamr, Hassan Rshed Alqaeri and Abdullah Aldahlawi

AraSentEval 2026 — Sentiment Analysis and Swapping

(Overview) AraSentEval 2026: A Shared Task on Sentiment Analysis and Swapping in Arabic
Saad Ezzini, Shadi Abudalfa, Maram I. Alharbi, Salmane Chafik, Hamzah Luqman, Mo El-Haj, Paul Rayson and Reem Alotaibi
(Lightning Talk) TTLab at AraSentEval: SARF (صرف) Sentiment Analysis via Root-based Fusion for Multi-Dialectal Arabic
Ali Abusaleh, Bhuvanesh Verma and Alexander Mehler

12:55 – 13:00 | Closing

Closing Remarks

Welcome to OSACT7

Workshop Description

Shared Tasks

Workshop Topics

Language Resources:

Tools and Technologies:

Paper Types and Formats

Important Dates

Important Dates

Submission guidelines

Accepted Papers

Committees

Organizing Committee

Programme Committee

Workshop Program

09:00 – 09:05 | Opening

09:05 – 10:25 | Session 1: Sentiment Analysis, Retrieval & Machine Translation

10:30 – 11:00 | Coffee Break

11:00 – 12:15 | Session 2: Machine Translation, LLM Applications & Language Resources

12:15 – 12:55 | Session 3: Arabic NLP Shared Tasks

12:55 – 13:00 | Closing