Svoboda | Graniru | BBC Russia | Golosameriki | Facebook
Skip to main content
Abstract: Global problems such as disease detection and control, terrorism, immigration and border control,illicit drug trafficking, etc. require information sharing, coordination and collaboration amonggovernment agencies within a... more
Abstract: Global problems such as disease detection and control, terrorism, immigration and border control,illicit drug trafficking, etc. require information sharing, coordination and collaboration amonggovernment agencies within a country and across national boundaries. This paper presents a prototype ofa transnational information system which aims at achieving information sharing, process coordination andenforcement of policies, constraints, regulations, and security and privacy rules by...
Reading in a foreign language is a difficult task, especially if the texts presented to readers are chosen without taking into account the reader’s skill level. Foreign language learners need to be presented with reading material suitable... more
Reading in a foreign language is a difficult task, especially if the texts presented to readers are chosen without taking into account the reader’s skill level. Foreign language learners need to be presented with reading material suitable to their reading capacities. A basic tool for determining if a text is appropriate to a reader’s level is the assessment of its readability, a measure that aims to represent the human capacities required to comprehend a given text. Readability prediction for a text is an important aspect in the process of teaching and learning, for reading in a foreign language as well as in one’s native language, and continues to be a central area of research and practice. In this paper, we present our approach to readability assessment for Modern Standard Arabic (MSA) as a foreign language. Readability prediction is carried out using the Global Language Online Support System (GLOSS) corpus, which was developed for independent learners to improve their foreign language skills and was annotated with the Interagency Language Roundtable (ILR) scale. In this study, we introduce a frequency dictionary, which was developed to calculate frequency-based features. The approach gives results that surpass the state-of the-art results for Arabic.
Text readability assessment is a well-known problem that has acquired even more importance in today’s information-rich world. In this article, we survey various approaches to measuring and assessing the readability of texts. Our specific... more
Text readability assessment is a well-known problem that has acquired even more importance in today’s information-rich world. In this article, we survey various approaches to measuring and assessing the readability of texts. Our specific goal is to provide a perspective on the state-of-the-art in readability assessment research for Arabic, which differs significantly from other languages on which readability studies have tended to focus. We provide background on readability assessment research and tools for English, for which readability studies are the most advanced. We then survey approaches adopted for Arabic, both classical formula-based approaches and studies that combine Machine Learning (ML) with Natural Language Processing (NLP) techniques. The works we cover target text corpora for different audiences: school-age first language readers (L1), foreign language learners (L2), and adult readers in non-academic contexts. Therefore, we explore differences between reading in L1 and L2 and consider how they play out specifically in Arabic after describing language characteristics that may impact readability. Finally, we highlight challenges for Arabic readability research and propose multiple future directions to improve readability assessment and related applications that would benefit from more attention.
Today there is a large amount of valuable research on corpora, and the availability of corpora has increased significantly in recent years. Unfortunately, this is not the case for all types of corpora. Research in the field of Arabic... more
Today there is a large amount of valuable research on corpora, and the availability of corpora has increased significantly in recent years. Unfortunately, this is not the case for all types of corpora. Research in the field of Arabic language processing suffers from a great lack of annotated educational corpora. In this work, we have tried to constitute a new educational corpus by drawing from Moroccan primary school books. This corpus will help education researchers and computational linguists provide appropriate tools to support school students who are learning formal Arabic. We annotated the corpus with morphosyntactic information that can be used in several natural language processing applications. We also added a text difficulty measure, linked to the Moroccan primary school levels, so that the corpus can be used in the development of readability measurement applications. The result is a Modern Standard Arabic Language corpus dedicated to young learners of Arabic as a first language (L1). The corpus is manually labeled by seven levels, namely the primary levels of the Moroccan educational system from 1st to 6th grade, in addition to a more basic level we called level 0.
We present a computational approach to Arabic morphology description that draws from Lexeme-Based Morphology (Aronoff, 1994; Beard, 1995), giving priority to stems and granting a subordinate status to inflectional prefixes and suffixes.... more
We present a computational approach to Arabic morphology description that draws from Lexeme-Based Morphology (Aronoff, 1994; Beard, 1995), giving priority to stems and granting a subordinate status to inflectional prefixes and suffixes. Although the morphology of Arabic is non-concatenative, we make the process of generating inflected forms concatenative by separating the generation of stems from that of other inflectional affixes.
Text readability assessment is a well-known problem that has acquired even more importance in today’s information-rich world. In this article, we survey various approaches to measuring and assessing the readability of texts. Our specific... more
Text readability assessment is a well-known problem that has acquired even more importance in today’s information-rich world. In this article, we survey various approaches to measuring and assessing the readability of texts. Our specific goal is to provide a perspective on the state-of-the-art in readability assessment research for Arabic, which differs significantly from other languages on which readability studies have tended to focus. We provide background on readability assessment research and tools for English, for which readability studies are the most advanced. We then survey approaches adopted for Arabic, both classical formula-based approaches and studies that combine Machine Learning (ML) with Natural Language Processing (NLP) techniques. The works we cover target text corpora for different audiences: school-age first language readers (L1), foreign language learners (L2), and adult readers in non-academic contexts. Therefore, we explore differences between reading in L1 an...
Global problems such as disease detection and control, terrorism, immigration and border control, illicit drug trafficking, etc. require information sharing, coordination and collaboration among government agencies within a country and... more
Global problems such as disease detection and control, terrorism, immigration and border control, illicit drug trafficking, etc. require information sharing, coordination and collaboration among government agencies within a country and across national boundaries. This paper presents a prototype of a transnational information system which aims at achieving information sharing, process coordination and enforcement of policies, constraints, regulations, and security and privacy rules by integrating a distributed query processor with form-based and conversational user interfaces, a language translation system, an event server for event filtering and notification, and an event-trigger-rule server. The Web-services infrastructure is used to achieve the interoperation of these heterogeneous component systems.
Abstract. We describe ongoing efforts towards and challenges in using an Example-Based Machine Translation (EBMT) system in the context of a multi-national, multi-university and multi-agency transnational digital government project. The... more
Abstract. We describe ongoing efforts towards and challenges in using an Example-Based Machine Translation (EBMT) system in the context of a multi-national, multi-university and multi-agency transnational digital government project. The project is aimed at applying information technology to the problem of collecting and sharing information securely in a multilingual context. We report on a number of issues encountered in obtaining and using language data for the EBMT system, discuss our current solutions, and briefly describe ongoing enhancements to the system to meet some of the technical and practical challenges posed by using this machine translation approach in the project domain. 1. Background We describe ongoing efforts towards and challenges in adapting and using an Example-Based Machine Translation (EBMT) system in the context of a transnational digital government project (Cavalli-Sforza, et al., 2003; Su et al., under
Global problems such as disease detection and control, terrorism, immigration and border control, illicit drug trafficking, etc. require information sharing, coordination and collaboration among government agencies within a country and... more
Global problems such as disease detection and control, terrorism, immigration and border control, illicit drug trafficking, etc. require information sharing, coordination and collaboration among government agencies within a country and across national boundaries. This paper presents a prototype of a transnational information system which aims at achieving information sharing, process coordination and enforcement of policies, constraints, regulations, and security and privacy rules by integrating a distributed query processor with form-based and conversational user interfaces, a language translation system, an event server for event filtering and notification, and an event-trigger-rule server. The Web-services infrastructure is used to achieve the interoperation of these heterogeneous component systems.
New members entering productive organizations require considerable training. Computer tools can support such training by providing an opportunity to learn while engaging in authentic activities and receiving appropriate coaching. We... more
New members entering productive organizations require considerable training. Computer tools can support such training by providing an opportunity to learn while engaging in authentic activities and receiving appropriate coaching. We describe two tools that incorporate this approach. Sherlock, an existing computer coach, is an effective environment for learning how to troubleshoot complex electronic devices. A newer research effort focuses on tools for supporting knowledge-building argumentation and scientific theory evaluation in post-elementary school science education. Both tools offer users opportunities for reflecting on their own performance and support individual as well as collaborative learning.
Abstract. We describe ongoing efforts towards and challenges in using an Example-Based Machine Translation (EBMT) system in the context of a multi-national, multi-university and multi-agency transnational digital government project. The... more
Abstract. We describe ongoing efforts towards and challenges in using an Example-Based Machine Translation (EBMT) system in the context of a multi-national, multi-university and multi-agency transnational digital government project. The project is aimed at applying information technology to the problem of collecting and sharing information securely in a multilingual context. We report on a number of issues encountered in obtaining and using language data for the EBMT system, discuss our current solutions, and briefly describe ongoing enhancements to the system to meet some of the technical and practical challenges posed by using this machine translation approach in the project domain. 1. Background We describe ongoing efforts towards and challenges in adapting and using an Example-Based Machine Translation (EBMT) system in the context of a transnational digital government project (Cavalli-Sforza, et al., 2003; Su et al., under
New members entering productive organizations require considerable training. Computer tools can support such training by providing an opportunity to learn while engaging in authentic activities and receiving appropriate coaching. We... more
New members entering productive organizations require considerable training. Computer tools can support such training by providing an opportunity to learn while engaging in authentic activities and receiving appropriate coaching. We describe two tools that incorporate this approach. Sherlock, an existing computer coach, is an effective environment for learning how to troubleshoot complex electronic devices. A newer research effort focuses on tools for supporting knowledge-building argumentation and scientific theory evaluation in post-elementary school science education. Both tools offer users opportunities for reflecting on their own performance and support individual as well as collaborative learning.
Solutions to global problems such as disease detection and control, terrorism, immigration and border control, and illicit drug trafficking require sharing and coordinating information and collaboration among government agencies within a... more
Solutions to global problems such as disease detection and control, terrorism, immigration and border control, and illicit drug trafficking require sharing and coordinating information and collaboration among government agencies within a country and across national boundaries. This paper presents an approach to achieve information sharing, event notification, enforcement of policies, constraints, regulations, security and privacy rules, and process coordination. The proposed system, designed in collaboration with stakeholders and end users in two Latin American countries, achieves the desired capabilities by integrating a distributed query processor (DQP) that provides form-based and conversational user interfaces, a language translation system, an event server for event filtering and notification, and an event-trigger-rule server. The Web-services infrastructure is used to achieve the interoperation of these heterogeneous component systems. A prototype of the integrated transnation...
Solutions to global problems such as disease detection and control, terrorism, immigration and border control, and illicit drug trafficking require sharing and coordinating information and collaboration among government agencies within a... more
Solutions to global problems such as disease detection and control, terrorism, immigration and border control, and illicit drug trafficking require sharing and coordinating information and collaboration among government agencies within a country and across national boundaries. This paper presents an approach to achieve information sharing, event notification, enforcement of policies, constraints, regulations, security and privacy rules, and process coordination. The proposed system, designed in collaboration with stakeholders and end users in two Latin American countries, achieves the desired capabilities by integrating a distributed query processor (DQP) that provides form-based and conversational user interfaces, a language translation system, an event server for event filtering and notification, and an event-trigger-rule server. The Web-services infrastructure is used to achieve the interoperation of these heterogeneous component systems. A prototype of the integrated transnation...
Nous étudions l’évolution d’une série de textes conçus pour les apprenants de la langue arabe, langue seconde, le long d'un cursus en considérant leur contenu lexical en termes de vocabulaire soi-disant acquis ou en cours... more
Nous étudions l’évolution d’une série de textes conçus pour les apprenants de la langue arabe, langue seconde, le long d'un cursus en considérant leur contenu lexical en termes de vocabulaire soi-disant acquis ou en cours d'acquisition par les apprenants auxquels sont destinés ces textes. Nous examinons aussi l'évolution d'autres variables de texte communément utilisés pour mesurer la lisibilité d'un texte. L'objectif est de déterminer les traits des textes qui peuvent être utilisés pour construire un modèle prédictif de la pertinence d'un texte à un apprenant, à un stade d'apprentissage donné, tel que défini principalement par le vocabulaire appris. Nous concluons en examinant si l’approche et les résultats peuvent être appliqués à l’amazighe.
Arabic inflectional morphology requires infixation, prefixation and suffixation, giving rise to a large space of morphological variation. In this paper we describe an approach to reducing the complexity of Arabic morphology generation... more
Arabic inflectional morphology requires infixation, prefixation and suffixation, giving rise to a large space of morphological variation. In this paper we describe an approach to reducing the complexity of Arabic morphology generation using discrimination trees and transformational rules. By decoupling the problem of stem changes from that of prefixes and suffixes, we gain a significant reduction in the number of rules required, as much as a factor of three for certain verb types. We focus on hollow verbs but discuss the wider applicability of the approach.
We describe ongoing efforts towards and challenges in using an Example-Based Machine Translation (EBMT) system in the context of a multinational, multi-university and multi-agency transnational digital government project. The project is... more
We describe ongoing efforts towards and challenges in using an Example-Based Machine Translation (EBMT) system in the context of a multinational, multi-university and multi-agency transnational digital government project. The project is aimed at applying information technology to the problem of collecting and sharing information securely in a multilingual context. We report on a number of issues encountered in obtaining and using language data for the EBMT system, discuss our current solutions, and briefly describe ongoing enhancements to the system to meet some of the technical and practical challenges posed by using this machine translation approach in the project domain.
We have completed a 3 year project, sponsored by the Korean KOICA agency, to develop competencies, resources and tools for mathematics and science education for use by middle school students and teachers in Morocco. The results have been... more
We have completed a 3 year project, sponsored by the Korean KOICA agency, to develop competencies, resources and tools for mathematics and science education for use by middle school students and teachers in Morocco. The results have been encouraging both in terms of increased student performance and teacher involvement and appropriation of the ICT based approach. 1 Background In general, the level in mathematics and science of a large proportion of Moroccan students in middle school and high school has been below expectations. Moroccan eighth-graders, for example, have performed below the international average [1, p. 18]. Statistics also show that a large portion of Moroccan students do not make it to higher education. These facts combined have led to the research grant sponsored by the Korean International Cooperation Agency (KOICA)[2], with the aim of the improvement of overall education in Morocco, and in particular improvement in the use of Information and Communications Technol...
W e describe and discuss the results of ongoing experim ents that use morphological analysis in the context of Example-Based M achine Translation. The goal is to increase the coverage of our training examples so as to capture things that... more
W e describe and discuss the results of ongoing experim ents that use morphological analysis in the context of Example-Based M achine Translation. The goal is to increase the coverage of our training examples so as to capture things that are not directly seen in the training text. This is done through a two stage process of generalization and filtering.
@Book{CASL2007:2007, editor = {Violetta Cavalli-Sforza and Imed Zitouni}, title = {Proceedings of the 2007 Workshop on Computational Approaches to Semitic Languages: Common Issues and Resources}, month = {June}, year = {2007}, address =... more
@Book{CASL2007:2007, editor = {Violetta Cavalli-Sforza and Imed Zitouni}, title = {Proceedings of the 2007 Workshop on Computational Approaches to Semitic Languages: Common Issues and Resources}, month = {June}, year = {2007}, address = {Prague, Czech Republic}, publisher = {Association for Computational Linguistics}, url = {http://www.aclweb.org/anthology/W/W07/W07- 08} } @InProceedings{smrz:2007:CASL2007, author = {Smrz, Otakar}, title = {ElixirFM -- Implementation of Functional Arabic Morphology}, booktitle = {Proceedings of the ...
L’enseignement et l’apprentissage d’une langue etrangere visent normalement, avec des variations selon la langue et les buts de l’apprenant, quatre competences principales : lire, ecrire, parler, ecouter. Dans le cadre de l’apprentissage... more
L’enseignement et l’apprentissage d’une langue etrangere visent normalement, avec des variations selon la langue et les buts de l’apprenant, quatre competences principales : lire, ecrire, parler, ecouter. Dans le cadre de l’apprentissage des langues assiste par ordinateur, la lecture des textes joue un role important pour deux raisons. Du cote pratique les systemes de traitement automatique de la langue sont actuellement bien equipes dans ce sens (beaucoup plus que pour le traitement de la parole), et du cote pedagogique la lecture permet a l’apprenant de developper le vocabulaire de la langue cible et de comprendre les nuances des mots et leur bonne utilisation par rapport au contexte, ainsi que la relation entre la realisation des mots et la structure syntaxique dans laquelle ils se trouvent.
IMORPHĒ is a significantly extended version of MORPHĒ, a morphology description compiler. MORPHĒ’s morphology description language is based on two constructs: 1) a morphological form hierarchy, whose nodes relate and differentiate surface... more
IMORPHĒ is a significantly extended version of MORPHĒ, a morphology description compiler. MORPHĒ’s morphology description language is based on two constructs: 1) a morphological form hierarchy, whose nodes relate and differentiate surface forms in terms of the common and distinguishing inflectional features of lexical items; and 2) transformational rules, attached to leaf nodes of the hierarchy, which generate the surface form of an item from the base form stored in the lexicon. While MORPHĒ’s approach to morphology description is intuitively appealing and was successfully used for generating the morphology of several European languages, its application to Modern Standard Arabic yielded morphological descriptions that were highly complex and redundant. Previous modifications and enhancements attempted to capture more elegantly and concisely different aspects of the complex morphology of Arabic, finding theoretical grounding in Lexeme-Based Morphology. Those extensions are being inco...
This paper describes a novel Arabic Reading Enhancement Tool (ARET) for classroom use, which has been built using corpus-based Natural Language Processing in combination with expert linguistic annotation. The NLP techniques include a... more
This paper describes a novel Arabic Reading Enhancement Tool (ARET) for classroom use, which has been built using corpus-based Natural Language Processing in combination with expert linguistic annotation. The NLP techniques include a widely used morphological analyzer for Modern Standard Arabic to provide word-level grammatical details, and a relational database index of corpus texts to provide word concordances. ARET also makes use of a commercial Arabic text-to-speech (TTS) system to add a speech layer (with male and female voices) to the Al-Kitaab language textbook resources. The system generates test questions and distractors, offering teachers and students an interesting computer-aided language learning tool. We describe the background and the motivation behind the building of ARET, presenting the various components and the method used to build the tools.
We describe efforts towards, and results of, a transnational collaboration between universities, government agencies, and an international organization in applying information technology (IT) to a problem of international concern:... more
We describe efforts towards, and results of, a transnational collaboration between universities, government agencies, and an international organization in applying information technology (IT) to a problem of international concern: detecting and monitoring activities related to the transnational movement of illicit drugs. Starting from a general vision of how IT could assist in achieving this objective, we have agreed on likely user scenarios, infrastructure requirements and specifications for a prototype system whose initial focus is on collecting and sharing information related to the migration of individuals across borders. Albeit on a reduced scale, this system concretely exemplifies how information about potentially drug-related activities can be collected and accessed in different languages by privileged users through natural dialogue communication, and how it can be shared in a timely fashion to promote regional and international cooperation on solving the drug problem.
Various research works have tried to connect Natural Language Processing NLP to computer graphics, as this connection would lay the ground for an automatic generation of computer animations. In this research we aim to provide a novel... more
Various research works have tried to connect Natural Language Processing NLP to computer graphics, as this connection would lay the ground for an automatic generation of computer animations. In this research we aim to provide a novel approach for connecting graphics to NLP by using OpenNLP and the Unity 3D game engine. We rely on two linguistic approaches—Vendler’s verb classification and Jackendoff’s Lexical Conceptual Structure LCS—and present how the technology enablers and the linguistic approaches chosen collaborate to provide the animation generation capability. We describe the overall architecture of AUI Story Maker, a system built to illustrate the feasibility of our approach, and discuss the future work required to make it a reliable tool in a modern classroom setting. We also present some writing samples gathered during field work with 1st graders at Al Akhawayn School in Ifrane (ASI), and provide sample outputs of AUI Story Maker.
Research Interests:
We describe an enhanced version of the MORPHE tool, a morphological analyzer/generator designed to interface with a knowledge-based machine translation system. MORPHE uses a hierarchy (tree structure) to relate various morphological forms... more
We describe an enhanced version of the MORPHE tool, a morphological analyzer/generator designed to interface with a knowledge-based machine translation system. MORPHE uses a hierarchy (tree structure) to relate various morphological forms to each other based on common and distinctive features. Transformational rules are attached to the leaf nodes of the hierarchy. In generation, MORPHE takes as input a feature structure and pushes it through the hierarchy, which acts as discrimination net. When a leaf node is reached, MORPHE applies the attached rule. Each rule may contain several mutually exclusive clauses, each of which attempts to match a pattern against the base string contained in the feature structure and, if the match is successful, applies operators to the string to produce a transformed string. Our enhancements to MORPHE were motivated by attempting to use the tool to generate Arabic morphology. The non-concatenative morphology typical of Semitic languages has spurred the d...

And 49 more