Svoboda | Graniru | BBC Russia | Golosameriki | Facebook

Skip to main content

Mai Oudah

New York University Abu Dhabi, Computer Science, Faculty Member

Followers

40

Following

12

Co-authors

2

Public Views

Address: UAE

less

InterestsView All (8)

Uploads

Papers by Mai Oudah

Machine Learning Based Screening Tool for Alzheimer’s Disease via Gut Microbiome

Lecture notes in networks and systems, 2023

Identifying Heat-Resilient Corals Using Machine Learning and Microbiome

Lecture notes in networks and systems, 2023

Microbiome Classification for Heart Disease Detection

2022 IEEE 22nd International Conference on Bioinformatics and Bioengineering (BIBE)

sj-pdf-1-smx-10.1177_00811750211053370 – Supplemental material for Language Models in Sociological Research: An Application to Classifying Large Administrative Data and Measuring Religiosity

Supplemental material, sj-pdf-1-smx-10.1177_00811750211053370 for Language Models in Sociological... more Supplemental material, sj-pdf-1-smx-10.1177_00811750211053370 for Language Models in Sociological Research: An Application to Classifying Large Administrative Data and Measuring Religiosity by Jeffrey L. Jensen, Daniel Karell, Cole Tanigawa-Lau, Nizar Habash, Mai Oudah and Dhia Fairus Shofia Fani in Sociological Methodology

Human Gut Microbes Associated with Systolic Blood Pressure

© The Author(s) 2013 Reprints and permissions

Taxonomy-aware feature engineering for microbiome classification

BMC Bioinformatics, 2018

Studying the impact of language-independent and language-specific features on hybrid Arabic Person name recognition

Language Resources and Evaluation, 2016

Global transcriptome analysis of salt acclimated Prochlorococcus AS9601

Microbiological Research, 2015

Person name recognition using the hybrid approach

A hybrid approach to Arabic named entity recognition

Journal of Information Science, 2013

In this paper, we propose a hybrid named entity recognition (NER) approach that takes the advanta... more In this paper, we propose a hybrid named entity recognition (NER) approach that takes the advantages of rule-based and machine learning-based approaches in order to improve the overall system performance and overcome the knowledge elicitation bottleneck and the lack of resources for underdeveloped languages that require deep language processing, such as Arabic. The complexity of Arabic poses special challenges to researchers of Arabic NER, which is essential for both monolingual and multilingual applications. We used the hybrid approach to develop an Arabic NER system that is capable of recognizing 11 types of Arabic named entities: Person, Location, Organization, Date, Time, Price, Measurement, Percent, Phone Number, ISBN and File Name. Extensive experiments were conducted using decision trees, Support Vector Machines and logistic regression classifiers to evaluate the system performance. The empirical results indicate that the hybrid approach outperforms both the rule-based and th...

A Pipeline Arabic Named Entity Recognition Using a Hybrid Approach

Additional file 1: of Taxonomy-aware feature engineering for microbiome classification

Figure S1. The PCoA plot of The Human Microbiome Project Consortium (2012) dataset, which is gene... more Figure S1. The PCoA plot of The Human Microbiome Project Consortium (2012) dataset, which is generated via the beta diversity through plots:py script available by QIIME Figure S2. The PCoA plot provided in the Meta-analysis of environmental microbiomes conducted by Henschel et al. (2015) Figure S3. The PCoA plot of the combined CRC dataset Figure S4. Comparison between the baseline and HFE confusion matrices when applied on CRC1 dataset (Zeller et al., 2014) for Cancer vs. Normal classification Figure S5. Comparison between the baseline and HFE confusion matrices when applied on CRC2 dataset (Zackular et al., 2014) for Cancer vs. Normal classification Figure S6. Comparison between the baseline and HFE confusion matrices when applied on CRC1 + 2 dataset for Cancer vs. Normal classification Figure S7. Comparison between the baseline and HFE confusion matrices when applied on CRC1 + 2 Figure S8. The taxonomic tree of all the informative features extracted by the HFE method for Cancer v...

NERA 2.0: Improving coverage and performance of rule-based named entity recognition for Arabic

Natural Language Engineering, 2016

Named Entity Recognition (NER) is an essential task for many natural language processing systems,... more Named Entity Recognition (NER) is an essential task for many natural language processing systems, which makes use of various linguistic resources. NER becomes more complicated when the language in use is morphologically rich and structurally complex, such as Arabic. This language has a set of characteristics that makes it particularly challenging to handle. In a previous work, we have proposed an Arabic NER system that follows the hybrid approach, i.e. integrates both rule-based and machine learning-based NER approaches. Our hybrid NER system is the state-of-the-art in Arabic NER according to its performance on standard evaluation datasets. In this article, we discuss a novel methodology for overcoming the coverage drawback of rule-based NER systems in order to improve their performance and allow for automated rule update. The presented mechanism utilizes the recognition decisions made by the hybrid NER system in order to identify the weaknesses of the rule-based component and deriv...

Human Gut Microbes Associated with Systolic Blood Pressure

by Mai Oudah and Dibyayan Deb

International Journal of Hypertension | Research Article, 2022

Emerging studies have revealed a strong link between the gut microbiome and several human disease... more Emerging studies have revealed a strong link between the gut microbiome and several human diseases. Since human gut microbiome mirrors variations in lifestyle and environment, whether associations between disease conditions and gut microbiome are consistent across populations-particularly in communities practicing traditional subsistence strategies whose microbiomes differ markedly from industrialists-remains unknown. Cardiovascular diseases are the leading cause of mortality in India affecting 55 million people, and high blood pressure is one of the primary risk factors for cardiovascular diseases. We examined associations between gut microbiome and blood pressure along with 14 other variables associated with lifestyle, dietary habits, disease conditions, and clinical blood markers in the three Assamese populations. Our analysis reveals a robust link between the gut microbiome diversity and composition and systolic blood pressure. Moreover, several genera previously associated with hypertension in non-Indian populations were also associated with systolic blood pressure in this cohort and these genera were predictors of elevated blood pressure in these populations. ese findings confer opportunities to design personalized, preventative, and targeted interventions harnessing the gut microbiome to tackle the burden of cardiovascular diseases in India.

The Impact of Preprocessing on Arabic-English Statistical and Neural Machine Translation

Neural networks have become the state-of-the-art approach for machine translation (MT) in many la... more Neural networks have become the state-of-the-art approach for machine translation (MT) in many languages. While linguistically-motivated tokenization techniques were shown to have significant effects on the performance of statistical MT, it remains unclear if those techniques are well suited for neural MT. In this paper, we systematically compare neural and statistical MT models for Arabic-English translation on data preprecossed by various prominent tokenization schemes. Furthermore, we consider a range of data and vocabulary sizes and compare their effect on both approaches. Our empirical results show that the best choice of tokenization scheme is largely based on the type of model and the size of data. We also show that we can gain significant improvements using a system selection that combines the output from neural and statistical MT.

Identification of discriminatory taxa for anthropogenic environments through taxonomy-aware feature engineering

F1000Research, 2015

Integrating rule-based approach and machine learning approach for arabic named entity recognition

Simple Automatic Post-editing for Arabic-Japanese Machine Translation

A common bottleneck for developing machine translation (MT) systems for some language pairs is th... more A common bottleneck for developing machine translation (MT) systems for some language pairs is the lack of direct parallel translation data sets, in general and in certain domains. Alternative solutions such as zero-shot models or pivoting techniques are successful in getting a strong baseline, but are often below the more supported language-pair systems. In this paper, we focus on Arabic-Japanese machine translation, a less studied language pair; and we work with a unique parallel corpus of Arabic news articles that were manually translated to Japanese. We use this parallel corpus to adapt a state-of-the-art domain/genre agnostic neural MT system via a simple automatic post-editing technique. Our results and detailed analysis suggest that this approach is quite viable for less supported language pairs in specific domains.

CAMeL Tools: An Open Source Python Toolkit for Arabic Natural Language Processing

We present CAMeL Tools, a collection of open-source tools for Arabic natural language processing ... more We present CAMeL Tools, a collection of open-source tools for Arabic natural language processing in Python. CAMeL Tools currently provides utilities for pre-processing, morphological modeling, Dialect Identification, Named Entity Recognition and Sentiment Analysis. In this paper, we describe the design of CAMeL Tools and the functionalities it provides.

Machine Learning Based Screening Tool for Alzheimer’s Disease via Gut Microbiome

Lecture notes in networks and systems, 2023

Identifying Heat-Resilient Corals Using Machine Learning and Microbiome

Lecture notes in networks and systems, 2023

Microbiome Classification for Heart Disease Detection

2022 IEEE 22nd International Conference on Bioinformatics and Bioengineering (BIBE)

sj-pdf-1-smx-10.1177_00811750211053370 – Supplemental material for Language Models in Sociological Research: An Application to Classifying Large Administrative Data and Measuring Religiosity

Supplemental material, sj-pdf-1-smx-10.1177_00811750211053370 for Language Models in Sociological... more Supplemental material, sj-pdf-1-smx-10.1177_00811750211053370 for Language Models in Sociological Research: An Application to Classifying Large Administrative Data and Measuring Religiosity by Jeffrey L. Jensen, Daniel Karell, Cole Tanigawa-Lau, Nizar Habash, Mai Oudah and Dhia Fairus Shofia Fani in Sociological Methodology

Human Gut Microbes Associated with Systolic Blood Pressure

© The Author(s) 2013 Reprints and permissions

Taxonomy-aware feature engineering for microbiome classification

BMC Bioinformatics, 2018

Studying the impact of language-independent and language-specific features on hybrid Arabic Person name recognition

Language Resources and Evaluation, 2016

Global transcriptome analysis of salt acclimated Prochlorococcus AS9601

Microbiological Research, 2015

Person name recognition using the hybrid approach

A hybrid approach to Arabic named entity recognition

Journal of Information Science, 2013

In this paper, we propose a hybrid named entity recognition (NER) approach that takes the advanta... more In this paper, we propose a hybrid named entity recognition (NER) approach that takes the advantages of rule-based and machine learning-based approaches in order to improve the overall system performance and overcome the knowledge elicitation bottleneck and the lack of resources for underdeveloped languages that require deep language processing, such as Arabic. The complexity of Arabic poses special challenges to researchers of Arabic NER, which is essential for both monolingual and multilingual applications. We used the hybrid approach to develop an Arabic NER system that is capable of recognizing 11 types of Arabic named entities: Person, Location, Organization, Date, Time, Price, Measurement, Percent, Phone Number, ISBN and File Name. Extensive experiments were conducted using decision trees, Support Vector Machines and logistic regression classifiers to evaluate the system performance. The empirical results indicate that the hybrid approach outperforms both the rule-based and th...

A Pipeline Arabic Named Entity Recognition Using a Hybrid Approach

Additional file 1: of Taxonomy-aware feature engineering for microbiome classification

Figure S1. The PCoA plot of The Human Microbiome Project Consortium (2012) dataset, which is gene... more Figure S1. The PCoA plot of The Human Microbiome Project Consortium (2012) dataset, which is generated via the beta diversity through plots:py script available by QIIME Figure S2. The PCoA plot provided in the Meta-analysis of environmental microbiomes conducted by Henschel et al. (2015) Figure S3. The PCoA plot of the combined CRC dataset Figure S4. Comparison between the baseline and HFE confusion matrices when applied on CRC1 dataset (Zeller et al., 2014) for Cancer vs. Normal classification Figure S5. Comparison between the baseline and HFE confusion matrices when applied on CRC2 dataset (Zackular et al., 2014) for Cancer vs. Normal classification Figure S6. Comparison between the baseline and HFE confusion matrices when applied on CRC1 + 2 dataset for Cancer vs. Normal classification Figure S7. Comparison between the baseline and HFE confusion matrices when applied on CRC1 + 2 Figure S8. The taxonomic tree of all the informative features extracted by the HFE method for Cancer v...

NERA 2.0: Improving coverage and performance of rule-based named entity recognition for Arabic

Natural Language Engineering, 2016

Named Entity Recognition (NER) is an essential task for many natural language processing systems,... more Named Entity Recognition (NER) is an essential task for many natural language processing systems, which makes use of various linguistic resources. NER becomes more complicated when the language in use is morphologically rich and structurally complex, such as Arabic. This language has a set of characteristics that makes it particularly challenging to handle. In a previous work, we have proposed an Arabic NER system that follows the hybrid approach, i.e. integrates both rule-based and machine learning-based NER approaches. Our hybrid NER system is the state-of-the-art in Arabic NER according to its performance on standard evaluation datasets. In this article, we discuss a novel methodology for overcoming the coverage drawback of rule-based NER systems in order to improve their performance and allow for automated rule update. The presented mechanism utilizes the recognition decisions made by the hybrid NER system in order to identify the weaknesses of the rule-based component and deriv...

Human Gut Microbes Associated with Systolic Blood Pressure

by Mai Oudah and Dibyayan Deb

International Journal of Hypertension | Research Article, 2022

Emerging studies have revealed a strong link between the gut microbiome and several human disease... more Emerging studies have revealed a strong link between the gut microbiome and several human diseases. Since human gut microbiome mirrors variations in lifestyle and environment, whether associations between disease conditions and gut microbiome are consistent across populations-particularly in communities practicing traditional subsistence strategies whose microbiomes differ markedly from industrialists-remains unknown. Cardiovascular diseases are the leading cause of mortality in India affecting 55 million people, and high blood pressure is one of the primary risk factors for cardiovascular diseases. We examined associations between gut microbiome and blood pressure along with 14 other variables associated with lifestyle, dietary habits, disease conditions, and clinical blood markers in the three Assamese populations. Our analysis reveals a robust link between the gut microbiome diversity and composition and systolic blood pressure. Moreover, several genera previously associated with hypertension in non-Indian populations were also associated with systolic blood pressure in this cohort and these genera were predictors of elevated blood pressure in these populations. ese findings confer opportunities to design personalized, preventative, and targeted interventions harnessing the gut microbiome to tackle the burden of cardiovascular diseases in India.

The Impact of Preprocessing on Arabic-English Statistical and Neural Machine Translation

Neural networks have become the state-of-the-art approach for machine translation (MT) in many la... more Neural networks have become the state-of-the-art approach for machine translation (MT) in many languages. While linguistically-motivated tokenization techniques were shown to have significant effects on the performance of statistical MT, it remains unclear if those techniques are well suited for neural MT. In this paper, we systematically compare neural and statistical MT models for Arabic-English translation on data preprecossed by various prominent tokenization schemes. Furthermore, we consider a range of data and vocabulary sizes and compare their effect on both approaches. Our empirical results show that the best choice of tokenization scheme is largely based on the type of model and the size of data. We also show that we can gain significant improvements using a system selection that combines the output from neural and statistical MT.

Identification of discriminatory taxa for anthropogenic environments through taxonomy-aware feature engineering

F1000Research, 2015

Integrating rule-based approach and machine learning approach for arabic named entity recognition

Simple Automatic Post-editing for Arabic-Japanese Machine Translation

A common bottleneck for developing machine translation (MT) systems for some language pairs is th... more A common bottleneck for developing machine translation (MT) systems for some language pairs is the lack of direct parallel translation data sets, in general and in certain domains. Alternative solutions such as zero-shot models or pivoting techniques are successful in getting a strong baseline, but are often below the more supported language-pair systems. In this paper, we focus on Arabic-Japanese machine translation, a less studied language pair; and we work with a unique parallel corpus of Arabic news articles that were manually translated to Japanese. We use this parallel corpus to adapt a state-of-the-art domain/genre agnostic neural MT system via a simple automatic post-editing technique. Our results and detailed analysis suggest that this approach is quite viable for less supported language pairs in specific domains.

CAMeL Tools: An Open Source Python Toolkit for Arabic Natural Language Processing

We present CAMeL Tools, a collection of open-source tools for Arabic natural language processing ... more We present CAMeL Tools, a collection of open-source tools for Arabic natural language processing in Python. CAMeL Tools currently provides utilities for pre-processing, morphological modeling, Dialect Identification, Named Entity Recognition and Sentiment Analysis. In this paper, we describe the design of CAMeL Tools and the functionalities it provides.