Svoboda | Graniru | BBC Russia | Golosameriki | Facebook
Skip to main content
Komminist Weldemariam

    Komminist Weldemariam

    Software defect data provide an invaluable source of information for developers, testers and so forth. A concise view of a software profile, its development process, and their relationships can be systematically extracted and analyzed to... more
    Software defect data provide an invaluable source of information for developers, testers and so forth. A concise view of a software profile, its development process, and their relationships can be systematically extracted and analyzed to deduce adequate corrective measures based on previously discovered weaknesses. This kind of approach is being widely used in various projects to improve the quality of a software system. This paper builds on top of the orthogonal defect classification (ODC) scheme to provide a structured security-specific defect classification. We perform a detailed analysis on the classified data and obtain in-process feedback so that the next version of the software can be more secure and reliable. We experimented our customized methodology on Firefox and Chrome defect repositories using six consecutive versions and milestones, respectively. We found that in-process feedback can help development team to take corrective actions as early as possible. We also studied...
    This study aimed at identifying the factors associated with neonatal mortality. We analyzed the Demographic and Health Survey (DHS) datasets from 10 Sub-Saharan countries. For each survey, we trained machine learning models to identify... more
    This study aimed at identifying the factors associated with neonatal mortality. We analyzed the Demographic and Health Survey (DHS) datasets from 10 Sub-Saharan countries. For each survey, we trained machine learning models to identify women who had experienced a neonatal death within the 5 years prior to the survey being administered. We then inspected the models by visualizing the features that were important for each model, and how, on average, changing the values of the features affected the risk of neonatal mortality. We confirmed the known positive correlation between birth frequency and neonatal mortality and identified an unexpected negative correlation between household size and neonatal mortality. We further established that mothers living in smaller households have a higher risk of neonatal mortality compared to mothers living in larger households; and that factors such as the age and gender of the head of the household may influence the association between household size...
    This work views neural networks as data generating systems and applies anomalous pattern detection techniques on that data in order to detect when a network is processing an anomalous input. Detecting anomalies is a critical component for... more
    This work views neural networks as data generating systems and applies anomalous pattern detection techniques on that data in order to detect when a network is processing an anomalous input. Detecting anomalies is a critical component for multiple machine learning problems including detecting adversarial noise. More broadly, this work is a step towards giving neural networks the ability to recognize an out-of-distribution sample. This is the first work to introduce "Subset Scanning" methods from the anomalous pattern detection domain to the task of detecting anomalous input of neural networks. Subset scanning treats the detection problem as a search for the most anomalous subset of node activations (i.e., highest scoring subset according to non-parametric scan statistics). Mathematical properties of these scoring functions allow the search to be completed in log-linear rather than exponential time while still guaranteeing the most anomalous subset of nodes in the network i...
    Clinical records capture the temporal, participatory, and interventional details of the care provision process. The exchange of these records plays a critical role in care continuity. Recently, there has been increasing attention on... more
    Clinical records capture the temporal, participatory, and interventional details of the care provision process. The exchange of these records plays a critical role in care continuity. Recently, there has been increasing attention on health data privacy and confidentiality which translates to questions on ownership and accessibility of clinical records. Traditional approaches to remedy this stand the risk of reducing the accessibility of these records, making care continuity across facilities more difficult. This poses a need for mechanisms that would enable the secure exchange of health data without adversely affecting the access to clinical records. This paper presents the Digital Health Wallet (DHW); a blockchain-enabled system that allows seamless clinical workflow orchestration and patient-mediated data exchange through consent management in a privacy-preserving manner. We conducted a preliminary test to benchmark the performance of DHW in resource-constrained healthcare facilities in developing countries.
    This document describes the details of the BON Egocentric vision dataset. BON denotes the initials of the locations where the dataset was collected; Barcelona (Spain); Oxford (UK); and Nairobi (Kenya). BON comprises first-person video,... more
    This document describes the details of the BON Egocentric vision dataset. BON denotes the initials of the locations where the dataset was collected; Barcelona (Spain); Oxford (UK); and Nairobi (Kenya). BON comprises first-person video, recorded when subjects were conducting common office activities. The preceding version of this dataset, FPV-O dataset has fewersubjects for only a single location (Barcelona). To develop a location agnostic framework, data from multiple locations and/or office settings is essential. Thus, BON comprises videos from an increased number of participants and office settings, resulting in a six-fold increase in the number of video segments, i.e., 2639 (BON) vs. 464 (FPV-O). In the follow up sections, we describe the details of the dataset, data collection, stratification across activities, duration, locations, and participants (genders)
    Data-driven approaches can provide more enhanced insights for domain experts in addressing critical global health challenges, such as newborn and child health, using surveys (e.g., Demographic Health Survey). Though there are multiple... more
    Data-driven approaches can provide more enhanced insights for domain experts in addressing critical global health challenges, such as newborn and child health, using surveys (e.g., Demographic Health Survey). Though there are multiple surveys on the topic, data-driven insight extraction and analysis are often applied on these surveys separately, with limited efforts to exploit them jointly, and hence results in poor prediction performance of critical events, such as neonatal death. Existing machine learning approaches to utilise multiple data sources are not directly applicable to surveys that are disjoint on collection time and locations. In this paper, we propose, to the best of our knowledge, the first detailed work that automatically links multiple surveys for the improved predictive performance of newborn and child mortality and achieves cross-study impact analysis of covariates.
    Reliably detecting attacks in a given set of inputs is of high practical relevance because of the vulnerability of neural networks to adversarial examples. These altered inputs create a security risk in applications with real-world... more
    Reliably detecting attacks in a given set of inputs is of high practical relevance because of the vulnerability of neural networks to adversarial examples. These altered inputs create a security risk in applications with real-world consequences, such as self-driving cars, robotics and financial services. We propose an unsupervised method for detecting adversarial attacks in inner layers of autoencoder (AE) networks by maximizing a non-parametric measure of anomalous node activations. Previous work in this space has shown AE networks can detect anomalous images by thresholding the reconstruction error produced by the final layer. Furthermore, other detection methods rely on data augmentation or specialized training techniques which must be asserted before training time. In contrast, we use subset scanning methods from the anomalous pattern detection domain to enhance detection power without labeled examples of the noise, retraining or data augmentation methods. In addition to an anom...
    Existing datasets available to address crucial problems, such as child mortality and family planning discontinuation in developing countries, are not ample for data-driven approaches. This is partly due to disjoint data collection efforts... more
    Existing datasets available to address crucial problems, such as child mortality and family planning discontinuation in developing countries, are not ample for data-driven approaches. This is partly due to disjoint data collection efforts employed across locations, times, and variations of modalities. On the other hand, state-of-the-art methods for small data problem are confined to image modalities. In this work, we proposed a data-level linkage of disjoint surveys across Sub-Saharan African countries to improve prediction performance of neonatal death and provide cross-domain explainability.
    We investigate the effect of variational autoencoder (VAE) based data anonymization and its ability to preserve anomalous subgroup properties. We present a Utility Guaranteed Deep Privacy (UGDP) system which casts existing anomalous... more
    We investigate the effect of variational autoencoder (VAE) based data anonymization and its ability to preserve anomalous subgroup properties. We present a Utility Guaranteed Deep Privacy (UGDP) system which casts existing anomalous pattern detection methods as a new utility measure for data synthesis. UGDP’s approach shows that properties of an anomalous subset of records, identified in the original data set, are preserved through the anonymization of a VAE. This is despite the newly generated records being completely synthetic. More specifically, the Bias-Scan algorithm identifies a subgroup of records that are consistently over- (or under-) risked by a black-box classifier as an area of ’poor fit’. This scanning process is applied on both pre- and post- VAE synthesized data. The areas of poor fit (i.e. anomalous records) persist in both settings. We evaluate our approach using publicly available datasets from the financial industry. Our evaluation confirmed that the approach is able to produce synthetic datasets that preserved a high level of subgroup differentiation as identified initially in the original dataset. Such a distinction was maintained while having distinctly different records between the synthetic and original dataset.
    In this paper, we address the problem of improving data collection of the education system by presenting School Census Hub (SCH). The SCH concept emerged from field studies with stakeholders in Kenya. The goal of these studies were to... more
    In this paper, we address the problem of improving data collection of the education system by presenting School Census Hub (SCH). The SCH concept emerged from field studies with stakeholders in Kenya. The goal of these studies were to help unlocking three key high-level requirements for the design of SCH. i) Budget allocation, allocating budget should be based on a verifiable number of active students and teachers, ii) Spending, spending on assets should be transparent and verifiable, iii) and, Improving learning environment, unlocking the limited insight into statistical relationship between school effectiveness and demographic variables. We present the overall architecture and design of SCH based on the findings from the field studies. The first version supporting a core set of capabilities for school data collection has been implemented. To evaluate the system, we conducted a large scale pilot in 97 schools. We report on a usability study of SCH that demonstrates user awareness and support for data acquisition and reporting in education management information system in Sub-Sharan Africa.
    Bias in data can have unintended consequences which propagate to the design, development, and deployment of machine learning models. In the financial services sector, this can result in discrimination from certain financial instruments... more
    Bias in data can have unintended consequences which propagate to the design, development, and deployment of machine learning models. In the financial services sector, this can result in discrimination from certain financial instruments and services. At the same time, data privacy is of paramount importance, and recent data breaches have seen reputational damage for large institutions. Presented in this paper is a trusted model-lifecycle management platform that attempts to ensure consumer data protection, anonymization, and fairness. Specifically, we examine how datasets can be reproduced using deep learning techniques to effectively retain important statistical features in datasets whilst simultaneously protecting data privacy and enabling safe and secure sharing of sensitive personal information beyond the current state-of-practice.
    In an effort to provide optimal inputs to downstream modeling systems (e.g., a hydrodynamics model that simulates the water circulation of a lake), we hereby strive to enhance resolution of precipitation fields from a weather model by up... more
    In an effort to provide optimal inputs to downstream modeling systems (e.g., a hydrodynamics model that simulates the water circulation of a lake), we hereby strive to enhance resolution of precipitation fields from a weather model by up to 9x. We test two super-resolution models: the enhanced super-resolution generative adversarial networks (ESRGAN) proposed in 2017, and the content adaptive resampler (CAR) proposed in 2020. Both models outperform simple bicubic interpolation, with the ESRGAN exceeding expectations for accuracy. We make several proposals for extending the work to ensure it can be a useful tool for quantifying the impact of climate change on local ecosystems while removing reliance on energy-intensive, high-resolution weather model simulations.
    In this paper, we investigate the effect of machine learning based anonymization on anomalous subgroup preservation. In particular, we train a binary classifier to discover the most anomalous subgroup in a dataset by maximizing the bias... more
    In this paper, we investigate the effect of machine learning based anonymization on anomalous subgroup preservation. In particular, we train a binary classifier to discover the most anomalous subgroup in a dataset by maximizing the bias between the group's predicted odds ratio from the model and observed odds ratio from the data. We then perform anonymization using a variational autoencoder (VAE) to synthesize an entirely new dataset that would ideally be drawn from the distribution of the original data. We repeat the anomalous subgroup discovery task on the new data and compare it to what was identified pre-anonymization. We evaluated our approach using publicly available datasets from the financial industry. Our evaluation confirmed that the approach was able to produce synthetic datasets that preserved a high level of subgroup differentiation as identified initially in the original dataset. Such a distinction was maintained while having distinctly different records between th...
    Gaining insight into how deep convolutional neural network models perform image classification and how to explain their outputs have been a concern to computer vision researchers and decision makers. These deep models are often referred... more
    Gaining insight into how deep convolutional neural network models perform image classification and how to explain their outputs have been a concern to computer vision researchers and decision makers. These deep models are often referred to as black box due to low comprehension of their internal workings. As an effort to developing explainable deep learning models, several methods have been proposed such as finding gradients of class output with respect to input image (sensitivity maps), class activation map (CAM), and Gradient based Class Activation Maps (Grad-CAM). These methods under perform when localizing multiple occurrences of the same class and do not work for all CNNs. In addition, Grad-CAM does not capture the entire object in completeness when used on single object images, this affect performance on recognition tasks. With the intention to create an enhanced visual explanation in terms of visual sharpness, object localization and explaining multiple occurrences of objects ...
    Several initiatives have been proposed to collect, report, and analyze data about school systems for supporting decision-making. These initiatives rely mostly on self-reported and summarized data collected irregularly and rarely. They... more
    Several initiatives have been proposed to collect, report, and analyze data about school systems for supporting decision-making. These initiatives rely mostly on self-reported and summarized data collected irregularly and rarely. They also lack a single independent and systematic process to validate the collected data during its entire lifecycle. Furthermore, schools in developing countries still do not maintain complete and up-to-date school records. Due to these and other factors addressing the education challenges in those countries remains a high priority for local and international governments, donor and non-governmental agencies across the world. In this paper, we discuss our initial design, implementation, and evaluation of a blockchain-enabled School Information Hub (SIH) using Kenya's school system as a case study.
    Many intelligent transportation systems (ITS) in cities with developed economies are making use of mobile technology as data sources (e.g., many crowd-sourced traffic-related applications) to improve the quality and efficiency of... more
    Many intelligent transportation systems (ITS) in cities with developed economies are making use of mobile technology as data sources (e.g., many crowd-sourced traffic-related applications) to improve the quality and efficiency of transportation networks. Often, these data sources are used to supplement existing traffic monitoring equipment (e.g., ground-loop detectors, traffic cameras), to provide greater insights into roadway infrastructure and traffic dynamics. For cities with emerging economies where traditional traffic monitoring equipment is cost prohibitive, the rise in mobile technology presents a unique opportunity to leverage smartphone sensors as an alternative data source for ITS. There are, however, challenges to using these sensors particularly with the cost of mobile data, network consistency, and on-device resources. In this paper, we present a mobile system that instruments roads under resource constraint while a vehicle is in motion. It determines when and what data to collect and/or upload using a number of on-device valuation and optimisation functions, by prioritising data collection over uploading or vis-versa. We deployed our mobile system on a fleet of heavy-duty waste-collection trucks in Nairobi, Kenya to collect a large volume of real-word road infrastructure and mobility data. Results show that a 42 % reduction in wireless transmissions costs can be achieved with minimal impact to the time in which important data are collected, uploaded and harmonized into a frequently updated map of road infrastructure and traffic.
    In sub-Saharan Africa, lack of useful information for the public good is one obstacle to the development of public services (public safety, education, healthcare, etc.). This makes the extraction of data from digital archives (e.g.,... more
    In sub-Saharan Africa, lack of useful information for the public good is one obstacle to the development of public services (public safety, education, healthcare, etc.). This makes the extraction of data from digital archives (e.g., analog sources such as printed newspaper archives and born-digital sources like native PDF) an interesting alternative source of data to increase the amount and diversity of potentially useful information. Printed newspapers contain various multiarticle page layouts, wherein articles in the newspaper are designed to allow readers to define their own reading. The title of an article, the introductory story of the title, and related images are mostly grouped together. However, subsequent paragraphs and images are spread across various pages of the newspaper in a somewhat unpredictable manner. This, together with the poor quality of existing archives, makes the extracting of data from archived newspapers a daunting research problem. To solve these challenges, we present a system that extracts, detects, and clusters articles in newspapers from digital archives (mainly containing scanned newspaper archives from which the information is extracted). Finally, we also describe our proof-of-concept service using the extracted data.
    In this paper, we study the engagement and performance of students in a classroom using a system the Cognitive Learning Companion (CLC). CLC is designed to keep track of the relationship between the student, content interaction and... more
    In this paper, we study the engagement and performance of students in a classroom using a system the Cognitive Learning Companion (CLC). CLC is designed to keep track of the relationship between the student, content interaction and learning progression. It also provides evidence-based engagement-oriented actionable insights to teachers by assessing information from a sensor-rich instrumented learning environment in order to infer a learner's cognitive and affective states. Data captured from the instrumented environment is aggregated and analyzed to create interlinked insights helping teachers identify how students engage with learning content and view their performance records on selected assignments. We conducted a 1 month pilot with 27 learners in a primary school in Nairobi, Kenya during their maths and science instructional periods. We present our primary analysis of content-level interactions and engagement at the individual student and classroom level.
    Software defect data provide an invaluable source of information for developers, testers and so forth. A concise view of a software profile, its development process, and their relationships can be systematically extracted and analyzed to... more
    Software defect data provide an invaluable source of information for developers, testers and so forth. A concise view of a software profile, its development process, and their relationships can be systematically extracted and analyzed to deduce adequate corrective measures based on previously discovered weaknesses. This kind of approach is being widely used in various projects to improve the quality of a software system. This paper builds on top of the orthogonal defect classification (ODC) scheme to provide a structured security-specific defect classification. We perform a detailed analysis on the classified data and obtain in-process feedback so that the next version of the software can be more secure and reliable. We experimented our customized methodology on Firefox and Chrome defect repositories using six consecutive versions and milestones, respectively. We found that in-process feedback can help development team to take corrective actions as early as possible. We also studied the correlations between software defect types and software development lifecycle to understand development improvement.
    Farm yields and crop quality are closely linked to environmental exposures during growth. Stresses can occur when too much or too little water is delivered. These nuances of farm production are often overlooked by the typical small scale... more
    Farm yields and crop quality are closely linked to environmental exposures during growth. Stresses can occur when too much or too little water is delivered. These nuances of farm production are often overlooked by the typical small scale farmer in sub-Saharan Africa. The result is that small scale farms, on average, underproduce by more than forty percent. In this paper, we describe the development of a small scale precision farming approach where fast soil moisture sensing via wireless sensor networks provides a low-cost, low-power option to reduce the potential for water induced plant stresses and increase yields. The solution is particularly suited to resource constrained environments with no access to grid power and poor network connectivity. By monitoring water intake by plants, we demonstrate the potential for fast data collection from wireless soil moisture sensors in the farm. Finally, we show that the developed wireless sensor nodes can run for more than five years with limited human intervention.
    Abstract As the technology for e-voting changes day by day along with an evolution of the regulatory environment, many questions emerge. One of these questions is how to allow voters or any third party to verify votes are correctly... more
    Abstract As the technology for e-voting changes day by day along with an evolution of the regulatory environment, many questions emerge. One of these questions is how to allow voters or any third party to verify votes are correctly captured, stored and counted. This underlines the fact that an e-voting system is not only responsible for ensuring its technical and procedural security, but also must provide a mechanism by which voters can verify their votes during and after casting, and third party be able to verify the correctness and ...
    ABSTRACT Most existing work to thwart malicious web pages capture maliciousness via discriminative artifacts, learn a model, and detect by leveraging static and/or dynamic analysis. Unfortunately, there is a two-sided evolution of the... more
    ABSTRACT Most existing work to thwart malicious web pages capture maliciousness via discriminative artifacts, learn a model, and detect by leveraging static and/or dynamic analysis. Unfortunately, there is a two-sided evolution of the artifacts of web pages. On one hand, cybercriminals constantly revamp attack payloads in malicious web pages. On the other hand, benign web pages evolve to improve content rendering and interaction with users. Consequently, the once precise detection techniques suffer from limitations to cope ...
    Abstract Malicious websites, when visited by an unsuspecting victim infect her machine to steal invaluable information, redirect her to malicious targets or compromise her system to mount future attacks. While the existing approaches have... more
    Abstract Malicious websites, when visited by an unsuspecting victim infect her machine to steal invaluable information, redirect her to malicious targets or compromise her system to mount future attacks. While the existing approaches have promising prospects in detecting malicious websites, there are still open issues in effectively and efficiently addressing: filtering of web pages from the wild, coverage of wide range of malicious characteristics to capture the big picture, continuous evolution of web page features, systematic ...
    Abstract: Formal analysis techniques can deliver important support during ICT-based innovation (or redesign) efforts in e-government services. This paper discusses a formal method-ology for assessing the procedural security of an... more
    Abstract: Formal analysis techniques can deliver important support during ICT-based innovation (or redesign) efforts in e-government services. This paper discusses a formal method-ology for assessing the procedural security of an organization. We do so by explicitly reasoning on critical information flow named assets flows. With this it is possible to understand how critical assets are modified in unlawful manner, which can trigger security and privacy violations, thereby (automatically) detecting security weaknesses within an ...
    Broken down into the fields of • Seminar • Proceedings • Dissertations • Thematics current topics are dealt with from the fields of research and development, teaching and further training in theory and practice. The Editorial Committee... more
    Broken down into the fields of • Seminar • Proceedings • Dissertations • Thematics current topics are dealt with from the fields of research and development, teaching and further training in theory and practice. The Editorial Committee uses an intensive review process in order to ...
    Deploying a system in a safe and secure manner requires ensuring the tech-nical and procedural levels of assurance also with respect to social and regu-latory frameworks. This is because threats and attacks may not only derive from... more
    Deploying a system in a safe and secure manner requires ensuring the tech-nical and procedural levels of assurance also with respect to social and regu-latory frameworks. This is because threats and attacks may not only derive from pitfalls in complex security critical system, but also from ill-designed procedures. However, existing methodologies are not mature enough to em-brace procedural implications and the need for multidisciplinary approach on the safe and secure operation of system. This is particularly common in ...
    119 A Survey: Electronic Voting Development and Trends Komminist Weldemariam and Adolfo Villafiorita Fondazione Bruno Kessler, Center for Scientific and Technological Research (FBK-IRST) via Sommarive 18 I-38050 Trento, Italy (sisai,... more
    119 A Survey: Electronic Voting Development and Trends Komminist Weldemariam and Adolfo Villafiorita Fondazione Bruno Kessler, Center for Scientific and Technological Research (FBK-IRST) via Sommarive 18 I-38050 Trento, Italy (sisai, adolfo. villafiorita)@ fbk. eu ...
    Contraceptive use improves the health of women and children in several ways, yet data shows high rates of discontinuation which is not well understood. We introduce an AI-based decision platform capable of analyzing event data to identify... more
    Contraceptive use improves the health of women and children in several ways, yet data shows high rates of discontinuation which is not well understood. We introduce an AI-based decision platform capable of analyzing event data to identify patterns of contraceptive uptake that are unique to a subpopulation of interest. These discriminatory patterns provide valuable, interpretable insights to policymakers. The sequences then serve as a hypothesis for downstream causal analysis to estimate the effect of specific variables on discontinuation outcomes. Our platform presents a way to visualize, stratify, compare, and perform a causal analysis on covariates that determine contraceptive uptake behavior, and yet is general enough to be extended to a variety of applications. 1 Study of Contraceptive Discontinuation Family Planning (FP) has emerged as a crucial component of sustainable global development [Osotimehin, 2015]. Effective use of contraceptives can significantly improve the nutritio...

    And 70 more