Svoboda | Graniru | BBC Russia | Golosameriki | Facebook
Skip to main content
Alejandro  Jara
  • Santiago, Region Metropolitana, Chile
The receiver operating characteristic (ROC) curve is the most widely used measure for evaluating the discriminatory performance of a continuous biomarker. Incorporating covariates in the analysis can potentially enhance information... more
The receiver operating characteristic (ROC) curve is the most widely
used measure for evaluating the discriminatory performance of a continuous biomarker. Incorporating covariates in the analysis can potentially enhance information gathered from the biomarker, as its discriminatory ability may depend on these. In this paper we propose a dependent Bayesian nonparametric model for conditional
ROC estimation. Our model is based on dependent Dirichlet processes, where the covariate-dependent ROC curves are indirectly modeled using probability models for related probability distributions in the diseased and healthy groups. Our approach allows for the entire distribution in each group to change as a function of the covariates, provides exact posterior inference up to a Monte Carlo error, and
can easily accommodate multiple continuous and categorical predictors. Simulation results suggest that, regarding the mean squared error, our approach performs better than its competitors for small sample sizes and nonlinear scenarios. The proposed model is applied to data concerning diagnosis of diabetes.
We introduce an autoregressive model for responses that are restricted to lie on the unit interval, with beta-distributed marginals. The model includes strict stationarity as a special case, and is based on the introduction of a series of... more
We introduce an autoregressive model for responses that are restricted to lie on the unit interval, with beta-distributed marginals. The model includes strict stationarity as a special case, and is based on the introduction of a series of latent random variables with a simple hierarchical speci cation that achieves the desired dependence while being amenable to posterior simulation schemes. We discuss the construction, study some of the main properties, and compare it with
alternative models using simulated data. We nally illustrate the usage of our proposal by modelling a yearly series of unemployment rates.
Base calling is a critical step in the Solexa next-generation sequencing procedure. It compares the position-specific intensity measurements that reflect the signal strength of four possible bases (A, C, G, T) at each genomic position,... more
Base calling is a critical step in the Solexa next-generation sequencing procedure. It compares the position-specific intensity measurements that reflect the signal strength of four possible bases (A, C, G, T) at each genomic position, and outputs estimates of the true sequences for short reads of DNA or RNA. We present a Bayesian method of
base calling, BM-BC, for Solexa-GA sequencing data. The Bayesian method builds on a hierarchical model that
accounts for three sources of noise in the data, which are known to affect the accuracy of the base calls: fading,
phasing, and cross-talk between channels. We show that the new method improves the precision of base calling
compared with currently leading methods. Furthermore, the proposed method provides a probability score that
measures the confidence of each base call. This probability score can be used to estimate the false discovery rate
of the base calling or to rank the precision of the estimated DNA sequences, which in turn can be useful for
downstream analysis such as sequence alignment.
"Incorporating temporal and spatial variation could potentially enhance information gathered from survival data. This paper proposes a Bayesian semiparametric model for capturing spatio-temporal heterogeneity within the proportional... more
"Incorporating temporal and spatial variation could potentially enhance
information gathered from survival data. This paper proposes a Bayesian semiparametric model for capturing spatio-temporal heterogeneity within the proportional hazards framework. The spatial correlation is introduced in the form of county-level frailties. The temporal effect is introduced by considering the stratification of the proportional hazards model, where the time-dependent hazards are
indirectly modeled using a probability model for related probability distributions. With this aim, an autoregressive dependent tailfree process is introduced. The full Kullback-Leibler support of the proposed process is provided. The approach is illustrated using simulated data and data from the Surveillance Epidemiology
and End Results database of the National Cancer Institute on patients in Iowa diagnosed with breast cancer."
We study the support properties of Dirichlet process-based models for sets of predictor-dependent probability distributions. Exploiting the connection between copulas and stochastic processes, we provide an alternative definition of... more
We study the support properties of Dirichlet process-based models for
sets of predictor-dependent probability distributions. Exploiting the connection between copulas and stochastic processes, we provide an alternative definition of MacEachern's dependent Dirichlet processes. Based on this definition, we provide sufficient conditions for the full weak support of different versions of the process. In particular, we show that under mild conditions on the copula functions, the version
where only the support points or the weights are dependent on predictors have full weak support. In addition, we also characterize the Hellinger and Kullback-Leibler support of mixtures induced by the different versions of the dependent Dirichlet process. A generalization of the results for the general class of dependent stick-breaking processes is also provided.
Motivated by a longitudinal oral health study, the Signal–Tandmobiel® study, we propose a multivariate binary inhomogeneous Markov model in which unobserved correlated response variables are subject to an unconstrained misclassification... more
Motivated by a longitudinal oral health study, the Signal–Tandmobiel® study, we propose a multivariate binary inhomogeneous Markov model in which unobserved correlated response variables are subject to an unconstrained misclassification process and have a monotone behavior. The multivariate baseline distributions and Markov transition matrices of the unobserved processes are defined as a function of covariates through the specification of compatible full conditional distributions. Distinct misclassification models are discussed. In all cases, the possibility that different examiners were involved in the scoring of the responses of a given subject across time is taken into account. A full Bayesian implementation of the model is described and its performance is evaluated using simulated data. We provide theoretical and empirical evidence that the parameters can be estimated without any external information about the misclassification parameters. Finally, the analyses of the motivating study are presented. Appendices 1–7 are available in the online supplementary materials.
We present a simple, efficient, and computationally cheap sampling method for exploring an unnormalized multivariate density on ℝd, such as a posterior density, called the Polya tree sampler. The algorithm constructs an independent... more
We present a simple, efficient, and computationally cheap sampling method for exploring an unnormalized multivariate density on ℝd, such as a posterior density, called the Polya tree sampler. The algorithm constructs an independent proposal based on an approximation of the target density. The approximation is built from a set of (initial) support points—data that act as parameters for the approximation—and the predictive density of a finite multivariate Polya tree. In an initial “warming-up” phase, the support points are iteratively relocated to regions of higher support under the target distribution to minimize the distance between the target distribution and the Polya tree predictive distribution. In the “sampling” phase, samples from the final approximating mixture of finite Polya trees are used as candidates which are accepted with a standard Metropolis–Hastings acceptance probability. Several illustrations are presented, including comparisons of the proposed approach to Metropolis-within-Gibbs and delayed rejection adaptive Metropolis algorithm. This article has supplementary material online.
We study the identification and consistency of Bayesian semiparametric IRT-type models, where the uncertainty on the abilities’ distribution is modeled using a prior distribution on the space of probability measures. We show that for the... more
We study the identification and consistency of Bayesian semiparametric IRT-type models, where the uncertainty on the abilities’ distribution is modeled using a prior distribution on the space of probability measures. We show that for the semiparametric Rasch Poisson counts model, simple restrictions ensure the identification of a general distribution generating the abilities, even for a finite number of probes. For the semiparametric Rasch model, only a finite number of properties of the general abilities’ distribution can be identified by a finite number of items, which are completely characterized. The full identification of the semiparametric Rasch model can be only achieved when an infinite number of items is available. The results are illustrated using simulated data.
The ubiquitous use of Dirichlet process models should not discourage researchers from considering interesting features of alternative models. In particular, the Polya tree model turns out to be an attractive choice for some applications.... more
The ubiquitous use of Dirichlet process models should not discourage researchers from considering interesting features of alternative models. In particular, the Polya tree model turns out to be an attractive choice for some applications. In this chapter we discuss the use of the Polya tree prior and its variations for density estimation. We define the model, introduce computation efficient methods for posterior inference and identify relative advantages and limitations compared with Dirichlet process models.
An important byproduct of inference in discrete mixture models is an implied random partition of experimental units. In fact, such random partitions are the main inference targets for many recently published applications of nonparametric... more
An important byproduct of inference in discrete mixture models is an implied random partition of experimental units. In fact, such random partitions are the main inference targets for many recently published applications of nonparametric Bayesian discrete mixture models. In this chapter we systematically consider the use of nonparametric Bayesian priors for inference on such random partitions. Many scientific inference problems are formalized as the related, more general problem of feature allocation. That is, inference on possibly overlapping random subsets of experimental units. We introduce some examples from data analysis for bioinformatics data and introduce the Polya urn model, product partition models, model based clustering and the Indian buffet process prior.
In this final chapter we briefly discuss some more specialized applications of nonparametric Bayesian inference, including the analysis of spatio-temporal data, model validation and causal inference. These themes are introduced to show by... more
In this final chapter we briefly discuss some more specialized applications of nonparametric Bayesian inference, including the analysis of spatio-temporal data, model validation and causal inference. These themes are introduced to show by example the nature of the many application areas of nonparametric Bayesian inference that we did not include in earlier chapter.
We propose a fully nonparametric modelling approach for time-to-event regression data, when the response of interest can only be determined to lie in an interval obtained from a sequence of examination times and the determination of the... more
We propose a fully nonparametric modelling approach for time-to-event regression data, when the response of interest can only be determined to lie in an interval obtained from a sequence of examination times and the determination of the occurrence of the event is subject to misclassification. The covariate-dependent time-to-event distributions are modelled using a linear dependent Dirichlet process mixture model. A general misclassification model is discussed, considering the possibility that different examiners were involved in the assessment of the occurrence of the events for a given subject across time. An advantage of the proposed model is that the underlying time-to-event distributions and the misclassification parameters can be estimated without any external information about the latter parameters.
INTRODUCTION The discovery of the deep freezing method of cattle semen preservation allowed for a worldwide exchange of genetic material and that the importation of semen from improved breeds have been often used in attempts to increase... more
INTRODUCTION The discovery of the deep freezing method of cattle semen preservation allowed for a worldwide exchange of genetic material and that the importation of semen from improved breeds have been often used in attempts to increase local livestock productivity. Frequently, this has been done without systematic evaluation of the introduced stocks in the new environment. In 1998, Argentina started a genetic evaluation procedure for yield traits in the Holando Argentino Dairy Cattle population through an animal model. This mixed model genetic evaluation procedure uses the simplifying assumption of equal genetic and residual variances across herds, and that the genetic correlation between genetic expression at different environments is equal to one. However, several studies in which variance components of milk yield were estimated from herds grouped by production and by variability levels have indicated a positive relationship between production level and variability level, and est...
We discuss the use of nonparametric Bayesian models in density estimation, arguably one of the most basic statistical inference problems. In this chapter we introduce the Dirichlet process prior and variations of it that are the by far... more
We discuss the use of nonparametric Bayesian models in density estimation, arguably one of the most basic statistical inference problems. In this chapter we introduce the Dirichlet process prior and variations of it that are the by far most commonly used nonparametric Bayesian models used in this context. Variations include the Dirichlet process mixture and the finite Dirichlet process. One critical reason for the extensive use of these models is the availability of computation efficient methods for posterior simulation. We discuss several such methods.
Copula-based models provide a great deal of flexibility in modelling multivariate distributions, allowing for the specifications of models for the marginal distributions separately from the dependence structure (copula) that links them to... more
Copula-based models provide a great deal of flexibility in modelling multivariate distributions, allowing for the specifications of models for the marginal distributions separately from the dependence structure (copula) that links them to form a joint distribution. Choosing a class of copula models is not a trivial task and its misspecification can lead to wrong conclusions. We introduce a novel class of grid-uniform copula functions, which is dense in the space of all continuous copula functions in a Hellinger sense. We propose a Bayesian model based on this class and develop an automatic Markov chain Monte Carlo algorithm for exploring the corresponding posterior distribution. The methodology is illustrated by means of simulated data and compared to the main existing approach. 1Nicolás Kuschinski is a Postdoctoral Researcher at the ANID – Millennium Science Initiative Program – Millennium Nucleus Center for the Discovery of Structures in Complex Data, Casilla 306, Correo 22, Santi...
The study of racial/ethnic inequalities in health is important to reduce the uneven burden of disease. In the case of colorectal cancer (CRC), disparities in survival among non-Hispanic Whites and Blacks are well documented, and... more
The study of racial/ethnic inequalities in health is important to reduce the uneven burden of disease. In the case of colorectal cancer (CRC), disparities in survival among non-Hispanic Whites and Blacks are well documented, and mechanisms leading to these disparities need to be studied formally. It has also been established that body mass index (BMI) is a risk factor for developing CRC, and recent literature shows BMI at diagnosis of CRC is associated with survival. Since BMI varies by racial/ethnic group, a question that arises is whether disparities in BMI is partially responsible for observed racial/ethnic disparities in CRC survival. This paper presents new methodology to quantify the impact of the hypothetical intervention that matches the BMI distribution in the Black population to a potentially complex distributional form observed in the White population on racial/ethnic disparities in survival. We perform a simulation that shows our proposed Bayesian density regression appr...
Early case detection and isolation of infected individuals are critical to controlling coronavirus disease 2019 (COVID-19). Reverse transcription polymerase chain reaction (RT-PCR) is considered the gold standard for the diagnosis of... more
Early case detection and isolation of infected individuals are critical to controlling coronavirus disease 2019 (COVID-19). Reverse transcription polymerase chain reaction (RT-PCR) is considered the gold standard for the diagnosis of severe acute respiratory syndrome coronavirus 2 infection, but false negatives do occur. We built a user-friendly online tool to estimate the probability of having COVID-19 with negative RT-PCR results and thus avoid preventable transmission.
We propose a fully nonparametric modelling approach for time-to-event regression data, when the response of interest can only be determined to lie in an interval obtained from a sequence of examination times and the determination of the... more
We propose a fully nonparametric modelling approach for time-to-event regression data, when the response of interest can only be determined to lie in an interval obtained from a sequence of examination times and the determination of the occurrence of the event is subject to misclassification. The covariate-dependent time-to-event distributions are modelled using a linear dependent Dirichlet process mixture model. A general misclassification model is discussed, considering the possibility that different examiners were involved in the assessment of the occurrence of the events for a given subject across time. An advantage of the proposed model is that the underlying time-to-event distributions and the misclassification parameters can be estimated without any external information about the latter parameters.
We propose a novel class of probability models for sets of predictor-dependent probability distributions with bounded domain. The proposal extends the Dirichlet-Bernstein prior for single density estimation, by using dependent... more
We propose a novel class of probability models for sets of predictor-dependent probability distributions with bounded domain. The proposal extends the Dirichlet-Bernstein prior for single density estimation, by using dependent stick-breaking processes. A general model class and two simplified versions are discussed in detail. Appealing theoretical properties such as continuity, association structure, marginal distribution, large support and consistency of the posterior distribution are established for all models. The behavior of the models is illustrated using simulated and real-life data. The simulated data are also used to compare the proposed methodology to existing methods.
We discuss Bayesian nonparametric procedures for the regression analysis of compositional responses, that is, data supported on a multivariate simplex. The procedures are based on a modified class of multivariate Bernstein polynomials and... more
We discuss Bayesian nonparametric procedures for the regression analysis of compositional responses, that is, data supported on a multivariate simplex. The procedures are based on a modified class of multivariate Bernstein polynomials and on the use of dependent stick-breaking processes. A general model and two simplified versions of the general model are discussed. Appealing theoretical properties such as continuity, association structure, support, and consistency of the posterior distribution are established. Additionally, we exploit the use of spike-and-slab priors for choosing the version of the model that best adapts to the complexity of the underlying true data-generating distribution. The performance of the proposed model is illustrated in a simulation study and in an application to solid waste data from Colombia.
p(� | a, b,m,S) p(a) p(b | a) × p(m) p(S) p(�) , where p T O i ,T E i | ziis a deterministic mapping and p(� | a, b,m,S) arises by exploiting the
Research Interests:
The study aimed to explore the association between parental smoking behavior and caries experience in young children, taking into account the socioeconomic status and oral health-related behavior. Cross-sectional data from 1250 3-year-old... more
The study aimed to explore the association between parental smoking behavior and caries experience in young children, taking into account the socioeconomic status and oral health-related behavior. Cross-sectional data from 1250 3-year-old and 1283 5-year-old children from four geographical areas in Flanders (Belgium) were analyzed. Children were examined at school by trained dentist-examiners, using standard criteria and calibrated examination methodology. Data on oral hygiene and dietary habits, oral health behavior, sociodemographic variables, and parental smoking behavior were obtained through structured questionnaires, completed by the parents. Visible caries experience (i.e. d(3)mft > 0) was seen in 7% of 3-year olds and 31% of 5-year olds. In both age groups, 30% of the parents reported smoking behavior. Univariable logistic regression analysis with caries prevalence as the dependent variable, revealed that parental smoking was a significant independent variable. After controlling for age, gender, sociodemographic characteristics, oral hygiene, and dietary habits, the effect of family smoking status was no longer significant in 3-year-old children (OR = 1.98; 95% CI: 0.68-5.76). In 5-year olds the significant relationship between parental smoking behavior and caries experience persisted after adjusting for the other evaluated variables (OR = 3.36; 95% CI: 1.49-7.58). The results of this study illustrate the existence of a significant association between parental smoking behavior and caries experience in 5-year-old children.
More than 18 types of human papillomavirus (HPV) are associated with cervical cancer, the relative importance of the HPV types may vary in different populations. To investigate the types of HPV, age distribution, and risk factors for HPV... more
More than 18 types of human papillomavirus (HPV) are associated with cervical cancer, the relative importance of the HPV types may vary in different populations. To investigate the types of HPV, age distribution, and risk factors for HPV infection in women from Santiago, Chile. We interviewed and obtained two cervical specimens from a population-based random sample of 1,038 sexually active women (age range, 15-69 years). Specimens were tested for the presence of HPV DNA using a GP5+/6+ primer-mediated PCR and for cervical cytologic abnormalities by Papanicolaou smears. 122 women tested positive for HPV DNA, 87 with high risk types (HR), and 35 with low risks (LR) only. Standardized prevalence of HPV DNA was 14.0% [95% confidence interval (95% CI), 11.5-16.4]. HR HPV by age showed a J reverse curve, whereas LR HPV showed a U curve, both statistically significant in comparison with no effect or with a linear effect. We found 34 HPV types (13 HR and 21 LR); HPV 16, 56, 31, 58, 59, 18, ...
The study aimed to describe oral hygiene habits, oral hygiene status and gingival health in Flemish pre-school children and to explore factors associated with these clinical oral health variables. Cross-sectional data from 1,071... more
The study aimed to describe oral hygiene habits, oral hygiene status and gingival health in Flemish pre-school children and to explore factors associated with these clinical oral health variables. Cross-sectional data from 1,071 3-year-old and 1,119 5-year-old children from four geographical areas in Flanders (Belgium) were analysed Buccal plaque accumulation and gingival health were assessed on six index teeth. Data on oral hygiene and dietary habits, oral health behaviour and socio-demographic variables were obtained through questionnaires. 34% of 3-year-olds and 25% of 5-year-olds started brushing before the age of one, 17% of 3-year-olds and 23% of 5-year-olds brushed twice a day. Roughly, 30% of 3-year-olds and 37% of 5-year-olds presented with visible plaque accumulation. In both age groups, only 3 to 4% of children presented with signs of gingival inflammation. Multiple logistic regression models revealed that in both age groups children whose mothers had a college or university degree, had a smaller chance of presenting with visible plaque than children whose mothers had a lower educational level. With gingival health as dependent variable, multiple logistic regression analysis confirmed the major association between bacterial plaque accumulation and the presence of gingivitis. In the oldest age group, children's former exposure to passive smoking was also significantly associated with gingivitis. Parents should be motivated to start brushing at an early age and brush thoroughly in order to maintain good oral health in their offspring. Special attention should go to children raised by mothers with a lower educational level.
Understanding the factors that explain differences in survival times is an important issue for establishing policies to improve national health systems. Motivated by breast cancer data arising from the Surveillance Epidemiology and End... more
Understanding the factors that explain differences in survival times is an important issue for establishing policies to improve national health systems. Motivated by breast cancer data arising from the Surveillance Epidemiology and End Results program, we propose a covariate-adjusted proportional hazards frailty model for the analysis of clustered right-censored data. Rather than incorporating exchangeable frailties in the linear predictor of commonly-used survival models, we allow the frailty distribution to flexibly change with both continuous and categorical cluster-level covariates and model them using a dependent Bayesian nonparametric model. The resulting process is flexible and easy to fit using an existing R package. The application of the model to our motivating example showed that, contrary to intuition, those diagnosed during a period of time in the 1990s in more rural and less affluent Iowan counties survived breast cancer better. Additional analyses showed the opposite ...

And 12 more