Journal Description
Stats
Stats
is an international, peer-reviewed, open access journal on statistical science published quarterly online by MDPI. The journal focuses on methodological and theoretical papers in statistics, probability, stochastic processes and innovative applications of statistics in all scientific disciplines including biological and biomedical sciences, medicine, business, economics and social sciences, physics, data science and engineering.
- Open Access— free for readers, with article processing charges (APC) paid by authors or their institutions.
- High Visibility: indexed within ESCI (Web of Science), Scopus, RePEc, and other databases.
- Rapid Publication: manuscripts are peer-reviewed and a first decision is provided to authors approximately 15.8 days after submission; acceptance to publication is undertaken in 3.8 days (median values for papers published in this journal in the second half of 2023).
- Recognition of Reviewers: reviewers who provide timely, thorough peer-review reports receive vouchers entitling them to a discount on the APC of their next publication in any MDPI journal, in appreciation of the work done.
Impact Factor:
1.3 (2022);
5-Year Impact Factor:
1.2 (2022)
Latest Articles
On Non-Occurrence of the Inspection Paradox
Stats 2024, 7(2), 389-401; https://doi.org/10.3390/stats7020024 (registering DOI) - 24 Apr 2024
Abstract
The well-known inspection paradox or waiting time paradox states that, in a renewal process, the inspection interval is stochastically larger than a common interarrival time having a distribution function F, where the inspection interval is given by the particular interarrival time containing
[...] Read more.
The well-known inspection paradox or waiting time paradox states that, in a renewal process, the inspection interval is stochastically larger than a common interarrival time having a distribution function F, where the inspection interval is given by the particular interarrival time containing the specified time point of process inspection. The inspection paradox may also be expressed in terms of expectations, where the order is strict, in general. A renewal process can be utilized to describe the arrivals of vehicles, customers, or claims, for example. As the inspection time may also be considered a random variable T with a left-continuous distribution function G independent of the renewal process, the question arises as to whether the inspection paradox inevitably occurs in this general situation, apart from in some marginal cases with respect to F and G. For a random inspection time T, it is seen that non-trivial choices lead to non-occurrence of the paradox. In this paper, a complete characterization of the non-occurrence of the inspection paradox is given with respect to G. Several examples and related assertions are shown, including the deterministic time situation.
Full article
(This article belongs to the Section Applied Stochastic Models)
Open AccessArticle
New Goodness-of-Fit Tests for the Kumaraswamy Distribution
by
David E. Giles
Stats 2024, 7(2), 373-388; https://doi.org/10.3390/stats7020023 (registering DOI) - 22 Apr 2024
Abstract
The two-parameter distribution known as the Kumaraswamy distribution is a very flexible alternative to the beta distribution with the same (0,1) support. Originally proposed in the field of hydrology, it has subsequently received a good deal of positive attention in both the theoretical
[...] Read more.
The two-parameter distribution known as the Kumaraswamy distribution is a very flexible alternative to the beta distribution with the same (0,1) support. Originally proposed in the field of hydrology, it has subsequently received a good deal of positive attention in both the theoretical and applied statistics literatures. Interestingly, the problem of testing formally for the appropriateness of the Kumaraswamy distribution appears to have received little or no attention to date. To fill this gap, in this paper, we apply a “biased transformation” methodology to several standard goodness-of-fit tests based on the empirical distribution function. A simulation study reveals that these (modified) tests perform well in the context of the Kumaraswamy distribution, in terms of both their low size distortion and respectable power. In particular, the “biased transformation” Anderson–Darling test dominates the other tests that are considered.
Full article
(This article belongs to the Section Statistical Methods)
►▼
Show Figures
Figure 1
Open AccessArticle
Bayesian Mediation Analysis with an Application to Explore Racial Disparities in the Diagnostic Age of Breast Cancer
by
Wentao Cao, Joseph Hagan and Qingzhao Yu
Stats 2024, 7(2), 361-372; https://doi.org/10.3390/stats7020022 - 19 Apr 2024
Abstract
A mediation effect refers to the effect transmitted by a mediator intervening in the relationship between an exposure variable and a response variable. Mediation analysis is widely used to identify significant mediators and to make inferences on their effects. The Bayesian method allows
[...] Read more.
A mediation effect refers to the effect transmitted by a mediator intervening in the relationship between an exposure variable and a response variable. Mediation analysis is widely used to identify significant mediators and to make inferences on their effects. The Bayesian method allows researchers to incorporate prior information from previous knowledge into the analysis, deal with the hierarchical structure of variables, and estimate the quantities of interest from the posterior distributions. This paper proposes three Bayesian mediation analysis methods to make inferences on mediation effects. Our proposed methods are the following: (1) the function of coefficients method; (2) the product of partial difference method; and (3) the re-sampling method. We apply these three methods to explore racial disparities in the diagnostic age of breast cancer patients in Louisiana. We found that African American (AA) patients are diagnosed at an average of 4.37 years younger compared with Caucasian (CA) patients (57.40 versus 61.77, 0.0001). We also found that the racial disparity can be explained by patients’ insurance (12.90%), marital status (17.17%), cancer stage (3.27%), and residential environmental factors, including the percent of the population under age 18 (3.07%) and the environmental factor of intersection density (9.02%).
Full article
(This article belongs to the Section Bayesian Methods)
►▼
Show Figures
Figure 1
Open AccessArticle
Combined Permutation Tests for Pairwise Comparison of Scale Parameters Using Deviances
by
Scott J. Richter and Melinda H. McCann
Stats 2024, 7(2), 350-360; https://doi.org/10.3390/stats7020021 - 28 Mar 2024
Abstract
Nonparametric combinations of permutation tests for pairwise comparison of scale parameters, based on deviances, are examined. Permutation tests for comparing two or more groups based on the ratio of deviances have been investigated, and a procedure based on Higgins’ RMD statistic was found
[...] Read more.
Nonparametric combinations of permutation tests for pairwise comparison of scale parameters, based on deviances, are examined. Permutation tests for comparing two or more groups based on the ratio of deviances have been investigated, and a procedure based on Higgins’ RMD statistic was found to perform well, but two other tests were sometimes more powerful. Thus, combinations of these tests are investigated. A simulation study shows a combined test can be more powerful than any single test.
Full article
(This article belongs to the Section Statistical Methods)
►▼
Show Figures
Figure 1
Open AccessArticle
A Note on Simultaneous Confidence Intervals for Direct, Indirect and Synthetic Estimators
by
Christophe Quentin Valvason and Stefan Sperlich
Stats 2024, 7(1), 333-349; https://doi.org/10.3390/stats7010020 - 20 Mar 2024
Abstract
Direct, indirect and synthetic estimators have a long history in official statistics. While model-based or model-assisted approaches have become very popular, direct and indirect estimators remain the predominant standard and are therefore important tools in practice. This is mainly due to their simplicity,
[...] Read more.
Direct, indirect and synthetic estimators have a long history in official statistics. While model-based or model-assisted approaches have become very popular, direct and indirect estimators remain the predominant standard and are therefore important tools in practice. This is mainly due to their simplicity, including low data requirements, assumptions and straightforward inference. With the increasing use of domain estimates in policy, the demands on these tools have also increased. Today, they are frequently used for comparative statistics. This requires appropriate tools for simultaneous inference. We study devices for constructing simultaneous confidence intervals and show that simple tools like the Bonferroni correction can easily fail. In contrast, uniform inference based on max-type statistics in combination with bootstrap methods, appropriate for finite populations, work reasonably well. We illustrate our methods with frequently applied estimators of totals and means.
Full article
(This article belongs to the Section Statistical Methods)
►▼
Show Figures
Figure 1
Open AccessArticle
The Flexible Gumbel Distribution: A New Model for Inference about the Mode
by
Qingyang Liu, Xianzheng Huang and Haiming Zhou
Stats 2024, 7(1), 317-332; https://doi.org/10.3390/stats7010019 - 13 Mar 2024
Abstract
A new unimodal distribution family indexed via the mode and three other parameters is derived from a mixture of a Gumbel distribution for the maximum and a Gumbel distribution for the minimum. Properties of the proposed distribution are explored, including model identifiability and
[...] Read more.
A new unimodal distribution family indexed via the mode and three other parameters is derived from a mixture of a Gumbel distribution for the maximum and a Gumbel distribution for the minimum. Properties of the proposed distribution are explored, including model identifiability and flexibility in capturing heavy-tailed data that exhibit different directions of skewness over a wide range. Both frequentist and Bayesian methods are developed to infer parameters in the new distribution. Simulation studies are conducted to demonstrate satisfactory performance of both methods. By fitting the proposed model to simulated data and data from an application in hydrology, it is shown that the proposed flexible distribution is especially suitable for data containing extreme values in either direction, with the mode being a location parameter of interest. Using the proposed unimodal distribution, one can easily formulate a regression model concerning the mode of a response given covariates. We apply this model to data from an application in criminology to reveal interesting data features that are obscured by outliers.
Full article
(This article belongs to the Special Issue Bayes and Empirical Bayes Inference)
►▼
Show Figures
Figure 1
Open AccessArticle
Wilcoxon-Type Control Charts Based on Multiple Scans
by
Ioannis S. Triantafyllou
Stats 2024, 7(1), 301-316; https://doi.org/10.3390/stats7010018 - 07 Mar 2024
Abstract
►▼
Show Figures
In this article, we establish new distribution-free Shewhart-type control charts based on rank sum statistics with signaling multiple scans-type rules. More precisely, two Wilcoxon-type chart statistics are considered in order to formulate the decision rule of the proposed monitoring scheme. In order to
[...] Read more.
In this article, we establish new distribution-free Shewhart-type control charts based on rank sum statistics with signaling multiple scans-type rules. More precisely, two Wilcoxon-type chart statistics are considered in order to formulate the decision rule of the proposed monitoring scheme. In order to enhance the performance of the new nonparametric control charts, multiple scans-type rules are activated, which make the proposed chart more sensitive in detecting possible shifts of the underlying distribution. The appraisal of the proposed monitoring scheme is accomplished with the aid of the corresponding run length distribution under both in- and out-of-control cases. Thereof, exact formulae for the variance of the run length distribution and the average run length (ARL) of the proposed monitoring schemes are derived. A numerical investigation is carried out and depicts that the proposed schemes acquire better performance towards their competitors.
Full article
Figure 1
Open AccessArticle
Cumulative Histograms under Uncertainty: An Application to Dose–Volume Histograms in Radiotherapy Treatment Planning
by
Flavia Gesualdi and Niklas Wahl
Stats 2024, 7(1), 284-300; https://doi.org/10.3390/stats7010017 - 06 Mar 2024
Abstract
In radiotherapy treatment planning, the absorbed doses are subject to executional and preparational errors, which propagate to plan quality metrics. Accurately quantifying these uncertainties is imperative for improved treatment outcomes. One approach, analytical probabilistic modeling (APM), presents a highly computationally efficient method. This
[...] Read more.
In radiotherapy treatment planning, the absorbed doses are subject to executional and preparational errors, which propagate to plan quality metrics. Accurately quantifying these uncertainties is imperative for improved treatment outcomes. One approach, analytical probabilistic modeling (APM), presents a highly computationally efficient method. This study evaluates the empirical distribution of dose–volume histogram points (a typical plan metric) derived from Monte Carlo sampling to quantify the accuracy of modeling uncertainties under different distribution assumptions, including Gaussian, log-normal, four-parameter beta, gamma, and Gumbel distributions. Since APM necessitates the bivariate cumulative distribution functions, this investigation also delves into approximations using a Gaussian or an Ali–Mikhail–Haq Copula. The evaluations are performed in a one-dimensional simulated geometry and on patient data for a lung case. Our findings suggest that employing a beta distribution offers improved modeling accuracy compared to a normal distribution. Moreover, the multivariate Gaussian model outperforms the Copula models in patient data. This investigation highlights the significance of appropriate statistical distribution selection in advancing the accuracy of uncertainty modeling in radiotherapy treatment planning, extending an understanding of the analytical probabilistic modeling capacities in this crucial medical domain.
Full article
(This article belongs to the Special Issue Advances in Probability Theory and Statistics)
►▼
Show Figures
Figure 1
Open AccessArticle
Comments on the Bernoulli Distribution and Hilbe’s Implicit Extra-Dispersion
by
Daniel A. Griffith
Stats 2024, 7(1), 269-283; https://doi.org/10.3390/stats7010016 - 05 Mar 2024
Abstract
►▼
Show Figures
For decades, conventional wisdom maintained that binary 0–1 Bernoulli random variables cannot contain extra-binomial variation. Taking an unorthodox stance, Hilbe actively disagreed, especially for correlated observation instances, arguing that the universally adopted diagnostic Pearson or deviance dispersion statistics are insensitive to a variance
[...] Read more.
For decades, conventional wisdom maintained that binary 0–1 Bernoulli random variables cannot contain extra-binomial variation. Taking an unorthodox stance, Hilbe actively disagreed, especially for correlated observation instances, arguing that the universally adopted diagnostic Pearson or deviance dispersion statistics are insensitive to a variance anomaly in a binary context, and hence simply fail to detect it. However, having the intuition and insight to sense the existence of this departure from standard mathematical statistical theory, but being unable to effectively isolate it, he classified this particular over-/under-dispersion phenomenon as implicit. This paper explicitly exposes his hidden quantity by demonstrating that the variance in/deflation it represents occurs in an underlying predicted beta random variable whose real number values are rounded to their nearest integers to convert to a Bernoulli random variable, with this discretization masking any materialized extra-Bernoulli variation. In doing so, asymptotics linking the beta-binomial and Bernoulli distributions show another conventional wisdom misconception, namely a mislabeling substitution involving the quasi-Bernoulli random variable; this undeniably is not a quasi-likelihood situation. A public bell pepper disease dataset exhibiting conspicuous spatial autocorrelation furnishes empirical examples illustrating various features of this advocated proposition.
Full article
Figure 1
Open AccessArticle
Two-Stage Limited-Information Estimation for Structural Equation Models of Round-Robin Variables
by
Terrence D. Jorgensen, Aditi M. Bhangale and Yves Rosseel
Stats 2024, 7(1), 235-268; https://doi.org/10.3390/stats7010015 - 28 Feb 2024
Abstract
We propose and demonstrate a new two-stage maximum likelihood estimator for parameters of a social relations structural equation model (SR-SEM) using estimated summary statistics ( ) as data, as well as uncertainty about to obtain robust inferential statistics. The
[...] Read more.
We propose and demonstrate a new two-stage maximum likelihood estimator for parameters of a social relations structural equation model (SR-SEM) using estimated summary statistics ( ) as data, as well as uncertainty about to obtain robust inferential statistics. The SR-SEM is a generalization of a traditional SEM for round-robin data, which have a dyadic network structure (i.e., each group member responds to or interacts with each other member). Our two-stage estimator is developed using similar logic as previous two-stage estimators for SEM, developed for application to multilevel data and multiple imputations of missing data. We demonstrate out estimator on a publicly available data set from a 2018 publication about social mimicry. We employ Markov chain Monte Carlo estimation of in Stage 1, implemented using the R package rstan. In Stage 2, the posterior mean estimates of are used as input data to estimate SEM parameters with the R package lavaan. The posterior covariance matrix of estimated is also calculated so that lavaan can use it to calculate robust standard errors and test statistics. Results are compared to full-information maximum likelihood (FIML) estimation of SR-SEM parameters using the R package srm. We discuss how differences between estimators highlight the need for future research to establish best practices under realistic conditions (e.g., how to specify empirical Bayes priors in Stage 1), as well as extensions that would make 2-stage estimation particularly advantageous over single-stage FIML.
Full article
(This article belongs to the Section Statistical Methods)
►▼
Show Figures
Figure 1
Open AccessArticle
Generation of Scale-Free Assortative Networks via Newman Rewiring for Simulation of Diffusion Phenomena
by
Laura Di Lucchio and Giovanni Modanese
Stats 2024, 7(1), 220-234; https://doi.org/10.3390/stats7010014 - 24 Feb 2024
Abstract
►▼
Show Figures
By collecting and expanding several numerical recipes developed in previous work, we implement an object-oriented Python code, based on the networkX library, for the realization of the configuration model and Newman rewiring. The software can be applied to any kind of network and
[...] Read more.
By collecting and expanding several numerical recipes developed in previous work, we implement an object-oriented Python code, based on the networkX library, for the realization of the configuration model and Newman rewiring. The software can be applied to any kind of network and “target” correlations, but it is tested with focus on scale-free networks and assortative correlations. In order to generate the degree sequence we use the method of “random hubs”, which gives networks with minimal fluctuations. For the assortative rewiring we use the simple Vazquez-Weigt matrix as a test in the case of random networks; since it does not appear to be effective in the case of scale-free networks, we subsequently turn to another recipe which generates matrices with decreasing off-diagonal elements. The rewiring procedure is also important at the theoretical level, in order to test which types of statistically acceptable correlations can actually be realized in concrete networks. From the point of view of applications, its main use is in the construction of correlated networks for the solution of dynamical or diffusion processes through an analysis of the evolution of single nodes, i.e., beyond the Heterogeneous Mean Field approximation. As an example, we report on an application to the Bass diffusion model, with calculations of the time of the diffusion peak. The same networks can additionally be exported in environments for agent-based simulations like NetLogo.
Full article
Figure 1
Open AccessArticle
New Vessel Extraction Method by Using Skew Normal Distribution for MRA Images
by
Tohid Bahrami, Hossein Jabbari Khamnei, Mehrdad Lakestani and B. M. Golam Kibria
Stats 2024, 7(1), 203-219; https://doi.org/10.3390/stats7010013 - 23 Feb 2024
Abstract
►▼
Show Figures
Vascular-related diseases pose significant public health challenges and are a leading cause of mortality and disability. Understanding the complex structure of the vascular system and its processes is crucial for addressing these issues. Recent advancements in medical imaging technology have enabled the generation
[...] Read more.
Vascular-related diseases pose significant public health challenges and are a leading cause of mortality and disability. Understanding the complex structure of the vascular system and its processes is crucial for addressing these issues. Recent advancements in medical imaging technology have enabled the generation of high-resolution 3D images of vascular structures, leading to a diverse array of methods for vascular extraction. While previous research has often assumed a normal distribution of image data, this paper introduces a novel vessel extraction method that utilizes the skew normal distribution for more accurate probability distribution modeling. The proposed method begins with a preprocessing step to enhance vessel structures and reduce noise in Magnetic Resonance Angiography (MRA) images. The skew normal distribution, known for its ability to model skewed data, is then employed to characterize the intensity distribution of vessels. By estimating the parameters of the skew normal distribution using the Expectation-Maximization (EM) algorithm, the method effectively separates vessel pixels from the background and non-vessel regions. To extract vessels, a thresholding technique is applied based on the estimated skew normal distribution parameters. This segmentation process enables accurate vessel extraction, particularly in detecting thin vessels and enhancing the delineation of vascular edges with low contrast. Experimental evaluations on a diverse set of MRA images demonstrate the superior performance of the proposed method compared to previous approaches in terms of accuracy and computational efficiency. The presented vessel extraction method holds promise for improving the diagnosis and treatment of vascular-related diseases. By leveraging the skew normal distribution, it provides accurate and efficient vessel segmentation, contributing to the advancement of vascular imaging in the field of medical image analysis.
Full article
Figure 1
Open AccessArticle
Utility in Time Description in Priority Best–Worst Discrete Choice Models: An Empirical Evaluation Using Flynn’s Data
by
Sasanka Adikari and Norou Diawara
Stats 2024, 7(1), 185-202; https://doi.org/10.3390/stats7010012 - 19 Feb 2024
Abstract
Discrete choice models (DCMs) are applied in many fields and in the statistical modelling of consumer behavior. This paper focuses on a form of choice experiment, best–worst scaling in discrete choice experiments (DCEs), and the transition probability of a choice of a consumer
[...] Read more.
Discrete choice models (DCMs) are applied in many fields and in the statistical modelling of consumer behavior. This paper focuses on a form of choice experiment, best–worst scaling in discrete choice experiments (DCEs), and the transition probability of a choice of a consumer over time. The analysis was conducted by using simulated data (choice pairs) based on data from Flynn’s (2007) ‘Quality of Life Experiment’. Most of the traditional approaches assume the choice alternatives are mutually exclusive over time, which is a questionable assumption. We introduced a new copula-based model (CO-CUB) for the transition probability, which can handle the dependent structure of best–worst choices while applying a very practical constraint. We used a conditional logit model to calculate the utility at consecutive time points and spread it to future time points under dynamic programming. We suggest that the CO-CUB transition probability algorithm is a novel way to analyze and predict choices in future time points by expressing human choice behavior. The numerical results inform decision making, help formulate strategy and learning algorithms under dynamic utility in time for best–worst DCEs.
Full article
(This article belongs to the Topic Interfacing Statistics, Machine Learning and Data Science from a Probabilistic Modelling Viewpoint)
►▼
Show Figures
Figure 1
Open AccessArticle
Importance and Uncertainty of λ-Estimation for Box–Cox Transformations to Compute and Verify Reference Intervals in Laboratory Medicine
by
Frank Klawonn, Neele Riekeberg and Georg Hoffmann
Stats 2024, 7(1), 172-184; https://doi.org/10.3390/stats7010011 - 09 Feb 2024
Cited by 1
Abstract
Reference intervals play an important role in medicine, for instance, for the interpretation of blood test results. They are defined as the central 95% values of a healthy population and are often stratified by sex and age. In recent years, so-called indirect methods
[...] Read more.
Reference intervals play an important role in medicine, for instance, for the interpretation of blood test results. They are defined as the central 95% values of a healthy population and are often stratified by sex and age. In recent years, so-called indirect methods for the computation and validation of reference intervals have gained importance. Indirect methods use all values from a laboratory, including the pathological cases, and try to identify the healthy sub-population in the mixture of values. This is only possible under certain model assumptions, i.e., that the majority of the values represent non-pathological values and that the non-pathological values follow a normal distribution after a suitable transformation, commonly a Box–Cox transformation, rendering the parameter of the Box–Cox transformation as a nuisance parameter for the estimation of the reference interval. Although indirect methods put high effort on the estimation of , they come to very different estimates for , even though the estimated reference intervals are quite coherent. Our theoretical considerations and Monte-Carlo simulations show that overestimating can lead to intolerable deviations of the reference interval estimates, whereas produces usually acceptable estimates. For close to 1, its estimate has limited influence on the estimate for the reference interval, and with reasonable sample sizes, the uncertainty for the -estimate remains quite high.
Full article
(This article belongs to the Special Issue Advances in Probability Theory and Statistics)
►▼
Show Figures
Figure 1
Open AccessArticle
Sensitivity Analysis of Start Point of Extreme Daily Rainfall Using CRHUDA and Stochastic Models
by
Martin Muñoz-Mandujano, Alfonso Gutierrez-Lopez, Jose Alfredo Acuña-Garcia, Mauricio Arturo Ibarra-Corona, Isaac Carpintero Aguilar and José Alejandro Vargas-Diaz
Stats 2024, 7(1), 160-171; https://doi.org/10.3390/stats7010010 - 08 Feb 2024
Abstract
Forecasting extreme precipitation is one of the basic actions of warning systems in Latin America and the Caribbean (LAC). With thousands of economic losses and severe damage caused by floods in urban areas, hydrometeorological monitoring is a priority in most countries in the
[...] Read more.
Forecasting extreme precipitation is one of the basic actions of warning systems in Latin America and the Caribbean (LAC). With thousands of economic losses and severe damage caused by floods in urban areas, hydrometeorological monitoring is a priority in most countries in the LAC region. The monitoring of convective precipitation, cold fronts, and hurricane tracks are the most demanded technological developments for early warning systems in the region. However, predicting and forecasting the onset time of extreme precipitation is a subject of life-saving scientific research. Developed in 2019, the CRHUDA (Crossing HUmidity, Dew point, and Atmospheric pressure) model provides insight into the onset of precipitation from the Clausius–Clapeyron relationship. With access to a historical database of more than 600 storms, the CRHUDA model provides a prediction with a precision of six to eight hours in advance of storm onset. However, the calibration is complex given the addition of ARMA(p,q)-type models for real-time forecasting. This paper presents the calibration of the joint CRHUDA+ARMA(p,q) model. It is concluded that CRHUDA is significantly more suitable and relevant for the forecast of precipitation and a possible future development for an early warning system (EWS).
Full article
(This article belongs to the Section Applied Stochastic Models)
►▼
Show Figures
Figure 1
Open AccessArticle
On Estimation of Shannon’s Entropy of Maxwell Distribution Based on Progressively First-Failure Censored Data
by
Kapil Kumar, Indrajeet Kumar and Hon Keung Tony Ng
Stats 2024, 7(1), 138-159; https://doi.org/10.3390/stats7010009 - 08 Feb 2024
Abstract
Shannon’s entropy is a fundamental concept in information theory that quantifies the uncertainty or information in a random variable or data set. This article addresses the estimation of Shannon’s entropy for the Maxwell lifetime model based on progressively first-failure-censored data from both classical
[...] Read more.
Shannon’s entropy is a fundamental concept in information theory that quantifies the uncertainty or information in a random variable or data set. This article addresses the estimation of Shannon’s entropy for the Maxwell lifetime model based on progressively first-failure-censored data from both classical and Bayesian points of view. In the classical perspective, the entropy is estimated using maximum likelihood estimation and bootstrap methods. For Bayesian estimation, two approximation techniques, including the Tierney-Kadane (T-K) approximation and the Markov Chain Monte Carlo (MCMC) method, are used to compute the Bayes estimate of Shannon’s entropy under the linear exponential (LINEX) loss function. We also obtained the highest posterior density (HPD) credible interval of Shannon’s entropy using the MCMC technique. A Monte Carlo simulation study is performed to investigate the performance of the estimation procedures and methodologies studied in this manuscript. A numerical example is used to illustrate the methodologies. This paper aims to provide practical values in applied statistics, especially in the areas of reliability and lifetime data analysis.
Full article
(This article belongs to the Section Reliability Engineering)
►▼
Show Figures
Figure 1
Open AccessArticle
Active Learning for Stacking and AdaBoost-Related Models
by
Qun Sui and Sujit K. Ghosh
Stats 2024, 7(1), 110-137; https://doi.org/10.3390/stats7010008 - 24 Jan 2024
Abstract
►▼
Show Figures
Ensemble learning (EL) has become an essential technique in machine learning that can significantly enhance the predictive performance of basic models, but it also comes with an increased cost of computation. The primary goal of the proposed approach is to present a general
[...] Read more.
Ensemble learning (EL) has become an essential technique in machine learning that can significantly enhance the predictive performance of basic models, but it also comes with an increased cost of computation. The primary goal of the proposed approach is to present a general integrative framework that allows for applying active learning (AL) which makes use of only limited budget by selecting optimal instances to achieve comparable predictive performance within the context of ensemble learning. The proposed framework is based on two distinct approaches: (i) AL is implemented following a full scale EL, which we call the ensemble learning on top of active learning (ELTAL), and (ii) apply the AL while using the EL, which we call the active learning during ensemble learning (ALDEL). Various algorithms for ELTAL and ALDEL are presented using Stacking and Boosting with various algorithm-specific query strategies. The proposed active learning algorithms are numerically illustrated with the Support Vector Machine (SVM) model using simulated data and two real-world applications, evaluating their accuracy when only a small number instances are selected as compared to using full data. Our findings demonstrate that: (i) the accuracy of a boosting or stacking model, using the same uncertainty sampling, is higher than that of the SVM model, highlighting the strength of EL; (ii) AL can enable the stacking model to achieve comparable accuracy to the SVM model using the full dataset, with only a small fraction of carefully selected instances, illustrating the strength of active learning.
Full article
Figure 1
Open AccessBrief Report
Statistical Framework: Estimating the Cumulative Shares of Nobel Prizes from 1901 to 2022
by
Xu Zhang, Bruce Golden and Edward Wasil
Stats 2024, 7(1), 95-109; https://doi.org/10.3390/stats7010007 - 19 Jan 2024
Abstract
►▼
Show Figures
Studying trends in the geographical distribution of the Nobel Prize is an interesting topic that has been examined in the academic literature. To track the trends, we develop a stochastic estimate for the cumulative shares of Nobel Prizes awarded to recipients in four
[...] Read more.
Studying trends in the geographical distribution of the Nobel Prize is an interesting topic that has been examined in the academic literature. To track the trends, we develop a stochastic estimate for the cumulative shares of Nobel Prizes awarded to recipients in four geographical groups: North America, Europe, Asia, Other. Specifically, we propose two models to estimate how cumulative shares change over time in the four groups. We estimate parameters, develop a prediction interval for each model, and validate our models. Finally, we apply our approach to estimate the distribution of the cumulative shares of Nobel Prizes for the four groups from 1901 to 2022.
Full article
Figure 1
Open AccessCase Report
Ecosystem Degradation in Romania: Exploring the Core Drivers
by
Alexandra-Nicoleta Ciucu-Durnoi and Camelia Delcea
Stats 2024, 7(1), 79-94; https://doi.org/10.3390/stats7010006 - 18 Jan 2024
Abstract
►▼
Show Figures
The concept of sustainable development appeared as a response to the attempt to improve the quality of human life, simultaneously with the preservation of the environment. For this reason, two of the 17 Sustainable Development Goals are dedicated to life below water (SDG14)
[...] Read more.
The concept of sustainable development appeared as a response to the attempt to improve the quality of human life, simultaneously with the preservation of the environment. For this reason, two of the 17 Sustainable Development Goals are dedicated to life below water (SDG14) and on land (SDG15). In the course of this research, comprehensive information on the extent of degradation in Romania’s primary ecosystems was furnished, along with an exploration of the key factors precipitating this phenomenon. This investigation delves into the perspectives of 42 counties, scrutinizing the level of degradation in forest ecosystems, grasslands, lakes and rivers. The analysis commences with a presentation of descriptive statistics pertaining to each scrutinized system, followed by an elucidation of the primary causes contributing to its degradation. Subsequently, a cluster analysis is conducted on the counties of the country. One of these causes is the presence of intense industrial activity in certain areas, so it is even more important to accelerate the transition to a green economy in order to help the environment regenerate.
Full article
Figure 1
Open AccessArticle
Directional Differences in Thematic Maps of Soil Chemical Attributes with Geometric Anisotropy
by
Dyogo Lesniewski Ribeiro, Tamara Cantú Maltauro, Luciana Pagliosa Carvalho Guedes, Miguel Angel Uribe-Opazo and Gustavo Henrique Dalposso
Stats 2024, 7(1), 65-78; https://doi.org/10.3390/stats7010005 - 16 Jan 2024
Abstract
In the study of the spatial variability of soil chemical attributes, the process is considered anisotropic when the spatial dependence structure differs in relation to the direction. Anisotropy is a characteristic that influences the accuracy of the thematic maps that represent the spatial
[...] Read more.
In the study of the spatial variability of soil chemical attributes, the process is considered anisotropic when the spatial dependence structure differs in relation to the direction. Anisotropy is a characteristic that influences the accuracy of the thematic maps that represent the spatial variability of the phenomenon. Therefore, the linear anisotropic Gaussian spatial model is important for spatial data that present anisotropy, and incorporating this as an intrinsic characteristic of the process that describes the spatial dependence structure improves the accuracy of the spatial estimation of the values of a georeferenced variable in unsampled locations. This work aimed at quantifying the directional differences existing in the thematic map of georeferenced variables when incorporating or not incorporating anisotropy into the spatial dependence structure through directional spatial autocorrelation. For simulated data and soil chemical properties (carbon, calcium and potassium), the Moran directional index was calculated, considering the predicted values at unsampled locations, and taking into account estimated isotropic and anisotropic geostatistical models. The directional spatial autocorrelation was effective in evidencing the directional difference between thematic maps elaborated with estimated isotropic and anisotropic geostatistical models. This measure evidenced the existence of an elliptical format of the subregions presented by thematic maps in the direction of anisotropy that indicated a greater spatial continuity for greater distances between pairs of points.
Full article
(This article belongs to the Section Statistical Methods)
►▼
Show Figures
Figure 1
Highly Accessed Articles
Latest Books
E-Mail Alert
News
Topics
Topic in
Entropy, Mathematics, Modelling, Stats
Interfacing Statistics, Machine Learning and Data Science from a Probabilistic Modelling Viewpoint
Topic Editors: Jürgen Pilz, Noelle I. Samia, Dirk HusmeierDeadline: 31 December 2024
Conferences
Special Issues
Special Issue in
Stats
Modern Time Series Analysis II
Guest Editors: Magda Sofia Valério Monteiro, Marco André da Silva CostaDeadline: 31 May 2024
Special Issue in
Stats
Machine Learning and Natural Language Processing (ML & NLP)
Guest Editor: Stéphane MussardDeadline: 31 August 2024
Special Issue in
Stats
Feature Paper Special Issue: Reinforcement Learning
Guest Editors: Wei Zhu, Sourav Sen, Keli XiaoDeadline: 30 September 2024