Mathematics and Statistics

Permanent URI for this collection

https://hdl.handle.net/10323/11893

Browse

Now showing 1 - 4 of 4

Contributions to Multivariate Data Science: Assessment and Identification of Multivariate Distributions and Supervised Learning for Groups of Objects
(2024-01-01) Tran, Nguyen Quynh Huong; Khattree, Ravindra; Drignei, Dorin; Li, Li; Roy, Anuradha; So, Hon Yiu
This dissertation considers three critical aspects of modern statistical analyses and machine learning, namely, (i) addressing the challenges posed by assessing the distributional assumptions of a univariate dataset, (ii) constructing graphical tests for the multivariate normality assumption, and (iii) exploring new algorithms for solving group classification problems, especially when the distance-based methods are not applicable. First, we focus on the development and evaluation of graphical statistical tests for univariate datasets, aiming to assess any specific distributional assumption. Examination of normality is given special emphasis. Recognizing the impact of outliers on the normality assumption, this study also incorporates outlier detection methodologies and provides some graphical tools for their identification. T4 plot is introduced as an additional effective tool for this purpose. Examination of multivariate normality is more challenging as in this case nonnormality may exhibit or mask itself in many different ways. In the second part, we thus emphasize graphical assessments of the multivariate normality assumption. This is done via MT3 and MT4 plots based on the derivatives of the cumulant generating function. The final segment of the dissertation shifts the discussion towards machine learning algorithms devised specifically for group classification problems. This involves the exploration of new methodologies that address the challenges inherent in classification and discrimination within complex datasets where other standard methods based on the classification of individual observation may not be very effective. For this we rely on the eigenstructures of the data. In the process, we also address the problem of dimensionality reduction. The problem of selection of copulas can be viewed as a corollary of the group classification problem. This has also been discussed in a separate chapter.
Modeling Extreme Insurance Losses Using Transmutation and Copula
(2023-01-01) Addai, Solomon; Ogunyemi, Theophilus; Perla, Subbaiah; Shillor, Meir; Drignei, Dorin; So, Hon Yiu
In this dissertation, we apply transmutation to the theoretical work in insurance. From our extensive literature search, this seems to be a novel piece of work with regards to the transmutation, we particularly focus on the theoretical application of the exponential, Pareto and Weibull distributions. By shedding light on this unexplored area, our findings contribute valuable insights into the broader domain of insurance studies. We also do some exploratory work with regard to future research pursuit on a combined application of copula and transmutation to insurance data.
Optimal Cut-Points for Diagnostic Variables in Complex Surveys
(2023-01-01) Madi, Samar Adnan; Drignei, Dorin; Brown, Elise; Ogunyemi, Theophilus; Perla, Subbaiah; So, Hon Yiu
The ability to diagnose an individual is crucial in promoting treatment and improved health. However, finding a simple tool to base the diagnosis on can be complicated. This research will focus on developing statistical methodology for accurate diagnostic tests in the context of complex survey data. The proposed method will be illustrated with data from National Health and Nutrition Examination Survey (NHANES) to construct a diagnostic test to predict cardiometabolic disease risk in the US younger population. This research will begin with the exploration of a single diagnostic variable to be used as a diagnostic tool. The first 1-dimensional method explored uses receiver operating characteristic (ROC) curves for survey data as a means of determining an optimal cut-point for the diagnostic variable. This method is shown to be accurate but not conducive to multi-variable diagnostic tools using survey data. Another 1-dimensional method uses logistic regression for survey data to determine an optimal cut-point, using minimizing information criteria such as AIC to select the cut-point. The method is applied to NHANES data but considering a single diagnostic variable is shown to be too simplistic to create a comprehensive diagnostic tool. This method will then be extended to a multi-dimensional case, creating a diagnostic tool based on multiple variables using logistic regression for survey data. This method, although accurate, is shown to be time-consuming and computationally inefficient. A modified method using kriging-based optimization is proposed. Under this method, a more efficient search algorithm of efficient global optimization is explored, using a criterion of expected improvement. This proposed method is more computationally efficient in creating a multi-dimensional diagnostic tool. Application of these methods in a healthcare setting could be beneficial in promoting quick and easy diagnosis.
Point Estimators and Confidence Intervals Under Sequential Sampling Strategies with Applications
(2024-01-01) Alanazi, Ibtihal Hamoud; Hu, Jun; Drignei, Dorin; Perla, Subbaiah; Yiu So, Hon; Li, Li
Statistical inference is the process of making informed decisions about a larger population by analyzing a smaller group of data collected with some form of sampling. In many statistical inference problems, where some prescribed accuracy is desired, the required sample size often depends on unknown population parameters and thus remains unknown. Then, it is necessary to conduct a sequential sampling procedure, where an experimenter takes one observation at a time successively until a predefined stopping rule is satisfied. This thesis involves sequential sampling procedures dealing with three statistical inference problems. These are (i) bounded variance point estimation (BVPE) of a function of the scale parameter in a gamma distribution with known shape parameter; (ii) fixed-width confidence interval (FWCI) estimation for comparing two independent Bernoulli proportions; and (iii) fixed-accuracy confidence interval (FACI) estimation for the shape parameter of a Weibull distribution based on record data. In the first research problem, given a gamma population with known shape parameter α, we develop a general theory for estimating a function g(·) of the scale parameter β with bounded variance. We begin by defining a sequential sampling procedure with g(·) satisfying some desired condition in proposing the stopping rule, and show the procedure enjoys appealing asymptotic properties. After these general conditions, we substitute g(·) with specific functions including the gamma mean, the gamma variance, the gamma rate parameter, and a gamma survival probability as four possible illustrations. For each illustration, Monte Carlo simulations are carried out to justify the remarkable performance of our proposed sequential sampling procedure. This is further substantiated with a real data study on the weights of newborn babies. In the second research problem, we are interested in the proportions of a common characteristic possessed by two independent dichotomous populations, denoted by P1 and P2. We propose sequential sampling procedures for constructing FWCIs to compare the magnitude of P1 and P2 based on the log transformation and the logit transformation, respectively, which are followed by Monte Carlo simulations. We then implement these sequential sampling procedures to solve a real-world problem of mobile games A/B testing. In the third research problem, we focus on utilizing the record data to estimate the shape parameter of a two-parameter Weibull population, which is widely used in lifetime data analysis. A sequential sampling procedure is developed for constructing a FACI for the Weibull shape parameter β, no matter whether the scale parameter α is known or unknown.

Browse

Browsing Mathematics and Statistics by Author "Drignei, Dorin"

Results Per Page

Sort Options