Mathematics and Statistics
Permanent URI for this collection
Browse
Browsing Mathematics and Statistics by Author "Li, Li"
Now showing 1 - 2 of 2
Results Per Page
Sort Options
Item Contributions to Multivariate Data Science: Assessment and Identification of Multivariate Distributions and Supervised Learning for Groups of Objects(2024-01-01) Tran, Nguyen Quynh Huong; Khattree, Ravindra; Drignei, Dorin; Li, Li; Roy, Anuradha; So, Hon YiuThis dissertation considers three critical aspects of modern statistical analyses and machine learning, namely, (i) addressing the challenges posed by assessing the distributional assumptions of a univariate dataset, (ii) constructing graphical tests for the multivariate normality assumption, and (iii) exploring new algorithms for solving group classification problems, especially when the distance-based methods are not applicable. First, we focus on the development and evaluation of graphical statistical tests for univariate datasets, aiming to assess any specific distributional assumption. Examination of normality is given special emphasis. Recognizing the impact of outliers on the normality assumption, this study also incorporates outlier detection methodologies and provides some graphical tools for their identification. T4 plot is introduced as an additional effective tool for this purpose. Examination of multivariate normality is more challenging as in this case nonnormality may exhibit or mask itself in many different ways. In the second part, we thus emphasize graphical assessments of the multivariate normality assumption. This is done via MT3 and MT4 plots based on the derivatives of the cumulant generating function. The final segment of the dissertation shifts the discussion towards machine learning algorithms devised specifically for group classification problems. This involves the exploration of new methodologies that address the challenges inherent in classification and discrimination within complex datasets where other standard methods based on the classification of individual observation may not be very effective. For this we rely on the eigenstructures of the data. In the process, we also address the problem of dimensionality reduction. The problem of selection of copulas can be viewed as a corollary of the group classification problem. This has also been discussed in a separate chapter.Item Point Estimators and Confidence Intervals Under Sequential Sampling Strategies with Applications(2024-01-01) Alanazi, Ibtihal Hamoud; Hu, Jun; Drignei, Dorin; Perla, Subbaiah; Yiu So, Hon; Li, LiStatistical inference is the process of making informed decisions about a larger population by analyzing a smaller group of data collected with some form of sampling. In many statistical inference problems, where some prescribed accuracy is desired, the required sample size often depends on unknown population parameters and thus remains unknown. Then, it is necessary to conduct a sequential sampling procedure, where an experimenter takes one observation at a time successively until a predefined stopping rule is satisfied. This thesis involves sequential sampling procedures dealing with three statistical inference problems. These are (i) bounded variance point estimation (BVPE) of a function of the scale parameter in a gamma distribution with known shape parameter; (ii) fixed-width confidence interval (FWCI) estimation for comparing two independent Bernoulli proportions; and (iii) fixed-accuracy confidence interval (FACI) estimation for the shape parameter of a Weibull distribution based on record data. In the first research problem, given a gamma population with known shape parameter α, we develop a general theory for estimating a function g(·) of the scale parameter β with bounded variance. We begin by defining a sequential sampling procedure with g(·) satisfying some desired condition in proposing the stopping rule, and show the procedure enjoys appealing asymptotic properties. After these general conditions, we substitute g(·) with specific functions including the gamma mean, the gamma variance, the gamma rate parameter, and a gamma survival probability as four possible illustrations. For each illustration, Monte Carlo simulations are carried out to justify the remarkable performance of our proposed sequential sampling procedure. This is further substantiated with a real data study on the weights of newborn babies. In the second research problem, we are interested in the proportions of a common characteristic possessed by two independent dichotomous populations, denoted by P1 and P2. We propose sequential sampling procedures for constructing FWCIs to compare the magnitude of P1 and P2 based on the log transformation and the logit transformation, respectively, which are followed by Monte Carlo simulations. We then implement these sequential sampling procedures to solve a real-world problem of mobile games A/B testing. In the third research problem, we focus on utilizing the record data to estimate the shape parameter of a two-parameter Weibull population, which is widely used in lifetime data analysis. A sequential sampling procedure is developed for constructing a FACI for the Weibull shape parameter β, no matter whether the scale parameter α is known or unknown.