Mathematics and Statistics
Permanent URI for this collection
Browse
Browsing Mathematics and Statistics by Author "Roy, Anuradha"
Now showing 1 - 1 of 1
Results Per Page
Sort Options
Item Contributions to Multivariate Data Science: Assessment and Identification of Multivariate Distributions and Supervised Learning for Groups of Objects(2024-01-01) Tran, Nguyen Quynh Huong; Khattree, Ravindra; Drignei, Dorin; Li, Li; Roy, Anuradha; So, Hon YiuThis dissertation considers three critical aspects of modern statistical analyses and machine learning, namely, (i) addressing the challenges posed by assessing the distributional assumptions of a univariate dataset, (ii) constructing graphical tests for the multivariate normality assumption, and (iii) exploring new algorithms for solving group classification problems, especially when the distance-based methods are not applicable. First, we focus on the development and evaluation of graphical statistical tests for univariate datasets, aiming to assess any specific distributional assumption. Examination of normality is given special emphasis. Recognizing the impact of outliers on the normality assumption, this study also incorporates outlier detection methodologies and provides some graphical tools for their identification. T4 plot is introduced as an additional effective tool for this purpose. Examination of multivariate normality is more challenging as in this case nonnormality may exhibit or mask itself in many different ways. In the second part, we thus emphasize graphical assessments of the multivariate normality assumption. This is done via MT3 and MT4 plots based on the derivatives of the cumulant generating function. The final segment of the dissertation shifts the discussion towards machine learning algorithms devised specifically for group classification problems. This involves the exploration of new methodologies that address the challenges inherent in classification and discrimination within complex datasets where other standard methods based on the classification of individual observation may not be very effective. For this we rely on the eigenstructures of the data. In the process, we also address the problem of dimensionality reduction. The problem of selection of copulas can be viewed as a corollary of the group classification problem. This has also been discussed in a separate chapter.