Contributions to Multivariate Data Science: Assessment and Identification of Multivariate Distributions and Supervised Learning for Groups of Objects

dc.contributor.advisorKhattree, Ravindra
dc.contributor.authorTran, Nguyen Quynh Huong
dc.contributor.otherDrignei, Dorin
dc.contributor.otherLi, Li
dc.contributor.otherRoy, Anuradha
dc.contributor.otherSo, Hon Yiu
dc.date.accessioned2024-11-13T20:16:38Z
dc.date.available2024-11-13T20:16:38Z
dc.date.issued2024-01-01
dc.description.abstractThis dissertation considers three critical aspects of modern statistical analyses and machine learning, namely, (i) addressing the challenges posed by assessing the distributional assumptions of a univariate dataset, (ii) constructing graphical tests for the multivariate normality assumption, and (iii) exploring new algorithms for solving group classification problems, especially when the distance-based methods are not applicable. First, we focus on the development and evaluation of graphical statistical tests for univariate datasets, aiming to assess any specific distributional assumption. Examination of normality is given special emphasis. Recognizing the impact of outliers on the normality assumption, this study also incorporates outlier detection methodologies and provides some graphical tools for their identification. T4 plot is introduced as an additional effective tool for this purpose. Examination of multivariate normality is more challenging as in this case nonnormality may exhibit or mask itself in many different ways. In the second part, we thus emphasize graphical assessments of the multivariate normality assumption. This is done via MT3 and MT4 plots based on the derivatives of the cumulant generating function. The final segment of the dissertation shifts the discussion towards machine learning algorithms devised specifically for group classification problems. This involves the exploration of new methodologies that address the challenges inherent in classification and discrimination within complex datasets where other standard methods based on the classification of individual observation may not be very effective. For this we rely on the eigenstructures of the data. In the process, we also address the problem of dimensionality reduction. The problem of selection of copulas can be viewed as a corollary of the group classification problem. This has also been discussed in a separate chapter.
dc.identifier.urihttps://hdl.handle.net/10323/18440
dc.relation.departmentMathematics and Statistics
dc.subjectCopulas choice
dc.subjectCumulant generating functions
dc.subjectDistributional modeling
dc.subjectGroup discrimination
dc.subjectGroups classification
dc.subjectMultivariate normality test
dc.titleContributions to Multivariate Data Science: Assessment and Identification of Multivariate Distributions and Supervised Learning for Groups of Objects

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Tran_oakland_0446E_10407.pdf
Size:
8.26 MB
Format:
Adobe Portable Document Format