Mathematics and Statistics
Permanent URI for this collection
Browse
Recent Submissions
Item Point Estimators and Confidence Intervals Under Sequential Sampling Strategies with Applications(2024-01-01) Alanazi, Ibtihal Hamoud; Hu, Jun; Drignei, Dorin; Perla, Subbaiah; Yiu So, Hon; Li, LiStatistical inference is the process of making informed decisions about a larger population by analyzing a smaller group of data collected with some form of sampling. In many statistical inference problems, where some prescribed accuracy is desired, the required sample size often depends on unknown population parameters and thus remains unknown. Then, it is necessary to conduct a sequential sampling procedure, where an experimenter takes one observation at a time successively until a predefined stopping rule is satisfied. This thesis involves sequential sampling procedures dealing with three statistical inference problems. These are (i) bounded variance point estimation (BVPE) of a function of the scale parameter in a gamma distribution with known shape parameter; (ii) fixed-width confidence interval (FWCI) estimation for comparing two independent Bernoulli proportions; and (iii) fixed-accuracy confidence interval (FACI) estimation for the shape parameter of a Weibull distribution based on record data. In the first research problem, given a gamma population with known shape parameter α, we develop a general theory for estimating a function g(·) of the scale parameter β with bounded variance. We begin by defining a sequential sampling procedure with g(·) satisfying some desired condition in proposing the stopping rule, and show the procedure enjoys appealing asymptotic properties. After these general conditions, we substitute g(·) with specific functions including the gamma mean, the gamma variance, the gamma rate parameter, and a gamma survival probability as four possible illustrations. For each illustration, Monte Carlo simulations are carried out to justify the remarkable performance of our proposed sequential sampling procedure. This is further substantiated with a real data study on the weights of newborn babies. In the second research problem, we are interested in the proportions of a common characteristic possessed by two independent dichotomous populations, denoted by P1 and P2. We propose sequential sampling procedures for constructing FWCIs to compare the magnitude of P1 and P2 based on the log transformation and the logit transformation, respectively, which are followed by Monte Carlo simulations. We then implement these sequential sampling procedures to solve a real-world problem of mobile games A/B testing. In the third research problem, we focus on utilizing the record data to estimate the shape parameter of a two-parameter Weibull population, which is widely used in lifetime data analysis. A sequential sampling procedure is developed for constructing a FACI for the Weibull shape parameter β, no matter whether the scale parameter α is known or unknown.Item Contributions to Multivariate Data Science: Assessment and Identification of Multivariate Distributions and Supervised Learning for Groups of Objects(2024-01-01) Tran, Nguyen Quynh Huong; Khattree, Ravindra; Drignei, Dorin; Li, Li; Roy, Anuradha; So, Hon YiuThis dissertation considers three critical aspects of modern statistical analyses and machine learning, namely, (i) addressing the challenges posed by assessing the distributional assumptions of a univariate dataset, (ii) constructing graphical tests for the multivariate normality assumption, and (iii) exploring new algorithms for solving group classification problems, especially when the distance-based methods are not applicable. First, we focus on the development and evaluation of graphical statistical tests for univariate datasets, aiming to assess any specific distributional assumption. Examination of normality is given special emphasis. Recognizing the impact of outliers on the normality assumption, this study also incorporates outlier detection methodologies and provides some graphical tools for their identification. T4 plot is introduced as an additional effective tool for this purpose. Examination of multivariate normality is more challenging as in this case nonnormality may exhibit or mask itself in many different ways. In the second part, we thus emphasize graphical assessments of the multivariate normality assumption. This is done via MT3 and MT4 plots based on the derivatives of the cumulant generating function. The final segment of the dissertation shifts the discussion towards machine learning algorithms devised specifically for group classification problems. This involves the exploration of new methodologies that address the challenges inherent in classification and discrimination within complex datasets where other standard methods based on the classification of individual observation may not be very effective. For this we rely on the eigenstructures of the data. In the process, we also address the problem of dimensionality reduction. The problem of selection of copulas can be viewed as a corollary of the group classification problem. This has also been discussed in a separate chapter.Item Modeling Extreme Insurance Losses Using Transmutation and Copula(2023-01-01) Addai, Solomon; Ogunyemi, Theophilus; Perla, Subbaiah; Shillor, Meir; Drignei, Dorin; So, Hon YiuIn this dissertation, we apply transmutation to the theoretical work in insurance. From our extensive literature search, this seems to be a novel piece of work with regards to the transmutation, we particularly focus on the theoretical application of the exponential, Pareto and Weibull distributions. By shedding light on this unexplored area, our findings contribute valuable insights into the broader domain of insurance studies. We also do some exploratory work with regard to future research pursuit on a combined application of copula and transmutation to insurance data.Item Transportation Related Algorithm Design and Application(2024-01-01) Kulick, Anthony James; Cheng, Eddie; Kruk, Serge; Shillor, Meir; Liptak, LaszloIn this thesis several algorithms are proposed and developed to solve a variety of transportation related problems. First we considered, a dynamic programming approach to create an exact solver to minimize distance in a vehicle routing problemwith time windows (VRPTW) variant. Several new tests are developed to reduce the size of the state space and ultimately reduce the number of state transitions.Item Optimal Cut-Points for Diagnostic Variables in Complex Surveys(2023-01-01) Madi, Samar Adnan; Drignei, Dorin; Brown, Elise; Ogunyemi, Theophilus; Perla, Subbaiah; So, Hon YiuThe ability to diagnose an individual is crucial in promoting treatment and improved health. However, finding a simple tool to base the diagnosis on can be complicated. This research will focus on developing statistical methodology for accurate diagnostic tests in the context of complex survey data. The proposed method will be illustrated with data from National Health and Nutrition Examination Survey (NHANES) to construct a diagnostic test to predict cardiometabolic disease risk in the US younger population. This research will begin with the exploration of a single diagnostic variable to be used as a diagnostic tool. The first 1-dimensional method explored uses receiver operating characteristic (ROC) curves for survey data as a means of determining an optimal cut-point for the diagnostic variable. This method is shown to be accurate but not conducive to multi-variable diagnostic tools using survey data. Another 1-dimensional method uses logistic regression for survey data to determine an optimal cut-point, using minimizing information criteria such as AIC to select the cut-point. The method is applied to NHANES data but considering a single diagnostic variable is shown to be too simplistic to create a comprehensive diagnostic tool. This method will then be extended to a multi-dimensional case, creating a diagnostic tool based on multiple variables using logistic regression for survey data. This method, although accurate, is shown to be time-consuming and computationally inefficient. A modified method using kriging-based optimization is proposed. Under this method, a more efficient search algorithm of efficient global optimization is explored, using a criterion of expected improvement. This proposed method is more computationally efficient in creating a multi-dimensional diagnostic tool. Application of these methods in a healthcare setting could be beneficial in promoting quick and easy diagnosis.Item A Hybridized Discontinuous Galerkin Scheme for the Coupled Stokes-Darcy Flow and Transport(2022-03-22) Pham, Dinh Dong; Cesmelioglu, Aycil; Cheng, Eddie; Horvath, Tamas; Schmidt, Darrell; Shillor, MeirThe main focus of this thesis is on finding highly accurate and robust numerical methods to solve a complex flow and transport problem governed by the fully-coupled time-dependent Stokes-Darcy-transport equations. This problem has many applications one of which is groundwater contamination by pollutants transported via surface/subsurface flow. It consists of two main ingredients; the time-dependent Stokes-Darcy equations describing the flow, and the time-dependent advection-diffusion equation for the transport of chemicals via this flow. Therefore, the first part of this thesis is dedicated to studying the time-dependent Stokes-Darcy problem that describes the free flow and porous media flow on two different parts of a domain and their interaction at the common interface. We introduce a hybridized discontinuous Galerkin (HDG) method which provides exact mass conservation and pressure robustness and handles the interface conditions via facet unknowns. We prove well-posedness and a priori error estimates in the energy norm, and provide numerical experiments that show optimal convergence and robustness of the method with respect to the problem parameters. The second part deals with the time-dependent advection-diffusion equation where we again use an HDG method for the spatial discretization. We show the existence and uniqueness of the semi-discrete transport problem and prove a priori error estimates in the energy norm. A number of numerical experiments are presented for different boundary conditions and we observe optimal rates of convergence in each case. Combining the two parts by a sequential algorithm, we solve the fully coupled time-dependent Stokes-Darcy-transport problem. The coupling of the flow and transport is introduced by the dependence of the fluid viscosity and source/sink terms on the concentration and by the dependence of the dispersion/diffusion tensor in the porous media domain on the advective fluid velocity. Our sequential algorithm employs a linearizing decoupling strategy based on the backward Euler time-stepping where the Stokes-Darcy and the transport equations are solved sequentially by time-lagging the concentration. The well-posedness results and a priori error estimates for the velocity and the concentration in the energy norm are presented and numerical examples demonstrating optimal convergence and mass conservation are provided.Item Mathematical Models, Analysis and Simulations of the Handy Model with Middle Class(2021-12-06) Al-Khawaja, Thanaa Ali Kadhim A; Shillor, Meir; Spagnuolo, Anna Maria; Ogunyemi, Theophilus; Andrews, KevinThis study presents three different mathematical versions of the HANDY (Human And Nature DYnamics) model for the socioeconomic dynamics of a large stratified society. The basic model was introduced in the ground breaking publications of Motesharrei (dissertation 2014) and Motesharrei et. al. (2016). The original model consists of a nonlinear system of four ordinary differential equations (ODEs) which describe the development, in time, of a ’very simple’ society consisting of two populations: the Elite (rich) and Commoners (workers). Included also are the use of natural (renewable and nonrenewable) resources and the accumulation of human wealth. The model’s solutions depict the dynamics of these variables. Motesharrei’s main impetus and interest was to use the model as a tool for evaluating the conditions that contribute to the flourishing, sustainability, or collapse of complex societies. This dissertation expands the basic HANDY model and studies its mathematical properties and those of its three extensions. It establishes the existence of solutions to the models, as well as their uniqueness, boundedness and positivity. Furthermore, it investigates the stability of the systems’ steady states, which describe the long-time behavior of the societies. It also presents a number of qualitatively different computer simulations, providing insights into potential behaviors of the societies described by these models. The main contributions of this work are the mathematical analysis of the basic HANDY model, its three expansions and their analysis, and computer simulations. The first extension, the HANDY-SM model, includes social mobility. Rich individuals may go bankrupt and become workers, and some workers may become rich. It also allows for two different aspects of inequality, through variations in salaries and the wealth structure. The second extension, the HANDY-MC-I model, includes the Middle Class population, making the model more practical when applied to modern societies. It expands the system into five ODEs, and allows for social mobility among the three populations. Finally, in the third extension, HANDY-MC-II, two variables describe the natural resources: the renewable resources (wood, solar and wind energies), and nonrenewable resources (coal, oil, gas). This particular extension makes the model more realistic, but it also adds considerable complexity since it consists of six nonlinear coupled ODEs. The model simulations depict the consequences of having three different populations with different income status, two natural resources, and unequal contributions to wealth structure. Analysis of the models’ steady states shows that the state is stable when the populations and wealth die out but nature (the resources) is at its equilibrium. The model has also asymptotically stable, nonzero steady states to which the populations, the resources and the wealth converge in the long-time limit. The simulations also show the existence of periodic solutions in which the populations, the natural resources and wealth undergo large oscillations, indicating cycles of ‘boom and bust.’ Finally, the simulations demonstrate that the models may have chaotic solutions, pointing to a high level of unpredictability. This dissertation describes three increasingly more complex HANDY models. It paves the way and raises mathematically interesting topics for their further study. In particular, the uniqueness of the solutions, and the questions of the existence of periodic solutions, limit cycles and chaos, remain unresolved, yet. Furthermore, it suggests the possibility of tailoring such models to existing societies, and then using them as tools for evaluation of the potential outcomes of various policy decision