Abstract

Inference for Correlation Coefficient in a Bivariate Normal Model Subject to Type II Censoring

Student: Brian Reyes (University of Puerto Rico)
Mentor: Scott Linder (OWU Department of Mathematics & Computer Science)

Researchers commonly wish to estimate the strength and nature of the relationship between two variables. For example, we might believe that the duration of a monkey's life (X) is closely related to the amount of plaque in its aorta (Y). Rather than waiting for, say, 20 monkeys to die, researchers sometimes economize the amount of time needed to conduct an experiment and instead observe the first 20 deaths from an original sample of, say, 200 monkeys. This is an example of Type II censoring, and occurs in many settings. The act of censoring renders the sampling distributions of statistics associated with commonly used inferential methods mathematically intractable, and they need to be approximated typically using simulation. In this work, we consider inference for the population correlation coefficient Rho between two variables X and Y in a Bivariate Normal model. The Fisher Z-transformation of the ordinary sample correlation coefficient, r, is typically used to provide an approximate sampling distribution that can be exploited to construct a confidence interval for Rho . This method is known to work well for full samples as small as n = 8 or 10. Here we consider the case when the data are in a concomitant order and subjected to Type II censoring, so we observe only those cases associated with the smallest p of n values of X. We demonstrate that censoring impacts the quality of inference using Fisher’s Z-transformation: Confidence intervals have far from nominal confidence levels when censoring is heavy or the population correlation coefficient is far from zero. We also demonstrate that this degradation is caused by a systematic error of the Normal distribution in estimating the percentiles of the sampling distribution described above. Using simulation, we document these errors and then construct a regression model that a researcher could use to correct these percentile estimates from experimental conditions n, p, and r. As a result, the researcher can then construct confidence intervals for r with confidence intervals much closer to nominal.