AUTHOR=Dhankani Varsha , Gibbs David L. , Knijnenburg Theo , Kramer Roger , Vockley Joseph , Niederhuber John , Shmulevich Ilya , Bernard Brady TITLE=Using Incomplete Trios to Boost Confidence in Family Based Association Studies JOURNAL=Frontiers in Genetics VOLUME=7 YEAR=2016 URL=https://www.frontiersin.org/journals/genetics/articles/10.3389/fgene.2016.00034 DOI=10.3389/fgene.2016.00034 ISSN=1664-8021 ABSTRACT=

Most currently available family based association tests are designed to account only for nuclear families with complete genotypes for parents as well as offspring. Due to the availability of increasingly less expensive generation of whole genome sequencing information, genetic studies are able to collect data for more families and from large family cohorts with the goal of improving statistical power. However, due to missing genotypes, many families are not included in the family based association tests, negating the benefits of large scale sequencing data. Here, we present the CIFBAT method to use incomplete families in Family Based Association Test (FBAT) to evaluate robustness against missing data. CIFBAT uses quantile intervals of the FBAT statistic by randomly choosing valid completions of incomplete family genotypes based on Mendelian inheritance rules. By considering all valid completions equally likely and computing quantile intervals over many randomized iterations, CIFBAT avoids assumption of a homogeneous population structure or any particular missingness pattern in the data. Using simulated data, we show that the quantile intervals computed by CIFBAT are useful in validating robustness of the FBAT statistic against missing data and in identifying genomic markers with higher precision. We also propose a novel set of candidate genomic markers for uterine related abnormalities from analysis of familial whole genome sequences, and provide validation for a previously established set of candidate markers for Type 1 diabetes. We have provided a software package that incorporates TDT, robustTDT, FBAT, and CIFBAT. The data format proposed for the software uses half the memory space that the standard FBAT format (PED) files use, making it efficient for large scale genome wide association studies.