Transmission Disequilibrium Tests Based on Read Counts for Low-Coverage Next-Generation Sequence Data
- Authors
- Kim, Wonkuk
- Issue Date
- Sep-2015
- Publisher
- KARGER
- Keywords
- EM algorithm; Family-based association; Likelihood ratio test; Mixture model; Non-centrality parameter
- Citation
- HUMAN HEREDITY, v.80, no.1, pp 36 - 49
- Pages
- 14
- Journal Title
- HUMAN HEREDITY
- Volume
- 80
- Number
- 1
- Start Page
- 36
- End Page
- 49
- URI
- https://scholarworks.bwise.kr/cau/handle/2019.sw.cau/11425
- DOI
- 10.1159/000434645
- ISSN
- 0001-5652
1423-0062
- Abstract
- The purpose of this paper is the introduction of new statistical methods for case-parent trio association studies based on the read counts that can be obtained from next-generation sequencing (NGS) experiments. This work focuses on the inclusion of low-coverage data into the case-parent trio design without genotype classification or imputation. Two different approaches are considered: (1) a likelihood-based approach implementing a 15-component parametric mixture model and (2) a model-free approach that applies non-parametric statistical methods to the ratios of the read counts to coverage. Simulation studies are conducted to evaluate the performances of the proposed tests. In addition, the non-centrality parameters of the mixture likelihood-based tests are derived to determine sample sizes and coverage for a NGS experimental design. As an example, the sample sizes to maintain specified powers of a published adolescent idiopathic scoliosis (AIS) study are presented. The simulation results show that the tests using the genotypes classified by the maximum Bayesian posterior probability have significantly inflated type I error rates for low-coverage data. The tests using the posterior probabilities instead of the classified genotypes show lower power than the proposed tests. Generally, power for the likelihood-based approach is higher than that for the non-parametric ratio-based approach. For the AIS example, approximately 654 trios with 4x coverage are necessary to maintain 90% power when detecting an association of odds ratio 2 at a locus with a minor allele frequency of 0.35 at the level of significance alpha = 5 x 10(-8). By comparison, approximately 416 trios with 25x coverage are required to maintain the same power with the same settings. The R and C source codes to calculate the proposed test statistics, the sample sizes and power can be obtained by contacting the author (wkim@cau.ac.kr). (C) 2015 S. Karger AG, Basel
- Files in This Item
- There are no files associated with this item.
- Appears in
Collections - College of Business & Economics > Department of Applied Statistics > 1. Journal Articles
![qrcode](https://api.qrserver.com/v1/create-qr-code/?size=55x55&data=https://scholarworks.bwise.kr/cau/handle/2019.sw.cau/11425)
Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.