Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Identification of common genetic variants controlling transcript isoform variation in human whole blood

Subjects

Abstract

An understanding of the genetic variation underlying transcript splicing is essential to dissect the molecular mechanisms of common disease. The available evidence from splicing quantitative trait locus (sQTL) studies has been limited to small samples. We performed genome-wide screening to identify SNPs that might control mRNA splicing in whole blood collected from 5,257 Framingham Heart Study participants. We identified 572,333 cis sQTLs involving 2,650 unique genes. Many sQTL-associated genes (40%) undergo alternative splicing. Using the National Human Genome Research Institute (NHGRI) genome-wide association study (GWAS) catalog, we determined that 528 unique sQTLs were significantly enriched for 8,845 SNPs associated with traits in previous GWAS. In particular, we found 395 (4.5%) GWAS SNPs with evidence of cis sQTLs but not gene-level cis expression quantitative trait loci (eQTLs), suggesting that sQTL analysis could provide additional insights into the functional mechanism underlying GWAS results. Our findings provide an informative sQTL resource for further characterizing the potential functional roles of SNPs that control transcript isoforms relevant to common diseases.

This is a preview of subscription content, access via your institution

Access options

Figure 1: Overview of sQTL analysis.
Figure 2: An example of the PTGS1 gene with its associated cis sQTL located in the 3′ acceptor splice site.
Figure 3: Cis sQTLs are highly enriched in binding sites for RNA-binding proteins.
Figure 4: Functional annotation of genes and exons with cis-sQTL associations.

Similar content being viewed by others

Accession codes

Accessions

NCBI Reference Sequence

References

  1. Hindorff, L.A. et al. Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proc. Natl. Acad. Sci. USA 106, 9362–9367 (2009).

    CAS  Google Scholar 

  2. Cookson, W., Liang, L., Abecasis, G., Moffatt, M. & Lathrop, M. Mapping complex disease traits with global gene expression. Nat. Rev. Genet. 10, 184–194 (2009).

    Article  CAS  Google Scholar 

  3. Westra, H.J. et al. Systematic identification of trans eQTLs as putative drivers of known disease associations. Nat. Genet. 45, 1238–1243 (2013).

    Article  CAS  Google Scholar 

  4. Li, Q., Lee, J.A. & Black, D.L. Neuronal regulation of alternative pre-mRNA splicing. Nat. Rev. Neurosci. 8, 819–831 (2007).

    Article  CAS  Google Scholar 

  5. Yeo, G., Holste, D., Kreiman, G. & Burge, C.B. Variation in alternative splicing across human tissues. Genome Biol. 5, R74 (2004).

    Article  Google Scholar 

  6. Wang, E.T. et al. Alternative isoform regulation in human tissue transcriptomes. Nature 456, 470–476 (2008).

    Article  CAS  Google Scholar 

  7. Merkin, J., Russell, C., Chen, P. & Burge, C.B. Evolutionary dynamics of gene and isoform regulation in mammalian tissues. Science 338, 1593–1599 (2012).

    Article  CAS  Google Scholar 

  8. Coulombe-Huntington, J., Lam, K.C., Dias, C. & Majewski, J. Fine-scale variation and genetic determinants of alternative splicing across individuals. PLoS Genet. 5, e1000766 (2009).

    Article  Google Scholar 

  9. Kwan, T. et al. Heritability of alternative splicing in the human genome. Genome Res. 17, 1210–1218 (2007).

    Article  CAS  Google Scholar 

  10. Faustino, N.A. & Cooper, T.A. Pre-mRNA splicing and human disease. Genes Dev. 17, 419–437 (2003).

    Article  CAS  Google Scholar 

  11. Nissim-Rafinia, M. & Kerem, B. The splicing machinery is a genetic modifier of disease severity. Trends Genet. 21, 480–483 (2005).

    Article  CAS  Google Scholar 

  12. Kwan, T. et al. Genome-wide analysis of transcript isoform variation in humans. Nat. Genet. 40, 225–231 (2008).

    Article  CAS  Google Scholar 

  13. Montgomery, S.B. et al. Transcriptome genetics using second generation sequencing in a Caucasian population. Nature 464, 773–777 (2010).

    Article  CAS  Google Scholar 

  14. Battle, A. et al. Characterizing the genetic basis of transcriptome diversity through RNA-sequencing of 922 individuals. Genome Res. 24, 14–24 (2014).

    Article  CAS  Google Scholar 

  15. 1000 Genomes Project Consortium. A map of human genome variation from population-scale sequencing. Nature 467, 1061–1073 (2010).

  16. Mendell, J.T., Sharifi, N.A., Meyers, J.L., Martinez-Murillo, F. & Dietz, H.C. Nonsense surveillance regulates expression of diverse classes of mammalian transcripts and mutes genomic noise. Nat. Genet. 36, 1073–1078 (2004).

    Article  CAS  Google Scholar 

  17. Carninci, P. et al. The transcriptional landscape of the mammalian genome. Science 309, 1559–1563 (2005).

    Article  CAS  Google Scholar 

  18. Hunt, R., Sauna, Z.E., Ambudkar, S.V., Gottesman, M.M. & Kimchi-Sarfaty, C. Silent (synonymous) SNPs: should we care about them? Methods Mol. Biol. 578, 23–39 (2009).

    Article  CAS  Google Scholar 

  19. Carlini, D.B. & Genut, J.E. Synonymous SNPs provide evidence for selective constraint on human exonic splicing enhancers. J. Mol. Evol. 62, 89–98 (2006).

    Article  CAS  Google Scholar 

  20. Taggart, A.J., DeSimone, A.M., Shih, J.S., Filloux, M.E. & Fairbrother, W.G. Large-scale mapping of branchpoints in human pre-mRNA transcripts in vivo. Nat. Struct. Mol. Biol. 19, 719–721 (2012).

    Article  CAS  Google Scholar 

  21. Corvelo, A., Hallegger, M., Smith, C.W. & Eyras, E. Genome-wide association between branch point properties and alternative splicing. PLoS Comput. Biol. 6, e1001016 (2010).

    Article  Google Scholar 

  22. Keene, J.D. & Tenenbaum, S.A. Eukaryotic mRNPs may represent posttranscriptional operons. Mol. Cell 9, 1161–1167 (2002).

    Article  CAS  Google Scholar 

  23. Jayaseelan, S., Doyle, F., Currenti, S. & Tenenbaum, S.A. RIP: an mRNA localization technique. Methods Mol. Biol. 714, 407–422 (2011).

    Article  CAS  Google Scholar 

  24. Nicolae, D.L. et al. Trait-associated SNPs are more likely to be eQTLs: annotation to enhance discovery from GWAS. PLoS Genet. 6, e1000888 (2010).

    Article  Google Scholar 

  25. Welter, D. et al. The NHGRI GWAS Catalog, a curated resource of SNP-trait associations. Nucleic Acids Res. 42, D1001–D1006 (2014).

    Article  CAS  Google Scholar 

  26. Zhang, X. et al. Genetic associations with expression for genes implicated in GWAS studies for atherosclerotic cardiovascular disease and blood phenotypes. Hum. Mol. Genet. 23, 782–795 (2014).

    Article  CAS  Google Scholar 

  27. Graveley, B.R. The haplo-spliceo-transcriptome: common variations in alternative splicing in the human population. Trends Genet. 24, 5–7 (2008).

    Article  CAS  Google Scholar 

  28. Nembaware, V., Wolfe, K.H., Bettoni, F., Kelso, J. & Seoighe, C. Allele-specific transcript isoforms in human. FEBS Lett. 577, 233–238 (2004).

    Article  CAS  Google Scholar 

  29. Bondar', T.N. & Kravchenko, N.A. Cyclooxigenase-1 gene polymorphism and aspirin resistance. Tsitol. Genet. 46, 66–72 (2012).

    CAS  PubMed  Google Scholar 

  30. Licis, N., Krivmane, B., Latkovskis, G. & Erglis, A. A common promoter variant of the gene encoding cyclooxygenase-1 (PTGS1) is related to decreased incidence of myocardial infarction in patients with coronary artery disease. Thromb. Res. 127, 600–602 (2011).

    Article  CAS  Google Scholar 

  31. Zhang, X. et al. Synthesis of 53 tissue and cell line expression QTL datasets reveals master eQTLs. BMC Genomics 15, 532 (2014).

    Article  Google Scholar 

  32. Heinzen, E.L. et al. Tissue-specific genetic control of splicing: implications for the study of complex traits. PLoS Biol. 6, e1 (2008).

    Article  Google Scholar 

  33. Zhernakova, D.V. et al. DeepSAGE reveals genetic variants associated with alternative polyadenylation and expression of coding and non-coding transcripts. PLoS Genet. 9, e1003594 (2013).

    Article  CAS  Google Scholar 

  34. GTEx Consortium. The Genotype-Tissue Expression (GTEx) project. Nat. Genet. 45, 580–585 (2013).

  35. Dawber, T.R., Kannel, W.B. & Lyell, L.P. An approach to longitudinal studies in a community: the Framingham Study. Ann. NY Acad. Sci. 107, 539–556 (1963).

    Article  CAS  Google Scholar 

  36. Feinleib, M., Kannel, W.B., Garrison, R.J., McNamara, P.M. & Castelli, W.P. The Framingham Offspring Study. Design and preliminary data. Prev. Med. 4, 518–525 (1975).

    Article  CAS  Google Scholar 

  37. Kannel, W.B., Feinleib, M., McNamara, P.M., Garrison, R.J. & Castelli, W.P. An investigation of coronary heart disease in families. The Framingham offspring study. Am. J. Epidemiol. 110, 281–290 (1979).

    Article  CAS  Google Scholar 

  38. Splansky, G.L. et al. The Third Generation Cohort of the National Heart, Lung, and Blood Institute's Framingham Heart Study: design, recruitment, and initial examination. Am. J. Epidemiol. 165, 1328–1335 (2007).

    Article  Google Scholar 

  39. Irizarry, R.A. et al. Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics 4, 249–264 (2003).

    Article  Google Scholar 

  40. Li, Y., Willer, C.J., Ding, J., Scheet, P. & Abecasis, G.R. MaCH: using sequence and genotype data to estimate haplotypes and unobserved genotypes. Genet. Epidemiol. 34, 816–834 (2010).

    Article  Google Scholar 

  41. Lange, K. Mathematical and Statistical Methods for Genetic Analysis (Springer, 2002).

  42. Ramasamy, A. et al. Resolving the polymorphism-in-probe problem is critical for correct interpretation of expression QTL studies. Nucleic Acids Res. 41, e88 (2013).

    Article  CAS  Google Scholar 

  43. Tenenbaum, S.A., Lager, P.J., Carson, C.C. & Keene, J.D. Ribonomics: identifying mRNA subsets in mRNP complexes using antibodies to RNA-binding proteins and genomic arrays. Methods 26, 191–198 (2002).

    Article  CAS  Google Scholar 

  44. Huang, W., Sherman, B.T. & Lempicki, R.A. Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists. Nucleic Acids Res. 37, 1–13 (2009).

    Article  Google Scholar 

  45. Huang, W., Sherman, B.T. & Lempicki, R.A. Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat. Protoc. 4, 44–57 (2009).

    Article  CAS  Google Scholar 

Download references

Acknowledgements

This research was conducted in part using data and resources from the FHS of the National Heart, Lung, and Blood Institute (NHLBI) of the US National Institutes of Health (NIH) and the Boston University School of Medicine. The analyses reflect intellectual input and resource development from the FHS investigators participating in the SNP Health Association Resource (SHARe) project and in the Systems Approach to Biomarker Research in Cardiovascular Disease (SABRe) project.

We thank J. Zhu and Y. Yang at the DNA Sequencing and Genomics Core of the NHLBI for detailed review of our manuscript and helpful suggestions. We also thank J. Dupuis at the Boston University School of Public Health for her statistical suggestions.

This study used the high-performance computational capabilities of the Biowulf Linux cluster at the US NIH (http://biowulf.nih.gov/).

The FHS is funded by US NIH contract N01-HC-25195; this work was also supported by the NHLBI, Division of Intramural Research.

Author information

Authors and Affiliations

Authors

Contributions

X.Z. designed the study, developed the method, performed the analyses and wrote the manuscript. C.J.O'D. conceived and coordinated the project and wrote the manuscript. B.H.C. provided key input and revised the manuscript. R.J., S.Y. and P.J.M. provided the normalized expression Exon array data. T.H., A.D.J., P.J.M. and D.L. reviewed the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Christopher J O'Donnell.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Integrated supplementary information

Supplementary Figure 1 Schematic for coverage of Affymetrix exon array probe sets across the entire length of the transcript.

Black regions represent exons, whereas gray regions represent introns. The short dashes underneath the exon regions indicate individual probes of 25 nt in length representing the probe set. The highlighted red box indicates an alternative splicing event (exon 2 is spliced out in mRNA transcript isoform 1), which can be detected by analyzing the exon-level probe sets one by one in this genomic locus. The Affymetrix GeneChip Human Exon 1.0 ST array allows for exon-level expression profiling on a single chip and can interrogate over 280,000 core exons in the human genome.

Supplementary Figure 2 The number of exons and SNPs versus the gene length.

Histograms of (a) the number of core probe sets/exons per gene and (b) the number of SNPs with minor allele frequency (MAF) > 0.01 within 50 kb of each gene. Correlation plots of (c) the number of probe sets/exons to the number of SNPs located within 50 kb of each gene, (d) the length of genes to the number of probe sets/exons and (e) the number of SNPs located within 50 kb of each gene.

Supplementary Figure 3 For the 2,650 genes only found in exon-level analysis, shown is the distribution of fold differences in expression levels between the 2 homozygous genotypes.

Supplementary Figure 4 Examples of the different types of transcript isoform events observed.

Supplementary Figure 5 Examples of two GWAS SNPs that are in close LD (r2 > 0.8) with both the peak signal of a cis eQTL and the peak signal of a cis sQTL (the index SNP is represented as a purple diamond).

Supplementary Figure 6 The correlation plot of the –log(P values) of cis-sQTL results with and without adjusting for the cell counts of seven major blood cell types: total white blood cell count (WBC), total platelet count (PLT), and subfractions (percentages) of neutrophils, lymphocytes, monocytes, eosinophils and basophils.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–6 and Supplementary Note. (PDF 1126 kb)

Supplementary Tables

Supplementary Tables 1–8 (XLSX 874 kb)

Source data

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, X., Joehanes, R., Chen, B. et al. Identification of common genetic variants controlling transcript isoform variation in human whole blood. Nat Genet 47, 345–352 (2015). https://doi.org/10.1038/ng.3220

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/ng.3220

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing