Abstract
Motivation
Principal component analysis (PCA) is a crucial step in quality control of genomic data and a common approach for understanding population genetic structure. With the advent of large genotyping studies involving hundreds of thousands of individuals, standard approaches are no longer feasible. However, when the full decomposition is not required, substantial computational savings can be made.Results
We present FlashPCA2, a tool that can perform partial PCA on 1 million individuals faster than competing approaches, while requiring substantially less memory.Availability and implementation
https://github.com/gabraham/flashpca .Contact
[email protected].Supplementary information
Supplementary data are available at Bioinformatics online.Full text links
Read article at publisher's site: https://doi.org/10.1093/bioinformatics/btx299
Read article for free, from open access legal sources, via Unpaywall: https://academic.oup.com/bioinformatics/article-pdf/33/17/2776/25163845/btx299.pdf
References
Articles referenced by this article (10)
Fast principal component analysis of large-scale genome-wide data.
PLoS One, (4):e93766 2014
MED: 24718290
Second-generation PLINK: rising to the challenge of larger and richer datasets
GigaScience, 7.- 2015
Fast Principal-Component Analysis Reveals Convergent Evolution of ADH1B in Europe and East Asia.
Am J Hum Genet, (3):456-472 2016
MED: 26924531
Interpreting principal component analyses of spatial population genetic variation.
Nat Genet, (5):646-649 2008
MED: 18425127
Population structure and eigenanalysis
PLoS Genet, e190.- 2006
New approaches to population stratification in genome-wide association studies.
Nat Rev Genet, (7):459-463 2010
MED: 20548291
UK Biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age
PLoS Med, e1001779.- 2015
Citations & impact
Impact metrics
Citations of article over time
Alternative metrics
Smart citations by scite.ai
Explore citation contexts and check if this article has been
supported or disputed.
https://scite.ai/reports/10.1093/bioinformatics/btx299
Article citations
Cerebral Amyloidosis in Individuals with Subjective Cognitive Decline: From Genetic Predisposition to Actual Cerebrospinal Fluid Measurements.
Biomedicines, 12(5):1053, 10 May 2024
Cited by: 0 articles | PMID: 38791015 | PMCID: PMC11118196
Phenomewide Association Study of Health Outcomes Associated With the Genetic Correlates of 25 Hydroxyvitamin D Concentration and Vitamin D Binding Protein Concentration.
Twin Res Hum Genet, 27(2):69-79, 22 Apr 2024
Cited by: 0 articles | PMID: 38644690 | PMCID: PMC11138239
Predicting depression risk in early adolescence via multimodal brain imaging.
Neuroimage Clin, 42:103604, 08 Apr 2024
Cited by: 0 articles | PMID: 38603863 | PMCID: PMC11015491
The genetic architecture of multimodal human brain age.
Nat Commun, 15(1):2604, 23 Mar 2024
Cited by: 1 article | PMID: 38521789 | PMCID: PMC10960798
Reconstructing the ancestral gene pool to uncover the origins and genetic links of Hmong-Mien speakers.
BMC Biol, 22(1):59, 13 Mar 2024
Cited by: 0 articles | PMID: 38475771 | PMCID: PMC10935854
Go to all (156) article citations
Similar Articles
To arrive at the top five similar articles we use a word-weighted algorithm to compare words from the Title and Abstract of each citation.
MBV: a method to solve sample mislabeling and detect technical bias in large combined genotype and sequencing assay datasets.
Bioinformatics, 33(12):1895-1897, 01 Jun 2017
Cited by: 19 articles | PMID: 28186259 | PMCID: PMC6044394
SNP genotyping and parameter estimation in polyploids using low-coverage sequencing data.
Bioinformatics, 34(3):407-415, 01 Feb 2018
Cited by: 25 articles | PMID: 29028881
GTC: how to maintain huge genotype collections in a compressed form.
Bioinformatics, 34(11):1834-1840, 01 Jun 2018
Cited by: 8 articles | PMID: 29351600
Discovery and genotyping of novel sequence insertions in many sequenced individuals.
Bioinformatics, 33(14):i161-i169, 01 Jul 2017
Cited by: 12 articles | PMID: 28881988 | PMCID: PMC5870608
Simple, rapid and accurate genotyping-by-sequencing from aligned whole genomes with ArrayMaker.
Bioinformatics, 31(4):599-601, 21 Oct 2014
Cited by: 1 article | PMID: 25336502 | PMCID: PMC4325546
Funding
Funders who supported this work.