Publication Date:
2004-11-13
Description:
We assess the phylogenetic potential of approximately 300,000 protein sequences sampled from Swiss-Prot and GenBank. Although only a small subset of these data was potentially phylogenetically informative, this subset retained a substantial fraction of the original taxonomic diversity. Sampling biases in the databases necessitate building phylogenetic data sets that have large numbers of missing entries. However, an analysis of two "supermatrices" suggests that even data sets with as much as 92% missing data can provide insights into broad sections of the tree of life.〈br /〉〈span class="detail_caption"〉Notes: 〈/span〉Driskell, Amy C -- Ane, Cecile -- Burleigh, J Gordon -- McMahon, Michelle M -- O'meara, Brian C -- Sanderson, Michael J -- New York, N.Y. -- Science. 2004 Nov 12;306(5699):1172-4.〈br /〉〈span class="detail_caption"〉Author address: 〈/span〉Section of Evolution and Ecology, University of California, One Shields Avenue, Davis, CA 95616, USA. acdriskell@ucdavis.edu〈br /〉〈span class="detail_caption"〉Record origin:〈/span〉 〈a href="http://www.ncbi.nlm.nih.gov/pubmed/15539599" target="_blank"〉PubMed〈/a〉
Keywords:
Animals
;
Anopheles/classification/genetics
;
Biodiversity
;
*Biological Evolution
;
Classification
;
Computational Biology
;
*Databases, Nucleic Acid
;
*Databases, Protein
;
Multigene Family
;
*Phylogeny
;
Plant Proteins/genetics
;
Plants/classification/genetics
;
Spodoptera/classification/genetics
Print ISSN:
0036-8075
Electronic ISSN:
1095-9203
Topics:
Biology
,
Chemistry and Pharmacology
,
Computer Science
,
Medicine
,
Natural Sciences in General
,
Physics