ALBERT — All Library Books, journals and Electronic Records Telegrafenberg

Hits per page

hits 1 - 2 | 2 hits

Sorting

Unknown

Fully automated sequence alignment methods are comparable to, and much faster than, traditional methods in large data sets: an example with hepatitis B virus (2019)

Catanach, Therese A. ; Sweet, Andrew D. ; Nguyen, Nam-phuong D. ; [et al.]

PeerJ

In: PeerJ. 2019; 7: e6142. e6142. Published 2019 Jan 03. doi: 10.7717/peerj.6142.

add to mindlist on the mindlist

Details

Publication Date: 2019-01-03

Description: Aligning sequences for phylogenetic analysis (multiple sequence alignment; MSA) is an important, but increasingly computationally expensive step with the recent surge in DNA sequence data. Much of this sequence data is publicly available, but can be extremely fragmentary (i.e., a combination of full genomes and genomic fragments), which can compound the computational issues related to MSA. Traditionally, alignments are produced with automated algorithms and then checked and/or corrected “by eye” prior to phylogenetic inference. However, this manual curation is inefficient at the data scales required of modern phylogenetics and results in alignments that are not reproducible. Recently, methods have been developed for fully automating alignments of large data sets, but it is unclear if these methods produce alignments that result in compatible phylogenies when compared to more traditional alignment approaches that combined automated and manual methods. Here we use approximately 33,000 publicly available sequences from the hepatitis B virus (HBV), a globally distributed and rapidly evolving virus, to compare different alignment approaches. Using one data set comprised exclusively of whole genomes and a second that also included sequence fragments, we compared three MSA methods: (1) a purely automated approach using traditional software, (2) an automated approach including by eye manual editing, and (3) more recent fully automated approaches. To understand how these methods affect phylogenetic results, we compared resulting tree topologies based on these different alignment methods using multiple metrics. We further determined if the monophyly of existing HBV genotypes was supported in phylogenies estimated from each alignment type and under different statistical support thresholds. Traditional and fully automated alignments produced similar HBV phylogenies. Although there was variability between branch support thresholds, allowing lower support thresholds tended to result in more differences among trees. Therefore, differences between the trees could be best explained by phylogenetic uncertainty unrelated to the MSA method used. Nevertheless, automated alignment approaches did not require human intervention and were therefore considerably less time-intensive than traditional approaches. Because of this, we conclude that fully automated algorithms for MSA are fully compatible with older methods even in extremely difficult to align data sets. Additionally, we found that most HBV diagnostic genotypes did not correspond to evolutionarily-sound groups, regardless of alignment type and support threshold. This suggests there may be errors in genotype classification in the database or that HBV genotypes may need a revision.

Electronic ISSN: 2167-8359

Topics: Biology , Medicine

Published by PeerJ

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

Unknown

Effects of 16S rDNA sampling on estimates of the number of endosymbiont lineages in sucking lice (2016)

Allen, Julie M. ; Burleigh, J. Gordon ; Light, Jessica E. ; [et al.]

PeerJ

In: PeerJ. 2016; 4: e2187. e2187. Published 2016 Jul 19. doi: 10.7717/peerj.2187.

add to mindlist on the mindlist

Details

Publication Date: 2016-07-19

Description: Phylogenetic trees can reveal the origins of endosymbiotic lineages of bacteria and detect patterns of co-evolution with their hosts. Although taxon sampling can greatly affect phylogenetic and co-evolutionary inference, most hypotheses of endosymbiont relationships are based on few available bacterial sequences. Here we examined how different sampling strategies ofGammaproteobacteriasequences affect estimates of the number of endosymbiont lineages in parasitic sucking lice (Insecta: Phthirapatera: Anoplura). We estimated the number of louse endosymbiont lineages using both newly obtained and previously sequenced 16S rDNA bacterial sequences and more than 42,000 16S rDNA sequences from otherGammaproteobacteria. We also performed parametric and nonparametric bootstrapping experiments to examine the effects of phylogenetic error and uncertainty on these estimates. Sampling of 16S rDNA sequences affects the estimates of endosymbiont diversity in sucking lice until we reach a threshold of genetic diversity, the size of which depends on the sampling strategy. Sampling by maximizing the diversity of 16S rDNA sequences is more efficient than randomly sampling available 16S rDNA sequences. Although simulation results validate estimates of multiple endosymbiont lineages in sucking lice, the bootstrap results suggest that the precise number of endosymbiont origins is still uncertain.

Electronic ISSN: 2167-8359

Topics: Biology , Medicine

Published by PeerJ

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

hits 1 - 2 | 2 hits