ALBERT

All Library Books, journals and Electronic Records Telegrafenberg

feed icon rss

Ihre E-Mail wurde erfolgreich gesendet. Bitte prüfen Sie Ihren Maileingang.

Leider ist ein Fehler beim E-Mail-Versand aufgetreten. Bitte versuchen Sie es erneut.

Vorgang fortführen?

Exportieren
Filter
Sammlung
Erscheinungszeitraum
  • 1
  • 2
    Publikationsdatum: 2015-06-14
    Beschreibung: Motivation: CYP2D6 is highly polymorphic gene which encodes the (CYP2D6) enzyme, involved in the metabolism of 20–25% of all clinically prescribed drugs and other xenobiotics in the human body. CYP2D6 genotyping is recommended prior to treatment decisions involving one or more of the numerous drugs sensitive to CYP2D6 allelic composition. In this context, high-throughput sequencing (HTS) technologies provide a promising time-efficient and cost-effective alternative to currently used genotyping techniques. To achieve accurate interpretation of HTS data, however, one needs to overcome several obstacles such as high sequence similarity and genetic recombinations between CYP2D6 and evolutionarily related pseudogenes CYP2D7 and CYP2D8 , high copy number variation among individuals and short read lengths generated by HTS technologies. Results: In this work, we present the first algorithm to computationally infer CYP2D6 genotype at basepair resolution from HTS data. Our algorithm is able to resolve complex genotypes, including alleles that are the products of duplication, deletion and fusion events involving CYP2D6 and its evolutionarily related cousin CYP2D7. Through extensive experiments using simulated and real datasets, we show that our algorithm accurately solves this important problem with potential clinical implications. Availability and implementation: Cypiripi is available at http://sfu-compbio.github.io/cypiripi . Contact: cenk@sfu.ca .
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 3
    Publikationsdatum: 2016-09-02
    Beschreibung: Motivation: Second generation sequencing technologies paved the way to an exceptional increase in the number of sequenced genomes, both prokaryotic and eukaryotic. However, short reads are difficult to assemble and often lead to highly fragmented assemblies. The recent developments in long reads sequencing methods offer a promising way to address this issue. However, so far long reads are characterized by a high error rate, and assembling from long reads require a high depth of coverage. This motivates the development of hybrid approaches that leverage the high quality of short reads to correct errors in long reads. Results: We introduce CoLoRMap, a hybrid method for correcting noisy long reads, such as the ones produced by PacBio sequencing technology, using high-quality Illumina paired-end reads mapped onto the long reads. Our algorithm is based on two novel ideas: using a classical shortest path algorithm to find a sequence of overlapping short reads that minimizes the edit score to a long read and extending corrected regions by local assembly of unmapped mates of mapped short reads. Our results on bacterial, fungal and insect data sets show that CoLoRMap compares well with existing hybrid correction methods. Availability and Implementation: The source code of CoLoRMap is freely available for non-commercial use at https://github.com/sfu-compbio/colormap Contact: ehaghshe@sfu.ca or cedric.chauve@sfu.ca Supplementary information: Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 4
    Publikationsdatum: 2012-04-28
    Beschreibung: Motivation: Determining the interaction partners among protein/domain families poses hard computational problems, in particular in the presence of paralogous proteins. Available approaches aim to identify interaction partners among protein/domain families through maximizing the similarity between trimmed versions of their phylogenetic trees. Since maximization of any natural similarity score is computationally difficult, many approaches employ heuristics to evaluate the distance matrices corresponding to the tree topologies in question. In this article, we devise an efficient deterministic algorithm which directly maximizes the similarity between two leaf labeled trees with edge lengths, obtaining a score-optimal alignment of the two trees in question. Results: Our algorithm is significantly faster than those methods based on distance matrix comparison: 1 min on a single processor versus 730 h on a supercomputer. Furthermore, we outperform the current state-of-the-art exhaustive search approach in terms of precision, while incurring acceptable losses in recall. Availability: A C implementation of the method demonstrated in this article is available at http://compbio.cs.sfu.ca/mirrort.htm Contact: imanh@sfu.ca ; cenk@sfu.ca ; as@cwi.nl
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 5
    Publikationsdatum: 2012-06-12
    Beschreibung: Motivation: Computational identification of genomic structural variants via high-throughput sequencing is an important problem for which a number of highly sophisticated solutions have been recently developed. With the advent of high-throughput transcriptome sequencing (RNA-Seq), the problem of identifying structural alterations in the transcriptome is now attracting significant attention. In this article, we introduce two novel algorithmic formulations for identifying transcriptomic structural variants through aligning transcripts to the reference genome under the consideration of such variation. The first formulation is based on a nucleotide-level alignment model; a second, potentially faster formulation is based on chaining fragments shared between each transcript and the reference genome. Based on these formulations, we introduce a novel transcriptome-to-genome alignment tool, Dissect (DIScovery of Structural Alteration Event Containing Transcripts), which can identify and characterize transcriptomic events such as duplications, inversions, rearrangements and fusions. Dissect is suitable for whole transcriptome structural variation discovery problems involving sufficiently long reads or accurately assembled contigs. Results: We tested Dissect on simulated transcripts altered via structural events, as well as assembled RNA-Seq contigs from human prostate cancer cell line C4-2. Our results indicate that Dissect has high sensitivity and specificity in identifying structural alteration events in simulated transcripts as well as uncovering novel structural alterations in cancer transcriptomes. Availability: Dissect is available for public use at: http://dissect-trans.sourceforge.net Contact: denizy@mit.edu ; fhach@cs.sfu.ca ; cenk@cs.sfu.ca
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 6
    Publikationsdatum: 2016-12-30
    Beschreibung: Motivation: Successful development and application of precision oncology approaches require robust elucidation of the genomic landscape of a patient’s cancer and, ideally, the ability to monitor therapy-induced genomic changes in the tumour in an inexpensive and minimally invasive manner. Thanks to recent advances in sequencing technologies, ‘liquid biopsy’, the sampling of patient’s bodily fluids such as blood and urine, is considered as one of the most promising approaches to achieve this goal. In many cancer patients, and especially those with advanced metastatic disease, deep sequencing of circulating cell free DNA (cfDNA) obtained from patient’s blood yields a mixture of reads originating from the normal DNA and from multiple tumour subclones—called circulating tumour DNA or ctDNA. The ctDNA/cfDNA ratio as well as the proportion of ctDNA originating from specific tumour subclones depend on multiple factors, making comprehensive detection of mutations difficult, especially at early stages of cancer. Furthermore, sensitive and accurate detection of single nucleotide variants (SNVs) and indels from cfDNA is constrained by several factors such as the sequencing errors and PCR artifacts, and mapping errors related to repeat regions within the genome. In this article, we introduce SiNVICT, a computational method that increases the sensitivity and specificity of SNV and indel detection at very low variant allele frequencies. SiNVICT has the capability to handle multiple sequencing platforms with different error properties; it minimizes false positives resulting from mapping errors and other technology specific artifacts including strand bias and low base quality at read ends. SiNVICT also has the capability to perform time-series analysis, where samples from a patient sequenced at multiple time points are jointly examined to report locations of interest where there is a possibility that certain clones were wiped out by some treatment while some subclones gained selective advantage. Results: We tested SiNVICT on simulated data as well as prostate cancer cell lines and cfDNA obtained from castration-resistant prostate cancer patients. On both simulated and biological data, SiNVICT was able to detect SNVs and indels with variant allele percentages as low as 0.5%. The lowest amounts of total DNA used for the biological data where SNVs and indels could be detected with very high sensitivity were 2.5 ng on the Ion Torrent platform and 10 ng on Illumina. With increased sequencing and mapping accuracy, SiNVICT might be utilized in clinical settings, making it possible to track the progress of point mutations and indels that are associated with resistance to cancer therapies and provide patients personalized treatment. We also compared SiNVICT with other popular SNV callers such as MuTect, VarScan2 and Freebayes. Our results show that SiNVICT performs better than these tools in most cases and allows further data exploration such as time-series analysis on cfDNA sequencing data. Availability and Implementation: SiNVICT is available at: https://sfu-compbio.github.io/sinvict Supplementary information: Supplementary data are available at Bioinformatics online. Contact: cenk@sfu.ca
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 7
    Publikationsdatum: 2012-11-29
    Beschreibung: Motivation: The high throughput sequencing (HTS) platforms generate unprecedented amounts of data that introduce challenges for the computational infrastructure. Data management, storage and analysis have become major logistical obstacles for those adopting the new platforms. The requirement for large investment for this purpose almost signalled the end of the Sequence Read Archive hosted at the National Center for Biotechnology Information (NCBI), which holds most of the sequence data generated world wide. Currently, most HTS data are compressed through general purpose algorithms such as gzip. These algorithms are not designed for compressing data generated by the HTS platforms; for example, they do not take advantage of the specific nature of genomic sequence data, that is, limited alphabet size and high similarity among reads. Fast and efficient compression algorithms designed specifically for HTS data should be able to address some of the issues in data management, storage and communication. Such algorithms would also help with analysis provided they offer additional capabilities such as random access to any read and indexing for efficient sequence similarity search. Here we present SCALCE, a ‘boosting’ scheme based on Locally Consistent Parsing technique, which reorganizes the reads in a way that results in a higher compression speed and compression rate, independent of the compression algorithm in use and without using a reference genome. Results: Our tests indicate that SCALCE can improve the compression rate achieved through gzip by a factor of 4.19—when the goal is to compress the reads alone. In fact, on SCALCE reordered reads, gzip running time can improve by a factor of 15.06 on a standard PC with a single core and 6 GB memory. Interestingly even the running time of SCALCE + gzip improves that of gzip alone by a factor of 2.09. When compared with the recently published BEETL, which aims to sort the (inverted) reads in lexicographic order for improving bzip2, SCALCE + gzip provides up to 2.01 times better compression while improving the running time by a factor of 5.17. SCALCE also provides the option to compress the quality scores as well as the read names, in addition to the reads themselves. This is achieved by compressing the quality scores through order-3 Arithmetic Coding (AC) and the read names through gzip through the reordering SCALCE provides on the reads. This way, in comparison with gzip compression of the unordered FASTQ files (including reads, read names and quality scores), SCALCE (together with gzip and arithmetic encoding) can provide up to 3.34 improvement in the compression rate and 1.26 improvement in running time. Availability: Our algorithm, SCALCE (Sequence Compression Algorithm using Locally Consistent Encoding), is implemented in C++ with both gzip and bzip2 compression options. It also supports multithreading when gzip option is selected, and the pigz binary is available. It is available at http://scalce.sourceforge.net . Contact : fhach@cs.sfu.ca or cenk@cs.sfu.ca Supplementary information: Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 8
    Publikationsdatum: 2014-02-26
    Beschreibung: Motivation: RNA-Seq technology is promising to uncover many novel alternative splicing events, gene fusions and other variations in RNA transcripts. For an accurate detection and quantification of transcripts, it is important to resolve the mapping ambiguity for those RNA-Seq reads that can be mapped to multiple loci: 〉17% of the reads from mouse RNA-Seq data and 50% of the reads from some plant RNA-Seq data have multiple mapping loci. In this study, we show how to resolve the mapping ambiguity in the presence of novel transcriptomic events such as exon skipping and novel indels towards accurate downstream analysis. We introduce ORMAN ( O ptimal R esolution of M ultimapping A mbiguity of R N A-Seq Reads), which aims to compute the minimum number of potential transcript products for each gene and to assign each multimapping read to one of these transcripts based on the estimated distribution of the region covering the read. ORMAN achieves this objective through a combinatorial optimization formulation, which is solved through well-known approximation algorithms, integer linear programs and heuristics. Results: On a simulated RNA-Seq dataset including a random subset of transcripts from the UCSC database, the performance of several state-of-the-art methods for identifying and quantifying novel transcripts, such as Cufflinks, IsoLasso and CLIIQ, is significantly improved through the use of ORMAN. Furthermore, in an experiment using real RNA-Seq reads, we show that ORMAN is able to resolve multimapping to produce coverage values that are similar to the original distribution, even in genes with highly non-uniform coverage. Availability : ORMAN is available at http://orman.sf.net Contact : cenk@cs.sfu.ca Supplementary information: Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 9
    Publikationsdatum: 2011-12-22
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 10
    Publikationsdatum: 2011-06-16
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
Schließen ⊗
Diese Webseite nutzt Cookies und das Analyse-Tool Matomo. Weitere Informationen finden Sie hier...