ALBERT

All Library Books, journals and Electronic Records Telegrafenberg

Your email was sent successfully. Check your inbox.

An error occurred while sending the email. Please try again.

Proceed reservation?

Export
Filter
  • Articles  (8,687)
  • Oxford University Press  (8,687)
  • American Physical Society (APS)
  • De Gruyter
  • 2010-2014  (8,185)
  • 1960-1964  (420)
  • 1955-1959  (82)
  • Computer Science  (8,687)
Collection
  • Articles  (8,687)
Years
Year
Journal
  • 1
    Publication Date: 2014-11-07
    Description: Motivation: Mapping of high-throughput sequencing data and other bulk sequence comparison applications have motivated a search for high-efficiency sequence alignment algorithms. The bit-parallel approach represents individual cells in an alignment scoring matrix as bits in computer words and emulates the calculation of scores by a series of logic operations composed of AND, OR, XOR, complement, shift and addition. Bit-parallelism has been successfully applied to the longest common subsequence (LCS) and edit-distance problems, producing fast algorithms in practice. Results: We have developed BitPAl, a bit-parallel algorithm for general, integer-scoring global alignment. Integer-scoring schemes assign integer weights for match, mismatch and insertion/deletion. The BitPAl method uses structural properties in the relationship between adjacent scores in the scoring matrix to construct classes of efficient algorithms, each designed for a particular set of weights. In timed tests, we show that BitPAl runs 7–25 times faster than a standard iterative algorithm. Availability and implementation: Source code is freely available for download at http://lobstah.bu.edu/BitPAl/BitPAl.html . BitPAl is implemented in C and runs on all major operating systems. Contact : jloving@bu.edu or yhernand@bu.edu or gbenson@bu.edu Supplementary information : Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 2
    Publication Date: 2014-11-07
    Description: : Next-generation sequencing (NGS) has a large potential in HIV diagnostics, and genotypic prediction models have been developed and successfully tested in the recent years. However, albeit being highly accurate, these computational models lack computational efficiency to reach their full potential. In this study, we demonstrate the use of graphics processing units (GPUs) in combination with a computational prediction model for HIV tropism. Our new model named gCUP, parallelized and optimized for GPU, is highly accurate and can classify 〉175 000 sequences per second on an NVIDIA GeForce GTX 460. The computational efficiency of our new model is the next step to enable NGS technologies to reach clinical significance in HIV diagnostics. Moreover, our approach is not limited to HIV tropism prediction, but can also be easily adapted to other settings, e.g. drug resistance prediction. Availability and implementation: The source code can be downloaded at http://www.heiderlab.de Contact: d.heider@wz-straubing.de
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 3
    facet.materialart.
    Unknown
    Oxford University Press
    Publication Date: 2014-11-07
    Description: : We present a new method to incrementally construct the FM-index for both short and long sequence reads, up to the size of a genome. It is the first algorithm that can build the index while implicitly sorting the sequences in the reverse (complement) lexicographical order without a separate sorting step. The implementation is among the fastest for indexing short reads and the only one that practically works for reads of averaged kilobases in length. Availability and implementation: https://github.com/lh3/ropebwt2 Contact: hengli@broadinstitute.org
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 4
    Publication Date: 2014-11-07
    Description: : AliView is an alignment viewer and editor designed to meet the requirements of next-generation sequencing era phylogenetic datasets. AliView handles alignments of unlimited size in the formats most commonly used, i.e. FASTA, Phylip, Nexus, Clustal and MSF. The intuitive graphical interface makes it easy to inspect, sort, delete, merge and realign sequences as part of the manual filtering process of large datasets. AliView also works as an easy-to-use alignment editor for small as well as large datasets. Availability and implementation: AliView is released as open-source software under the GNU General Public License, version 3.0 (GPLv3), and is available at GitHub ( www.github.com/AliView ). The program is cross-platform and extensively tested on Linux, Mac OS X and Windows systems. Downloads and help are available at http://ormbunkar.se/aliview Contact: anders.larsson@ebc.uu.se Supplementary information: Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 5
    Publication Date: 2014-11-07
    Description: Motivation: The ability to accurately read the order of nucleotides in DNA and RNA is fundamental for modern biology. Errors in next-generation sequencing can lead to many artifacts, from erroneous genome assemblies to mistaken inferences about RNA editing. Uneven coverage in datasets also contributes to false corrections. Result: We introduce Trowel, a massively parallelized and highly efficient error correction module for Illumina read data. Trowel both corrects erroneous base calls and boosts base qualities based on the k -mer spectrum. With high-quality k -mers and relevant base information, Trowel achieves high accuracy for different short read sequencing applications.The latency in the data path has been significantly reduced because of efficient data access and data structures. In performance evaluations, Trowel was highly competitive with other tools regardless of coverage, genome size read length and fragment size. Availability and implementation: Trowel is written in C++ and is provided under the General Public License v3.0 (GPLv3). It is available at http://trowel-ec.sourceforge.net . Contact: euncheon.lim@tue.mpg.de or weigel@tue.mpg.de Supplementary information: Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 6
    Publication Date: 2014-11-07
    Description: : The application of protein–protein docking in large-scale interactome analysis is a major challenge in structural bioinformatics and requires huge computing resources. In this work, we present MEGADOCK 4.0, an FFT-based docking software that makes extensive use of recent heterogeneous supercomputers and shows powerful, scalable performance of 〉97% strong scaling. Availability and Implementation: MEGADOCK 4.0 is written in C++ with OpenMPI and NVIDIA CUDA 5.0 (or later) and is freely available to all academic and non-profit users at: http://www.bi.cs.titech.ac.jp/megadock . Contact: akiyama@cs.titech.ac.jp Supplementary information: Supplementary data are available at Bioinformatics online
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 7
    Publication Date: 2014-11-07
    Description: Motivation: The identification of active transcriptional regulatory elements is crucial to understand regulatory networks driving cellular processes such as cell development and the onset of diseases. It has recently been shown that chromatin structure information, such as DNase I hypersensitivity (DHS) or histone modifications, significantly improves cell-specific predictions of transcription factor binding sites. However, no method has so far successfully combined both DHS and histone modification data to perform active binding site prediction. Results: We propose here a method based on hidden Markov models to integrate DHS and histone modifications occupancy for the detection of open chromatin regions and active binding sites. We have created a framework that includes treatment of genomic signals, model training and genome-wide application. In a comparative analysis, our method obtained a good trade-off between sensitivity versus specificity and superior area under the curve statistics than competing methods. Moreover, our technique does not require further training or sequence information to generate binding location predictions. Therefore, the method can be easily applied on new cell types and allow flexible downstream analysis such as de novo motif finding. Availability and implementation: Our framework is available as part of the Regulatory Genomics Toolbox. The software information and all benchmarking data are available at http://costalab.org/wp/dh-hmm . Contact: ivan.costa@rwth-aachen.de or eduardo.gusmao@rwth-aachen.de Supplementary information: Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 8
    Publication Date: 2014-11-07
    Description: Motivation: A proper target or marker is essential in any diagnosis (e.g. an infection or cancer). An ideal diagnostic target should be both conserved in and unique to the pathogen. Currently, these targets can only be identified manually, which is time-consuming and usually error-prone. Because of the increasingly frequent occurrences of emerging epidemics and multidrug-resistant ‘superbugs’, a rapid diagnostic target identification process is needed. Results: A new method that can identify uniquely conserved regions (UCRs) as candidate diagnostic targets for a selected group of organisms solely from their genomic sequences has been developed and successfully tested. Using a sequence-indexing algorithm to identify UCRs and a k -mer integer-mapping model for computational efficiency, this method has successfully identified UCRs within the bacteria domain for 15 test groups, including pathogenic, probiotic, commensal and extremophilic bacterial species or strains. Based on the identified UCRs, new diagnostic primer sets were designed, and their specificity and efficiency were tested by polymerase chain reaction amplifications from both pure isolates and samples containing mixed cultures. Availability and implementation: The UCRs identified for the 15 bacterial species are now freely available at http://ucr.synblex.com . The source code of the programs used in this study is accessible at http://ucr.synblex.com/bacterialIdSourceCode.d.zip Contact: yazhousun@synblex.com Supplementary Information: Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 9
    Publication Date: 2014-11-07
    Description: Motivation: A popular method for classification of protein domain movements apportions them into two main types: those with a ‘hinge’ mechanism and those with a ‘shear’ mechanism. The intuitive assignment of domain movements to these classes has limited the number of domain movements that can be classified in this way. Furthermore, whether intended or not, the term ‘shear’ is often interpreted to mean a relative translation of the domains. Results: Numbers of occurrences of four different types of residue contact changes between domains were optimally combined by logistic regression using the training set of domain movements intuitively classified as hinge and shear to produce a predictor for hinge and shear. This predictor was applied to give a 10-fold increase in the number of examples over the number previously available with a high degree of precision. It is shown that overall a relative translation of domains is rare, and that there is no difference between hinge and shear mechanisms in this respect. However, the shear set contains significantly more examples of domains having a relative twisting movement than the hinge set. The angle of rotation is also shown to be a good discriminator between the two mechanisms. Availability and implementation: Results are free to browse at http://www.cmp.uea.ac.uk/dyndom/interface/ . Contact: sjh@cmp.uea.ac.uk . Supplementary information: Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 10
    Publication Date: 2014-11-07
    Description: Motivation: Recent studies on human disease have revealed that aberrant interaction between proteins probably underlies a substantial number of human genetic diseases. This suggests a need to investigate disease inheritance mode using interaction, and based on which to refresh our conceptual understanding of a series of properties regarding inheritance mode of human disease. Results: We observed a strong correlation between the number of protein interactions and the likelihood of a gene causing any dominant diseases or multiple dominant diseases, whereas no correlation was observed between protein interaction and the likelihood of a gene causing recessive diseases. We found that dominant diseases are more likely to be associated with disruption of important interactions. These suggest inheritance mode should be understood using protein interaction. We therefore reviewed the previous studies and refined an interaction model of inheritance mode, and then confirmed that this model is largely reasonable using new evidences. With these findings, we found that the inheritance mode of human genetic diseases can be predicted using protein interaction. By integrating the systems biology perspectives with the classical disease genetics paradigm, our study provides some new insights into genotype–phenotype correlations. Contact: haodapeng@ems.hrbmu.edu.cn or biofomeng@hotmail.com Supplementary information: Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 11
    Publication Date: 2014-11-07
    Description: : Recently, several high profile studies collected cell viability data from panels of cancer cell lines treated with many drugs applied at different concentrations. Such drug sensitivity data for cancer cell lines provide suggestive treatments for different types and subtypes of cancer. Visualization of these datasets can reveal patterns that may not be obvious by examining the data without such efforts. Here we introduce Drug/Cell-line Browser (DCB), an online interactive HTML5 data visualization tool for interacting with three of the recently published datasets of cancer cell lines/drug-viability studies. DCB uses clustering and canvas visualization of the drugs and the cell lines, as well as a bar graph that summarizes drug effectiveness for the tissue of origin or the cancer subtypes for single or multiple drugs. DCB can help in understanding drug response patterns and prioritizing drug/cancer cell line interactions by tissue of origin or cancer subtype. Availability and implementation: DCB is an open source Web-based tool that is freely available at: http://www.maayanlab.net/LINCS/DCB Contact: avi.maayan@mssm.edu Supplementary information: Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 12
    Publication Date: 2014-12-04
    Description: Motivation : Structural variation is common in human and cancer genomes. High-throughput DNA sequencing has enabled genome-scale surveys of structural variation. However, the short reads produced by these technologies limit the study of complex variants, particularly those involving repetitive regions. Recent ‘third-generation’ sequencing technologies provide single-molecule templates and longer sequencing reads, but at the cost of higher per-nucleotide error rates. Results : We present MultiBreak-SV, an algorithm to detect structural variants (SVs) from single molecule sequencing data, paired read sequencing data, or a combination of sequencing data from different platforms. We demonstrate that combining low-coverage third-generation data from Pacific Biosciences (PacBio) with high-coverage paired read data is advantageous on simulated chromosomes. We apply MultiBreak-SV to PacBio data from four human fosmids and show that it detects known SVs with high sensitivity and specificity. Finally, we perform a whole-genome analysis on PacBio data from a complete hydatidiform mole cell line and predict 1002 high-probability SVs, over half of which are confirmed by an Illumina-based assembly. Availability and implementation : MultiBreak-SV is available at http://compbio.cs.brown.edu/software/ . Contact : annaritz@vt.edu or braphael@cs.brown.edu Supplementary information: Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 13
    Publication Date: 2014-12-04
    Description: Motivation: Insertions play an important role in genome evolution. However, such variants are difficult to detect from short-read sequencing data, especially when they exceed the paired-end insert size. Many approaches have been proposed to call short insertion variants based on paired-end mapping. However, there remains a lack of practical methods to detect and assemble long variants. Results: We propose here an original method, called M ind T he G ap , for the integrated detection and assembly of insertion variants from re-sequencing data. Importantly, it is designed to call insertions of any size, whether they are novel or duplicated, homozygous or heterozygous in the donor genome. M ind T he G ap uses an efficient k -mer-based method to detect insertion sites in a reference genome, and subsequently assemble them from the donor reads. M ind T he G ap showed high recall and precision on simulated datasets of various genome complexities. When applied to real Caenorhabditis elegans and human NA12878 datasets, M ind T he G ap detected and correctly assembled insertions 〉1 kb, using at most 14 GB of memory. Availability and implementation: http://mindthegap.genouest.org Contact: guillaume.rizk@inria.fr or claire.lemaitre@inria.fr
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 14
    Publication Date: 2014-12-04
    Description: Motivation: Most tumor samples are a heterogeneous mixture of cells, including admixture by normal (non-cancerous) cells and subpopulations of cancerous cells with different complements of somatic aberrations. This intra-tumor heterogeneity complicates the analysis of somatic aberrations in DNA sequencing data from tumor samples. Results: We describe an algorithm called THetA2 that infers the composition of a tumor sample—including not only tumor purity but also the number and content of tumor subpopulations—directly from both whole-genome (WGS) and whole-exome (WXS) high-throughput DNA sequencing data. This algorithm builds on our earlier Tumor Heterogeneity Analysis (THetA) algorithm in several important directions. These include improved ability to analyze highly rearranged genomes using a variety of data types: both WGS sequencing (including low ~7 x coverage) and WXS sequencing. We apply our improved THetA2 algorithm to WGS (including low-pass) and WXS sequence data from 18 samples from The Cancer Genome Atlas (TCGA). We find that the improved algorithm is substantially faster and identifies numerous tumor samples containing subclonal populations in the TCGA data, including in one highly rearranged sample for which other tumor purity estimation algorithms were unable to estimate tumor purity. Availability and implementation: An implementation of THetA2 is available at http://compbio.cs.brown.edu/software Contact: layla@cs.brown.edu or braphael@brown.edu Supplementary information: Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 15
    Publication Date: 2014-01-16
    Description: The matrix method, due to Bibel and Andrews, is a proof procedure designed for automated theorem-proving. We show that underlying this method is a fully structured combinatorial model of conventional classical proof theory.
    Print ISSN: 0955-792X
    Electronic ISSN: 1465-363X
    Topics: Computer Science , Mathematics
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 16
    Publication Date: 2014-01-16
    Description: Proof search in inference systems such as the sequent calculus is a process of discovery. Once a proof is found, there is often information in the proof which is redundant. In this article we show how to detect and eliminate certain kinds of redundant formulae from a given proof, and in particular in a way which does not require further proof search or any rearrangement of the proof found. Our technique involves adding constraints to the inference rules, which are used once the proof is complete to determine redundant formulae and how they may be eliminated. We show how this technique can be applied to propositional linear logic, and prove its correctness for this logic. We also discuss how our approach can be extended to other logics without much change.
    Print ISSN: 0955-792X
    Electronic ISSN: 1465-363X
    Topics: Computer Science , Mathematics
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 17
    Publication Date: 2014-01-22
    Description: Good accessibility of publicly funded research data is essential to secure an open scientific system and eventually becomes mandatory [Wellcome Trust will Penalise Scientists Who Don’t Embrace Open Access . The Guardian 2012]. By the use of high-throughput methods in many research areas from physics to systems biology, large data collections are increasingly important as raw material for research. Here, we present strategies worked out by international and national institutions targeting open access to publicly funded research data via incentives or obligations to share data. Funding organizations such as the British Wellcome Trust therefore have developed data sharing policies and request commitment to data management and sharing in grant applications. Increased citation rates are a profound argument for sharing publication data. Pre-publication sharing might be rewarded by a data citation credit system via digital object identifiers (DOIs) which have initially been in use for data objects. Besides policies and incentives, good practice in data management is indispensable. However, appropriate systems for data management of large-scale projects for example in systems biology are hard to find. Here, we give an overview of a selection of open-source data management systems proved to be employed successfully in large-scale projects.
    Print ISSN: 1467-5463
    Electronic ISSN: 1477-4054
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 18
    Publication Date: 2014-01-22
    Description: Genome-scale metabolic network reconstructions are now routinely used in the study of metabolic pathways, their evolution and design. The development of such reconstructions involves the integration of information on reactions and metabolites from the scientific literature as well as public databases and existing genome-scale metabolic models. The reconciliation of discrepancies between data from these sources generally requires significant manual curation, which constitutes a major obstacle in efforts to develop and apply genome-scale metabolic network reconstructions. In this work, we discuss some of the major difficulties encountered in the mapping and reconciliation of metabolic resources and review three recent initiatives that aim to accelerate this process, namely BKM-react, MetRxn and MNXref (presented in this article). Each of these resources provides a pre-compiled reconciliation of many of the most commonly used metabolic resources. By reducing the time required for manual curation of metabolite and reaction discrepancies, these resources aim to accelerate the development and application of high-quality genome-scale metabolic network reconstructions and models.
    Print ISSN: 1467-5463
    Electronic ISSN: 1477-4054
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 19
    Publication Date: 2014-01-22
    Description: microRNAs (miRNAs) are small endogenous non-coding RNAs that function as the universal specificity factors in post-transcriptional gene silencing. Discovering miRNAs, identifying their targets and further inferring miRNA functions have been a critical strategy for understanding normal biological processes of miRNAs and their roles in the development of disease. In this review, we focus on computational methods of inferring miRNA functions, including miRNA functional annotation and inferring miRNA regulatory modules, by integrating heterogeneous data sources. We also briefly introduce the research in miRNA discovery and miRNA-target identification with an emphasis on the challenges to computational biology.
    Print ISSN: 1467-5463
    Electronic ISSN: 1477-4054
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 20
    Publication Date: 2014-01-22
    Description: Supermatrix and supertree analyses are frequently used to more accurately recover vertical evolutionary history but debate still exists over which method provides greater reliability. Traditional methods that resolve relationships among organisms from single genes are often unreliable because of the frequent lack of strong phylogenetic signal and the presence of systematic artifacts. Methods developed to reconstruct organismal history from multiple genes can be divided into supermatrix and supertree approaches. A supermatrix analysis consists of the concatenation of multiple genes into a single, possibly partitioned alignment, from which phylogenies are reconstructed using a variety of approaches. Supertrees build consensus trees from the topological information contained within individual gene trees. Both methods are now widely used and have been demonstrated to solve previously ambiguous or unresolved phylogenies with high statistical support. However, the amount of misleading signal needed to induce erroneous phylogenies for both strategies is still unknown. Using genome simulations, we test the accuracy of supertree and supermatrix approaches in recovering the true organismal phylogeny under increased amounts of horizontally transferred genes and changes in substitution rates. Our results show that overall, supermatrix approaches are preferable when a low amount of gene transfer is suspected to be present in the dataset, while supertrees have greater reliability in the presence of a moderate amount of misleading gene transfers. In the face of very high or very low substitution rates without horizontal gene transfers, supermatrix approaches outperform supertrees as individual gene trees remain unresolved and additional sequences contribute to a congruent phylogenetic signal.
    Print ISSN: 1467-5463
    Electronic ISSN: 1477-4054
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 21
    Publication Date: 2014-01-16
    Description: Motivation: We have recently characterized an instance of alternative splicing that differs from the canonical gene transcript by deletion of a length of sequence not divisible by three, but where translation can be rescued by an alternative start codon. This results in a predicted protein in which the amino terminus differs markedly in sequence from the known protein product(s), as it is translated from an alternative reading frame. Automated pipelines have annotated thousands of splice variants but have overlooked these protein isoforms, leading to them being underrepresented in current databases. Results: Here we describe 1849 human and 733 mouse transcripts that can be transcribed from an alternate ATG. Of these, 〉80% have not been annotated previously. Those conserved between human and mouse genomes (and hence under likely evolutionary selection) are identified. We provide mass spectroscopy evidence for translation of selected transcripts. Of the described splice variants, only one has previously been studied in detail and converted the encoded protein from an activator of cell-function to a suppressor, demonstrating that these splice variants can result in profound functional change. We investigate the potential functional effects of this splicing using a variety of bioinformatic tools. The 2582 variants we describe are involved in a wide variety of biological processes, and therefore open many new avenues of research. Contact: aude.fahrer@anu.edu.au Supplementary Inforation: Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 22
    Publication Date: 2014-01-16
    Description: Motivation : High-throughput sequencing technologies enable the genome-wide analysis of the impact of genetic variation on molecular phenotypes at unprecedented resolution. However, although powerful, these technologies can also introduce unexpected artifacts. Results : We investigated the impact of library amplification bias on the identification of allele-specific (AS) molecular events from high-throughput sequencing data derived from chromatin immunoprecipitation assays (ChIP-seq). Putative AS DNA binding activity for RNA polymerase II was determined using ChIP-seq data derived from lymphoblastoid cell lines of two parent–daughter trios. We found that, at high-sequencing depth, many significant AS binding sites suffered from an amplification bias, as evidenced by a larger number of clonal reads representing one of the two alleles. To alleviate this bias, we devised an amplification bias detection strategy, which filters out sites with low read complexity and sites featuring a significant excess of clonal reads. This method will be useful for AS analyses involving ChIP-seq and other functional sequencing assays. Availability : The R package absfilter for library clonality simulations and detection of amplification-biased sites is available from http://updepla1srv1.epfl.ch/waszaks/absfilter Contact : sebastian.waszak@epfl.ch or bart.deplancke@epfl.ch Supplementary information: Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 23
    Publication Date: 2014-01-16
    Description: Motivation : Recently, investigators have proposed state-of-the-art Identity-by-descent (IBD) mapping methods to detect IBD segments between purportedly unrelated individuals. The IBD information can then be used for association testing in genetic association studies. One approach for this IBD association testing strategy is to test for excessive IBD between pairs of cases (‘pairwise method’). However, this approach is inefficient because it requires a large number of permutations. Moreover, a limited number of permutations define a lower bound for P -values, which makes fine-mapping of associated regions difficult because, in practice, a much larger genomic region is implicated than the region that is actually associated. Results: In this article, we introduce a new pairwise method ‘Fast-Pairwise’. Fast-Pairwise uses importance sampling to improve efficiency and enable approximation of extremely small P -values. Fast-Pairwise method takes only days to complete a genome-wide scan. In the application to the WTCCC type 1 diabetes data, Fast-Pairwise successfully fine-maps a known human leukocyte antigen gene that is known to cause the disease. Availability: Fast-Pairwise is publicly available at: http://genetics.cs.ucla.edu/graphibd . Contact: eeskin@cs.ucla.edu Supplementary information: Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 24
    Publication Date: 2014-01-16
    Description: Motivation: Measurements are commonly taken from two phenotypes to build a classifier, where the number of data points from each class is predetermined, not random. In this ‘separate sampling’ scenario, the data cannot be used to estimate the class prior probabilities. Moreover, predetermined class sizes can severely degrade classifier performance, even for large samples. Results: We employ simulations using both synthetic and real data to show the detrimental effect of separate sampling on a variety of classification rules. We establish propositions related to the effect on the expected classifier error owing to a sampling ratio different from the population class ratio. From these we derive a sample-based minimax sampling ratio and provide an algorithm for approximating it from the data. We also extend to arbitrary distributions the classical population-based Anderson linear discriminant analysis minimax sampling ratio derived from the discriminant form of the Bayes classifier. Availability: All the codes for synthetic data and real data examples are written in MATLAB. A function called mmratio, whose output is an approximation of the minimax sampling ratio of a given dataset, is also written in MATLAB. All the codes are available at: http://gsp.tamu.edu/Publications/supplementary/shahrokh13b . Contact: edward@ece.tamu.edu Supplementary information: Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 25
    Publication Date: 2014-01-16
    Description: Motivation:  Expression vectors used in different biotechnology applications are designed with domain-specific rules. For instance, promoters, origins of replication or homologous recombination sites are host-specific. Similarly, chromosomal integration or viral delivery of an expression cassette imposes specific structural constraints. As de novo gene synthesis and synthetic biology methods permeate many biotechnology specialties, the design of application-specific expression vectors becomes the new norm. In this context, it is desirable to formalize vector design strategies applicable in different domains. Results:  Using the design of constructs to express genes in the chloroplast of Chlamydomonas reinhardtii as an example, we show that a vector design strategy can be formalized as a domain-specific language. We have developed a graphical editor of context-free grammars usable by biologists without prior exposure to language theory. This environment makes it possible for biologists to iteratively improve their design strategies throughout the course of a project. It is also possible to ensure that vectors designed with early iterations of the language are consistent with the latest iteration of the language. Availability and implementation:  The context-free grammar editor is part of the GenoCAD application. A public instance of GenoCAD is available at http://www.genocad.org . GenoCAD source code is available from SourceForge and licensed under the Apache v2.0 open source license. Contact:   peccoud@vt.edu Supplementary Information:   Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 26
    Publication Date: 2014-01-16
    Description: Motivation: Homology search methods are dominated by the central paradigm that sequence similarity is a proxy for common ancestry and, by extension, functional similarity. For determining sequence similarity in proteins, most widely used methods use models of sequence evolution and compare amino-acid strings in search for conserved linear stretches. Probabilistic models or sequence profiles capture the position-specific variation in an alignment of homologous sequences and can identify conserved motifs or domains. While profile-based search methods are generally more accurate than simple sequence comparison methods, they tend to be computationally more demanding. In recent years, several methods have emerged that perform protein similarity searches based on domain composition. However, few methods have considered the linear arrangements of domains when conducting similarity searches, despite strong evidence that domain order can harbour considerable functional and evolutionary signal. Results: Here, we introduce an alignment scheme that uses a classical dynamic programming approach to the global alignment of domains. We illustrate that representing proteins as strings of domains (domain arrangements) and comparing these strings globally allows for a both fast and sensitive homology search. Further, we demonstrate that the presented methods complement existing methods by finding similar proteins missed by popular amino-acid–based comparison methods. Availability: An implementation of the presented algorithms, a web-based interface as well as a command-line program for batch searching against the UniProt database can be found at http://rads.uni-muenster.de . Furthermore, we provide a JAVA API for programmatic access to domain-string–based search methods. Contact: terrapon.nicolas@gmail.com or ebb@uni-muenster.de Supplementary information: Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 27
    Publication Date: 2014-01-16
    Description: Motivation: DNA enrichment followed by sequencing is a versatile tool in molecular biology, with a wide variety of applications including genome-wide analysis of epigenetic marks and mechanisms. A common requirement of these diverse applications is a comparison of read coverage between experimental conditions. The amount of samples generated for such comparisons ranges from few replicates to hundreds of samples per condition for epigenome-wide association studies. Consequently, there is an urgent need for software that allows for fast and simple processing and comparison of sequencing data derived from enriched DNA. Results: Here, we present a major update of the R/Bioconductor package MEDIPS, which allows for an arbitrary number of replicates per group and integrates sophisticated statistical methods for the detection of differential coverage between experimental conditions. Our approach can be applied to a diversity of quantitative sequencing data. In addition, our update adds novel functionality to MEDIPS, including correlation analysis between samples, and takes advantage of Bioconductor’s annotation databases to facilitate annotation of specific genomic regions. Availability and implementation: The latest version of MEDIPS is available as version 1.12.0 and part of Bioconductor 2.13. The package comes with a manual containing detailed description of its functionality and is available at http://www.bioconductor.org . Contact: lienhard@molgen.mpg.de Supplementary information: Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 28
    Publication Date: 2014-01-16
    Description: Motivation:  Most methods for estimating differential expression from RNA-seq are based on statistics that compare normalized read counts between treatment classes. Unfortunately, reads are in general too short to be mapped unambiguously to features of interest, such as genes, isoforms or haplotype-specific isoforms. There are methods for estimating expression levels that account for this source of ambiguity. However, the uncertainty is not generally accounted for in downstream analysis of gene expression experiments. Moreover, at the individual transcript level, it can sometimes be too large to allow useful comparisons between treatment groups. Results:  In this article we make two proposals that improve the power, specificity and versatility of expression analysis using RNA-seq data. First, we present a Bayesian method for model selection that accounts for read mapping ambiguities using random effects. This polytomous model selection approach can be used to identify many interesting patterns of gene expression and is not confined to detecting differential expression between two groups. For illustration, we use our method to detect imprinting, different types of regulatory divergence in cis and in trans and differential isoform usage, but many other applications are possible. Second, we present a novel collapsing algorithm for grouping transcripts into inferential units that exploits the posterior correlation between transcript expression levels. The aggregate expression levels of these units can be estimated with useful levels of uncertainty. Our algorithm can improve the precision of expression estimates when uncertainty is large with only a small reduction in biological resolution. Availability and implementation:  We have implemented our software in the mmdiff and mmcollapse multithreaded C++ programs as part of the open-source MMSEQ package, available on https://github.com/eturro/mmseq . Contact:   et341@cam.ac.uk Supplementary information:   Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 29
    Publication Date: 2014-01-16
    Description: Motivation:  Nucleotide sequence data are being produced at an ever increasing rate. Clustering such sequences by similarity is often an essential first step in their analysis—intended to reduce redundancy, define gene families or suggest taxonomic units. Exact clustering algorithms, such as hierarchical clustering, scale relatively poorly in terms of run time and memory usage, yet they are desirable because heuristic shortcuts taken during clustering might have unintended consequences in later analysis steps. Results:  Here we present HPC-CLUST, a highly optimized software pipeline that can cluster large numbers of pre-aligned DNA sequences by running on distributed computing hardware. It allocates both memory and computing resources efficiently, and can process more than a million sequences in a few hours on a small cluster. Availability and implementation:  Source code and binaries are freely available at http://meringlab.org/software/hpc-clust/ ; the pipeline is implemented in C++ and uses the Message Passing Interface (MPI) standard for distributed computing. Contact:  mering@imls.uzh.ch Supplementary Information:  Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 30
    Publication Date: 2014-01-16
    Description: : High-throughput technologies have led to an explosion of genomic data available for automated analysis. The consequent possibility to simultaneously sample multiple layers of variation along the gene expression flow requires computational methods integrating raw information from different ‘-omics’. It has been recently demonstrated that translational control is a widespread phenomenon, with profound and still underestimated regulation capabilities. Although detecting changes in the levels of total messenger RNAs (mRNAs; the transcriptome), of polysomally loaded mRNAs (the translatome) and of proteins (the proteome) is experimentally feasible in a high-throughput way, the integration of these levels is still far from being robustly approached. Here we introduce tRanslatome, a new R/Bioconductor package, which is a complete platform for the simultaneous pairwise analysis of transcriptome, translatome and proteome data. The package includes most of the available statistical methods developed for the analysis of high-throughput data, allowing the parallel comparison of differentially expressed genes and the corresponding differentially enriched biological themes. Notably, it also enables the prediction of translational regulatory elements on mRNA sequences. The utility of this tool is demonstrated with two case studies. Availability and implementation: tRanslatome is available in Bioconductor. Contact : t.tebaldi@unitn.it Supplementary information: Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 31
    Publication Date: 2014-01-16
    Description: : DoMosaics is an application that unifies protein domain annotation, domain arrangement analysis and visualization in a single tool. It simplifies the analysis of protein families by consolidating disjunct procedures based on often inconvenient command-line applications and complex analysis tools. It provides a simple user interface with access to domain annotation services such as InterProScan or a local HMMER installation, and can be used to compare, analyze and visualize the evolution of domain architectures. Availability and implementation: DoMosaics is licensed under the Apache License, Version 2.0, and binaries can be freely obtained from www.domosaics.net . Contact: radmoore@uni-muenster.de or e.bornberg@uni-muenster.de
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 32
    Publication Date: 2014-01-16
    Description: Motivation: A common problem in understanding a biochemical system is to infer its correct structure or topology. This topology consists of all relevant state variables—usually molecules and their interactions. Here we present a method called topological augmentation to infer this structure in a statistically rigorous and systematic way from prior knowledge and experimental data. Results: Topological augmentation starts from a simple model that is unable to explain the experimental data and augments its topology by adding new terms that capture the experimental behavior. This process is guided by representing the uncertainty in the model topology through stochastic differential equations whose trajectories contain information about missing model parts. We first apply this semiautomatic procedure to a pharmacokinetic model. This example illustrates that a global sampling of the parameter space is critical for inferring a correct model structure. We also use our method to improve our understanding of glutamine transport in yeast. This analysis shows that transport dynamics is determined by glutamine permeases with two different kinds of kinetics. Topological augmentation can not only be applied to biochemical systems, but also to any system that can be described by ordinary differential equations. Availability and implementation: Matlab code and examples are available at: http://www.csb.ethz.ch/tools/index . Contact: mikael.sunnaker@bsse.ethz.ch ; andreas.wagner@ieu.uzh.ch Supplementary information: Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 33
    Publication Date: 2014-01-16
    Description: : Assembling and/or producing integrated knowledge of sequence features continues to be an onerous and redundant task despite a large number of existing resources. We have developed SeqDepot—a novel database that focuses solely on two primary goals: (i) assimilating known primary sequences with predicted feature data and (ii) providing the most simple and straightforward means to procure and readily use this information. Access to 〉28.5 million sequences and 300 million features is provided through a well-documented and flexible RESTful interface that supports fetching specific data subsets, bulk queries, visualization and searching by MD5 digests or external database identifiers. We have also developed an HTML5/JavaScript web application exemplifying how to interact with SeqDepot and Perl/Python scripts for use with local processing pipelines. Availability: Freely available on the web at http://seqdepot.net/ . REST access via http://seqdepot.net/api/v1 . Database files and scripts may be downloaded from http://seqdepot.net/download . Contact: ulrich.luke+sci@gmail.com
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 34
    facet.materialart.
    Unknown
    Oxford University Press
    Publication Date: 2014-01-16
    Description: Motivation: Microarray data analysis is often applied to characterize disease populations by identifying individual genes linked to the disease. In recent years, efforts have shifted to focus on sets of genes known to perform related biological functions (i.e. in the same pathways). Evaluating gene sets reduces the need to correct for false positives in multiple hypothesis testing. However, pathways are often large, and genes in the same pathway that do not contribute to the disease can cause a method to miss the pathway. In addition, large pathways may not give much insight to the cause of the disease. Moreover, when such a method is applied independently to two datasets of the same disease phenotypes, the two resulting lists of significant pathways often have low agreement. Results: We present a powerful method, PFSNet, that identifies smaller parts of pathways (which we call subnetworks), and show that significant subnetworks (and the genes therein) discovered by PFSNet are up to 51% (64%) more consistent across independent datasets of the same disease phenotypes, even for datasets based on different platforms, than previously published methods. We further show that those methods which initially declared some large pathways to be insignificant would declare subnetworks detected by PFSNet in those large pathways to be significant, if they were given those subnetworks as input instead of the entire large pathways. Availability: http://compbio.ddns.comp.nus.edu.sg:8080/pfsnet/ Contact: kevinl@comp.nus.edu.sg Supplementary Information: Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 35
    Publication Date: 2014-01-16
    Description: :  Pathway Commons is a resource permitting simultaneous queries of multiple pathway databases. However, there is no standard mechanism for using these data (stored in BioPAX format) to annotate and build quantitative mathematical models. Therefore, we developed a new module within the virtual cell modeling and simulation software. It provides pathway data retrieval and visualization and enables automatic creation of executable network models directly from qualitative connections between pathway nodes. Availability and implementation:  Available at Virtual Cell ( http://vcell.org/ ). Application runs on all major platforms and does not require registration for use on the user’s computer. Tutorials and video are available at user guide page. Contact:   vcell_support@uchc.edu
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 36
    Publication Date: 2014-01-16
    Description: : myChEMBL is a completely open platform, which combines public domain bioactivity data with open source database and cheminformatics technologies. myChEMBL consists of a Linux (Ubuntu) Virtual Machine featuring a PostgreSQL schema with the latest version of the ChEMBL database, as well as the latest RDKit cheminformatics libraries. In addition, a self-contained web interface is available, which can be modified and improved according to user specifications. Availability and implementation: The VM is available at: ftp://ftp.ebi.ac.uk/pub/databases/chembl/VM/myChEMBL/current . The web interface and web services code is available at: https://github.com/rochoa85/myChEMBL . Contact: jpo@ebi.ac.uk
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 37
    Publication Date: 2014-01-16
    Description: Motivation: The identification of cell cycle-regulated genes through the cyclicity of messenger RNAs in genome-wide studies is a difficult task due to the presence of internal and external noise in microarray data. Moreover, the analysis is also complicated by the loss of synchrony occurring in cell cycle experiments, which often results in additional background noise. Results: To overcome these problems, here we propose the LEON (LEarning and OptimizatioN) algorithm, able to characterize the ‘cyclicity degree’ of a gene expression time profile using a two-step cascade procedure. The first step identifies a potentially cyclic behavior by means of a Support Vector Machine trained with a reliable set of positive and negative examples. The second step selects those genes having peak timing consistency along two cell cycles by means of a non-linear optimization technique using radial basis functions. To prove the effectiveness of our combined approach, we use recently published human fibroblasts cell cycle data and, performing in vivo experiments, we demonstrate that our computational strategy is able not only to confirm well-known cell cycle-regulated genes, but also to predict not yet identified ones. Availability and implementation: All scripts for implementation can be obtained on request. Contact: lorenzo.farina@uniroma1.it or gurtner@ifo.it Supplementary information: Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 38
    Publication Date: 2014-01-16
    Description: Motivation: RNA-seq technology has been widely adopted as an attractive alternative to microarray-based methods to study global gene expression. However, robust statistical tools to analyze these complex datasets are still lacking. By grouping genes with similar expression profiles across treatments, cluster analysis provides insight into gene functions and networks, and hence is an important technique for RNA-seq data analysis. Results: In this manuscript, we derive clustering algorithms based on appropriate probability models for RNA-seq data. An expectation-maximization algorithm and another two stochastic versions of expectation-maximization algorithms are described. In addition, a strategy for initialization based on likelihood is proposed to improve the clustering algorithms. Moreover, we present a model-based hybrid-hierarchical clustering method to generate a tree structure that allows visualization of relationships among clusters as well as flexibility of choosing the number of clusters. Results from both simulation studies and analysis of a maize RNA-seq dataset show that our proposed methods provide better clustering results than alternative methods such as the K-means algorithm and hierarchical clustering methods that are not based on probability models. Availability and implementation: An R package, MBCluster.Seq, has been developed to implement our proposed algorithms. This R package provides fast computation and is publicly available at http://www.r-project.org . Contact: sy@swufe.edu.cn ; pliu@iastate.edu Supplementary Information: Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 39
    Publication Date: 2014-01-16
    Description: Motivation:  Modern biomedical and epidemiological studies often measure hundreds or thousands of biomarkers, such as gene expression or metabolite levels. Although there is an extensive statistical literature on adjusting for ‘multiple comparisons’ when testing whether these biomarkers are directly associated with a disease, testing whether they are biological mediators between a known risk factor and a disease requires a more complex null hypothesis, thus offering additional methodological challenges. Results:  We propose a permutation approach that tests multiple putative mediators and controls the family wise error rate. We demonstrate that, unlike when testing direct associations, replacing the Bonferroni correction with a permutation approach that focuses on the maximum of the test statistics can significantly improve the power to detect mediators even when all biomarkers are independent. Through simulations, we show the power of our method is 2–5 x larger than the power achieved by Bonferroni correction. Finally, we apply our permutation test to a case-control study of dietary risk factors and colorectal adenoma to show that, of 149 test metabolites, docosahexaenoate is a possible mediator between fish consumption and decreased colorectal adenoma risk. Availability and implementation:  R-package included in online Supplementary Material. Contact:   joshua.sampson@nih.gov Supplementary information:   Supplementary materials are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 40
    Publication Date: 2014-01-16
    Print ISSN: 0955-792X
    Electronic ISSN: 1465-363X
    Topics: Computer Science , Mathematics
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 41
    Publication Date: 2014-01-16
    Description: Segerberg's Dynamic Deontic Logic is a dynamic logic where among the set of all possible histories those fulfilling the norms are distinguished. An extension of this logic to obligations (respectively permissions and prohibitions) to do an action before a given deadline or during a given time interval is defined. These temporal constraints are defined by events which may have several occurrences (like the obligation to update a given file before midnight). Violations of these kinds of norms are defined in this logical framework.
    Print ISSN: 0955-792X
    Electronic ISSN: 1465-363X
    Topics: Computer Science , Mathematics
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 42
    Publication Date: 2014-01-16
    Description: In this article we show how to model a range of notions in the context of delegation and revocation applied to security scenarios. We demonstrate how a range of delegation–revocation models and policies may be represented in pictorial form and formally represented in terms of reactive Kripke models and a first-order policy specification language. We translate first-order representations of our reactive Kripke models into an equivalent Answer Set Programming form that enables users to apply flexibly well-defined definitions of predicates to represent their requirements in terms of delegation–revocation policy specification.
    Print ISSN: 0955-792X
    Electronic ISSN: 1465-363X
    Topics: Computer Science , Mathematics
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 43
    Publication Date: 2014-01-16
    Description: Error-driven ranking algorithms (EDRAs) perform a sequence of slight re-rankings of the constraint set triggered by mistakes on the incoming stream of data. The sequence of rankings entertained by the algorithm (and in particular the final ranking entertained at convergence) depends not only on the grammar the algorithm is trained on, but also on the specific way data are sampled from that grammar and fed to the algorithm. The robust analysis of EDRAs pinpoints at properties of the predicted sequence of rankings that are robust, namely only depend on the target grammar, not on the way the data are sampled from it. This article reviews in detail Tesar and Smolensky's (1998, Linguist Inq. , 29, 229–268.) robust analysis of EDRAs that perform constraint demotion only, but no constraint promotion. This article then develops a new tool for the robust analysis of EDRAs that perform both constraint demotion and promotion. The latter tool is applied to the robust analysis of the EDRA model of the child's early acquisition of phonotactics, through a detailed discussion of restrictiveness on three case studies from Prince and Tesar (2004, Constraints in Phonological Acquisition , 245–291), that crucially require EDRAs that perform both demotion and promotion.
    Print ISSN: 0955-792X
    Electronic ISSN: 1465-363X
    Topics: Computer Science , Mathematics
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 44
    Publication Date: 2014-01-16
    Description: Modern software systems usually deal with several sorts (types) of data elements simultaneously. Some of these sorts, like integers, booleans, and so on, can be seen as having an immediate, direct nature and therefore are called visible , and they are contrasted with the others, like types of objects (in object-oriented (OO) sense), which are called hidden sorts. A language used to specify such software system has to be heterogeneous. In addition, to reason about such computations, we have to consider k -tuples of formulas (for instance, pairs in equational reasoning). Consequently, a consequence relation used to specify and verify the properties of those systems must relate sorted sets of k -formulas with individual k -formulas. Logics usually employed in this process are called hidden k-logics and are very general in nature: they comprise several classes of logical systems, including the 2-dimensional hidden and standard equational logics, and Boolean logic. In this article, we propose a generalization of the notion of deduction-detachment system for hidden k -logics. We introduce a syntactic notion of translation, which will be used to define an equivalence relation between hidden k -logics. We show that this notion of equivalence preserves some logical properties, namely the deduction-detachment theorem (DDT) and the Craig interpolation property. We also show that if a specifiable hidden k -logic admits the DDT then it admits a presentation whose only inference rules are the generalized modus ponens rules with respect to the deduction-detachment system.
    Print ISSN: 0955-792X
    Electronic ISSN: 1465-363X
    Topics: Computer Science , Mathematics
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 45
    Publication Date: 2014-01-16
    Description: Motivation: For samples of unrelated individuals, we propose a general analysis framework in which hundred thousands of genetic loci can be tested simultaneously for association with complex phenotypes. The approach is built on spatial-clustering methodology, assuming that genetic loci that are associated with the target phenotype cluster in certain genomic regions. In contrast to standard methodology for multilocus analysis, which has focused on the dimension reduction of the data, our multilocus association-clustering test profits from the availability of large numbers of genetic loci by detecting clusters of loci that are associated with the phenotype. Results: The approach is computationally fast and powerful, enabling the simultaneous association testing of large genomic regions. Even the entire genome or certain chromosomes can be tested simultaneously. Using simulation studies, the properties of the approach are evaluated. In an application to a genome-wide association study for chronic obstructive pulmonary disease, we illustrate the practical relevance of the proposed method by simultaneously testing all genotyped loci of the genome-wide association study and by testing each chromosome individually. Our findings suggest that statistical methodology that incorporates spatial-clustering information will be especially useful in whole-genome sequencing studies in which millions or billions of base pairs are recorded and grouped by genomic regions or genes, and are tested jointly for association. Availability and implementation: Implementation of the approach is available upon request. Contact : daq412@mail.harvard.edu Supplementary information: Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 46
    facet.materialart.
    Unknown
    Oxford University Press
    Publication Date: 2014-01-16
    Description: ECTL is an extension of the computation tree logic (CTL) with two operators GF and FG where GF and FG represent ‘there is a path along which holds infinitely often’ and ‘along any path, there exists a state after which always holds’, respectively. A Hilbert-style axiomatization of ECTL is defined by adding the schemata G( -〉 ) -〉 (GF -〉 GF ), GF F( XGF ), G( -〉 XF ) -〉 ( -〉 GF ) and FG ¬ GF¬ to the axioms of CTL. We prove its soundness and completeness with respect to arbitrary and finite models, i.e. equivalence of the following three conditions: (i) is provable in this axiomatization of ECTL; (ii) is valid in any model; (iii) is valid in any finite model.
    Print ISSN: 0955-792X
    Electronic ISSN: 1465-363X
    Topics: Computer Science , Mathematics
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 47
    Publication Date: 2014-01-16
    Description: Gurevich and Neeman introduced Distributed Knowledge Authorization Language (DKAL). The world of DKAL consists of communicating principals computing their own knowledge in their own states. DKAL is based on a new logic of information, the so-called infon logic , and its efficient subsystem called primal logic . In this article, we simplify Kripkean semantics of primal logic and study various extensions of it in search to balance expressivity and efficiency. On the proof-theoretic side we develop cut-free Gentzen-style sequent calculi for the original primal logic and its extensions.
    Print ISSN: 0955-792X
    Electronic ISSN: 1465-363X
    Topics: Computer Science , Mathematics
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 48
    Publication Date: 2014-01-22
    Description: The iteratively reweighted least square (IRLS) method is mostly identical to maximum likelihood (ML) method in terms of parameter estimation and power of quantitative trait locus (QTL) detection. But the IRLS is greatly superior to ML in terms of computing speed and the robustness of parameter estimation. In conjunction with the priors of parameters, ML can analyze multiple QTL model based on Bayesian theory, whereas under a single QTL model, IRLS has very limited statistical power to detect multiple QTLs. In this study, we proposed the iteratively reweighted least absolute shrinkage and selection operator (IRLASSO) for extending IRLS to simultaneously map multiple QTLs. The LASSO with coordinate descent step is employed to efficiently estimate non-zero genetic effect of each locus scanned over entire genome. Simulations demonstrate that IRLASSO has a higher precision of parameter estimation and power to detect QTL than IRLS, and is able to estimate residual variance more accurately than the unweighted LASSO based on LS. Especially, IRLASSO is very fast, usually taking less than five iterations to converge. The barley dataset from the North American Barley Genome Mapping Project is reanalyzed by our proposed method.
    Print ISSN: 1467-5463
    Electronic ISSN: 1477-4054
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 49
    Publication Date: 2014-01-22
    Description: The formation of phenotypic traits, such as biomass production, tumor volume and viral abundance, undergoes a complex process in which interactions between genes and developmental stimuli take place at each level of biological organization from cells to organisms. Traditional studies emphasize the impact of genes by directly linking DNA-based markers with static phenotypic values. Functional mapping, derived to detect genes that control developmental processes using growth equations, has proven powerful for addressing questions about the roles of genes in development. By treating phenotypic formation as a cohesive system using differential equations, a different approach—systems mapping—dissects the system into interconnected elements and then map genes that determine a web of interactions among these elements, facilitating our understanding of the genetic machineries for phenotypic development. Here, we argue that genetic mapping can play a more important role in studying the genotype–phenotype relationship by filling the gaps in the biochemical and regulatory process from DNA to end-point phenotype. We describe a new framework, named network mapping, to study the genetic architecture of complex traits by integrating the regulatory networks that cause a high-order phenotype. Network mapping makes use of a system of differential equations to quantify the rule by which transcriptional, proteomic and metabolomic components interact with each other to organize into a functional whole. The synthesis of functional mapping, systems mapping and network mapping provides a novel avenue to decipher a comprehensive picture of the genetic landscape of complex phenotypes that underlie economically and biomedically important traits.
    Print ISSN: 1467-5463
    Electronic ISSN: 1477-4054
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 50
    Publication Date: 2014-01-22
    Description: Traditional approaches for genetic mapping are to simply associate the genotypes of a quantitative trait locus (QTL) with the phenotypic variation of a complex trait. A more mechanistic strategy has emerged to dissect the trait phenotype into its structural components and map specific QTLs that control the mechanistic and structural formation of a complex trait. We describe and assess such a strategy, called structural mapping, by integrating the internal structural basis of trait formation into a QTL mapping framework. Electrical impedance spectroscopy (EIS) has been instrumental for describing the structural components of a phenotypic trait and their interactions. By building robust mathematical models on circuit EIS data and embedding these models within a mixture model-based likelihood for QTL mapping, structural mapping implements the EM algorithm to obtain maximum likelihood estimates of QTL genotype-specific EIS parameters. The uniqueness of structural mapping is to make it possible to test a number of hypotheses about the pattern of the genetic control of structural components. We validated structural mapping by analyzing an EIS data collected for QTL mapping of frost hardiness in a controlled cross of jujube trees. The statistical properties of parameter estimates were examined by simulation studies. Structural mapping can be a powerful alternative for genetic mapping of complex traits by taking account into the biological and physical mechanisms underlying their formation.
    Print ISSN: 1467-5463
    Electronic ISSN: 1477-4054
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 51
    facet.materialart.
    Unknown
    Oxford University Press
    Publication Date: 2014-05-01
    Description: Merged processes (MPs) are a recently proposed condensed representation of a Petri net's behaviour similar to branching processes (unfoldings), which copes well not only with concurrency but also with other sources of state space explosion like sequences of choices. They are by orders of magnitude more compact than traditional unfoldings, and yet can be used for efficient model checking. However, constructing complete MPs is difficult, and the only known algorithm is based on building a (potentially much larger) complete unfolding prefix of a Petri net, whose nodes are then merged. Obviously, this significantly reduces their appeal as a representation that can be used for practical model checking. In this paper, we develop an algorithm that avoids constructing the intermediate unfolding prefix and builds a complete merged process directly from a safe Petri net. In particular, a challenging problem of truncating a merged process is solved.
    Print ISSN: 0010-4620
    Electronic ISSN: 1460-2067
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 52
    Publication Date: 2014-05-01
    Description: We present an algorithm for the correction of an XML document with respect to schema constraints expressed as a document type definition. Given a well-formed XML document t seen as a tree, a schema S and a non-negative threshold th , the algorithm finds every tree t ' valid with respect to S such that the edit distance between t and t ' is no higher than th . The algorithm is based on a recursive exploration of the finite-state automata representing structural constraints imposed by the schema, as well as on the construction of an edit distance matrix storing edit sequences leading to correction trees. We prove the termination, correctness and completeness of the algorithm, as well as its exponential time complexity. We also perform experimental tests on real-life XML data showing the influence of various input parameters on the execution time and on the number of solutions found. The algorithm's implementation demonstrates polynomial rather than exponential behavior. It has been made public under the GNU LGPL v3 license. As we show in our in-depth discussion of the related work, this is the first full-fledged study of the document-to-schema correction problem.
    Print ISSN: 0010-4620
    Electronic ISSN: 1460-2067
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 53
    Publication Date: 2014-05-01
    Description: Software architecture slicing extracts the right software architecture to provide reference or design guiding for developing software architecture. It will reduce the complexity of the requirement specifications based on a selected slicing criterion of either the component or the connector, but little effort has been made regarding the relationship between forward slicing and backward slicing analysis at the architectural level. This paper combines architecture description language -architecture description language semantics to build behavior graph ( BG) to represent the software architecture, and proposes methods for the coarse-grained software architecture slicing, which can reduce the number of components, connectors and constraints of BG. This method is based on the relationships between the port of the component and the role of the connector, which makes use of both forward and backward coarse-grained architecture slicing of BG. In order to understand the similarities and differences between the forward and backward architecture slicing techniques, some experiments are done. Two results are obtained: The first point is that the average percentage reduction of the backward coarse-grained architecture slice is equal to the average percentage reduction of the forward coarse-grained architecture slice. The second point is that the percentage reduction of the forward coarse-grained architecture slice cluster changes on average, while the percentage reduction of the backward coarse-grained architecture slice cluster change the quickly, and the more extreme cases.
    Print ISSN: 0010-4620
    Electronic ISSN: 1460-2067
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 54
    Publication Date: 2014-05-01
    Description: There is an increasing demand to efficiently process emerging types of queries, such as progressive queries (PQs), from contemporary database applications including telematics, e-commerce and social media. Unlike conventional queries, a PQ consists of a set of step-queries (SQ). A user formulates a new SQ on the fly based on the result(s) from the previous SQ(s). Existing database management systems were not designed to efficiently process such queries. In this paper, we present a novel technique to efficiently process a special type of PQ, called monotonic linear PQs, based on dynamically materialized views. The key idea is to create a superior relationship graph for SQs from historical PQs that can be used to estimate the benefit of keeping the current SQ result as a materialized view. The materialized views are used to improve the performance of future SQs. A new storage structure for the materialized views set is designed to facilitate efficient search for a usable view to answer a given SQ. Algorithms/strategies to efficiently construct a superior relationship graph, dynamically select materialized views, effectively manage the materialized views set and efficiently search for usable views are discussed. Experiment results demonstrate that our proposed technique is quite promising.
    Print ISSN: 0010-4620
    Electronic ISSN: 1460-2067
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 55
    Publication Date: 2014-05-01
    Description: This article presents AccessedBefore (AccB), an algorithm and its associated minimal hardware support to detect data races, and compares it with two widely known and used commercial tools: Helgrind, the data race detection tool included in the general purpose memory checking suite Valgrind, and Intel Thread Checker, now shipped as part of Intel Thread Inspector. It provides a performance overhead evaluation using current workloads, along with an analysis of AccB's scalability with the number of threads and workload input set size. It demonstrates that AccB is in the range of 2 x to 11 x faster than these two tools. Finally, it shows the complete proof that AccB is complete in that, for every static data race present in a program, there exists an instruction interleaving that would expose this data race such that AccB can detect it.
    Print ISSN: 0010-4620
    Electronic ISSN: 1460-2067
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 56
    Publication Date: 2014-05-01
    Description: The problems of query containment, equivalence and minimization are fundamental problems in the context of query processing and optimization. In their classic work published in 1977 [Chandra, A. and Merlin, P. (1977) Optimal Implementation of Conjunctive Queries in Relational Data Bases. Proc. ACM STOC , Boulder, CO, USA, May 4–6, pp. 77–90, ACM, USA], Chandra and Merlin solved the three problems for the language of conjunctive queries (CQ queries) on relational data, under the ‘set-semantics’ assumption for query evaluation. While the results of Chandra and Merlin ((1977) Optimal Implementation of Conjunctive Queries in Relational Data Bases. Proc. ACM STOC , Boulder, CO, USA, May 4–6, pp. 77–90, ACM, USA] have been very influential in database research, it was recognized long ago that the set semantics does not correspond to the semantics of the standard commercial query language structured query language (SQL). Alternative semantics, called bag and bag-set semantics , have been studied since 1993; Chaudhuri and Vardi [(1993) Optimization of Real Conjunctive Queries (Extended Abstract). Proc. PODS , Washington, DC, USA, May 25–28, pp. 59–70. ACM Press, USA] outlined necessary and sufficient conditions for the equivalence of CQ queries under these semantics. (The problems of containment of CQ bag and bag-set queries remain open to this day.) More recently, Cohen [(2006) Equivalence of Queries Combining Set and Bag-Set Semantics. Proc. PODS , Chicago, IL, USA, 26–28 June, pp. 70–79. ACM, USA; (2009) Equivalence of queries that are sensitive to multiplicities. VLDB J. , 18, 765–785] introduced a formalism for treating (generalizations of) CQ queries evaluated under each of set, bag and bag-set semantics uniformly as special cases of the more general combined semantics . This formalism provides tools for studying broader classes of practical SQL queries, specifically important types of queries that arise in on-line analytical processing. Cohen [(2009) Equivalence of queries that are sensitive to multiplicities. VLDB J. , 18, 765–785] provides a sufficient condition for the equivalence of (generalizations of) combined-semantics CQ queries, as well as sufficient and necessary equivalence conditions for several proper sublanguages of the query language of Cohen ((2009) Equivalence of queries that are sensitive to multiplicities. VLDB J. , 18, 765–785]. To the best of our knowledge, no results on minimization of CQ queries beyond set-semantics queries have been reported in the literature. Our goal in this paper is to continue the study of equivalence and minimization of CQ queries. We focus on the practically important problem of finding minimized versions of combined-semantics CQ queries. The main contribution of this paper is the extension of the minimization result of Chandra and Merlin ((1977) Optimal Implementation of Conjunctive Queries in Relational Data Bases. Proc. ACM STOC , Boulder, CO, USA, May 4–6, pp. 77–90, ACM, USA] to all combined-semantics CQ queries; we develop this result using our sufficient condition for containment of combined-semantics CQ queries [Chirkova, R. (2012) Combined-semantics equivalence is decidable for a practical class of conjunctive queries. Submitted for publication]. We also present an extension to all combined-semantics CQ queries of the well-known equivalence condition of Chandra and Merlin ((1977) Optimal Implementation of Conjunctive Queries in Relational Data Bases. Proc. ACM STOC , Boulder, CO, USA, May 4–6, pp. 77–90. ACM, USA] of CQ set-semantics queries. Similarly to the condition of Chandra and Merlin ((1977) Optimal Implementation of Conjunctive Queries in Relational Data Bases. Proc. ACM STOC , Boulder, CO, USA, May 4–6, pp. 77–90. ACM, USA], our extension is given in terms of the relationship between the minimized versions of the queries.
    Print ISSN: 0010-4620
    Electronic ISSN: 1460-2067
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 57
    Publication Date: 2014-02-26
    Description: Motivation:  The reliable identification of genes is a major challenge in genome research, as further analysis depends on the correctness of this initial step. With high-throughput RNA-Seq data reflecting currently expressed genes, a particularly meaningful source of information has become commonly available for gene finding. However, practical application in automated gene identification is still not the standard case. A particular challenge in including RNA-Seq data is the difficult handling of ambiguously mapped reads. Results:  We present GIIRA (Gene Identification Incorporating RNA-Seq data and Ambiguous reads), a novel prokaryotic and eukaryotic gene finder that is exclusively based on a RNA-Seq mapping and inherently includes ambiguously mapped reads. GIIRA extracts candidate regions supported by a sufficient number of mappings and reassigns ambiguous reads to their most likely origin using a maximum-flow approach. This avoids the exclusion of genes that are predominantly supported by ambiguous mappings. Evaluation on simulated and real data and comparison with existing methods incorporating RNA-Seq information highlight the accuracy of GIIRA in identifying the expressed genes. Availability and implementation:  GIIRA is implemented in Java and is available from https://sourceforge.net/projects/giira/ . Contact:   renardB@rki.de Supplementary Information: Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 58
    Publication Date: 2014-02-26
    Description: Motivation: Statistical validation of protein identifications is an important issue in shotgun proteomics. The false discovery rate (FDR) is a powerful statistical tool for evaluating the protein identification result. Several research efforts have been made for FDR estimation at the protein level. However, there are still certain drawbacks in the existing FDR estimation methods based on the target-decoy strategy. Results: In this article, we propose a decoy-free protein-level FDR estimation method. Under the null hypothesis that each candidate protein matches an identified peptide totally at random, we assign statistical significance to protein identifications in terms of the permutation P -value and use these P -values to calculate the FDR. Our method consists of three key steps: (i) generating random bipartite graphs with the same structure; (ii) calculating the protein scores on these random graphs; and (iii) calculating the permutation P value and final FDR. As it is time-consuming or prohibitive to execute the protein inference algorithms for thousands of times in step ii, we first train a linear regression model using the original bipartite graph and identification scores provided by the target inference algorithm. Then we use the learned regression model as a substitute of original protein inference method to predict protein scores on shuffled graphs. We test our method on six public available datasets. The results show that our method is comparable with those state-of-the-art algorithms in terms of estimation accuracy. Availability: The source code of our algorithm is available at: https://sourceforge.net/projects/plfdr/ Contact: zyhe@dlut.edu.cn Supplementary information: Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 59
    Publication Date: 2014-02-26
    Description: Motivation:  Atomistic or coarse grained (CG) potentials derived from statistical distributions of internal variables have recently become popular due to the need of simplified interactions for reaching larger scales in simulations or more efficient conformational space sampling. However, the process of parameterization of accurate and predictive statistics-based force fields requires a huge amount of work and is prone to the introduction of bias and errors. Results:  This article introduces SecStAnT, a software for the creation and analysis of protein structural datasets with user-defined primary/secondary structure composition, with a particular focus on the CG representation. In addition, the possibility of managing different resolutions and the primary/secondary structure selectivity allow addressing the mapping-backmapping of atomistic to CG representation and study the secondary to primary structure relations. Sample datasets and distributions are reported, including interpretation of structural features. Availability and implementation:  SecStAnT is available free of charge at secstant.sourceforge.net/. Source code is freely available on request, implemented in Java and supported on Linux, MS Windows and OSX. Contact:  giuseppe.maccari@iit.it Supplementary information: Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 60
    Publication Date: 2014-02-28
    Description: Most previous studies need to learn a complex object model for parsing a specific object instance. This paper directly learns the general parsing patterns from the set of parsed objects and formalizes the parsing patterns as a series of parsing templates instead of learning the complex object model. Moreover, a novel hierarchical structure is presented to represent an object by using the parsing templates, which implicitly contains the multi-scale object parts and their relationships. For a single object, the parsing process is equivalent to establishing its hierarchical representation and determining the parsing template for each node. We combine the top-down decomposing scheme and the bottom-up composing scheme to infer the parsing process and formalize the inference as an energy minimization problem. The effect of our method is demonstrated by parsing the human body with aggressive pose variations. Compared with the state-of-the-art methods, the parsing results are more satisfying.
    Print ISSN: 0010-4620
    Electronic ISSN: 1460-2067
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 61
    Publication Date: 2014-02-26
    Description: Motivation: To reliably assess the effects of unknown chemicals on the development of fluorescently labeled sensory-, moto- and interneuron populations in the spinal cord of zebrafish, automated data analysis is essential. Results: For the evaluation of a high-throughput screen of a large chemical library, we developed a new method for the automated extraction of quantitative information from green fluorescent protein (eGFP) and red fluorescent protein (RFP) labeled spinal cord neurons in double-transgenic zebrafish embryos. The methodology comprises region of interest detection, intensity profiling with reference comparison and neuron distribution histograms. All methods were validated on a manually evaluated pilot study using a Notch inhibitor dose-response experiment. The automated evaluation showed superior performance to manual investigation regarding time consumption, information detail and reproducibility. Availability and implementation: Being part of GNU General Public Licence (GNU-GPL) licensed open-source MATLAB toolbox Gait-CAD, an implementation of the presented methods is publicly available for download at http://sourceforge.net/projects/zebrafishimage/ . Contact: johannes.stegmaier@kit.edu Supplementary information: Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 62
    Publication Date: 2014-02-26
    Description: Motivation: The comparison of genes and gene products across species depends on high-quality tools to determine the relationships between gene or protein sequences from various species. Although some excellent applications are available and widely used, their performance leaves room for improvement. Results: We developed orthAgogue: a multithreaded C application for high-speed estimation of homology relations in massive datasets, operated via a flexible and easy command-line interface. Availability: The orthAgogue software is distributed under the GNU license. The source code and binaries compiled for Linux are available at https://code.google.com/p/orthagogue/ . Contact: orthagogue-issue-tracker@googlegroups.com
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 63
    Publication Date: 2014-02-26
    Description: :  A challenge in biodata analysis is to understand the underlying phenomena among many interactions in signaling pathways. Such study is formulated as the pathway enrichment analysis, which identifies relevant pathways functional enriched in high-throughput data. The question faced here is how to analyze different data types in a unified and integrative way by characterizing pathways that these data simultaneously reveal. To this end, we developed integrative Pathway Enrichment Analysis Platform, iPEAP , which handles transcriptomics, proteomics, metabolomics and GWAS data under a unified aggregation schema. iPEAP emphasizes on the ability to aggregate various pathway enrichment results generated in different high-throughput experiments, as well as the quantitative measurements of different ranking results, thus providing the first benchmark platform for integration, comparison and evaluation of multiple types of data and enrichment methods. Availability and implementation:   iPEAP is freely available at http://www.tongji.edu.cn/~qiliu/ipeap.html . Contact:   qiliu@tongji.edu.cn or zwcao@tongji.edu.cn Supplementary information:   Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 64
    Publication Date: 2014-02-26
    Description: :  The semantic measures library and toolkit are robust open-source and easy to use software solutions dedicated to semantic measures. They can be used for large-scale computations and analyses of semantic similarities between terms/concepts defined in terminologies and ontologies. The comparison of entities (e.g. genes) annotated by concepts is also supported. A large collection of measures is available. Not limited to a specific application context, the library and the toolkit can be used with various controlled vocabularies and ontology specifications (e.g. Open Biomedical Ontology, Resource Description Framework). The project targets both designers and practitioners of semantic measures providing a JAVA library, as well as a command-line tool that can be used on personal computers or computer clusters. Availability and implementation:  Downloads, documentation, tutorials, evaluation and support are available at http://www.semantic-measures-library.org . Contact:   harispe.sebastien@gmail.com
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 65
    Publication Date: 2014-02-28
    Description: We propose a transductive Gaussian process (TGP) regression method with regularized Laplacian kernels. Transductive learning exploits not only the labeled data but also the unlabeled test instances for learning. GPs are Bayesian probabilistic regressors which use only labeled data. To use unlabeled data in GPs, regularized Laplacian kernels are used. Similar to the case of a supervised GP regression, the proposed method provides not only the predicted target values but also their error bars. It also provides a hyperparameter selection method based on a Bayesian model selection scheme. We applied the proposed TGP method to the object pose estimation data sets as well as artificial data sets and compared the existing methods. Experimental results show that the proposed method has some advantages over the existing methods.
    Print ISSN: 0010-4620
    Electronic ISSN: 1460-2067
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 66
    Publication Date: 2014-02-28
    Description: Owing to the sheer volume of text generated by a microblog site like Twitter, it is often difficult to fully understand what is being said about various topics. This paper presents algorithms for summarizing microblog documents. Initially, we present algorithms that produce single-document summaries but later extend them to produce summaries containing multiple documents. We evaluate the generated summaries by comparing them to both manually produced summaries and, for the multiple-post summaries, to the summarization results of some of the leading traditional summarization systems.
    Print ISSN: 0010-4620
    Electronic ISSN: 1460-2067
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 67
    Publication Date: 2014-02-28
    Description: This paper addresses the development of a new framework to control traffic signal lights for a road network with a recently introduced bus rapid transit (BRT) system. By applying automated goal-directed learning and decision-making called reinforcement learning, the best possible traffic signal actions can be sought upon changes of network states as modelled by the signalized cell transmission model (CTM). An extension to a network of cascading interactions with a BRT system has been proposed with simple uni-directional flows without turning movements. Motivated by the BRT system in Thailand, the conventional signalized CTM has been generalized to cope with the preplanned space-usage priority of a BRT over other non-priority vehicles. A BRT physical lane separator as well as the location of BRT stations have been explicitly modelled. The delay function of both carried passengers on BRT and on other non-priority vehicles as well as waiting passengers at stations has been introduced. The deployment of BRT system with one lane deducted by the lane separator cannot reduce the total passenger delay in comparison with the same road and traffic condition before the installation of the BRT system. Moreover, our proposed method outperforms preemptive and differential priority control methods because of the improved awareness of the signal switching cost.
    Print ISSN: 0010-4620
    Electronic ISSN: 1460-2067
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 68
    Publication Date: 2014-02-28
    Description: A square matrix of distinct numbers in which every row, column and both diagonals have the same total is referred to as a magic square. Constructing a magic square of a given order is considered a difficult computational problem, particularly when additional constraints are imposed. Hyper-heuristics are emerging high-level search methodologies that explore the space of heuristics for solving a given problem. In this study, we present a range of effective selection hyper-heuristics mixing perturbative low-level heuristics for constructing the constrained version of magic squares. The results show that selection hyper-heuristics, even the non-learning ones deliver an outstanding performance, beating the best-known heuristic solution on average.
    Print ISSN: 0010-4620
    Electronic ISSN: 1460-2067
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 69
    facet.materialart.
    Unknown
    Oxford University Press
    Publication Date: 2014-02-28
    Description: Probabilistic Logic Programming (PLP) allows one to represent domains containing many entities connected by uncertain relations and has many applications in particular in Machine Learning. PITA is a PLP algorithm for computing the probability of queries, which exploits tabling, answer subsumption and Binary Decision Diagrams (BDDs). PITA does not impose any restriction on the programs. Other algorithms, such as PRISM, reduce computation time by imposing restrictions on the program, namely that subgoals are independent and that clause bodies are mutually exclusive. Another assumption that simplifies inference is that clause bodies are independent. In this paper, we present the algorithms PITA(IND,IND) and PITA(OPT). PITA(IND,IND) assumes that subgoals and clause bodies are independent. PITA(OPT) instead first checks whether these assumptions hold for subprograms and subgoals: if they do, PITA(OPT) uses a simplified calculation, otherwise it resorts to BDDs. Experiments on a number of benchmark datasets show that PITA(IND,IND) is the fastest on datasets respecting the assumptions, while PITA(OPT) is a good option when nothing is known about a dataset.
    Print ISSN: 0010-4620
    Electronic ISSN: 1460-2067
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 70
    Publication Date: 2014-02-28
    Description: This paper discusses a system that extracts and displays temporal and geospatial entities in text. The first task involves identification of all events in a document followed by identification of important events using a classifier. The second task involves identifying named entities associated with the document. In particular, we extract geospatial named entities. We disambiguate the set of geospatial named entities and geocode them to determine the correct coordinates for each place name, often called grounding. We resolve ambiguity based on sentence and article context. Finally, we present a user with the key events and their associated people, places and organizations within a document in terms of a timeline and a map. For purposes of testing, we use Wikipedia articles about historical events, such as those describing wars, battles and invasions. We focus on extracting major events from the articles, although our ideas and tools can be easily used with articles from other sources such as news articles. We use several existing tools such as Evita, Google Maps, publicly available implementations of Support Vector Machines, Hidden Markov Model and Conditional Random Field, and the MIT SIMILE Timeline.
    Print ISSN: 0010-4620
    Electronic ISSN: 1460-2067
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 71
    Publication Date: 2014-03-27
    Description: Server-aided verification (SAV) has potential applicability in lightweight devices for improving signature verification, where the verifier possesses a computationally weak hardware. We observe that lightweight devices run all algorithms through hardware implementation with logic circuits. Existing SAV protocols indeed improve computational efficiency for lightweight devices, however, few of them take the hardware cost into consideration. The hardware implementation of SAV protocols could be still costly and expensive for lightweight devices. Currently, the most secure SAV protocols in the literature for pairing-based (G 1 x G 2 -〉 G T ) signatures can securely delegate pairing computations to the server; however, verifiers are still required to perform group operations over two completely different groups G 1 and G T , which heavily contribute to the cost of hardware implementation. In this work, we propose several collusion-resistant SAV protocols for pairing-based signatures to improve their applicability for lightweight devices. In our SAV protocols, verifiers are only required to perform group operations in G 1 . In comparison with existing SAV protocols, our protocols save the unnecessary hardware cost for implementing group operations in G T and therefore are more applicable to lightweight applications.
    Print ISSN: 0010-4620
    Electronic ISSN: 1460-2067
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 72
    Publication Date: 2014-03-27
    Description: Secrecy of decryption keys is an important pre-requisite for security of any encryption scheme and compromised private keys must be immediately replaced. Forward Security (FS), introduced to Public Key Encryption (PKE) by Canetti et al. (Eurocrypt 2003), reduces damage from compromised keys by guaranteeing confidentiality of messages that were encrypted prior to the compromise event. The FS property was also shown to be achievable in (Hierarchical) Identity-Based Encryption (HIBE) by Yao et al. (ACM CCS 2004). Yet, for emerging encryption techniques, offering flexible access control to encrypted data, by means of functional relationships between ciphertexts and decryption keys, FS protection was not known to exist. In this paper, we introduce FS to the powerful setting of Hierarchical Predicate Encryption (HPE), proposed by Okamoto and Takashima (Asiacrypt 2009). Anticipated applications of FS-HPE schemes can be found in searchable encryption and in fully private communication. Considering the dependencies among the concepts, our FS-HPE scheme implies forward-secure flavors of Predicate Encryption and (Hierarchical) Attribute-Based Encryption. Our FS-HPE scheme guarantees FS for plaintexts and for attributes that are hidden in HPE ciphertexts. It further allows delegation of decrypting abilities at any point in time, independent of FS time evolution. It realizes zero-inner-product predicates and is proved adaptively secure under standard assumptions. As the ‘cross-product’ approach taken in FS-HIBE is not directly applicable to the HPE setting, our construction resorts to techniques that are specific to existing HPE schemes and extends them with what can be seen as a reminiscent of binary tree encryption from FS-PKE.
    Print ISSN: 0010-4620
    Electronic ISSN: 1460-2067
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 73
    Publication Date: 2014-03-27
    Description: Distributed denial of service (DDoS) attack is a coordinated attack, generally performed on a massive scale on the availability of services of a target system or network resources. Owing to the continuous evolution of new attacks and ever-increasing number of vulnerable hosts on the Internet, many DDoS attack detection or prevention mechanisms have been proposed. In this paper, we present a comprehensive survey of DDoS attacks, detection methods and tools used in wired networks. The paper also highlights open issues, research challenges and possible solutions in this area.
    Print ISSN: 0010-4620
    Electronic ISSN: 1460-2067
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 74
    Publication Date: 2014-03-27
    Description: A novel adaptive steganographic scheme for spatial image is proposed. A noisy function is used to measure texture complexity of 2 x 2 pixel blocks, which keeps monotonic increasing after ±1 modifications. Therefore, the message is embedded into the noisiest areas and the recipient can identify the embedding region. The ‘double-layered embedding’ is exploited to reduce the number of ±1 modifications, in which the fast matrix embedding and wet paper codes are applied to the least significant bit (LSB) plane and the second LSB plane, respectively. The experiments on resisting three steganalyzers show that the proposed method performs better than four typical steganographic schemes. Moreover, comparing with the extended highly undetectable steGO having parameter T = 255, the novel method achieves the competitive ability of resisting detection and faster embedding speed.
    Print ISSN: 0010-4620
    Electronic ISSN: 1460-2067
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 75
    Publication Date: 2014-03-27
    Description: In this study, we propose an efficient aggregate signcryption scheme to maximize the security of data in a kind of wireless medical network named the disconnected or unattended wireless sensor network (applied in medical systems). These networks address patients who need to be monitored for a long time. The main challenge of these networks that are usually implanted on the patient's clothing and established in sensitive conditions is that the server (station) visits sensors continuously. Moreover, the sensors must retain data for long enough time to off-load to the station as they have limited capacity and batteries. This disconnected nature gives adversaries the power to read and modify target data without being detected or disclose private medical data related to a patient. In this paper, we address these security problems and improve the first study of identity-based aggregate signcryption in UWSNs to achieve both key privacy and invisibility. Our improved approach is at the same time efficient in terms of space and communication overload. Moreover, the proposed scheme allows servers to efficiently verify and unsigncrypt all the related data accumulated by sensors. We further show that the proposed scheme has resistance against reading and modifying attacks. We compare our scheme with the best alternative works in the literature.
    Print ISSN: 0010-4620
    Electronic ISSN: 1460-2067
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 76
    Publication Date: 2014-03-27
    Description: We introduce ZIDS, a client-server solution for private detection of intrusions that is suitable for private detection of zero-day attacks in input data. The system includes an intrusion detection system (IDS) server that has a set of sensitive signatures for zero-day attacks and IDS clients that possess some sensitive data (e.g. files, logs). Using ZIDS, each IDS client learns whether its input data matche any of the zero-day signatures, but neither party learns about any additional information. In other words, the IDS client learns nothing about the zero-day signatures and the IDS server learns nothing about the input data and the analysis results. To solve this problem, we reduce privacy-preserving intrusion detection to an instance of secure two-party oblivious deterministic finite automata (ODFA) evaluation. Then, motivated by the fact that the DFAs associated with attack signature are often sparse , we propose a new and efficient ODFA protocol that takes advantage of this sparsity. Our new construction is considerably more efficient than the existing solutions and, at the same time, does not leak any sensitive information about the nature of the sparsity in the private DFA. We provide a full implementation of our privacy-preserving system that includes optimizations that lead to better memory usage and evaluate its performance on rule sets from the Snort IDS.
    Print ISSN: 0010-4620
    Electronic ISSN: 1460-2067
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 77
    Publication Date: 2014-03-27
    Description: Anonymous multi-receiver identity-based encryption can protect the receiver identity privacy and message confidentiality. Thus, it can be used in many fields, such as Voice over Internet Protocol and pay-TV systems. In 2012, Chien improved an anonymous multi-receiver identity-based encryption scheme. This paper points out that Chien's scheme does not satisfy the indistinguishability of encryptions under selective multi-identity, chosen ciphertext attacks. The analysis is important for understanding the security risks.
    Print ISSN: 0010-4620
    Electronic ISSN: 1460-2067
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 78
    Publication Date: 2014-03-27
    Description: Conditionally anonymous ring signatures are a variant of ring signatures such that the anonymity is conditional: if a user is the true signer, then he can claim this through a confirmation protocol; if he is not the signer, he can prove this through a disavowal protocol. Hence, this can preserve the anonymity of a signer while reserving the right to trace it when necessary. The security of such a signature also requires that an innocent non-signer will not be framed as a signer. In this paper, we propose a new framework for this type of signature without random oracles. Our construction can be realized under general complexity assumptions and has a simple structure. In contrast, previous works are based on non-standard assumptions or proved secure in the random oracle model.
    Print ISSN: 0010-4620
    Electronic ISSN: 1460-2067
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 79
    Publication Date: 2014-03-27
    Description: With the growth of networked computers and associated applications, intrusion detection has become essential to keeping networks secure. A number of intrusion detection methods have been developed for protecting computers and networks using conventional statistical methods as well as data mining methods. Data mining methods for misuse and anomaly-based intrusion detection, usually encompass supervised, unsupervised and outlier methods. It is necessary that the capabilities of intrusion detection methods be updated with the creation of new attacks. This paper proposes a multi-level hybrid intrusion detection method that uses a combination of supervised, unsupervised and outlier-based methods for improving the efficiency of detection of new and old attacks. The method is evaluated with a captured real-time flow and packet dataset called the Tezpur University intrusion detection system (TUIDS) dataset, a distributed denial of service dataset, and the benchmark intrusion dataset called the knowledge discovery and data mining Cup 1999 dataset and the new version of KDD (NSL-KDD) dataset. Experimental results are compared with existing multi-level intrusion detection methods and other classifiers. The performance of our method is very good.
    Print ISSN: 0010-4620
    Electronic ISSN: 1460-2067
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 80
    Publication Date: 2014-03-27
    Description: Data transfer is a transmission of data over a point-to-point or point-to-multipoint communication channel. To protect the confidentiality of the transferred data, public-key cryptography has been introduced in data transfer schemes (DTSs). Unfortunately, there exist some drawbacks in the current DTSs. First, the sender must know who the real receivers are. This is undesirable in a system where the number of the users is very large, such as cloud computing. In practice, the sender only knows some descriptive attributes of the receivers. Secondly, the receiver cannot be guaranteed to only receive messages from the legal senders. Therefore, it remains an elusive and challenging research problem on how to design a DTS scheme where the sender can send messages to the unknown receivers and the receiver can filter out false messages according to the described attributes. In this paper, we propose an attribute-based data transfer with filtering (ABDTF) scheme to address these problems. In our proposed scheme, the receiver can publish an access structure so that only the users whose attributes satisfy this access structure can send messages to him. Furthermore, the sender can encrypt a message under a set of attributes such that only the users who hold these attributes can obtain the message. In particular, we provide an efficient filtering algorithm for the receiver to resist the denial-of-service attacks. Notably, we propose the formal definition and security models for ABDTF schemes. To the best of our knowledge, it is the first time that a provable ABDTF scheme is proposed. Hence, this work provides a new research approach to ABDTF schemes.
    Print ISSN: 0010-4620
    Electronic ISSN: 1460-2067
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 81
    Publication Date: 2014-03-27
    Description: Certificateless encryption (CLE) effectively solves the inherent key escrow problem in identity-based encryption while retaining its keeping certificate-free property. Although a number of CLE schemes have been available in the literature, little attention has been paid to the problem of user revocation in the certificateless setting. In this work, we study CLE systems with user revocation capabilities. At first, we establish reasonable security models for revocable CLE (RCLE) schemes. Then we put forward the first efficient and CCA2-secure RCLE scheme in the standard model. A rigorous security proof of our RCLE scheme is presented based on the decisional truncated q -ABDHE assumption and decisional bilinear Diffie–Hellman (DBDH) assumption.
    Print ISSN: 0010-4620
    Electronic ISSN: 1460-2067
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 82
    Publication Date: 2014-03-28
    Description: Motivation: Microtubules are dynamic polymers of tubulin dimers that undergo continuous assembly and disassembly. A mounting number of microtubule-associated proteins (MAPs) regulate the dynamic behavior of microtubules and hence the assembly and disassembly of disparate microtubule structures within the cell. Despite recent advances in identification and functional characterization of MAPs, a substantial number of microtubule accessory factors have not been functionally annotated. Here, using profile-to-profile comparisons and structure modeling, we show that the yeast outer kinetochore components NDC80 and NUF2 share evolutionary ancestry with a novel protein family in mammals comprising, besides NDC80/HEC1 and NUF2, three Intraflagellar Transport (IFT) complex B subunits (IFT81, IFT57, CLUAP1) as well as six proteins with poorly defined function (FAM98A-C, CCDC22, CCDC93 and C14orf166). We show that these proteins consist of a divergent N-terminal calponin homology (CH)-like domain adjoined to an array of C-terminal heptad repeats predicted to form a coiled-coil arrangement. We have named the divergent CH-like domain NN–CH after the founding members NDC80 and NUF2. Contact : kbschou@bio.ku.dk or lbpedersen@bio.ku.dk Supplementary information: Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 83
    Publication Date: 2014-03-28
    Description: Motivation: The creation and exchange of biologically relevant models is of great interest to many researchers. When multiple standards are in use, models are more readily used and re-used if there exist robust translators between the various accepted formats. Summary: Antimony 2.4 and JSim 2.10 provide translation capabilities from their own formats to SBML and CellML. All provided unique challenges, stemming from differences in each format’s inherent design, in addition to differences in functionality. Availability and implementation: Both programs are available under BSD licenses; Antimony from http://antimony.sourceforge.net/and JSim from http://physiome.org/jsim/ . Contact: lpsmith@u.washington.edu
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 84
    Publication Date: 2014-03-28
    Description: Motivation:  One common task in structural biology is to assess the similarities and differences among protein structures. A variety of structure alignment algorithms and programs has been designed and implemented for this purpose. A major drawback with existing structure alignment programs is that they require a large amount of computational time, rendering them infeasible for pairwise alignments on large collections of structures. To overcome this drawback, a fragment alphabet learned from known structures has been introduced. The method, however, considers local similarity only, and therefore occasionally assigns high scores to structures that are similar only in local fragments. Method:  We propose a novel approach that eliminates false positives, through the comparison of both local and remote similarity, with little compromise in speed. Two kinds of contact libraries (ContactLib) are introduced to fingerprint protein structures effectively and efficiently. Each contact group of the contact library consists of one local or two remote fragments and is represented by a concise vector. These vectors are then indexed and used to calculate a new combined hit-rate score to identify similar protein structures effectively and efficiently. Results:  We tested our method on the high-quality protein structure subset of SCOP30 containing 3297 protein structures. For each protein structure of the subset, we retrieved its neighbor protein structures from the rest of the subset. The best area under the Receiver-Operating Characteristic curve, archived by ContactLib, is as high as 0.960. This is a significant improvement compared with 0.747, the best result achieved by FragBag. We also demonstrated that incorporating remote contact information is critical to consistently retrieve accurate neighbor protein structures for all- query protein structures. Availability and implementation:   https://cs.uwaterloo.ca/~xfcui/contactlib/ . Contact:   shuaicli@cityu.edu.hk or mli@uwaterloo.ca
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 85
    Publication Date: 2014-03-28
    Description: Motivation: Gene expression data are currently collected on a wide range of platforms. Differences between platforms make it challenging to combine and compare data collected on different platforms. We propose a new method of cross-platform normalization that uses topic models to summarize the expression patterns in each dataset before normalizing the topics learned from each dataset using per-gene multiplicative weights. Results: This method allows for cross-platform normalization even when samples profiled on different platforms have systematic differences, allows the simultaneous normalization of data from an arbitrary number of platforms and, after suitable training, allows for online normalization of expression data collected individually or in small batches. In addition, our method outperforms existing state-of-the-art platform normalization tools. Availability and implementation: MATLAB code is available at http://morrislab.med.utoronto.ca/plida/ . Contact: Amit.Deshwar@utoronto.ca Supplementary information: Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 86
    Publication Date: 2014-03-28
    Description: Motivation: Epigenetic landscapes in the regulatory regions reflect binding condition of transcription factors and their co-factors. Identifying epigenetic condition and its variation is important in understanding condition-specific gene regulation. Computational approaches to explore complex multi-dimensional landscapes are needed. Results: To study epigenomic condition for gene regulation, we developed a method, AWNFR, to classify epigenomic landscapes based on the detected epigenomic landscapes. Assuming mixture of Gaussians for a nucleosome, the proposed method captures the shape of histone modification and identifies potential regulatory regions in the wavelet domain. For accuracy estimation as well as enhanced computational speed, we developed a novel algorithm based on down-sampling operation and footprint in wavelet. We showed the algorithmic advantages of AWNFR using the simulated data. AWNFR identified regulatory regions more effectively and accurately than the previous approaches with the epigenome data in mouse embryonic stem cells and human lung fibroblast cells (IMR90). Based on the detected epigenomic landscapes, AWNFR classified epigenomic status and studied epigenomic codes. We studied co-occurring histone marks and showed that AWNFR captures the epigenomic variation across time. Availability and implementation: The source code and supplemental document of AWNFR are available at http://wonk.med.upenn.edu/AWNFR . Contact: wonk@mail.med.upenn.edu Supplementary information: Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 87
    Publication Date: 2014-03-28
    Description: Motivation: Accurate identification of transcription start sites (TSSs) is an essential step in the analysis of transcription regulatory networks. In higher eukaryotes, the capped analysis of gene expression technology enabled comprehensive annotation of TSSs in genomes such as those of mice and humans. In bacteria, an equivalent approach, termed differential RNA sequencing (dRNA-seq), has recently been proposed, but the application of this approach to a large number of genomes is hindered by the paucity of computational analysis methods. With few exceptions, when the method has been used, annotation of TSSs has been largely done manually. Results: In this work, we present a computational method called ‘TSSer’ that enables the automatic inference of TSSs from dRNA-seq data. The method rests on a probabilistic framework for identifying both genomic positions that are preferentially enriched in the dRNA-seq data as well as preferentially captured relative to neighboring genomic regions. Evaluating our approach for TSS calling on several publicly available datasets, we find that TSSer achieves high consistency with the curated lists of annotated TSSs, but identifies many additional TSSs. Therefore, TSSer can accelerate genome-wide identification of TSSs in bacterial genomes and can aid in further characterization of bacterial transcription regulatory networks. Availability: TSSer is freely available under GPL license at http://www.clipz.unibas.ch/TSSer/index.php Contact: mihaela.zavolan@unibas.ch Supplementary information: Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 88
    Publication Date: 2014-03-28
    Description: Motivation: Although constraint-based flux analysis of knockout strains has facilitated the production of desirable metabolites in microbes, current screening methods have placed a limitation on the number knockouts that can be simultaneously analyzed. Results: Here, we propose a novel screening method named FastPros. In this method, the potential of a given reaction knockout for production of a specific metabolite is evaluated by shadow pricing of the constraint in the flux balance analysis, which generates a screening score to obtain candidate knockout sets. To evaluate the performance of FastPros, we screened knockout sets to produce each metabolite in the entire Escherichia coli metabolic network. We found that 75% of these metabolites could be produced under biomass maximization conditions by adding up to 25 reaction knockouts. Furthermore, we demonstrated that using FastPros in tandem with another screening method, OptKnock, could further improve target metabolite productivity. Availability and implementation: Source code is freely available at http://www-shimizu.ist.osaka-u.ac.jp/shimizu_lab/FastPros/ , implemented in MATLAB and COBRA toolbox. Contact: chikara.furusawa@riken.jp or shimizu@ist.osaka-u.ac.jp Supplementary information: Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 89
    Publication Date: 2014-03-28
    Description: Motivation : Comprehensive 2D gas chromatography-mass spectrometry is an established method for the analysis of complex mixtures in analytical chemistry and metabolomics. It produces large amounts of data that require semiautomatic, but preferably automatic handling. This involves the location of significant signals (peaks) and their matching and alignment across different measurements. To date, there exist only a few openly available algorithms for the retention time alignment of peaks originating from such experiments that scale well with increasing sample and peak numbers, while providing reliable alignment results. Results : We describe B i PACE 2D, an automated algorithm for retention time alignment of peaks from 2D gas chromatography-mass spectrometry experiments and evaluate it on three previously published datasets against the m SPA, SWPA and G uineu algorithms. We also provide a fourth dataset from an experiment studying the H 2 production of two different strains of Chlamydomonas reinhardtii that is available from the MetaboLights database together with the experimental protocol, peak-detection results and manually curated multiple peak alignment for future comparability with newly developed algorithms. Availability and implementation : B i PACE 2D is contained in the freely available Maltcms framework, version 1.3, hosted at http://maltcms.sf.net , under the terms of the L-GPL v3 or Eclipse Open Source licenses. The software used for the evaluation along with the underlying datasets is available at the same location. The C.reinhardtii dataset is freely available at http://www.ebi.ac.uk/metabolights/MTBLS37 . Contact : nils.hoffmann@cebitec.uni-bielefeld.de or jens.stoye@uni-bielefeld.de Supplementary information : Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 90
    Publication Date: 2014-03-28
    Description: Motivation:  The capacity to systematically search through large image collections and ensembles and detect regions exhibiting similar morphological characteristics is central to pathology diagnosis. Unfortunately, the primary methods used to search digitized, whole-slide histopathology specimens are slow and prone to inter- and intra-observer variability. The central objective of this research was to design, develop, and evaluate a content-based image retrieval system to assist doctors for quick and reliable content-based comparative search of similar prostate image patches. Method:  Given a representative image patch (sub-image), the algorithm will return a ranked ensemble of image patches throughout the entire whole-slide histology section which exhibits the most similar morphologic characteristics. This is accomplished by first performing hierarchical searching based on a newly developed hierarchical annular histogram (HAH). The set of candidates is then further refined in the second stage of processing by computing a color histogram from eight equally divided segments within each square annular bin defined in the original HAH. A demand-driven master-worker parallelization approach is employed to speed up the searching procedure. Using this strategy, the query patch is broadcasted to all worker processes. Each worker process is dynamically assigned an image by the master process to search for and return a ranked list of similar patches in the image. Results:  The algorithm was tested using digitized hematoxylin and eosin (H&E) stained prostate cancer specimens. We have achieved an excellent image retrieval performance. The recall rate within the first 40 rank retrieved image patches is ~90%. Availability and implementation:  Both the testing data and source code can be downloaded from http://pleiad.umdnj.edu/CBII/Bioinformatics/ . Contact:   lin.yang@uky.edu
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 91
    Publication Date: 2014-03-28
    Description: Motivation:  Next-generation sequencing technologies generate millions of short sequence reads, which are usually aligned to a reference genome. In many applications, the key information required for downstream analysis is the number of reads mapping to each genomic feature, for example to each exon or each gene. The process of counting reads is called read summarization. Read summarization is required for a great variety of genomic analyses but has so far received relatively little attention in the literature. Results:  We present featureCounts , a read summarization program suitable for counting reads generated from either RNA or genomic DNA sequencing experiments. featureCounts implements highly efficient chromosome hashing and feature blocking techniques. It is considerably faster than existing methods (by an order of magnitude for gene-level summarization) and requires far less computer memory. It works with either single or paired-end reads and provides a wide range of options appropriate for different sequencing applications. Availability and implementation:  featureCounts is available under GNU General Public License as part of the Subread ( http://subread.sourceforge.net ) or Rsubread ( http://www.bioconductor.org ) software packages. Contact:  shi@wehi.edu.au
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 92
    Publication Date: 2014-03-28
    Description: : Track data hubs provide an efficient mechanism for visualizing remotely hosted Internet-accessible collections of genome annotations. Hub datasets can be organized, configured and fully integrated into the University of California Santa Cruz (UCSC) Genome Browser and accessed through the familiar browser interface. For the first time, individuals can use the complete browser feature set to view custom datasets without the overhead of setting up and maintaining a mirror. Availability and implementation: Source code for the BigWig, BigBed and Genome Browser software is freely available for non-commercial use at http://hgdownload.cse.ucsc.edu/admin/jksrc.zip , implemented in C and supported on Linux. Binaries for the BigWig and BigBed creation and parsing utilities may be downloaded at http://hgdownload.cse.ucsc.edu/admin/exe/ . Binary Alignment/Map (BAM) and Variant Call Format (VCF)/tabix utilities are available from http://samtools.sourceforge.net/ and http://vcftools.sourceforge.net/ . The UCSC Genome Browser is publicly accessible at http://genome.ucsc.edu . Contact: donnak@soe.ucsc.edu
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 93
    Publication Date: 2014-03-28
    Description: Motivation: Reference genome assemblies are subject to change and refinement from time to time. Generally, researchers need to convert the results that have been analyzed according to old assemblies to newer versions, or vice versa, to facilitate meta-analysis, direct comparison, data integration and visualization. Several useful conversion tools can convert genome interval files in browser extensible data or general feature format, but none have the functionality to convert files in sequence alignment map or BigWig format. This is a significant gap in computational genomics tools, as these formats are the ones most widely used for representing high-throughput sequencing data, such as RNA-seq, chromatin immunoprecipitation sequencing, DNA-seq, etc. Results: Here we developed CrossMap, a versatile and efficient tool for converting genome coordinates between assemblies. CrossMap supports most of the commonly used file formats, including BAM, sequence alignment map, Wiggle, BigWig, browser extensible data, general feature format, gene transfer format and variant call format. Availability and implementation: CrossMap is written in Python and C. Source code and a comprehensive user’s manual are freely available at: http://crossmap.sourceforge.net/ . Contact: Kocher.JeanPierre@mayo.edu or wang.liguo@mayo.edu Supplementary information: Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 94
    Publication Date: 2014-03-28
    Description: Motivation: Generating accurate transcription factor (TF) binding site motifs from data generated using the next-generation sequencing, especially ChIP-seq, is challenging. The challenge arises because a typical experiment reports a large number of sequences bound by a TF, and the length of each sequence is relatively long. Most traditional motif finders are slow in handling such enormous amount of data. To overcome this limitation, tools have been developed that compromise accuracy with speed by using heuristic discrete search strategies or limited optimization of identified seed motifs. However, such strategies may not fully use the information in input sequences to generate motifs. Such motifs often form good seeds and can be further improved with appropriate scoring functions and rapid optimization. Results: We report a tool named discriminative motif optimizer ( DiMO ). DiMO takes a seed motif along with a positive and a negative database and improves the motif based on a discriminative strategy. We use area under receiver-operating characteristic curve (AUC) as a measure of discriminating power of motifs and a strategy based on perceptron training that maximizes AUC rapidly in a discriminative manner. Using DiMO , on a large test set of 87 TFs from human, drosophila and yeast, we show that it is possible to significantly improve motifs identified by nine motif finders. The motifs are generated/optimized using training sets and evaluated on test sets. The AUC is improved for almost 90% of the TFs on test sets and the magnitude of increase is up to 39%. Availability and implementation: DiMO is available at http://stormo.wustl.edu/ DiMO Contact: rpatel@genetics.wustl.edu , ronakypatel@gmail.com
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 95
    Publication Date: 2014-03-28
    Description: Motivation: The evolution of multicellular organisms is associated with increasing variability of molecules governing behavioral and physiological states. This is often achieved by neuropeptides (NPs) that are produced in neurons from a longer protein, named neuropeptide precursor (NPP). The maturation of NPs occurs through a sequence of proteolytic cleavages. The difficulty in identifying NPPs is a consequence of their diversity and the lack of applicable sequence similarity among the short functionally related NPs. Results: Herein, we describe Neuropeptide Precursor Identifier (NeuroPID), a machine learning scheme that predicts metazoan NPPs. NeuroPID was trained on hundreds of identified NPPs from the UniProtKB database. Some 600 features were extracted from the primary sequences and processed using support vector machines (SVM) and ensemble decision tree classifiers. These features combined biophysical, chemical and informational–statistical properties of NPs and NPPs. Other features were guided by the defining characteristics of the dibasic cleavage sites motif. NeuroPID reached 89–94% accuracy and 90–93% precision in cross-validation blind tests against known NPPs (with an emphasis on Chordata and Arthropoda). NeuroPID also identified NPP-like proteins from extensively studied model organisms as well as from poorly annotated proteomes. We then focused on the most significant sets of features that contribute to the success of the classifiers. We propose that NPPs are attractive targets for investigating and modulating behavior, metabolism and homeostasis and that a rich repertoire of NPs remains to be identified. Availability: NeuroPID source code is freely available at http://www.protonet.cs.huji.ac.il/neuropid Contact: michall@cc.huji.ac.il Supplementary information: Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 96
    Publication Date: 2014-03-28
    Description: Motivation: Most existing identity-by-descent (IBD) detection methods only consider haplotype pairs; less attention has been paid to considering multiple haplotypes simultaneously, even though IBD is an equivalence relation on haplotypes that partitions a set of haplotypes into IBD clusters. Multiple-haplotype IBD clusters may have advantages over pairwise IBD in some applications, such as IBD mapping. Existing methods for detecting multiple-haplotype IBD clusters are often computationally expensive and unable to handle large samples with thousands of haplotypes. Results: We present a clustering method, efficient multiple-IBD, which uses pairwise IBD segments to infer multiple-haplotype IBD clusters. It expands clusters from seed haplotypes by adding qualified neighbors and extends clusters across sliding windows in the genome. Our method is an order of magnitude faster than existing methods and has comparable performance with respect to the quality of clusters it uncovers. We further investigate the potential application of multiple-haplotype IBD clusters in association studies by testing for association between multiple-haplotype IBD clusters and low-density lipoprotein cholesterol in the Northern Finland Birth Cohort. Using our multiple-haplotype IBD cluster approach, we found an association with a genomic interval covering the PCSK9 gene in these data that is missed by standard single-marker association tests. Previously published studies confirm association of PCSK9 with low-density lipoprotein. Availability and implementation: Source code is available under the GNU Public License http://cs.au.dk/~qianyuxx/EMI/ . Contact: qianyuxx@gmail.com Supplementary information: Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 97
    Publication Date: 2014-03-28
    Description: Motivation:  The plethora of information that emerges from large-scale genome characterization studies has triggered the development of computational frameworks and tools for efficient analysis, interpretation and visualization of genomic data. Functional annotation of genomic variations and the ability to visualize the data in the context of whole genome and/or multiple genomes has remained a challenging task. We have developed an interactive web-based tool, AVIA (Annotation, Visualization and Impact Analysis), to explore and interpret large sets of genomic variations (single nucleotide variations and insertion/deletions) and to help guide and summarize genomic experiments. The annotation, summary plots and tables are packaged and can be downloaded by the user from the email link provided. Availability and implementation:  http://avia.abcc.ncifcrf.gov . Contact:  vuonghm@mail.nih.gov Supplementary information:  Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 98
    Publication Date: 2014-03-28
    Description: :  We developed PSAR-Align, a multiple sequence realignment tool that can refine a given multiple sequence alignment based on suboptimal alignments generated by probabilistic sampling. Our evaluation demonstrated that PSAR-Align is able to improve the results from various multiple sequence alignment tools. Availability and implementation:  The PSAR-Align source code (implemented mainly in C++) is freely available for download at http://bioen-compbio.bioen.illinois.edu/PSAR-Align . Contact:   jbkim@konkuk.ac.kr or jianma@illinois.edu
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 99
    Publication Date: 2014-03-28
    Description: Motivation:  Using high-throughput sequencing, researchers are now generating hundreds of whole-genome assays to measure various features such as transcription factor binding, histone marks, DNA methylation or RNA transcription. Displaying so much data generally leads to a confusing accumulation of plots. We describe here a multithreaded library that computes statistics on large numbers of datasets (Wiggle, BigWig, Bed, BigBed and BAM), generating statistical summaries within minutes with limited memory requirements, whether on the whole genome or on selected regions. Availability and Implementation: The code is freely available under Apache 2.0 license at www.github.com/Ensembl/Wiggletools Contact: zerbino@ebi.ac.uk or flicek@ebi.ac.uk
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 100
    Publication Date: 2014-03-28
    Description: Motivation: Pathway analysis tools are a powerful strategy to analyze ‘omics’ data in the field of systems biology. From a metabolic perspective, several pathway definitions can be found in the literature, each one appropriate for a particular study. Recently, a novel pathway concept termed carbon flux paths (CFPs) was introduced and benchmarked against existing approaches, showing a clear advantage for finding linear pathways from a given source to target metabolite. CFPs are simple paths in a metabolite–metabolite graph that satisfy typical constraints in stoichiometric models: mass balancing and thermodynamics (irreversibility). In addition, CFPs guarantee carbon exchange in each of their intermediate steps, but not between the source and the target metabolites and consequently false positive solutions may arise. These pathways often lack biological interest, particularly when studying biosynthetic or degradation routes of a metabolite. To overcome this issue, we amend the formulation in CFP, so as to account for atomic fate information. This approach is termed atomic CFP (aCFP). Results: By means of a side-by-side comparison in a medium scale metabolic network in Escherichia Coli , we show that aCFP provides more biologically relevant pathways than CFP, because canonical pathways are more easily recovered, which reflects the benefits of removing false positives. In addition, we demonstrate that aCFP can be successfully applied to genome-scale metabolic networks. As the quality of genome-scale atomic reconstruction is improved, methods such as the one presented here will undoubtedly be of value to interpret ‘omics’ data. Contact: fplanes@ceit.es or John.Beasley@brunel.ac.uk Supplementary Information: Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
Close ⊗
This website uses cookies and the analysis tool Matomo. More information can be found here...