ALBERT

All Library Books, journals and Electronic Records Telegrafenberg

Your email was sent successfully. Check your inbox.

An error occurred while sending the email. Please try again.

Proceed reservation?

Export
Filter
  • Articles  (6,139)
  • 2015-2019  (5,761)
  • 1995-1999  (378)
  • 1945-1949
  • BMC Bioinformatics  (1,120)
  • IEEE/ACM Transactions on Computational Biology and Bioinformatics  (504)
  • Algorithmica  (424)
  • 40961
  • 825
  • 9756
  • Computer Science  (6,139)
Collection
  • Articles  (6,139)
Years
Year
Topic
  • 1
    Publication Date: 2015-08-08
    Description: Background: Recently, the Bayesian method becomes more popular for analyzing high dimensional gene expression data as it allows us to borrow information across different genes and provides powerful estimators for evaluating gene expression levels. It is crucial to develop a simple but efficient gene selection algorithm for detecting differentially expressed (DE) genes based on the Bayesian estimators. Results: In this paper, by extending the two-criterion idea of Chen et al. (Chen M-H, Ibrahim JG, Chi Y-Y. A new class of mixture models for differential gene expression in DNA microarray data. J Stat Plan Inference. 2008;138:387–404), we propose two new gene selection algorithms for general Bayesian models and name these new methods as the confident difference criterion methods. One is based on the standardized differences between two mean expression values among genes; the other adds the differences between two variances to it. The proposed confident difference criterion methods first evaluate the posterior probability of a gene having different gene expressions between competitive samples and then declare a gene to be DE if the posterior probability is large. The theoretical connection between the proposed first method based on the means and the Bayes factor approach proposed by Yu et al. (Yu F, Chen M-H, Kuo L. Detecting differentially expressed genes using alibrated Bayes factors. Statistica Sinica. 2008;18:783–802) is established under the normal-normal-model with equal variances between two samples. The empirical performance of the proposed methods is examined and compared to those of several existing methods via several simulations. The results from these simulation studies show that the proposed confident difference criterion methods outperform the existing methods when comparing gene expressions across different conditions for both microarray studies and sequence-based high-throughput studies. A real dataset is used to further demonstrate the proposed methodology. In the real data application, the confident difference criterion methods successfully identified more clinically important DE genes than the other methods. Conclusion: The confident difference criterion method proposed in this paper provides a new efficient approach for both microarray studies and sequence-based high-throughput studies to identify differentially expressed genes.
    Electronic ISSN: 1471-2105
    Topics: Biology , Computer Science
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 2
    Publication Date: 2015-08-09
    Description: Background: Plant organ segmentation from 3D point clouds is a relevant task for plant phenotyping and plant growth observation. Automated solutions are required to increase the efficiency of recent high-throughput plant phenotyping pipelines. However, plant geometrical properties vary with time, among observation scales and different plant types. The main objective of the present research is to develop a fully automated, fast and reliable data driven approach for plant organ segmentation. Results: The automated segmentation of plant organs using unsupervised, clustering methods is crucial in cases where the goal is to get fast insights into the data or no labeled data is available or costly to achieve. For this we propose and compare data driven approaches that are easy-to-realize and make the use of standard algorithms possible. Since normalized histograms, acquired from 3D point clouds, can be seen as samples from a probability simplex, we propose to map the data from the simplex space into Euclidean space using Aitchisons log ratio transformation, or into the positive quadrant of the unit sphere using square root transformation. This, in turn, paves the way to a wide range of commonly used analysis techniques that are based on measuring the similarities between data points using Euclidean distance. We investigate the performance of the resulting approaches in the practical context of grouping 3D point clouds and demonstrate empirically that they lead to clustering results with high accuracy for monocotyledonous and dicotyledonous plant species with diverse shoot architecture. Conclusion: An automated segmentation of 3D point clouds is demonstrated in the present work. Within seconds first insights into plant data can be deviated – even from non-labelled data. This approach is applicable to different plant species with high accuracy. The analysis cascade can be implemented in future high-throughput phenotyping scenarios and will support the evaluation of the performance of different plant genotypes exposed to stress or in different environmental scenarios.
    Electronic ISSN: 1471-2105
    Topics: Biology , Computer Science
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 3
    facet.materialart.
    Unknown
    Springer
    Publication Date: 2015-08-13
    Description: In the classic k -center problem, we are given a metric graph, and the objective is to select k nodes as centers such that the maximum distance from any vertex to its closest center is minimized. In this paper, we consider two important generalizations of k -center, the matroid center problem and the knapsack center problem. Both problems are motivated by recent content distribution network applications. Our contributions can be summarized as follows: (1) We consider the matroid center problem in which the centers are required to form an independent set of a given matroid. We show this problem is NP-hard even on a line. We present a 3-approximation algorithm for the problem on general metrics. We also consider the outlier version of the problem where a given number of vertices can be excluded as outliers from the solution. We present a 7-approximation for the outlier version. (2) We consider the (multi-)knapsack center problem in which the centers are required to satisfy one (or more) knapsack constraint(s). It is known that the knapsack center problem with a single knapsack constraint admits a 3-approximation. However, when there are at least two knapsack constraints, we show this problem is not approximable at all. To complement the hardness result, we present a polynomial time algorithm that gives a 3-approximate solution such that one knapsack constraint is satisfied and the others may be violated by at most a factor of \(1+\epsilon \) . We also obtain a 3-approximation for the outlier version that may violate the knapsack constraint by \(1+\epsilon \) .
    Print ISSN: 0178-4617
    Electronic ISSN: 1432-0541
    Topics: Computer Science , Mathematics
    Published by Springer
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 4
    facet.materialart.
    Unknown
    Springer
    Publication Date: 2015-08-13
    Description: We design a succinct data structure for representing a poset that, given two elements, can report whether one precedes the other in constant time. This is equivalent to succinctly representing the transitive closure graph of the poset, and we note that the same method can also be used to succinctly represent the transitive reduction graph. For an n element poset, the data structure occupies \(n^2/4 + o(n^2)\) bits in the worst case. Furthermore, a slight extension to this data structure yields a succinct oracle for reachability in arbitrary directed graphs. Thus, using no more than a quarter of the space required to represent an arbitrary directed graph, reachability queries can be supported in constant time. We also consider the operation of listing all the successors or predecessors of a given element, and show how to do this in constant time per element reported using a slightly modified version of our succinct data structure.
    Print ISSN: 0178-4617
    Electronic ISSN: 1432-0541
    Topics: Computer Science , Mathematics
    Published by Springer
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 5
    Publication Date: 2015-08-13
    Description: Background: Tumorigenesis is an evolutionary process by which tumor cells acquire mutations through successive diversification and differentiation. There is much interest in reconstructing this process of evolution due to its relevance to identifying drivers of mutation and predicting future prognosis and drug response. Efforts are challenged by high tumor heterogeneity, though, both within and among patients. In prior work, we showed that this heterogeneity could be turned into an advantage by computationally reconstructing models of cell populations mixed to different degrees in distinct tumors. Such mixed membership model approaches, however, are still limited in their ability to dissect more than a few well-conserved cell populations across a tumor data set. Results: We present a method to improve on current mixed membership model approaches by better accounting for conserved progression pathways between subsets of cancers, which imply a structure to the data that has not previously been exploited. We extend our prior methods, which use an interpretation of the mixture problem as that of reconstructing simple geometric objects called simplices, to instead search for structured unions of simplices called simplicial complexes that one would expect to emerge from mixture processes describing branches along an evolutionary tree. We further improve on the prior work with a novel objective function to better identify mixtures corresponding to parsimonious evolutionary tree models. We demonstrate that this approach improves on our ability to accurately resolve mixtures on simulated data sets and demonstrate its practical applicability on a large RNASeq tumor data set. Conclusions: Better exploiting the expected geometric structure for mixed membership models produced from common evolutionary trees allows us to quickly and accurately reconstruct models of cell populations sampled from those trees. In the process, we hope to develop a better understanding of tumor evolution as well as other biological problems that involve interpreting genomic data gathered from heterogeneous populations of cells.
    Electronic ISSN: 1471-2105
    Topics: Biology , Computer Science
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 6
    Publication Date: 2015-08-13
    Description: Background: Understanding the architecture and function of RNA molecules requires methods for comparing and analyzing their tertiary and quaternary structures. While structural superposition of short RNAs is achievable in a reasonable time, large structures represent much bigger challenge. Therefore, we have developed a fast and accurate algorithm for RNA pairwise structure superposition called SETTER and implemented it in the SETTER web server. However, though biological relationships can be inferred by a pairwise structure alignment, key features preserved by evolution can be identified only from a multiple structure alignment. Thus, we extended the SETTER algorithm to the alignment of multiple RNA structures and developed the MultiSETTER algorithm. Results: In this paper, we present the updated version of the SETTER web server that implements a user friendly interface to the MultiSETTER algorithm. The server accepts RNA structures either as the list of PDB IDs or as user-defined PDB files. After the superposition is computed, structures are visualized in 3D and several reports and statistics are generated. Conclusion: To the best of our knowledge, the MultiSETTER web server is the first publicly available tool for a multiple RNA structure alignment. The MultiSETTER server offers the visual inspection of an alignment in 3D space which may reveal structural and functional relationships not captured by other multiple alignment methods based either on a sequence or on secondary structure motifs.
    Electronic ISSN: 1471-2105
    Topics: Biology , Computer Science
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 7
    Publication Date: 2015-08-13
    Description: Background: Today’s modern research of B and T cell antigen receptors (the immunoglobulins (IG) or antibodies and T cell receptors (TR)) forms the basis for detailed analyses of the human adaptive immune system. For instance, insights in the state of the adaptive immune system provide information that is essentially important in monitoring transplantation processes and the regulation of immune suppressiva. In this context, algorithms and tools are necessary for analyzing the IG and TR diversity on nucleotide as well as on amino acid sequence level, identifying highly proliferated clonotypes, determining the diversity of the cell repertoire found in a sample, comparing different states of the human immune system, and visualizing all relevant information. Results: We here present IMEX, a software framework for the detailed characterization and visualization of the state of human IG and TR repertoires. IMEX offers a broad range of algorithms for statistical analysis of IG and TR data, CDR and V-(D)-J analysis, diversity analysis by calculating the distribution of IG and TR, calculating primer efficiency, and comparing multiple data sets. We use a mathematical model that is able to describe the number of unique clonotypes in a sample taking into account the true number of unique sequences and read errors; we heuristically optimize the parameters of this model. IMEX uses IMGT/HighV-QUEST analysis outputs and includes methods for splitting and merging to enable the submission to this portal and to combine the outputs results, respectively. All calculation results can be visualized and exported. Conclusion: IMEX is an user-friendly and flexible framework for performing clonality experiments based on CDR and V-(D)-J rearranged regions, diversity analysis, primer efficiency, and various different visualization experiments. Using IMEX, various immunological reactions and alterations can be investigated in detail. IMEX is freely available for Windows and Unix platforms at http://bioinformatics.fh-hagenberg.at/immunexplorer/.
    Electronic ISSN: 1471-2105
    Topics: Biology , Computer Science
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 8
    facet.materialart.
    Unknown
    Springer
    Publication Date: 2015-08-14
    Description: We consider a graph observability problem: how many edge colors are needed for an unlabeled graph so that an agent, walking from node to node, can uniquely determine its location from just the observed color sequence of the walk? Specifically, let G ( n ,  d ) be an edge-colored subgraph of d -dimensional (directed or undirected) lattice of size \(n^d = n \times n \times \cdots \times n\) . We say that G ( n ,  d ) is t -observable if an agent can uniquely determine its current position in the graph from the color sequence of any t -dimensional walk, where the dimension is the number of different directions spanned by the edges of the walk. A walk in an undirected lattice G ( n ,  d ) has dimension between 1 and d , but a directed walk can have dimension between 1 and 2 d because of two different orientations for each axis. We derive bounds on the number of colors needed for t -observability. Our main result is that \(\varTheta (n^{d/t})\) colors are both necessary and sufficient for t -observability of G ( n ,  d ), where d is considered a constant. This shows an interesting dependence of graph observability on the ratio between the dimension of the lattice and that of the walk. In particular, the number of colors for full-dimensional walks is \(\varTheta (n^{1/2})\) in the directed case, and \(\varTheta (n)\) in the undirected case, independent of the lattice dimension. All of our results extend easily to non-square lattices: given a lattice graph of size \(N = n_1 \times n_2 \times \cdots \times n_d\) , the number of colors for t -observability is \(\varTheta (\root t \of {N})\) .
    Print ISSN: 0178-4617
    Electronic ISSN: 1432-0541
    Topics: Computer Science , Mathematics
    Published by Springer
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 9
    facet.materialart.
    Unknown
    Springer
    Publication Date: 2015-08-04
    Description: We revisit the matrix problems sparse null space and matrix sparsification , and show that they are equivalent. We then proceed to seek algorithms for these problems: we prove the hardness of approximation of these problems, and also give a powerful tool to extend algorithms and heuristics for sparse approximation theory to these problems.
    Print ISSN: 0178-4617
    Electronic ISSN: 1432-0541
    Topics: Computer Science , Mathematics
    Published by Springer
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 10
    facet.materialart.
    Unknown
    Springer
    Publication Date: 2015-08-04
    Description: We introduce Planar Disjoint Paths Completion , a completion counterpart of the Disjoint Paths problem, and study its parameterized complexity. The problem can be stated as follows: given a, not necessarily connected, plane graph G ,  k pairs of terminals, and a face F of G ,  find a minimum-size set of edges, if one exists, to be added inside F so that the embedding remains planar and the pairs become connected by k disjoint paths in the augmented network. Our results are twofold: first, we give an upper bound on the number of necessary additional edges when a solution exists. This bound is a function of k , independent of the size of G . Second, we show that the problem is fixed-parameter tractable, in particular, it can be solved in time \(f(k)\cdot n^{2}\) .
    Print ISSN: 0178-4617
    Electronic ISSN: 1432-0541
    Topics: Computer Science , Mathematics
    Published by Springer
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 11
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-08-07
    Description: This study proposes a quantitative measurement of split of the second heart sound (S2) based on nonstationary signal decomposition to deal with overlaps and energy modeling of the subcomponents of S2. The second heart sound includes aortic (A2) and pulmonic (P2) closure sounds. However, the split detection is obscured due to A2-P2 overlap and low energy of P2. To identify such split, HVD method is used to decompose the S2 into a number of components while preserving the phase information. Further, A2s and P2s are localized using smoothed pseudo Wigner-Ville distribution followed by reassignment method. Finally, the split iscalculated by taking the differences between the means of time indices of A2s and P2s. Experiments on total 33 clips of S2 signals are performed for evaluation of the method. The mean ± standard deviation of the split is 34.7 ± 4.6 ms. The method measures the splitefficiently, even when A2-P2 overlap is ≤ 20 ms and the normalized peak temporal ratio of P2 to A2 is low (≥ 0.22). This proposed method thus, demonstrates its robustness by defining split detectability (SDT), the split detection aptness through detecting P2s, by measuring upto 96 percent. Such findings reveal the effectiveness of the method as competent against the other baselines, especially for A2-P2 overlaps and low energy P2.
    Print ISSN: 1545-5963
    Electronic ISSN: 1557-9964
    Topics: Biology , Computer Science
    Published by Institute of Electrical and Electronics Engineers (IEEE) on behalf of The IEEE Computational Intelligence Society ; The IEEE Computer Society ; The IEEE Control Systems Society ; The IEEE Engineering in Medicine and Biology Society ; The Association for Computing Machinery.
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 12
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-08-07
    Description: Post-acquisition denoising of magnetic resonance (MR) images is an important step to improve any quantitative measurement of the acquired data. In this paper, assuming a Rician noise model, a new filtering method based on the linear minimum mean square error (LMMSE) estimation is introduced, which employs the self-similarity property of the MR data to restore the noise-less signal. This method takes into account the structural characteristics of images and the Bayesian mean square error (Bmse) of the estimator to address the denoising problem. In general, a twofold data processing approach is developed; first, the noisy MR data is processed using a patch-based L 2 -norm similarity measure to provide the primary set of samples required for the estimation process. Afterwards, the Bmse of the estimator is derived as the optimization function to analyze the pre-selected samples and minimize the error between the estimated and the underlying signal. Compared to the LMMSE method and also its recently proposed SNR-adapted realization (SNLMMSE), the optimized way of choosing the samples together with the automatic adjustment of the filtering parameters lead to a more robust estimation performance with our approach. Experimental results show the competitive performance of the proposed method in comparison with related state-of-the-art methods.
    Print ISSN: 1545-5963
    Electronic ISSN: 1557-9964
    Topics: Biology , Computer Science
    Published by Institute of Electrical and Electronics Engineers (IEEE) on behalf of The IEEE Computational Intelligence Society ; The IEEE Computer Society ; The IEEE Control Systems Society ; The IEEE Engineering in Medicine and Biology Society ; The Association for Computing Machinery.
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 13
    Publication Date: 2015-08-07
    Description: Large-scale ad hoc analytics of genomic data is popular using the R-programming language supported by over 700 software packages provided by Bioconductor. More recently, analytical jobs are benefitting from on-demand computing and storage, their scalability and their low maintenance cost, all of which are offered by the cloud. While biologists and bioinformaticists can take an analytical job and execute it on their personal workstations, it remains challenging to seamlessly execute the job on the cloud infrastructure without extensive knowledge of the cloud dashboard. How analytical jobs can not only with minimum effort be executed on the cloud, but also how both the resources and data required by the job can be managed is explored in this paper. An open-source light-weight framework for executing R-scripts using Bioconductor packages, referred to as ‘RBioCloud’, is designed and developed. RBioCloud offers a set of simple command-line tools for managing the cloud resources, the data and the execution of the job. Three biological test cases validate the feasibility of RBioCloud. The framework is available from http://www.rbiocloud.com .
    Print ISSN: 1545-5963
    Electronic ISSN: 1557-9964
    Topics: Biology , Computer Science
    Published by Institute of Electrical and Electronics Engineers (IEEE) on behalf of The IEEE Computational Intelligence Society ; The IEEE Computer Society ; The IEEE Control Systems Society ; The IEEE Engineering in Medicine and Biology Society ; The Association for Computing Machinery.
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 14
    Publication Date: 2015-08-07
    Description: Of major interest to translational genomics is the intervention in gene regulatory networks (GRNs) to affect cell behavior; in particular, to alter pathological phenotypes. Owing to the complexity of GRNs, accurate network inference is practically challenging and GRN models often contain considerable amounts of uncertainty. Considering the cost and time required for conducting biological experiments, it is desirable to have a systematic method for prioritizing potential experiments so that an experiment can be chosen to optimally reduce network uncertainty. Moreover, from a translational perspective it is crucial that GRN uncertainty be quantified and reduced in a manner that pertains to the operational cost that it induces, such as the cost of network intervention. In this work, we utilize the concept of mean objective cost of uncertainty (MOCU) to propose a novel framework for optimal experimental design. In the proposed framework, potential experiments are prioritized based on the MOCU expected to remain after conducting the experiment. Based on this prioritization, one can select an optimal experiment with the largest potential to reduce the pertinent uncertainty present in the current network model. We demonstrate the effectiveness of the proposed method via extensive simulations based on synthetic and real gene regulatory networks.
    Print ISSN: 1545-5963
    Electronic ISSN: 1557-9964
    Topics: Biology , Computer Science
    Published by Institute of Electrical and Electronics Engineers (IEEE) on behalf of The IEEE Computational Intelligence Society ; The IEEE Computer Society ; The IEEE Control Systems Society ; The IEEE Engineering in Medicine and Biology Society ; The Association for Computing Machinery.
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 15
    Publication Date: 2015-08-07
    Description: A novel approach to Contact Map Overlap (CMO) problem is proposed using the two dimensional clusters present in the contact maps. Each protein is represented as a set of the non-trivial clusters of contacts extracted from its contact map. The approach involves finding matching regions between the two contact maps using approximate 2D-pattern matching algorithm and dynamic programming technique. These matched pairs of small contact maps are submitted in parallel to a fast heuristic CMO algorithm. The approach facilitates parallelization at this level since all the pairs of contact maps can be submitted to the algorithm in parallel. Then, a merge algorithm is used in order to obtain the overall alignment. As a proof of concept, MSVNS, a heuristic CMO algorithm is used for global as well as local alignment. The divide and conquer approach is evaluated for two benchmark data sets that of Skolnick and Ding et al. It is interesting to note that along with achieving saving of time, better overlap is also obtained for certain protein folds.
    Print ISSN: 1545-5963
    Electronic ISSN: 1557-9964
    Topics: Biology , Computer Science
    Published by Institute of Electrical and Electronics Engineers (IEEE) on behalf of The IEEE Computational Intelligence Society ; The IEEE Computer Society ; The IEEE Control Systems Society ; The IEEE Engineering in Medicine and Biology Society ; The Association for Computing Machinery.
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 16
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-08-07
    Description: Canalizing genes possess broad regulatory power over a wide swath of regulatory processes. On the other hand, it has been hypothesized that the phenomenon of intrinsically multivariate prediction (IMP) is associated with canalization. However, applications have relied on user-selectable thresholds on the IMP score to decide on the presence of IMP. A methodology is developed here that avoids arbitrary thresholds, by providing a statistical test for the IMP score. In addition, the proposed procedure allows the incorporation of prior knowledge if available, which can alleviate the problem of loss of power due to small sample sizes. The issue of multiplicity of tests is addressed by family-wise error rate (FWER) and false discovery rate (FDR) controlling approaches. The proposed methodology is demonstrated by experiments using synthetic and real gene-expression data from studies on melanoma and ionizing radiation (IR) responsive genes. The results with the real data identified DUSP1 and p53, two well-known canalizing genes associated with melanoma and IR response, respectively, as the genes with a clear majority of IMP predictor pairs. This validates the potential of the proposed methodology as a tool for discovery of canalizing genes from binary gene-expression data. The procedure is made available through an R package.
    Print ISSN: 1545-5963
    Electronic ISSN: 1557-9964
    Topics: Biology , Computer Science
    Published by Institute of Electrical and Electronics Engineers (IEEE) on behalf of The IEEE Computational Intelligence Society ; The IEEE Computer Society ; The IEEE Control Systems Society ; The IEEE Engineering in Medicine and Biology Society ; The Association for Computing Machinery.
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 17
    Publication Date: 2015-08-21
    Description: Background: Membrane proteins represent over 25 % of human protein genes and account for more than 60 % of drug targets due to their accessibility from the extracellular environment. The increasing number of available crystal structures of these proteins in the Protein Data Bank permits an initial estimation of their structural properties.DescriptionWe have developed two web servers—TMalphaDB for α-helix bundles and TMbetaDB for β-barrels—to analyse the growing repertoire of available crystal structures of membrane proteins. TMalphaDB and TMbetaDB permit to search for these specific sequence motifs in a non-redundant structure database of transmembrane segments and quantify structural parameters such as ϕ and ψ backbone dihedral angles, χ 1 side chain torsion angle, unit bend and unit twist. Conclusions: The structural information offered by TMalphaDB and TMbetaDB permits to quantify structural distortions induced by specific sequence motifs, and to elucidate their role in the 3D structure. This specific structural information has direct implications in homology modeling of the growing sequences of membrane proteins lacking experimental structure. TMalphaDB and TMbetaDB are freely available at http://lmc.uab.cat/TMalphaDB and http://lmc.uab.cat/TMbetaDB.
    Electronic ISSN: 1471-2105
    Topics: Biology , Computer Science
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 18
    Publication Date: 2015-08-21
    Description: Background: Scoring DNA sequences against Position Weight Matrices (PWMs) is a widely adopted method to identify putative transcription factor binding sites. While common bioinformatics tools produce scores that can reflect the binding strength between a specific transcription factor and the DNA, these scores are not directly comparable between different transcription factors. Other methods, including p-value associated approaches (Touzet H, Varré J-S. Efficient and accurate p-value computation for position weight matrices. Algorithms Mol Biol. 2007;2(1510.1186):1748–7188), provide more rigorous ways to identify potential binding sites, but their results are difficult to interpret in terms of binding energy, which is essential for the modeling of transcription factor binding dynamics and enhancer activities. Results: Here, we provide two different ways to find the scaling parameter λ that allows us to infer binding energy from a PWM score. The first approach uses a PWM and background genomic sequence as input to estimate λ for a specific transcription factor, which we applied to show that λ distributions for different transcription factor families correspond with their DNA binding properties. Our second method can reliably convert λ between different PWMs of the same transcription factor, which allows us to directly compare PWMs that were generated by different approaches. Conclusion: These two approaches provide computationally efficient ways to scale PWM scores and estimate the strength of transcription factor binding sites in quantitative studies of binding dynamics. Their results are consistent with each other and previous reports in most of cases.
    Electronic ISSN: 1471-2105
    Topics: Biology , Computer Science
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 19
    Publication Date: 2015-08-22
    Description: We prove that in a graph with n vertices, induced chordal and interval subgraphs with the maximum number of vertices can be found in time \(\mathcal {O}(2^{\lambda n})\) for some \(\lambda 〈1\) . These are the first algorithms breaking the trivial \(2^n n^{\mathcal {O}(1)}\) bound of the brute-force search for these problems.
    Print ISSN: 0178-4617
    Electronic ISSN: 1432-0541
    Topics: Computer Science , Mathematics
    Published by Springer
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 20
    Publication Date: 2015-08-24
    Description: Background: Biological pathways are descriptive diagrams of biological processes widely used for functional analysis of differentially expressed genes or proteins. Primary data analysis, such as quality control, normalisation, and statistical analysis, is often performed in scripting languages like R, Perl, and Python. Subsequent pathway analysis is usually performed using dedicated external applications. Workflows involving manual use of multiple environments are time consuming and error prone. Therefore, tools are needed that enable pathway analysis directly within the same scripting languages used for primary data analyses. Existing tools have limited capability in terms of available pathway content, pathway editing and visualisation options, and export file formats. Consequently, making the full-fledged pathway analysis tool PathVisio available from various scripting languages will benefit researchers. Results: We developed PathVisioRPC, an XMLRPC interface for the pathway analysis software PathVisio. PathVisioRPC enables creating and editing biological pathways, visualising data on pathways, performing pathway statistics, and exporting results in several image formats in multiple programming environments.We demonstrate PathVisioRPC functionalities using examples in Python. Subsequently, we analyse a publicly available NCBI GEO gene expression dataset studying tumour bearing mice treated with cyclophosphamide in R. The R scripts demonstrate how calls to existing R packages for data processing and calls to PathVisioRPC can directly work together. To further support R users, we have created RPathVisio simplifying the use of PathVisioRPC in this environment. We have also created a pathway module for the microarray data analysis portal ArrayAnalysis.org that calls the PathVisioRPC interface to perform pathway analysis. This module allows users to use PathVisio functionality online without having to download and install the software and exemplifies how the PathVisioRPC interface can be used by data analysis pipelines for functional analysis of processed genomics data. Conclusions: PathVisioRPC enables data visualisation and pathway analysis directly from within various analytical environments used for preliminary analyses. It supports the use of existing pathways from WikiPathways or pathways created using the RPC itself. It also enables automation of tasks performed using PathVisio, making it useful to PathVisio users performing repeated visualisation and analysis tasks. PathVisioRPC is freely available for academic and commercial use at http://projects.bigcat.unimaas.nl/pathvisiorpc.
    Electronic ISSN: 1471-2105
    Topics: Biology , Computer Science
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 21
    Publication Date: 2015-08-25
    Description: We investigate the effect of limiting the number of reserve prices on the revenue in a probabilistic single item auction. In the model considered, bidders compete for an impression drawn from a known distribution of possible types. The auction mechanism sets up to \(\ell \) reserve prices, and each impression type is assigned the highest reserve price lower than the valuation of some bidder for it. The bidder proposing the highest bid for an arriving impression gets it provided his bid is at least the corresponding reserve price, and pays the maximum between the reserve price and the second highest bid. Since the number of impression types may be huge, we consider the revenue \(R_{\ell }\) that can be ensured using only \(\ell \) reserve prices. Our main results are tight lower bounds on \(R_{\ell }\) for the cases where the impressions are drawn from the uniform or a general probability distribution.
    Print ISSN: 0178-4617
    Electronic ISSN: 1432-0541
    Topics: Computer Science , Mathematics
    Published by Springer
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 22
    Publication Date: 2015-08-20
    Description: Background: Detecting and quantifying isoforms from RNA-seq data is an important but challenging task. The problem is often ill-posed, particularly at low coverage. One promising direction is to exploit several samples simultaneously. Results: We propose a new method for solving the isoform deconvolution problem jointly across several samples. We formulate a convex optimization problem that allows to share information between samples and that we solve efficiently. We demonstrate the benefits of combining several samples on simulated and real data, and show that our approach outperforms pooling strategies and methods based on integer programming. Conclusion: Our convex formulation to jointly detect and quantify isoforms from RNA-seq data of multiple related samples is a computationally efficient approach to leverage the hypotheses that some isoforms are likely to be present in several samples. The software and source code are available at http://cbio.ensmp.fr/flipflop.
    Electronic ISSN: 1471-2105
    Topics: Biology , Computer Science
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 23
    Publication Date: 2015-08-20
    Description: Background: The cascade computer model (CCM) was designed as a machine-learning feature platform for prediction of drug diffusivity from the mucoadhesive formulations. Three basic models (the statistical regression model, the K nearest neighbor model and the modified version of the back propagation neural network) in CCM operate sequentially in close collaboration with each other, employing the estimated value obtained from the afore-positioned base model as an input value to the next-positioned base model in the cascade.The effects of various parameters on the pharmacological efficacy of a female controlled drug delivery system (FcDDS) intended for prevention of women from HIV-1 infection were evaluated using an in vitro apparatus “Simulant Vaginal System” (SVS). We used computer simulations to explicitly examine the changes in drug diffusivity from FcDDS and determine the prognostic potency of each variable for in vivo prediction of formulation efficacy. The results obtained using the CCM approach were compared with those from individual multiple regression model. Results: CCM significantly lowered the percentage mean error (PME) and enhanced r 2 values as compared with those from the multiple regression models. It was noted that CCM generated the PME value of 21.82 at 48169 epoch iterations, which is significantly improved from the PME value of 29.91 % at 118344 epochs by the back propagation network model. The results of this study indicated that the sequential ensemble of the classifiers allowed for an accurate prediction of the domain with significantly lowered variance and considerably reduces the time required for training phase. Conclusion: CCM is accurate, easy to operate, time and cost-effective, and thus, can serve as a valuable tool for prediction of drug diffusivity from mucoadhesive formulations. CCM may yield new insights into understanding how drugs are diffused from the carrier systems and exert their efficacies under various clinical conditions.
    Electronic ISSN: 1471-2105
    Topics: Biology , Computer Science
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 24
    Publication Date: 2015-08-21
    Description: Background: In many domains, scientists build complex simulators of natural phenomena that encode their hypotheses about the underlying processes. These simulators can be deterministic or stochastic, fast or slow, constrained or unconstrained, and so on. Optimizing the simulators with respect to a set of parameter values is common practice, resulting in a single parameter setting that minimizes an objective subject to constraints. Results: We propose algorithms for post optimization posterior evaluation (POPE) of simulators. The algorithms compute and visualize all simulations that can generate results of the same or better quality than the optimum, subject to constraints. These optimization posteriors are desirable for a number of reasons among which are easy interpretability, automatic parameter sensitivity and correlation analysis, and posterior predictive analysis. Our algorithms are simple extensions to an existing simulation-based inference framework called approximate Bayesian computation. POPE is applied two biological simulators: a fast and stochastic simulator of stem-cell cycling and a slow and deterministic simulator of tumor growth patterns. Conclusions: POPE allows the scientist to explore and understand the role that constraints, both on the input and the output, have on the optimization posterior. As a Bayesian inference procedure, POPE provides a rigorous framework for the analysis of the uncertainty of an optimal simulation parameter setting.
    Electronic ISSN: 1471-2105
    Topics: Biology , Computer Science
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 25
    Publication Date: 2015-08-13
    Description: Background: Estimating the phylogenetic position of bacterial and archaeal organisms by genetic sequence comparisons is considered as the gold-standard in taxonomy. This is also a way to identify the species of origin of the sequence. The quality of the reference database used in such analyses is crucial: the database must reflect the up-to-date bacterial nomenclature and accurately indicate the species of origin of its sequences.DescriptionleBIBI QBPP is a web tool taking as input a series of nucleotide sequences belonging to one of a set of reference markers (e.g., SSU rRNA, rpoB, groEL2) and automatically retrieving closely related sequences, aligning them, and performing phylogenetic reconstruction using an approximate maximum likelihood approach. The system returns a set of quality parameters and, if possible, a suggested taxonomic assigment for the input sequences. The reference databases are extracted from GenBank and present four degrees of stringency, from the “superstringent” degree (one type strain per species) to the loosely parsed degree (“lax” database). A set of one hundred to more than a thousand sequences may be analyzed at a time. The speed of the process has been optimized through careful hardware selection and database design. Conclusion: leBIBI QBPP is a powerful tool helping biologists to position bacterial or archaeal sequence commonly used markers in a phylogeny. It is a diagnostic tool for clinical, industrial and environmental microbiology laboratory, as well as an exploratory tool for more specialized laboratories. Its main advantages, relatively to comparable systems are: i) the use of a broad set of databases covering diverse markers with various degrees of stringency; ii) the use of an approximate Maximum Likelihood approach for phylogenetic reconstruction; iii) a speed compatible with on-line usage; and iv) providing fully documented results to help the user in decision making.
    Electronic ISSN: 1471-2105
    Topics: Biology , Computer Science
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 26
    Publication Date: 2015-07-30
    Description: Background: Despite the tremendous drop in the cost of nucleotide sequencing in recent years, many research projects still utilize sequencing of pools containing multiple samples for the detection of sequence variants as a cost saving measure. Various software tools exist to analyze these pooled sequence data, yet little has been reported on the relative accuracy and ease of use of these different programs. Results: In this manuscript we evaluate five different variant detection programs—The Genome Analysis Toolkit (GATK), CRISP, LoFreq, VarScan, and SNVer—with regard to their ability to detect variants in synthetically pooled Illumina sequencing data, by creating simulated pooled binary alignment/map (BAM) files using single-sample sequencing data from varying numbers of previously characterized samples at varying depths of coverage per sample. We report the overall runtimes and memory usage of each program, as well as each program’s sensitivity and specificity to detect known true variants. Conclusions: GATK, CRISP, and LoFreq all gave balanced accuracy of 80 % or greater for datasets with varying per-sample depth of coverage and numbers of samples per pool. VarScan and SNVer generally had balanced accuracy lower than 80 %. CRISP and LoFreq required up to four times less computational time and up to ten times less physical memory than GATK did, and without filtering, gave results with the highest sensitivity. VarScan and SNVer had generally lower false positive rates, but also significantly lower sensitivity than the other three programs.
    Electronic ISSN: 1471-2105
    Topics: Biology , Computer Science
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 27
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-08-07
    Description: The Regression Network plugin for Cytoscape ( RegNetC ) implements the RegNet algorithm for the inference of transcriptional association network from gene expression profiles. This algorithm is a model tree-based method to detect the relationship between each gene and the remaining genes simultaneously instead of analyzing individually each pair of genes as correlation-based methods do. Model trees are a very useful technique to estimate the gene expression value by regression models and favours localized similarities over more global similarity, which is one of the major drawbacks of correlation-based methods. Here, we present an integrated software suite, named RegNetC , as a Cytoscape plugin that can operate on its own as well. RegNetC facilitates, according to user-defined parameters, the resulted transcriptional gene association network in .sif format for visualization, analysis and interoperates with other Cytoscape plugins, which can be exported for publication figures. In addition to the network, the RegNetC plugin also provides the quantitative relationships between genes expression values of those genes involved in the inferred network, i.e., those defined by the regression models.
    Print ISSN: 1545-5963
    Electronic ISSN: 1557-9964
    Topics: Biology , Computer Science
    Published by Institute of Electrical and Electronics Engineers (IEEE) on behalf of The IEEE Computational Intelligence Society ; The IEEE Computer Society ; The IEEE Control Systems Society ; The IEEE Engineering in Medicine and Biology Society ; The Association for Computing Machinery.
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 28
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-08-07
    Description: We introduce a new method for normalization of data acquired by liquid chromatography coupled with mass spectrometry (LC-MS) in label-free differential expression analysis. Normalization of LC-MS data is desired prior to subsequent statistical analysis to adjust variabilities in ion intensities that are not caused by biological differences but experimental bias. There are different sources of bias including variabilities during sample collection and sample storage, poor experimental design, noise, etc. In addition, instrument variability in experiments involving a large number of LC-MS runs leads to a significant drift in intensity measurements. Although various methods have been proposed for normalization of LC-MS data, there is no universally applicable approach. In this paper, we propose a Bayesian normalization model (BNM) that utilizes scan-level information from LC-MS data. Specifically, the proposed method uses peak shapes to model the scan-level data acquired from extracted ion chromatograms (EIC) with parameters considered as a linear mixed effects model. We extended the model into BNM with drift (BNMD) to compensate for the variability in intensity measurements due to long LC-MS runs. We evaluated the performance of our method using synthetic and experimental data. In comparison with several existing methods, the proposed BNM and BNMD yielded significant improvement.
    Print ISSN: 1545-5963
    Electronic ISSN: 1557-9964
    Topics: Biology , Computer Science
    Published by Institute of Electrical and Electronics Engineers (IEEE) on behalf of The IEEE Computational Intelligence Society ; The IEEE Computer Society ; The IEEE Control Systems Society ; The IEEE Engineering in Medicine and Biology Society ; The Association for Computing Machinery.
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 29
    Publication Date: 2015-08-07
    Description: Performing clustering analysis is one of the important research topics in cancer discovery using gene expression profiles, which is crucial in facilitating the successful diagnosis and treatment of cancer. While there are quite a number of research works which perform tumor clustering, few of them considers how to incorporate fuzzy theory together with an optimization process into a consensus clustering framework to improve the performance of clustering analysis. In this paper, we first propose a random double clustering based cluster ensemble framework (RDCCE) to perform tumor clustering based on gene expression data. Specifically, RDCCE generates a set of representative features using a randomly selected clustering algorithm in the ensemble, and then assigns samples to their corresponding clusters based on the grouping results. In addition, we also introduce the random double clustering based fuzzy cluster ensemble framework (RDCFCE), which is designed to improve the performance of RDCCE by integrating the newly proposed fuzzy extension model into the ensemble framework. RDCFCE adopts the normalized cut algorithm as the consensus function to summarize the fuzzy matrices generated by the fuzzy extension models, partition the consensus matrix, and obtain the final result. Finally, adaptive RDCFCE (A-RDCFCE) is proposed to optimize RDCFCE and improve the performance of RDCFCE further by adopting a self-evolutionary process (SEPP) for the parameter set. Experiments on real cancer gene expression profiles indicate that RDCFCE and A-RDCFCE works well on these data sets, and outperform most of the state-of-the-art tumor clustering algorithms.
    Print ISSN: 1545-5963
    Electronic ISSN: 1557-9964
    Topics: Biology , Computer Science
    Published by Institute of Electrical and Electronics Engineers (IEEE) on behalf of The IEEE Computational Intelligence Society ; The IEEE Computer Society ; The IEEE Control Systems Society ; The IEEE Engineering in Medicine and Biology Society ; The Association for Computing Machinery.
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 30
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-08-07
    Description: Named-entity recognition (NER) plays an important role in the development of biomedical databases. However, the existing NER tools produce multifarious named-entities which may result in both curatable and non-curatable markers. To facilitate biocuration with a straightforward approach, classifying curatable named-entities is helpful with regard to accelerating the biocuration workflow. Co-occurrence Interaction Nexus with Named-entity Recognition (CoINNER) is a web-based tool that allows users to identify genes, chemicals, diseases, and action term mentions in the Comparative Toxicogenomic Database (CTD). To further discover interactions, CoINNER uses multiple advanced algorithms to recognize the mentions in the BioCreative IV CTD Track. CoINNER is developed based on a prototype system that annotated gene, chemical, and disease mentions in PubMed abstracts at BioCreative 2012 Track I (literature triage). We extended our previous system in developing CoINNER. The pre-tagging results of CoINNER were developed based on the state-of-the-art named entity recognition tools in BioCreative III. Next, a method based on conditional random fields (CRFs) is proposed to predict chemical and disease mentions in the articles. Finally, action term mentions were collected by latent Dirichlet allocation (LDA). At the BioCreative IV CTD Track, the best F-measures reached for gene/protein, chemical/drug and disease NER were 54 percent while CoINNER achieved a 61.5 percent F-measure. System URL: http://ikmbio.csie.ncku.edu.tw/coinner/introduction.htm.
    Print ISSN: 1545-5963
    Electronic ISSN: 1557-9964
    Topics: Biology , Computer Science
    Published by Institute of Electrical and Electronics Engineers (IEEE) on behalf of The IEEE Computational Intelligence Society ; The IEEE Computer Society ; The IEEE Control Systems Society ; The IEEE Engineering in Medicine and Biology Society ; The Association for Computing Machinery.
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 31
    Publication Date: 2015-08-07
    Description: Next-generation short-read sequencing is widely utilized in genomic studies. Biological applications require an alignment step to map sequencing reads to the reference genome, before acquiring expected genomic information. This requirement makes alignment accuracy a key factor for effective biological interpretation. Normally, when accounting for measurement errors and single nucleotide polymorphisms, short read mappings with a few mismatches are generally considered acceptable. However, to further improve the efficiency of short-read sequencing alignment, we propose a method to retrieve additional reliably aligned reads (reads with more than a pre-defined number of mismatches), using a Bayesian-based approach. In this method, we first retrieve the sequence context around the mismatched nucleotides within the already aligned reads; these loci contain the genomic features where sequencing errors occur. Then, using the derived pattern, we evaluate the remaining (typically discarded) reads with more than the allowed number of mismatches, and calculate a score that represents the probability that a specific alignment is correct. This strategy allows the extraction of more reliably aligned reads, therefore improving alignment sensitivity. Implementation: The source code of our tool, ResSeq, can be downloaded from: https://github.com/hrbeubiocenter/Resseq.
    Print ISSN: 1545-5963
    Electronic ISSN: 1557-9964
    Topics: Biology , Computer Science
    Published by Institute of Electrical and Electronics Engineers (IEEE) on behalf of The IEEE Computational Intelligence Society ; The IEEE Computer Society ; The IEEE Control Systems Society ; The IEEE Engineering in Medicine and Biology Society ; The Association for Computing Machinery.
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 32
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-08-07
    Description: In genome assembly graphs, motifs such as tips, bubbles, and cross links are studied in order to find sequencing errors and to understand the nature of the genome. Superbubble, a complex generalization of bubbles, was recently proposed as an important subgraph class for analyzing assembly graphs. At present, a quadratic time algorithm is known. This paper gives an -time algorithm to solve this problem for a graph with $m$ edges.
    Print ISSN: 1545-5963
    Electronic ISSN: 1557-9964
    Topics: Biology , Computer Science
    Published by Institute of Electrical and Electronics Engineers (IEEE) on behalf of The IEEE Computational Intelligence Society ; The IEEE Computer Society ; The IEEE Control Systems Society ; The IEEE Engineering in Medicine and Biology Society ; The Association for Computing Machinery.
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 33
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-08-07
    Description: The papers in this special section focus on software and databases that are central in bioinformatics and computational biology.. These programs are playing more and more important roles in biology and medical research. These papers cover a broad range of topics, including computational genomics and transcriptomics, analysis of biological networks and interactions, drug design, biomedical signal/image analysis, biomedical text mining and ontologies, biological data mining, visualization and integration, and high performance computing application in bioinformatics.
    Print ISSN: 1545-5963
    Electronic ISSN: 1557-9964
    Topics: Biology , Computer Science
    Published by Institute of Electrical and Electronics Engineers (IEEE) on behalf of The IEEE Computational Intelligence Society ; The IEEE Computer Society ; The IEEE Control Systems Society ; The IEEE Engineering in Medicine and Biology Society ; The Association for Computing Machinery.
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 34
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-08-07
    Description: In the computational biology community, machine learning algorithms are key instruments for many applications, including the prediction of gene-functions based upon the available biomolecular annotations. Additionally, they may also be employed to compute similarity between genes or proteins. Here, we describe and discuss a software suite we developed to implement and make publicly available some of such prediction methods and a computational technique based upon Latent Semantic Indexing (LSI), which leverages both inferred and available annotations to search for semantically similar genes. The suite consists of three components. BioAnnotationPredictor is a computational software module to predict new gene-functions based upon Singular Value Decomposition of available annotations. SimilBio is a Web module that leverages annotations available or predicted by BioAnnotationPredictor to discover similarities between genes via LSI. The suite includes also SemSim , a new Web service built upon these modules to allow accessing them programmatically. We integrated SemSim in the Bio Search Computing framework (http://www.bioinformatics.deib.polimi.it/bio-seco/seco/), where users can exploit the Search Computing technology to run multi-topic complex queries on multiple integrated Web services. Accordingly, researchers may obtain ranked answers involving the computation of the functional similarity between genes in support of biomedical knowledge discovery.
    Print ISSN: 1545-5963
    Electronic ISSN: 1557-9964
    Topics: Biology , Computer Science
    Published by Institute of Electrical and Electronics Engineers (IEEE) on behalf of The IEEE Computational Intelligence Society ; The IEEE Computer Society ; The IEEE Control Systems Society ; The IEEE Engineering in Medicine and Biology Society ; The Association for Computing Machinery.
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 35
    Publication Date: 2015-08-07
    Description: Identification of cancer subtypes plays an important role in revealing useful insights into disease pathogenesis and advancing personalized therapy. The recent development of high-throughput sequencing technologies has enabled the rapid collection of multi-platform genomic data (e.g., gene expression, miRNA expression, and DNA methylation) for the same set of tumor samples. Although numerous integrative clustering approaches have been developed to analyze cancer data, few of them are particularly designed to exploit both deep intrinsic statistical properties of each input modality and complex cross-modality correlations among multi-platform input data. In this paper, we propose a new machine learning model, called multimodal deep belief network (DBN), to cluster cancer patients from multi-platform observation data. In our integrative clustering framework, relationships among inherent features of each single modality are first encoded into multiple layers of hidden variables, and then a joint latent model is employed to fuse common features derived from multiple input modalities. A practical learning algorithm, called contrastive divergence (CD), is applied to infer the parameters of our multimodal DBN model in an unsupervised manner. Tests on two available cancer datasets show that our integrative data analysis approach can effectively extract a unified representation of latent features to capture both intra- and cross-modality correlations, and identify meaningful disease subtypes from multi-platform cancer data. In addition, our approach can identify key genes and miRNAs that may play distinct roles in the pathogenesis of different cancer subtypes. Among those key miRNAs, we found that the expression level of miR-29a is highly correlated with survival time in ovarian cancer patients. These results indicate that our multimodal DBN based data analysis approach may have practical applications in cancer pathogenesis studies and provide useful guidelines for personali- ed cancer therapy.
    Print ISSN: 1545-5963
    Electronic ISSN: 1557-9964
    Topics: Biology , Computer Science
    Published by Institute of Electrical and Electronics Engineers (IEEE) on behalf of The IEEE Computational Intelligence Society ; The IEEE Computer Society ; The IEEE Control Systems Society ; The IEEE Engineering in Medicine and Biology Society ; The Association for Computing Machinery.
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 36
    Publication Date: 2015-08-08
    Description: The new dual-pivot Quicksort by Vladimir Yaroslavskiy—used in Oracle’s Java runtime library since version 7—features intriguing asymmetries. They make a basic variant of this algorithm use less comparisons than classic single-pivot Quicksort. In this paper, we extend the analysis to the case where the two pivots are chosen as fixed order statistics of a random sample. Surprisingly, dual-pivot Quicksort then needs more comparisons than a corresponding version of classic Quicksort, so it is clear that counting comparisons is not sufficient to explain the running time advantages observed for Yaroslavskiy’s algorithm in practice. Consequently, we take a more holistic approach and give also the precise leading term of the average number of swaps, the number of executed Java Bytecode instructions and the number of scanned elements, a new simple cost measure that approximates I/O costs in the memory hierarchy. We determine optimal order statistics for each of the cost measures. It turns out that the asymmetries in Yaroslavskiy’s algorithm render pivots with a systematic skew more efficient than the symmetric choice. Moreover, we finally have a convincing explanation for the success of Yaroslavskiy’s algorithm in practice: compared with corresponding versions of classic single-pivot Quicksort, dual-pivot Quicksort needs significantly less I/Os, both with and without pivot sampling.
    Print ISSN: 0178-4617
    Electronic ISSN: 1432-0541
    Topics: Computer Science , Mathematics
    Published by Springer
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 37
    Publication Date: 2015-06-05
    Description: In this paper, we consider the Unsplittable (hard) Capacitated Facility Location Problem (UCFLP) with uniform capacities and present new approximation algorithms for it. This problem is a generalization of the classical facility location problem where each facility can serve at most u units of demand and each client must be served by exactly one facility. This problem is motivated by its applications in many practical problems including supply chain problems of indivisible goods (Verter in Foundations of location analysis, chapter 2. International series in operations research and management science. Springer, Berlin, 2011 ) and the assignment problem in the content distribution networks (Bateni and Hajiaghayi in Proceedings of the nineteenth annual ACM-SIAM symposium on discrete algorithms, pp 805–814, 2009 ). While there are several approximation algorithms for the soft capacitated version of this problem (in which one can open multiple copies of each facility) or the splittable version (in which the demand of each client can be divided to be served by multiple open facilities), there are very few results for the UCFLP. It is known that it is NP-hard to approximate this problem within any factor without violating the capacities. So we consider bicriteria \((\alpha ,\beta )\) -approximations where the algorithm returns a solution whose cost is within factor \(\alpha \) of the optimum and violates the capacity constraints within factor \(\beta \) . Shmoys et al. (Proceedings of the twenty-ninth annual ACM symposium on theory of computing, pp 265–274, 1997 ) were the first to consider this problem and gave a (9, 4)-approximation. Later results imply ( O (1), 2)-approximations, however, no constant factor approximation is known with capacity violation of less than 2. We present a framework for designing bicriteria approximation algorithms for this problem and show two new approximation algorithms with factors (9, 3 / 2) and (29.315, 4 / 3). These are the first algorithms with constant approximation in which the violation of capacities is below 2. The heart of our algorithm is a reduction from the UCFLP to a restricted version of the problem. One feature of this reduction is that any \((O(1),1+{\epsilon })\) -approximation for the restricted version implies an \((O(1),1+{\epsilon })\) -approximation for the UCFLP and we believe our techniques might be useful towards finding such approximations or perhaps \((f({\epsilon }),1+{\epsilon })\) -approximation for the UCFLP for some function f . In addition, we present a quasi-polynomial time \((1+\epsilon ,1+\epsilon )\) -approximation for the (uniform) UCFLP in Euclidean metrics, for any constant \({\epsilon }〉0\) .
    Print ISSN: 0178-4617
    Electronic ISSN: 1432-0541
    Topics: Computer Science , Mathematics
    Published by Springer
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 38
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-06-06
    Description: The papers in this special issue contain extended versions of works that were originally presented at the Brazilian Symposium on Bioinformatics 2013 (BSB 2013), held in Recife, Brazil, November 3-6, 2013.
    Print ISSN: 1545-5963
    Electronic ISSN: 1557-9964
    Topics: Biology , Computer Science
    Published by Institute of Electrical and Electronics Engineers (IEEE) on behalf of The IEEE Computational Intelligence Society ; The IEEE Computer Society ; The IEEE Control Systems Society ; The IEEE Engineering in Medicine and Biology Society ; The Association for Computing Machinery.
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 39
    Publication Date: 2015-06-06
    Description: Computational methods for predicting protein-protein interactions are important tools that can complement high-throughput technologies and guide biologists in designing new laboratory experiments. The proteins and the interactions between them can be described by a network which is characterized by several topological properties. Information about proteins and interactions between them, in combination with knowledge about topological properties of the network, can be used for developing computational methods that can accurately predict unknown protein-protein interactions. This paper presents a supervised learning framework based on Bayesian inference for combining two types of information: i) network topology information, and ii) information related to proteins and the interactions between them. The motivation of our model is that by combining these two types of information one can achieve a better accuracy in predicting protein-protein interactions, than by using models constructed from these two types of information independently.
    Print ISSN: 1545-5963
    Electronic ISSN: 1557-9964
    Topics: Biology , Computer Science
    Published by Institute of Electrical and Electronics Engineers (IEEE) on behalf of The IEEE Computational Intelligence Society ; The IEEE Computer Society ; The IEEE Control Systems Society ; The IEEE Engineering in Medicine and Biology Society ; The Association for Computing Machinery.
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 40
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-06-06
    Description: We develop a theory of algebraic operations over linear and context-free grammars that makes it possible to combine simple “atomic” grammars operating on single sequences into complex, multi-dimensional grammars. We demonstrate the utility of this framework by constructing the search spaces of complex alignment problems on multiple input sequences explicitly as algebraic expressions of very simple one-dimensional grammars. In particular, we provide a fully worked frameshift-aware, semiglobal DNA-protein alignment algorithm whose grammar is composed of products of small, atomic grammars. The compiler accompanying our theory makes it easy to experiment with the combination of multiple grammars and different operations. Composite grammars can be written out in $ {rm L}^AT_{E}X$ for documentation and as a guide to implementation of dynamic programming algorithms. An embedding in Haskell as a domain-specific language makes the theory directly accessible to writing and using grammar products without the detour of an external compiler. Software and supplemental files available here: http://www.bioinf.uni-leipzig.de/Software/gramprod/
    Print ISSN: 1545-5963
    Electronic ISSN: 1557-9964
    Topics: Biology , Computer Science
    Published by Institute of Electrical and Electronics Engineers (IEEE) on behalf of The IEEE Computational Intelligence Society ; The IEEE Computer Society ; The IEEE Control Systems Society ; The IEEE Engineering in Medicine and Biology Society ; The Association for Computing Machinery.
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 41
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-06-06
    Description: Recent advancements in genomics and proteomics provide a solid foundation for understanding the pathogenesis of diabetes. Proteomics of diabetes associated pathways help to identify the most potent target for the management of diabetes. The relevant datasets are scattered in various prominent sources which takes much time to select the therapeutic target for the clinical management of diabetes. However, additional information about target proteins is needed for validation. This lacuna may be resolved by linking diabetes associated genes, pathways and proteins and it will provide a strong base for the treatment and planning management strategies of diabetes. Thus, a web source “Diabetes Associated Proteins Database (DAPD)” has been developed to link the diabetes associated genes, pathways and proteins using PHP, MySQL. The current version of DAPD has been built with proteins associated with different types of diabetes. In addition, DAPD has been linked to external sources to gain the access to more participatory proteins and their pathway network. DAPD will reduce the time and it is expected to pave the way for the discovery of novel anti-diabetic leads using computational drug designing for diabetes management. DAPD is open accessed via following url www.mkarthikeyan.bioinfoau.org/dapd.
    Print ISSN: 1545-5963
    Electronic ISSN: 1557-9964
    Topics: Biology , Computer Science
    Published by Institute of Electrical and Electronics Engineers (IEEE) on behalf of The IEEE Computational Intelligence Society ; The IEEE Computer Society ; The IEEE Control Systems Society ; The IEEE Engineering in Medicine and Biology Society ; The Association for Computing Machinery.
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 42
    Publication Date: 2015-06-06
    Description: Determining the glycan topology automatically from mass spectra represents a great challenge. Existing methods fall into approximate and exact ones. The former including greedy and heuristic ones can reduce the computational complexity, but suffer from information lost in the procedure of glycan interpretation. The latter including dynamic programming and exhaustive enumeration are much slower than the former. In the past years, nearly all emerging methods adopted a tree structure to represent a glycan. They share such problems as repetitive peak counting in reconstructing a candidate structure. Besides, tree-based glycan representation methods often have to give different computational formulas for binary and ternary glycans. We propose a new directed acyclic graph structure for glycan representation. Based on it, this work develops a de novo algorithm to accurately reconstruct the tree structure iteratively from mass spectra with logical constraints and some known biosynthesis rules, by a single computational formula. The experiments on multiple complex glycans extracted from human serum show that the proposed algorithm can achieve higher accuracy to determine a glycan topology than prior methods without increasing computational burden.
    Print ISSN: 1545-5963
    Electronic ISSN: 1557-9964
    Topics: Biology , Computer Science
    Published by Institute of Electrical and Electronics Engineers (IEEE) on behalf of The IEEE Computational Intelligence Society ; The IEEE Computer Society ; The IEEE Control Systems Society ; The IEEE Engineering in Medicine and Biology Society ; The Association for Computing Machinery.
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 43
    Publication Date: 2015-06-06
    Description: A crucial step in understanding the architecture of cells and tissues from microscopy images, and consequently explain important biological events such as wound healing and cancer metastases, is the complete extraction and enumeration of individual filaments from the cellular cytoskeletal network. Current efforts at quantitative estimation of filament length distribution, architecture and orientation from microscopy images are predominantly limited to visual estimation and indirect experimental inference. Here we demonstrate the application of a new algorithm to reliably estimate centerlines of biological filament bundles and extract individual filaments from the centerlines by systematically disambiguating filament intersections. We utilize a filament enhancement step followed by reverse diffusion based filament localization and an integer programming based set combination to systematically extract accurate filaments automatically from microscopy images. Experiments on simulated and real confocal microscope images of flat cells (2D images) show efficacy of the new method.
    Print ISSN: 1545-5963
    Electronic ISSN: 1557-9964
    Topics: Biology , Computer Science
    Published by Institute of Electrical and Electronics Engineers (IEEE) on behalf of The IEEE Computational Intelligence Society ; The IEEE Computer Society ; The IEEE Control Systems Society ; The IEEE Engineering in Medicine and Biology Society ; The Association for Computing Machinery.
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 44
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-06-06
    Description: The Local/Global Alignment (Zemla, 2003), or LGA, is a popular method for the comparison of protein structures. One of the two components of LGA requires us to compute the longest common contiguous segments between two protein structures. That is, given two structures $A=(a_1, ldots , a_n)$ and $B=(b_1, ldots , b_n)$ where $a_k$ , $b_kin mathbb {R}^3$ , we are to find, among all the segments $f=(a_i,ldots ,a_j)$ and $g=(b_i,ldots ,b_j)$ that fulfill a certain criterion regarding their similarity, those of the maximum length. We consider the following criteria: (1) the root mean squared deviation (RMSD) between $f$ and $g$ is to be within a given $tin mathbb {R}$ ; (2) $f$ and $g$ can be superposed such that for each $k$ , $ile kle j$ , $Vert a_k-b_kVert le t$ for a given $tin mathbb {R}$ . We give an algorithm of $O(n;log; n+n{{boldsymbol l}})$ time complexity when the first requirement applies, where ${{boldsymbol l}}$ is the maximum length of the segments fulfilling the criterion. We show an FPTAS which, for any
    Print ISSN: 1545-5963
    Electronic ISSN: 1557-9964
    Topics: Biology , Computer Science
    Published by Institute of Electrical and Electronics Engineers (IEEE) on behalf of The IEEE Computational Intelligence Society ; The IEEE Computer Society ; The IEEE Control Systems Society ; The IEEE Engineering in Medicine and Biology Society ; The Association for Computing Machinery.
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 45
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-06-06
    Description: Noise can induce various dynamical behaviors in nonlinear systems. White noise perturbed systems have been extensively investigated during the last decades. In gene networks, experimentally observed extrinsic noise is colored. As an attempt, we investigate the genetic toggle switch systems perturbed by colored extrinsic noise and with kinetic parameters. Compared with white noise perturbed systems, we show there also exists optimal colored noise strength to induce the best stochastic switch behaviors in the single toggle switch, and the best synchronized switching in the networked systems, which demonstrate that noise-induced optimal switch behaviors are widely in existence. Moreover, under a wide range of system parameter regions, we find there exist wider ranges of white and colored noises strengths to induce good switch and synchronization behaviors, respectively; therefore, white noise is beneficial for switch and colored noise is beneficial for population synchronization. Our observations are very robust to extrinsic stimulus strength, cell density, and diffusion rate. Finally, based on the Waddington’s epigenetic landscape and the Wiener-Khintchine theorem, physical mechanisms underlying the observations are interpreted. Our investigations can provide guidelines for experimental design, and have potential clinical implications in gene therapy and synthetic biology.
    Print ISSN: 1545-5963
    Electronic ISSN: 1557-9964
    Topics: Biology , Computer Science
    Published by Institute of Electrical and Electronics Engineers (IEEE) on behalf of The IEEE Computational Intelligence Society ; The IEEE Computer Society ; The IEEE Control Systems Society ; The IEEE Engineering in Medicine and Biology Society ; The Association for Computing Machinery.
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 46
    Publication Date: 2015-06-06
    Description: Revealing the underlying evolutionary mechanism plays an important role in understanding protein interaction networks in the cell. While many evolutionary models have been proposed, the problem about applying these models to real network data, especially for differentiating which model can better describe evolutionary process for the observed network remains a challenge. The traditional way is to use a model with presumed parameters to generate a network, and then evaluate the fitness by summary statistics, which however cannot capture the complete network structures information and estimate parameter distribution. In this work, we developed a novel method based on Approximate Bayesian Computation and modified Differential Evolution algorithm (ABC-DEP) that is capable of conducting model selection and parameter estimation simultaneously and detecting the underlying evolutionary mechanisms for PPI networks more accurately. We tested our method for its power in differentiating models and estimating parameters on simulated data and found significant improvement in performance benchmark, as compared with a previous method. We further applied our method to real data of protein interaction networks in human and yeast. Our results show duplication attachment model as the predominant evolutionary mechanism for human PPI networks and Scale-Free model as the predominant mechanism for yeast PPI networks.
    Print ISSN: 1545-5963
    Electronic ISSN: 1557-9964
    Topics: Biology , Computer Science
    Published by Institute of Electrical and Electronics Engineers (IEEE) on behalf of The IEEE Computational Intelligence Society ; The IEEE Computer Society ; The IEEE Control Systems Society ; The IEEE Engineering in Medicine and Biology Society ; The Association for Computing Machinery.
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 47
    Publication Date: 2015-06-06
    Description: Single nucleotide polymorphisms, a dominant type of genetic variants, have been used successfully to identify defective genes causing human single gene diseases. However, most common human diseases are complex diseases and caused by gene-gene and gene-environment interactions. Many SNP-SNP interaction analysis methods have been introduced but they are not powerful enough to discover interactions more than three SNPs. The paper proposes a novel method that analyzes all SNPs simultaneously. Different from existing methods, the method regards an individual’s genotype data on a list of SNPs as a point with a unit of energy in a multi-dimensional space, and tries to find a new coordinate system where the energy distribution difference between cases and controls reaches the maximum. The method will find different multiple SNPs combinatorial patterns between cases and controls based on the new coordinate system. The experiment on simulated data shows that the method is efficient. The tests on the real data of age-related macular degeneration (AMD) disease show that it can find out more significant multi-SNP combinatorial patterns than existing methods.
    Print ISSN: 1545-5963
    Electronic ISSN: 1557-9964
    Topics: Biology , Computer Science
    Published by Institute of Electrical and Electronics Engineers (IEEE) on behalf of The IEEE Computational Intelligence Society ; The IEEE Computer Society ; The IEEE Control Systems Society ; The IEEE Engineering in Medicine and Biology Society ; The Association for Computing Machinery.
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 48
    Publication Date: 2015-06-06
    Description: Disulfide connectivity is an important protein structural characteristic. Accurately predicting disulfide connectivity solely from protein sequence helps to improve the intrinsic understanding of protein structure and function, especially in the post-genome era where large volume of sequenced proteins without being functional annotated is quickly accumulated. In this study, a new feature extracted from the predicted protein 3D structural information is proposed and integrated with traditional features to form discriminative features. Based on the extracted features, a random forest regression model is performed to predict protein disulfide connectivity. We compare the proposed method with popular existing predictors by performing both cross-validation and independent validation tests on benchmark datasets. The experimental results demonstrate the superiority of the proposed method over existing predictors. We believe the superiority of the proposed method benefits from both the good discriminative capability of the newly developed features and the powerful modelling capability of the random forest. The web server implementation, called TargetDisulfide, and the benchmark datasets are freely available at: http://csbio.njust.edu.cn/bioinf/TargetDisulfide for academic use.
    Print ISSN: 1545-5963
    Electronic ISSN: 1557-9964
    Topics: Biology , Computer Science
    Published by Institute of Electrical and Electronics Engineers (IEEE) on behalf of The IEEE Computational Intelligence Society ; The IEEE Computer Society ; The IEEE Control Systems Society ; The IEEE Engineering in Medicine and Biology Society ; The Association for Computing Machinery.
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 49
    Publication Date: 2015-06-06
    Description: Proteins are molecules that form the mass of living beings. These proteins exist in dissociated forms like amino-acids and carry out various biological functions, in fact, almost all body reactions occur with the participation of proteins. This is one of the reasons why the analysis of proteins has become a major issue in biology. In a more concrete way, the identification of conserved patterns in a set of related protein sequences can provide relevant biological information about these protein functions. In this paper, we present a novel algorithm based on teaching learning based optimization (TLBO) combined with a local search function specialized to predict common patterns in sets of protein sequences. This population-based evolutionary algorithm defines a group of individuals (solutions) that enhance their knowledge (quality) by means of different learning stages. Thus, if we correctly adapt it to the biological context of the mentioned problem, we can get an acceptable set of quality solutions. To evaluate the performance of the proposed technique, we have used six instances composed of different related protein sequences obtained from the PROSITE database. As we will see, the designed approach makes good predictions and improves the quality of the solutions found by other well-known biological tools.
    Print ISSN: 1545-5963
    Electronic ISSN: 1557-9964
    Topics: Biology , Computer Science
    Published by Institute of Electrical and Electronics Engineers (IEEE) on behalf of The IEEE Computational Intelligence Society ; The IEEE Computer Society ; The IEEE Control Systems Society ; The IEEE Engineering in Medicine and Biology Society ; The Association for Computing Machinery.
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 50
    Publication Date: 2015-06-06
    Description: We introduce Supervised Variational Relevance Learning (Suvrel), a variational method to determine metric tensors to define distance based similarity in pattern classification, inspired in relevance learning. The variational method is applied to a cost function that penalizes large intraclass distances and favors small interclass distances. We find analytically the metric tensor that minimizes the cost function. Preprocessing the patterns by doing linear transformations using the metric tensor yields a dataset which can be more efficiently classified. We test our methods using publicly available datasets, for some standard classifiers. Among these datasets, two were tested by the MAQC-II project and, even without the use of further preprocessing, our results improve on their performance.
    Print ISSN: 1545-5963
    Electronic ISSN: 1557-9964
    Topics: Biology , Computer Science
    Published by Institute of Electrical and Electronics Engineers (IEEE) on behalf of The IEEE Computational Intelligence Society ; The IEEE Computer Society ; The IEEE Control Systems Society ; The IEEE Engineering in Medicine and Biology Society ; The Association for Computing Machinery.
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 51
    Publication Date: 2015-07-30
    Description: Background: Spirulina (Arthrospira) platensis is the only cyanobacterium that in addition to being studied at the molecular level and subjected to gene manipulation, can also be mass cultivated in outdoor ponds for commercial use as a food supplement. Thus, encountering environmental changes, including temperature stresses, is common during the mass production of Spirulina. The use of cyanobacteria as an experimental platform, especially for photosynthetic gene manipulation in plants and bacteria, is becoming increasingly important. Understanding the mechanisms and protein-protein interaction networks that underlie low- and high-temperature responses is relevant to Spirulina mass production. To accomplish this goal, high-throughput techniques such as OMICs analyses are used. Thus, large datasets must be collected, managed and subjected to information extraction. Therefore, databases including (i) proteomic analysis and protein-protein interaction (PPI) data and (ii) domain/motif visualization tools are required for potential use in temperature response models for plant chloroplasts and photosynthetic bacteria.DescriptionsA web-based repository was developed including an embedded database, SpirPro, and tools for network visualization. Proteome data were analyzed integrated with protein-protein interactions and/or metabolic pathways from KEGG. The repository provides various information, ranging from raw data (2D-gel images) to associated results, such as data from interaction and/or pathway analyses. This integration allows in silico analyses of protein-protein interactions affected at the metabolic level and, particularly, analyses of interactions between and within the affected metabolic pathways under temperature stresses for comparative proteomic analysis. The developed tool, which is coded in HTML with CSS/JavaScript and depicted in Scalable Vector Graphics (SVG), is designed for interactive analysis and exploration of the constructed network. SpirPro is publicly available on the web at http://spirpro.sbi.kmutt.ac.th. Conclusions: SpirPro is an analysis platform containing an integrated proteome and PPI database that provides the most comprehensive data on this cyanobacterium at the systematic level. As an integrated database, SpirPro can be applied in various analyses, such as temperature stress response networking analysis in cyanobacterial models and interacting domain-domain analysis between proteins of interest.
    Electronic ISSN: 1471-2105
    Topics: Biology , Computer Science
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 52
    Publication Date: 2015-07-30
    Description: Background: The knowledge of the spatial organisation of the chromatin fibre in cell nuclei helps researchers to understand the nuclear machinery that regulates dna activity. Recent experimental techniques of the type Chromosome Conformation Capture (3c, or similar) provide high-resolution, high-throughput data consisting in the number of times any possible pair of dna fragments is found to be in contact, in a certain population of cells. As these data carry information on the structure of the chromatin fibre, several attempts have been made to use them to obtain high-resolution 3d reconstructions of entire chromosomes, or even an entire genome. The techniques proposed treat the data in different ways, possibly exploiting physical-geometric chromatin models. One popular strategy is to transform contact data into Euclidean distances between pairs of fragments, and then solve a classical distance-to-geometry problem. Results: We developed and tested a reconstruction technique that does not require translating contacts into distances, thus avoiding a number of related drawbacks. Also, we introduce a geometrical chromatin chain model that allows us to include sound biochemical and biological constraints in the problem. This model can be scaled at different genomic resolutions, where the structures of the coarser models are influenced by the reconstructions at finer resolutions. The search in the solution space is then performed by a classical simulated annealing, where the model is evolved efficiently through quaternion operators. The presence of appropriate constraints permits the less reliable data to be overlooked, so the result is a set of plausible chromatin configurations compatible with both the data and the prior knowledge. Conclusions: To test our method, we obtained a number of 3d chromatin configurations from hi-c data available in the literature for the long arm of human chromosome 1, and validated their features against known properties of gene density and transcriptional activity. Our results are compatible with biological features not introduced a priori in the problem: structurally different regions in our reconstructions highly correlate with functionally different regions as known from literature and genomic repositories.
    Electronic ISSN: 1471-2105
    Topics: Biology , Computer Science
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 53
    Publication Date: 2015-08-07
    Description: Genes can participate in multiple biological processes at a time and thus their expression can be seen as a composition of the contributions from the active processes. Biclustering under a plaid assumption allows the modeling of interactions between transcriptional modules or biclusters (subsets of genes with coherence across subsets of conditions) by assuming an additive composition of contributions in their overlapping areas. Despite the biological interest of plaid models, few biclustering algorithms consider plaid effects and, when they do, they place restrictions on the allowed types and structures of biclusters, and suffer from robustness problems by seizing exact additive matchings. We propose BiP (Biclustering using Plaid models), a biclustering algorithm with relaxations to allow expression levels to change in overlapping areas according to biologically meaningful assumptions (weighted and noise-tolerant composition of contributions). BiP can be used over existing biclustering solutions (seizing their benefits) as it is able to recover excluded areas due to unaccounted plaid effects and detect noisy areas non-explained by a plaid assumption, thus producing an explanatory model of overlapping transcriptional activity. Experiments on synthetic data support BiP’s efficiency and effectiveness. The learned models from expression data unravel meaningful and non-trivial functional interactions between biological processes associated with putative regulatory modules.
    Print ISSN: 1545-5963
    Electronic ISSN: 1557-9964
    Topics: Biology , Computer Science
    Published by Institute of Electrical and Electronics Engineers (IEEE) on behalf of The IEEE Computational Intelligence Society ; The IEEE Computer Society ; The IEEE Control Systems Society ; The IEEE Engineering in Medicine and Biology Society ; The Association for Computing Machinery.
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 54
    Publication Date: 2015-08-07
    Description: We propose a classifier system called iPFPi that predicts the functions of un-annotated proteins. iPFPi assigns an un-annotated protein $P$ the functions of GO annotation terms that are semantically similar to $P$ . An un-annotated protein $P$ and a GO annotation term $T$ are represented by their characteristics. The characteristics of $P$ are GO terms found within the abstracts of biomedical literature associated with $P$ . The characteristics of $T$ are GO terms found within the abstracts of biomedical literature associated with the proteins annotated with the function of $T$ . Let - F$ and $Fprime $ be the important (dominant) sets of characteristic terms representing $T$ and $P$ , respectively. iPFPi would annotate $P$ with the function of $T$ , if $F$ and $Fprime $ are semantically similar. We constructed a novel semantic similarity measure that takes into consideration several factors, such as the dominance degree of each characteristic term $t$ in set $F$
    Print ISSN: 1545-5963
    Electronic ISSN: 1557-9964
    Topics: Biology , Computer Science
    Published by Institute of Electrical and Electronics Engineers (IEEE) on behalf of The IEEE Computational Intelligence Society ; The IEEE Computer Society ; The IEEE Control Systems Society ; The IEEE Engineering in Medicine and Biology Society ; The Association for Computing Machinery.
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 55
    Publication Date: 2015-08-07
    Description: Adverse drug reaction (ADR) is a common clinical problem, sometimes accompanying with high risk of mortality and morbidity. It is also one of the major factors that lead to failure in new drug development. Unfortunately, most of current experimental and computational methods are unable to evaluate clinical safety of drug candidates in early drug discovery stage due to the very limited knowledge of molecular mechanisms underlying ADRs. Therefore, in this study, we proposed a novel naïve Bayesian model for rapid assessment of clinical ADRs with frequency estimation. This model was constructed on a gene-ADR association network, which covered 611 US FDA approved drugs, 14,251 genes, and 1,254 distinct ADR terms. An average detection rate of 99.86 and 99.73 percent were achieved eventually in identification of known ADRs in internal test data set and external case analyses respectively. Moreover, a comparative analysis between the estimated frequencies of ADRs and their observed frequencies was undertaken. It is observed that these two frequencies have the similar distribution trend. These results suggest that the naïve Bayesian model based on gene-ADR association network can serve as an efficient and economic tool in rapid ADRs assessment.
    Print ISSN: 1545-5963
    Electronic ISSN: 1557-9964
    Topics: Biology , Computer Science
    Published by Institute of Electrical and Electronics Engineers (IEEE) on behalf of The IEEE Computational Intelligence Society ; The IEEE Computer Society ; The IEEE Control Systems Society ; The IEEE Engineering in Medicine and Biology Society ; The Association for Computing Machinery.
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 56
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-08-07
    Description: The papers in this special section were presented at the 13th International Workshop on Data Mining in Bioinformatics (BIOKDD???14) was organized in conjunction with the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining that was held on August 24, 2014 in New York, NY. It brought together international researchers in the interacting disciplines of data mining, systems biology, and bioinformatics at the Bloomberg Headquarters venue. The goal of this workshop is to encourage Knowledge Discovery and Data mining (KDD) researchers to take on the numerous challenges that Bioinformatics offers.
    Print ISSN: 1545-5963
    Electronic ISSN: 1557-9964
    Topics: Biology , Computer Science
    Published by Institute of Electrical and Electronics Engineers (IEEE) on behalf of The IEEE Computational Intelligence Society ; The IEEE Computer Society ; The IEEE Control Systems Society ; The IEEE Engineering in Medicine and Biology Society ; The Association for Computing Machinery.
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 57
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-08-07
    Description: The introduction of next-generation sequencing technologies has radically changed the way we view structural genetic events. Microhomology-mediated break-induced replication (MMBIR) is just one of the many mechanisms that can cause genomic destabilization that may lead to cancer. Although the mechanism for MMBIR remains unclear, it has been shown that MMBIR is typically associated with template-switching events. Currently, to our knowledge, there is no existing bioinformatics tool to detect these template-switching events. We have developed MMBIRFinder, a method that detects template-switching events associated with MMBIR from whole-genome sequenced data. MMBIRFinder uses a half-read alignment approach to identify potential regions of interest. Clustering of these potential regions helps narrow the search space to regions with strong evidence. Subsequent local alignments identify the template-switching events with single-nucleotide accuracy. Using simulated data, MMBIRFinder identified 83 percent of the MMBIR regions within a five nucleotide tolerance. Using real data, MMBIRFinder identified 16 MMBIR regions on a normal breast tissue data sample and 51 MMBIR regions on a triple-negative breast cancer tumor sample resulting in detection of 37 novel template-switching events. Finally, we identified template-switching events residing in the promoter region of seven genes that have been implicated in breast cancer. The program is freely available for download at https://github.com/msegar/MMBIRFinder.
    Print ISSN: 1545-5963
    Electronic ISSN: 1557-9964
    Topics: Biology , Computer Science
    Published by Institute of Electrical and Electronics Engineers (IEEE) on behalf of The IEEE Computational Intelligence Society ; The IEEE Computer Society ; The IEEE Control Systems Society ; The IEEE Engineering in Medicine and Biology Society ; The Association for Computing Machinery.
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 58
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-08-07
    Description: Compressing heterogeneous collections of trees is an open problem in computational phylogenetics. In a heterogeneous tree collection, each tree can contain a unique set of taxa. An ideal compression method would allow for the efficient archival of large tree collections and enable scientists to identify common evolutionary relationships over disparate analyses. In this paper, we extend TreeZip to compress heterogeneous collections of trees. TreeZip is the most efficient algorithm for compressing homogeneous tree collections. To the best of our knowledge, no other domain-based compression algorithm exists for large heterogeneous tree collections or enable their rapid analysis. Our experimental results indicate that TreeZip averages 89.03 percent (72.69 percent) space savings on unweighted (weighted) collections of trees when the level of heterogeneity in a collection is moderate. The organization of the TRZ file allows for efficient computations over heterogeneous data. For example, consensus trees can be computed in mere seconds. Lastly, combining the TreeZip compressed (TRZ) file with general-purpose compression yields average space savings of 97.34 percent (81.43 percent) on unweighted (weighted) collections of trees. Our results lead us to believe that TreeZip will prove invaluable in the efficient archival of tree collections, and enables scientists to develop novel methods for relating heterogeneous collections of trees.
    Print ISSN: 1545-5963
    Electronic ISSN: 1557-9964
    Topics: Biology , Computer Science
    Published by Institute of Electrical and Electronics Engineers (IEEE) on behalf of The IEEE Computational Intelligence Society ; The IEEE Computer Society ; The IEEE Control Systems Society ; The IEEE Engineering in Medicine and Biology Society ; The Association for Computing Machinery.
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 59
    Publication Date: 2015-08-07
    Description: The papers in this special section were presented at the 2014 International Conference on Genome Informatics (GIW).
    Print ISSN: 1545-5963
    Electronic ISSN: 1557-9964
    Topics: Biology , Computer Science
    Published by Institute of Electrical and Electronics Engineers (IEEE) on behalf of The IEEE Computational Intelligence Society ; The IEEE Computer Society ; The IEEE Control Systems Society ; The IEEE Engineering in Medicine and Biology Society ; The Association for Computing Machinery.
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 60
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-08-07
    Description: Cluster analysis of biological networks is one of the most important approaches for identifying functional modules and predicting protein functions. Furthermore, visualization of clustering results is crucial to uncover the structure of biological networks. In this paper, ClusterViz, an APP of Cytoscape 3 for cluster analysis and visualization, has been developed. In order to reduce complexity and enable extendibility for ClusterViz, we designed the architecture of ClusterViz based on the framework of Open Services Gateway Initiative. According to the architecture, the implementation of ClusterViz is partitioned into three modules including interface of ClusterViz, clustering algorithms and visualization and export. ClusterViz fascinates the comparison of the results of different algorithms to do further related analysis. Three commonly used clustering algorithms, FAG-EC, EAGLE and MCODE, are included in the current version. Due to adopting the abstract interface of algorithms in module of the clustering algorithms, more clustering algorithms can be included for the future use. To illustrate usability of ClusterViz, we provided three examples with detailed steps from the important scientific articles, which show that our tool has helped several research teams do their research work on the mechanism of the biological networks.
    Print ISSN: 1545-5963
    Electronic ISSN: 1557-9964
    Topics: Biology , Computer Science
    Published by Institute of Electrical and Electronics Engineers (IEEE) on behalf of The IEEE Computational Intelligence Society ; The IEEE Computer Society ; The IEEE Control Systems Society ; The IEEE Engineering in Medicine and Biology Society ; The Association for Computing Machinery.
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 61
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-08-07
    Description: Structural domains are evolutionary and functional units of proteins and play a critical role in comparative and functional genomics. Computational assignment of domain function with high reliability is essential for understanding whole-protein functions. However, functional annotations are conventionally assigned onto full-length proteins rather than associating specific functions to the individual structural domains. In this article, we present Structural Domain Annotation (SDA), a novel computational approach to predict functions for SCOP structural domains. The SDA method integrates heterogeneous information sources, including structure alignment based protein-SCOP mapping features, InterPro2GO mapping information, PSSM Profiles, and sequence neighborhood features, with a Bayesian network. By large-scale annotating Gene Ontology terms to SCOP domains with SDA, we obtained a database of SCOP domain to Gene Ontology mappings, which contains $sim$ 162,000 out of the approximately 166,900 domains in SCOPe 2.03 ( $>$ 97 percent) and their predicted Gene Ontology functions. We have benchmarked SDA using a single-domain protein dataset and an independent dataset from different species. Comparative studies show that SDA significantly outperforms the existing function prediction methods for structural domains in terms of coverage and maximum F-measure.
    Print ISSN: 1545-5963
    Electronic ISSN: 1557-9964
    Topics: Biology , Computer Science
    Published by Institute of Electrical and Electronics Engineers (IEEE) on behalf of The IEEE Computational Intelligence Society ; The IEEE Computer Society ; The IEEE Control Systems Society ; The IEEE Engineering in Medicine and Biology Society ; The Association for Computing Machinery.
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 62
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-08-07
    Description: A major challenge in computational biology is to find simple representations of high-dimensional data that best reveal the underlying structure. In this work, we present an intuitive and easy-to-implement method based on ranked neighborhood comparisons that detects structure in unsupervised data. The method is based on ordering objects in terms of similarity and on the mutual overlap of nearest neighbors. This basic framework was originally introduced in the field of social network analysis to detect actor communities. We demonstrate that the same ideas can successfully be applied to biomedical data sets in order to reveal complex underlying structure. The algorithm is very efficient and works on distance data directly without requiring a vectorial embedding of data. Comprehensive experiments demonstrate the validity of this approach. Comparisons with state-of-the-art clustering methods show that the presented method outperforms hierarchical methods as well as density based clustering methods and model-based clustering. A further advantage of the method is that it simultaneously provides a visualization of the data. Especially in biomedical applications, the visualization of data can be used as a first pre-processing step when analyzing real world data sets to get an intuition of the underlying data structure. We apply this model to synthetic data as well as to various biomedical data sets which demonstrate the high quality and usefulness of the inferred structure.
    Print ISSN: 1545-5963
    Electronic ISSN: 1557-9964
    Topics: Biology , Computer Science
    Published by Institute of Electrical and Electronics Engineers (IEEE) on behalf of The IEEE Computational Intelligence Society ; The IEEE Computer Society ; The IEEE Control Systems Society ; The IEEE Engineering in Medicine and Biology Society ; The Association for Computing Machinery.
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 63
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-08-07
    Description: Proline residues are common source of kinetic complications during folding. The X-Pro peptide bond is the only peptide bond for which the stability of the cis and trans conformations is comparable. The cis-trans isomerization (CTI) of X-Pro peptide bonds is a widely recognized rate-limiting factor, which can not only induces additional slow phases in protein folding but also modifies the millisecond and sub-millisecond dynamics of the protein. An accurate computational prediction of proline CTI is of great importance for the understanding of protein folding, splicing, cell signaling, and transmembrane active transport in both the human body and animals. In our earlier work, we successfully developed a biophysically motivated proline CTI predictor utilizing a novel tree-based consensus model with a powerful metalearning technique and achieved 86.58 percent Q2 accuracy and 0.74 Mcc, which is a better result than the results (70-73 percent Q2 accuracies) reported in the literature on the well-referenced benchmark dataset. In this paper, we describe experiments with novel randomized subspace learning and bootstrap seeding techniques as an extension to our earlier work, the consensus models as well as entropy-based learning methods, to obtain better accuracy through a precise and robust learning scheme for proline CTI prediction.
    Print ISSN: 1545-5963
    Electronic ISSN: 1557-9964
    Topics: Biology , Computer Science
    Published by Institute of Electrical and Electronics Engineers (IEEE) on behalf of The IEEE Computational Intelligence Society ; The IEEE Computer Society ; The IEEE Control Systems Society ; The IEEE Engineering in Medicine and Biology Society ; The Association for Computing Machinery.
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 64
    Publication Date: 2015-08-07
    Description: Efficient search algorithms for finding genomic-range overlaps are essential for various bioinformatics applications. A majority of fast algorithms for searching the overlaps between a query range (e.g., a genomic variant) and a set of N reference ranges (e.g., exons) has time complexity of O ( k + log N ), where k denotes a term related to the length and location of the reference ranges. Here, we present a simple but efficient algorithm that reduces k, based on the maximum reference range length. Specifically, for a given query range and the maximum reference range length, the proposed method divides the reference range set into three subsets: always , potentially , and never overlapping . Therefore, search effort can be reduced by excluding never overlapping subset. We demonstrate that the running time of the proposed algorithm is proportional to potentially overlapping subset size, that is proportional to the maximum reference range length if all the other conditions are the same. Moreover, an implementation of our algorithm was 13.8 to 30.0 percent faster than one of the fastest range search methods available when tested on various genomic-range data sets. The proposed algorithm has been incorporated into a disease-linked variant prioritization pipeline for WGS (http://gnome.tchlab.org) and its implementation is available at http://ml.ssu.ac.kr/gSearch.
    Print ISSN: 1545-5963
    Electronic ISSN: 1557-9964
    Topics: Biology , Computer Science
    Published by Institute of Electrical and Electronics Engineers (IEEE) on behalf of The IEEE Computational Intelligence Society ; The IEEE Computer Society ; The IEEE Control Systems Society ; The IEEE Engineering in Medicine and Biology Society ; The Association for Computing Machinery.
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 65
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-08-07
    Description: The identification of protein complexes in protein-protein interaction (PPI) networks is fundamental for understanding biological processes and cellular molecular mechanisms. Many graph computational algorithms have been proposed to identify protein complexes from PPI networks by detecting densely connected groups of proteins. These algorithms assess the density of subgraphs through evaluation of the sum of individual edges or nodes; thus, incomplete and inaccurate measures may miss meaningful biological protein complexes with functional significance. In this study, we propose a novel method for assessing the compactness of local subnetworks by measuring the number of three node cliques. The present method detects each optimal cluster by growing a seed and maximizing the compactness function. To demonstrate the efficacy of the new proposed method, we evaluate its performance using five PPI networks on three reference sets of yeast protein complexes with five different measurements and compare the performance of the proposed method with four state-of-the-art methods. The results show that the protein complexes generated by the proposed method are of better quality than those generated by four classic methods. Therefore, the new proposed method is effective and useful for detecting protein complexes in PPI networks.
    Print ISSN: 1545-5963
    Electronic ISSN: 1557-9964
    Topics: Biology , Computer Science
    Published by Institute of Electrical and Electronics Engineers (IEEE) on behalf of The IEEE Computational Intelligence Society ; The IEEE Computer Society ; The IEEE Control Systems Society ; The IEEE Engineering in Medicine and Biology Society ; The Association for Computing Machinery.
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 66
    Publication Date: 2015-08-08
    Description: Background: The traditional method used to estimate tree biomass is allometry. In this method, models are tested and equations fitted by regression usually applying ordinary least squares, though other analogous methods are also used for this purpose. Due to the nature of tree biomass data, the assumptions of regression are not always accomplished, bringing uncertainties to the inferences. This article demonstrates that the Data Mining (DM) technique can be used as an alternative to traditional regression approach to estimate tree biomass in the Atlantic Forest, providing better results than allometry, and demonstrating simplicity, versatility and flexibility to apply to a wide range of conditions. Results: Various DM approaches were examined regarding distance, number of neighbors and weighting, by using 180 trees coming from environmental restoration plantations in the Atlantic Forest biome. The best results were attained using the Chebishev distance, 1/d weighting and 5 neighbors. Increasing number of neighbors did not improve estimates. We also analyze the effect of the size of data set and number of variables in the results. The complete data set and the maximum number of predicting variables provided the best fitting. We compare DM to Schumacher-Hall model and the results showed a gain of up to 16.5 % in reduction of the standard error of estimate. Conclusion: It was concluded that Data Mining can provide accurate estimates of tree biomass and can be successfully used for this purpose in environmental restoration plantations in the Atlantic Forest. This technique provides lower standard error of estimate than the Schumacher-Hall model and has the advantage of not requiring some statistical assumptions as do the regression models. Flexibility, versatility and simplicity are attributes of DM that corroborates its great potential for similar applications.
    Electronic ISSN: 1471-2105
    Topics: Biology , Computer Science
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 67
    Publication Date: 2015-08-08
    Description: Background: Motivated by the general need to identify and classify species based on molecular evidence, genome comparisons have been proposed that are based on measuring mostly Euclidean distances between Chaos Game Representation (CGR) patterns of genomic DNA sequences. Results: We provide, on an extensive dataset and using several different distances, confirmation of the hypothesis that CGR patterns are preserved along a genomic DNA sequence, and are different for DNA sequences originating from genomes of different species. This finding lends support to the theory that CGRs of genomic sequences can act as graphic genomic signatures. In particular, we compare the CGR patterns of over five hundred different 150,000 bp genomic sequences spanning one complete chromosome from each of six organisms, representing all kingdoms of life: H. sapiens (Animalia; chromosome 21), S. cerevisiae (Fungi; chromosome 4), A. thaliana (Plantae; chromosome 1), P. falciparum (Protista; chromosome 14), E. coli (Bacteria - full genome), and P. furiosus (Archaea - full genome). To maximize the diversity within each species, we also analyze the interrelationships within a set of over five hundred 150,000 bp genomic sequences sampled from the entire aforementioned genomes. Lastly, we provide some preliminary evidence of this method’s ability to classify genomic DNA sequences at lower taxonomic levels by comparing sequences sampled from the entire genome of H. sapiens (class Mammalia, order Primates) and of M. musculus (class Mammalia, order Rodentia), for a total length of approximately 174 million basepairs analyzed. We compute pairwise distances between CGRs of these genomic sequences using six different distances, and construct Molecular Distance Maps, which visualize all sequences as points in a two-dimensional or three-dimensional space, to simultaneously display their interrelationships. Conclusion: Our analysis confirms, for this dataset, that CGR patterns of DNA sequences from the same genome are in general quantitatively similar, while being different for DNA sequences from genomes of different species. Our assessment of the performance of the six distances analyzed uses three different quality measures and suggests that several distances outperform the Euclidean distance, which has so far been almost exclusively used for such studies.
    Electronic ISSN: 1471-2105
    Topics: Biology , Computer Science
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 68
    Publication Date: 2015-08-08
    Description: Background: Next-generation sequencing (NGS) has greatly facilitated metagenomic analysis but also raised new challenges for metagenomic DNA sequence assembly, owing to its high-throughput nature and extremely short reads generated by sequencers such as Illumina. To date, how to generate a high-quality draft assembly for metagenomic sequencing projects has not been fully addressed. Results: We conducted a comprehensive assessment on state-of-the-art de novo assemblers and revealed that the performance of each assembler depends critically on the sequencing depth. To address this problem, we developed a pipeline named InteMAP to integrate three assemblers, ABySS, IDBA-UD and CABOG, which were found to complement each other in assembling metagenomic sequences. Making a decision of which assembling approaches to use according to the sequencing coverage estimation algorithm for each short read, the pipeline presents an automatic platform suitable to assemble real metagenomic NGS data with uneven coverage distribution of sequencing depth. By comparing the performance of InteMAP with current assemblers on both synthetic and real NGS metagenomic data, we demonstrated that InteMAP achieves better performance with a longer total contig length and higher contiguity, and contains more genes than others. Conclusions: We developed a de novo pipeline, named InteMAP, that integrates existing tools for metagenomics assembly. The pipeline outperforms previous assembly methods on metagenomic assembly by providing a longer total contig length, a higher contiguity and covering more genes. InteMAP, therefore, could potentially be a useful tool for the research community of metagenomics.
    Electronic ISSN: 1471-2105
    Topics: Biology , Computer Science
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 69
    Publication Date: 2015-08-12
    Description: Background: Conventional pairwise sequence comparison software algorithms are being used to process much larger datasets than they were originally designed for. This can result in processing bottlenecks that limit software capabilities or prevent full use of the available hardware resources. Overcoming the barriers that limit the efficient computational analysis of large biological sequence datasets by retrofitting existing algorithms or by creating new applications represents a major challenge for the bioinformatics community. Results: We have developed C libraries for pairwise sequence comparison within diverse architectures, ranging from commodity systems to high performance and cloud computing environments. Exhaustive tests were performed using different datasets of closely- and distantly-related sequences that span from small viral genomes to large mammalian chromosomes. The tests demonstrated that our solution is capable of generating high quality results with a linear-time response and controlled memory consumption, being comparable or faster than the current state-of-the-art methods. Conclusions: We have addressed the problem of pairwise and all-versus-all comparison of large sequences in general, greatly increasing the limits on input data size. The approach described here is based on a modular out-of-core strategy that uses secondary storage to avoid reaching memory limits during the identification of High-scoring Segment Pairs (HSPs) between the sequences under comparison. Software engineering concepts were applied to avoid intermediate result re-calculation, to minimise the performance impact of input/output (I/O) operations and to modularise the process, thus enhancing application flexibility and extendibility. Our computationally-efficient approach allows tasks such as the massive comparison of complete genomes, evolutionary event detection, the identification of conserved synteny blocks and inter-genome distance calculations to be performed more effectively.
    Electronic ISSN: 1471-2105
    Topics: Biology , Computer Science
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 70
    Publication Date: 2015-08-15
    Description: Background: Selective pressures at the DNA level shape genes into profiles consisting of patterns of rapidly evolving sites and sites withstanding change. These profiles remain detectable even when protein sequences become extensively diverged. A common task in molecular biology is to infer functional, structural or evolutionary relationships by querying a database using an algorithm. However, problems arise when sequence similarity is low. This study presents an algorithm that uses the evolutionary rate at codon sites, the dN/dS (ω) parameter, coupled to a substitution matrix as an alignment metric for detecting distantly related proteins. The algorithm, called BLOSUM-FIRE couples a newer and improved version of the original FIRE ( F unctional I nference using R ates of E volution) algorithm with an amino acid substitution matrix in a dynamic scoring function. The enigmatic hepatitis B virus X protein was used as a test case for BLOSUM-FIRE and its associated database EvoDB. Results: The evolutionary rate based approach was coupled with a conventional BLOSUM substitution matrix. The two approaches are combined in a dynamic scoring function, which uses the selective pressure to score aligned residues. The dynamic scoring function is based on a coupled additive approach that scores aligned sites based on the level of conservation inferred from the ω values. Evaluation of the accuracy of this new implementation, BLOSUM-FIRE, using MAFFT alignment as reference alignments has shown that it is more accurate than its predecessor FIRE. Comparison of the alignment quality with widely used algorithms (MUSCLE, T-COFFEE, and CLUSTAL Omega) revealed that the BLOSUM-FIRE algorithm performs as well as conventional algorithms. Its main strength lies in that it provides greater potential for aligning divergent sequences and addresses the problem of low specificity inherent in the original FIRE algorithm. The utility of this algorithm is demonstrated using the Hepatitis B virus X (HBx) protein, a protein of unknown function, as a test case. Conclusion: This study describes the utility of an evolutionary rate based approach coupled to the BLOSUM62 amino acid substitution matrix in inferring protein domain function. We demonstrate that such an approach is robust and performs as well as an array of conventional algorithms.
    Electronic ISSN: 1471-2105
    Topics: Biology , Computer Science
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 71
    Publication Date: 2015-08-11
    Description: A commonly studied means of parameterizing graph problems is the deletion distance from triviality (Guo et al., Parameterized and exact computation, Springer, Berlin, pp. 162–173, 2004 ), which counts vertices that need to be deleted from a graph to place it in some class for which efficient algorithms are known. In the context of graph isomorphism, we define triviality to mean a graph with maximum degree bounded by a constant, as such graph classes admit polynomial-time isomorphism tests. We generalise deletion distance to a measure we call elimination distance to triviality, based on elimination trees or tree-depth decompositions. We establish that graph canonisation, and thus graph isomorphism, is \(\mathsf {FPT}\) when parameterized by elimination distance to bounded degree, extending results of Bouland et al. (Parameterized and exact computation, Springer, Berlin, pp. 218–230, 2012 ).
    Print ISSN: 0178-4617
    Electronic ISSN: 1432-0541
    Topics: Computer Science , Mathematics
    Published by Springer
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 72
    facet.materialart.
    Unknown
    Springer
    Publication Date: 2015-08-11
    Description: Recent advances in drift analysis have given us better and better tools for understanding random processes, including the run time of randomized search heuristics. In the setting of multiplicative drift we do not only have excellent bounds on the expected run time, but also more general results showing the strong concentration of the run time. In this paper we investigate the setting of additive drift under the assumption of strong concentration of the “step size” of the process. Under sufficiently strong drift towards the goal we show a strong concentration of the hitting time. In contrast to this, we show that in the presence of small drift a Gambler’s-Ruin-like behavior of the process overrides the influence of the drift, leading to a maximal movement of about \(\sqrt{t}\) steps within t iterations. Finally, in the presence of sufficiently strong negative drift the hitting time is superpolynomial with high probability; this corresponds to the well-known Negative Drift Theorem.
    Print ISSN: 0178-4617
    Electronic ISSN: 1432-0541
    Topics: Computer Science , Mathematics
    Published by Springer
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 73
    Publication Date: 2015-08-15
    Description: Background: In structural bioinformatics, there is an increasing interest in identifying and understanding the evolution of local protein structures regarded as key structural or functional protein building blocks. A central need is then to compare these, possibly short, fragments by measuring efficiently and accurately their (dis)similarity. Progress towards this goal has given rise to scores enabling to assess the strong similarity of fragments. Yet, there is still a lack of more progressive scores, with meaningful intermediate values, for the comparison, retrieval or clustering of distantly related fragments. Results: We introduce here the Amplitude Spectrum Distance (ASD), a novel way of comparing protein fragments based on the discrete Fourier transform of their C α distance matrix. Defined as the distance between their amplitude spectra, ASD can be computed efficiently and provides a parameter-free measure of the global shape dissimilarity of two fragments. ASD inherits from nice theoretical properties, making it tolerant to shifts, insertions, deletions, circular permutations or sequence reversals while satisfying the triangle inequality. The practical interest of ASD with respect to RMSD, RMSD d , BC and TM scores is illustrated through zinc finger retrieval experiments and concrete structure examples. The benefits of ASD are also illustrated by two additional clustering experiments: domain linkers fragments and complementarity-determining regions of antibodies. Conclusions: Taking advantage of the Fourier transform to compare fragments at a global shape level, ASD is an objective and progressive measure taking into account the whole fragments. Its practical computation time and its properties make ASD particularly relevant for applications requiring meaningful measures on distantly related protein fragments, such as similar fragments retrieval asking for high recalls as shown in the experiments, or for any application taking also advantage of triangle inequality, such as fragments clustering.ASD program and source code are freely available at: http://www.irisa.fr/dyliss/public/ASD/.
    Electronic ISSN: 1471-2105
    Topics: Biology , Computer Science
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 74
    Publication Date: 2015-08-17
    Description: Background: Identifying periodically expressed genes across different processes (e.g. the cell and metabolic cycles, circadian rhythms, etc) is a central problem in computational biology. Biological time series may contain (multiple) unknown signal shapes of systemic relevance, imperfections like noise, damping, and trending, or limited sampling density. While there exist methods for detecting periodicity, their design biases (e.g. toward a specific signal shape) can limit their applicability in one or more of these situations. Methods: We present in this paper a novel method, SW1PerS, for quantifying periodicity in time series in a shape-agnostic manner and with resistance to damping. The measurement is performed directly, without presupposing a particular pattern, by evaluating the circularity of a high-dimensional representation of the signal. SW1PerS is compared to other algorithms using synthetic data and performance is quantified under varying noise models, noise levels, sampling densities, and signal shapes. Results on biological data are also analyzed and compared. Results: On the task of periodic/not-periodic classification, using synthetic data, SW1PerS outperforms all other algorithms in the low-noise regime. SW1PerS is shown to be the most shape-agnostic of the evaluated methods, and the only one to consistently classify damped signals as highly periodic. On biological data, and for several experiments, the lists of top 10% genes ranked with SW1PerS recover up to 67% of those generated with other popular algorithms. Moreover, the list of genes from data on the Yeast metabolic cycle which are highly-ranked only by SW1PerS, contains evidently non-cosine patterns (e.g. ECM33, CDC9, SAM1,2 and MSH6) with highly periodic expression profiles. In data from the Yeast cell cycle SW1PerS identifies genes not preferred by other algorithms, hence not previously reported as periodic, but found in other experiments such as the universal growth rate response of Slavov. These genes are BOP3, CDC10, YIL108W, YER034W, MLP1, PAC2 and RTT101. Conclusions: In biological systems with low noise, i.e. where periodic signals with interesting shapes are more likely to occur, SW1PerS can be used as a powerful tool in exploratory analyses. Indeed, by having an initial set of periodic genes with a rich variety of signal types, pattern/shape information can be included in the study of systems and the generation of hypotheses regarding the structure of gene regulatory networks.
    Electronic ISSN: 1471-2105
    Topics: Biology , Computer Science
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 75
    Publication Date: 2015-09-15
    Description: Binets and trinets are phylogenetic networks with two and three leaves, respectively. Here we consider the problem of deciding if there exists a binary level-1 phylogenetic network displaying a given set  \(\mathbb {T}\) of binary binets or trinets over a taxon set  X , and constructing such a network whenever it exists. We show that this is NP-hard for trinets but polynomial-time solvable for binets. Moreover, we show that the problem is still polynomial-time solvable for inputs consisting of binets and trinets as long as the cycles in the trinets have size three. Finally, we present an  \(O(3^{|X|} poly(|X|))\) time algorithm for general sets of binets and trinets. The latter two algorithms generalise to instances containing level-1 networks with arbitrarily many leaves, and thus provide some of the first supernetwork algorithms for computing networks from a set of rooted phylogenetic networks.
    Print ISSN: 0178-4617
    Electronic ISSN: 1432-0541
    Topics: Computer Science , Mathematics
    Published by Springer
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 76
    facet.materialart.
    Unknown
    Springer
    Publication Date: 2015-09-17
    Description: We consider stochastic versions of OneMax and LeadingOnes and analyze the performance of evolutionary algorithms with and without populations on these problems. It is known that the ( \(1+1\) ) EA on OneMax performs well in the presence of very small noise, but poorly for higher noise levels. We extend these results to LeadingOnes and to many different noise models, showing how the application of drift theory can significantly simplify and generalize previous analyses. Most surprisingly, even small populations (of size \(\varTheta (\log n)\) ) can make evolutionary algorithms perform well for high noise levels, well outside the abilities of the ( \(1+1\) ) EA. Larger population sizes are even more beneficial; we consider both parent and offspring populations. In this sense, populations are robust in these stochastic settings.
    Print ISSN: 0178-4617
    Electronic ISSN: 1432-0541
    Topics: Computer Science , Mathematics
    Published by Springer
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 77
    facet.materialart.
    Unknown
    Springer
    Publication Date: 2015-09-19
    Description: Since Tinhofer proposed the MinGreedy algorithm for maximum cardinality matching in 1984, several experimental studies found the randomized algorithm to perform excellently for various classes of random graphs and benchmark instances. In contrast, only few analytical results are known. We show that MinGreedy cannot improve on the trivial approximation ratio of  \(\frac{1}{2}\) whp., even for bipartite graphs. Our hard inputs seem to require a small number of high-degree nodes. This motivates an investigation of greedy algorithms on graphs with maximum degree  \(\varDelta \) : we show that  MinGreedy achieves a  \({\frac{{\varDelta }-1}{2{\varDelta }-3}} \) -approximation for graphs with  \({\varDelta } {=} 3\) and for \(\varDelta \) -regular graphs, and a guarantee of  \({\frac{{\varDelta }-1/2}{2{\varDelta }-2}} \) for graphs with maximum degree  \({\varDelta } \) . Interestingly, our bounds even hold for the deterministic MinGreedy that breaks all ties arbitrarily. Moreover, we investigate the limitations of the greedy paradigm, using the model of priority algorithms introduced by Borodin, Nielsen, and Rackoff. We study deterministic priority algorithms and prove a  \({\frac{{\varDelta }-1}{2{\varDelta }-3}}\) -inapproximability result for graphs with maximum degree  \({\varDelta } \) ; thus, these greedy algorithms do not achieve a  \(\frac{1}{2} {+} \varepsilon \) -approximation and in particular the  \(\frac{2}{3}\) -approximation obtained by the deterministic MinGreedy for  \({\varDelta } {=} 3\) is optimal in this class. For  k -uniform hypergraphs we show a tight  \(\frac{1}{k}\) -inapproximability bound. We also study fully randomized priority algorithms and give a  \(\frac{5}{6}\) -inapproximability bound. Thus, they cannot compete with matching algorithms of other paradigms.
    Print ISSN: 0178-4617
    Electronic ISSN: 1432-0541
    Topics: Computer Science , Mathematics
    Published by Springer
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 78
    facet.materialart.
    Unknown
    Springer
    Publication Date: 2015-09-19
    Description: We discuss how string sorting algorithms can be parallelized on modern multi-core shared memory machines. As a synthesis of the best sequential string sorting algorithms and successful parallel sorting algorithms for atomic objects, we first propose string sample sort. The algorithm makes effective use of the memory hierarchy, uses additional word level parallelism, and largely avoids branch mispredictions. Then we focus on NUMA architectures, and develop parallel multiway LCP-merge and -mergesort to reduce the number of random memory accesses to remote nodes. Additionally, we parallelize variants of multikey quicksort and radix sort that are also useful in certain situations. As base-case sorter for LCP-aware string sorting we describe sequential LCP-insertion sort which calculates the LCP array and accelerates its insertions using it. Comprehensive experiments on five current multi-core platforms are then reported and discussed. The experiments show that our parallel string sorting implementations scale very well on real-world inputs and modern machines.
    Print ISSN: 0178-4617
    Electronic ISSN: 1432-0541
    Topics: Computer Science , Mathematics
    Published by Springer
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 79
    Publication Date: 2015-09-19
    Description: Background: Two component systems (TCS) are signalling complexes manifested by a histidine kinase (receptor) and a response regulator (effector). They are the most abundant signalling pathways in prokaryotes and control a wide range of biological processes. The pairing of these two components is highly specific, often requiring costly and time-consuming experimental characterisation. Therefore, there is considerable interest in developing accurate prediction tools to lessen the burden of experimental work and cope with the ever-increasing amount of genomic information. Results: We present a novel meta-predictor, MetaPred2CS, which is based on a support vector machine. MetaPred2CS integrates six sequence-based prediction methods: in-silico two-hybrid, mirror-tree, gene fusion, phylogenetic profiling, gene neighbourhood, and gene operon. To benchmark MetaPred2CS, we also compiled a novel high-quality training dataset of experimentally deduced TCS protein pairs for k-fold cross validation, to act as a gold standard for TCS partnership predictions. Combining individual predictions using MetaPred2CS improved performance when compared to the individual methods and in comparison with a current state-of-the-art meta-predictor. Conclusion: We have developed MetaPred2CS, a support vector machine-based metapredictor for prokaryotic TCS protein pairings. Central to the success of MetaPred2CS is a strategy of integrating individual predictors that improves the overall prediction accuracy, with the in-silico two-hybrid method contributing most to performance. MetaPred2CS outperformed other available systems in our benchmark tests, and is available online at http://metapred2cs.ibers.aber.ac.uk, along with our gold standard dataset of TCS interaction pairs.
    Electronic ISSN: 1471-2105
    Topics: Biology , Computer Science
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 80
    Publication Date: 2015-09-19
    Description: Background: Technological advances have enabled the analysis of very small amounts of DNA in forensic cases. However, the DNA profiles from such evidence are frequently incomplete and can contain contributions from multiple individuals. The complexity of such samples confounds the assessment of the statistical weight of such evidence. One approach to account for this uncertainty is to use a likelihood ratio framework to compare the probability of the evidence profile under different scenarios. While researchers favor the likelihood ratio framework, few open-source software solutions with a graphical user interface implementing these calculations are available for practicing forensic scientists. Results: To address this need, we developed Lab Retriever, an open-source, freely available program that forensic scientists can use to calculate likelihood ratios for complex DNA profiles. Lab Retriever adds a graphical user interface, written primarily in JavaScript, on top of a C++ implementation of the previously published R code of Balding. We redesigned parts of the original Balding algorithm to improve computational speed. In addition to incorporating a probability of allelic drop-out and other critical parameters, Lab Retriever computes likelihood ratios for hypotheses that can include up to four unknown contributors to a mixed sample. These computations are completed nearly instantaneously on a modern PC or Mac computer. Conclusions: Lab Retriever provides a practical software solution to forensic scientists who wish to assess the statistical weight of evidence for complex DNA profiles. Executable versions of the program are freely available for Mac OSX and Windows operating systems.
    Electronic ISSN: 1471-2105
    Topics: Biology , Computer Science
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 81
    Publication Date: 2015-09-24
    Description: Background: One of the most important application spectrums of transcriptomic data is cancer phenotype classification. Many characteristics of transcriptomic data, such as redundant features and technical artifacts, make over-fitting commonplace. Promising classification results often fail to generalize across datasets with different sources, platforms, or preprocessing. Recently a novel differential network rank conservation (DIRAC) algorithm to characterize cancer phenotypes using transcriptomic data. DIRAC is a member of a family of algorithms that have shown useful for disease classification based on the relative expression of genes. Combining the robustness of this family’s simple decision rules with known biological relationships, this systems approach identifies interpretable, yet highly discriminate networks. While DIRAC has been briefly employed for several classification problems in the original paper, the potentials of DIRAC in cancer phenotype classification, and especially robustness against artifacts in transcriptomic data have not been fully characterized yet. Results: In this study we thoroughly investigate the potentials of DIRAC by applying it to multiple datasets, and examine the variations in classification performances when datasets are (i) treated and untreated for batch effect; (ii) preprocessed with different techniques. We also propose the first DIRAC-based classifier to integrate multiple networks. We show that the DIRAC-based classifier is very robust in the examined scenarios. To our surprise, the trained DIRAC-based classifier even translated well to a dataset with different biological characteristics in the presence of substantial batch effects that, as shown here, plagued the standard expression value based classifier. In addition, the DIRAC-based classifier, because of the integrated biological information, also suggests pathways to target in specific subtypes, which may enhance the establishment of personalized therapy in diseases such as pediatric AML. In order to better comprehend the prediction power of the DIRAC-based classifier in general, we also performed classifications using publicly available datasets from breast and lung cancer. Furthermore, multiple well-known classification algorithms were utilized to create an ideal test bed for comparing the DIRAC-based classifier with the standard gene expression value based classifier. We observed that the DIRAC-based classifier greatly outperforms its rival. Conclusions: Based on our experiments with multiple datasets, we propose that DIRAC is a promising solution to the lack of generalizability in classification efforts that uses transcriptomic data. We believe that superior performances presented in this study may motivate other to initiate a new aline of research to explore the untapped power of DIRAC in a broad range of cancer types.
    Electronic ISSN: 1471-2105
    Topics: Biology , Computer Science
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 82
    Publication Date: 2015-09-25
    Description: Background: Searching for two-dimensional (2D) structural similarities is a useful tool to identify new active compounds in drug-discovery programs. However, as 2D similarity measures neglect important structural and functional features, similarity by 2D might be underestimated. In the present study, we used combined 2D and three-dimensional (3D) similarity comparisons to reveal possible new functions and/or side-effects of known bioactive compounds. Results: We utilised more than 10,000 compounds from the SuperTarget database with known inhibition values for twelve different anti-cancer targets. We performed all-against-all comparisons resulting in 2D similarity landscapes. Among the regions with low 2D similarity scores are inhibitors of vascular endothelial growth factor receptor (VEGFR) and inhibitors of poly ADP-ribose polymerase (PARP). To demonstrate that 3D landscape comparison can identify similarities, which are untraceable in 2D similarity comparisons, we analysed this region in more detail. This 3D analysis showed the unexpected structural similarity between inhibitors of VEGFR and inhibitors of PARP. Among the VEGFR inhibitors that show similarities to PARP inhibitors was Vatalanib, an oral “multi-targeted” small molecule protein kinase inhibitor being studied in phase-III clinical trials in cancer therapy. An in silico docking simulation and an in vitro HT universal colorimetric PARP assay confirmed that the VEGFR inhibitor Vatalanib exhibits off-target activity as a PARP inhibitor, broadening its mode of action. Conclusion: In contrast to the 2D-similarity search, the 3D-similarity landscape comparison identifies new functions and side effects of the known VEGFR inhibitor Vatalanib.
    Electronic ISSN: 1471-2105
    Topics: Biology , Computer Science
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 83
    Publication Date: 2015-09-25
    Description: Background: Sequencing technologies provide a wealth of details in terms of genes, expression, splice variants, polymorphisms, and other features. A standard for sequencing analysis pipelines is to put genomic or transcriptomic features into a context of known functional information, but the relationships between ontology terms are often ignored. For RNA-Seq, considering genes and their genetic variants at the group level enables a convenient way to both integrate annotation data and detect small coordinated changes between experimental conditions, a known caveat of gene level analyses. Results: We introduce the high throughput data integration tool, htsint, as an extension to the commonly used gene set enrichment frameworks. The central aim of htsint is to compile annotation information from one or more taxa in order to calculate functional distances among all genes in a specified gene space. Spectral clustering is then used to partition the genes, thereby generating functional modules. The gene space can range from a targeted list of genes, like a specific pathway, all the way to an ensemble of genomes. Given a collection of gene sets and a count matrix of transcriptomic features (e.g. expression, polymorphisms), the gene sets produced by htsint can be tested for ‘enrichment’ or conditional differences using one of a number of commonly available packages. Conclusion: The database and bundled tools to generate functional modules were designed with sequencing pipelines in mind, but the toolkit nature of htsint allows it to also be used in other areas of genomics. The software is freely available as a Python library through GitHub at https://github.com/ajrichards/htsint.
    Electronic ISSN: 1471-2105
    Topics: Biology , Computer Science
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 84
    Publication Date: 2015-09-25
    Description: Background: In the past decade, the identification of gene co-expression has become a routine part of the analysis of high-dimensional microarray data. Gene co-expression, which is mostly detected via the Pearson correlation coefficient, has played an important role in the discovery of molecular pathways and networks. Unfortunately, the presence of systematic noise in high-dimensional microarray datasets corrupts estimates of gene co-expression. Removing systematic noise from microarray data is therefore crucial. Many cleaning approaches for microarray data exist, however these methods are aimed towards improving differential expression analysis and their performances have been primarily tested for this application. To our knowledge, the performances of these approaches have never been systematically compared in the context of gene co-expression estimation. Results: Using simulations we demonstrate that standard cleaning procedures, such as background correction and quantile normalization, fail to adequately remove systematic noise that affects gene co-expression and at times further degrade true gene co-expression. Instead we show that a global version of removal of unwanted variation (RUV), a data-driven approach, removes systematic noise but also allows the estimation of the true underlying gene-gene correlations. We compare the performance of all noise removal methods when applied to five large published datasets on gene expression in the human brain. RUV retrieves the highest gene co-expression values for sets of genes known to interact, but also provides the greatest consistency across all five datasets. We apply the method to prioritize epileptic encephalopathy candidate genes. Conclusions: Our work raises serious concerns about the quality of many published gene co-expression analyses. RUV provides an efficient and flexible way to remove systematic noise from high-dimensional microarray datasets when the objective is gene co-expression analysis. The RUV method as applicable in the context of gene-gene correlation estimation is available as a BioconductoR-package: RUVcorr.
    Electronic ISSN: 1471-2105
    Topics: Biology , Computer Science
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 85
    Publication Date: 2015-10-01
    Description: Background: The detection of the glomeruli is a key step in the histopathological evaluation of microscopic images of the kidneys. However, the task of automatic detection of the glomeruli poses challenges owing to the differences in their sizes and shapes in renal sections as well as the extensive variations in their intensities due to heterogeneity in immunohistochemistry staining.Although the rectangular histogram of oriented gradients (Rectangular HOG) is a widely recognized powerful descriptor for general object detection, it shows many false positives owing to the aforementioned difficulties in the context of glomeruli detection. Results: A new descriptor referred to as Segmental HOG was developed to perform a comprehensive detection of hundreds of glomeruli in images of whole kidney sections. The new descriptor possesses flexible blocks that can be adaptively fitted to input images in order to acquire robustness for the detection of the glomeruli. Moreover, the novel segmentation technique employed herewith generates high-quality segmentation outputs, and the algorithm is assured to converge to an optimal solution. Consequently, experiments using real-world image data revealed that Segmental HOG achieved significant improvements in detection performance compared to Rectangular HOG. Conclusion: The proposed descriptor for glomeruli detection presents promising results, and it is expected to be useful in pathological evaluation.
    Electronic ISSN: 1471-2105
    Topics: Biology , Computer Science
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 86
    Publication Date: 2015-09-30
    Description: Background: Numerous methods are available to profile several epigenetic marks, providing data with different genome coverage and resolution. Large epigenomic datasets are then generated, and often combined with other high-throughput data, including RNA-seq, ChIP-seq for transcription factors (TFs) binding and DNase-seq experiments. Despite the numerous computational tools covering specific steps in the analysis of large-scale epigenomics data, comprehensive software solutions for their integrative analysis are still missing. Multiple tools must be identified and combined to jointly analyze histone marks, TFs binding and other -omics data together with DNA methylation data, complicating the analysis of these data and their integration with publicly available datasets. Results: To overcome the burden of integrating various data types with multiple tools, we developed two companion R/Bioconductor packages. The former, methylPipe, is tailored to the analysis of high- or low-resolution DNA methylomes in several species, accommodating (hydroxy-)methyl-cytosines in both CpG and non-CpG sequence context. The analysis of multiple whole-genome bisulfite sequencing experiments is supported, while maintaining the ability of integrating targeted genomic data. The latter, compEpiTools, seamlessly incorporates the results obtained with methylPipe and supports their integration with other epigenomics data. It provides a number of methods to score these data in regions of interest, leading to the identification of enhancers, lncRNAs, and RNAPII stalling/elongation dynamics. Moreover, it allows a fast and comprehensive annotation of the resulting genomic regions, and the association of the corresponding genes with non-redundant GeneOntology terms. Finally, the package includes a flexible method based on heatmaps for the integration of various data types, combining annotation tracks with continuous or categorical data tracks. Conclusions: methylPipe and compEpiTools provide a comprehensive Bioconductor-compliant solution for the integrative analysis of heterogeneous epigenomics data. These packages are instrumental in providing biologists with minimal R skills a complete toolkit facilitating the analysis of their own data, or in accelerating the analyses performed by more experienced bioinformaticians.
    Electronic ISSN: 1471-2105
    Topics: Biology , Computer Science
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 87
    Publication Date: 2015-09-30
    Description: Background: The characterization of proteins in families and subfamilies, at different levels, entails the definition and use of class labels. When the adscription of a protein to a family is uncertain, or even wrong, this becomes an instance of what has come to be known as a label noise problem. Label noise has a potentially negative effect on any quantitative analysis of proteins that depends on label information. This study investigates class C of G protein-coupled receptors, which are cell membrane proteins of relevance both to biology in general and pharmacology in particular. Their supervised classification into different known subtypes, based on primary sequence data, is hampered by label noise. The latter may stem from a combination of expert knowledge limitations and the lack of a clear correspondence between labels that mostly reflect GPCR functionality and the different representations of the protein primary sequences. Results: In this study, we describe a systematic approach, using Support Vector Machine classifiers, to the analysis of G protein-coupled receptor misclassifications. As a proof of concept, this approach is used to assist the discovery of labeling quality problems in a curated, publicly accessible database of this type of proteins. We also investigate the extent to which physico-chemical transformations of the protein sequences reflect G protein-coupled receptor subtype labeling. The candidate mislabeled cases detected with this approach are externally validated with phylogenetic trees and against further trusted sources such as the National Center for Biotechnology Information, Universal Protein Resource, European Bioinformatics Institute and Ensembl Genome Browser information repositories. Conclusions: In quantitative classification problems, class labels are often by default assumed to be correct. Label noise, though, is bound to be a pervasive problem in bioinformatics, where labels may be obtained indirectly through complex, many-step similarity modelling processes. In the case of G protein-coupled receptors, methods capable of singling out and characterizing those sequences with consistent misclassification behaviour are required to minimize this problem. A systematic, Support Vector Machine-based method has been proposed in this study for such purpose. The proposed method enables a filtering approach to the label noise problem and might become a support tool for database curators in proteomics.
    Electronic ISSN: 1471-2105
    Topics: Biology , Computer Science
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 88
    Publication Date: 2015-09-30
    Description: Background: Copy number variations are important in the detection and progression of significant tumors and diseases. Recently, Whole Exome Sequencing is gaining popularity with copy number variations detection due to low cost and better efficiency. In this work, we developed VEGAWES for accurate and robust detection of copy number variations on WES data. VEGAWES is an extension to a variational based segmentation algorithm, VEGA: Variational estimator for genomic aberrations, which has previously outperformed several algorithms on segmenting array comparative genomic hybridization data. Results: We tested this algorithm on synthetic data and 100 Glioblastoma Multiforme primary tumor samples. The results on the real data were analyzed with segmentation obtained from Single-nucleotide polymorphism data as ground truth. We compared our results with two other segmentation algorithms and assessed the performance based on accuracy and time. Conclusions: In terms of both accuracy and time, VEGAWES provided better results on the synthetic data and tumor samples demonstrating its potential in robust detection of aberrant regions in the genome.
    Electronic ISSN: 1471-2105
    Topics: Biology , Computer Science
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 89
    Publication Date: 2015-09-30
    Description: Background: In the last decade, a great number of methods for reconstructing gene regulatory networks from expression data have been proposed. However, very few tools and datasets allow to evaluate accurately and reproducibly those methods. Hence, we propose here a new tool, able to perform a systematic, yet fully reproducible, evaluation of transcriptional network inference methods. Results: Our open-source and freely available Bioconductor package aggregates a large set of tools to assess the robustness of network inference algorithms against different simulators, topologies, sample sizes and noise intensities. Conclusions: The benchmarking framework that uses various datasets highlights the specialization of some methods toward network types and data. As a result, it is possible to identify the techniques that have broad overall performances.
    Electronic ISSN: 1471-2105
    Topics: Biology , Computer Science
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 90
    Publication Date: 2015-09-30
    Description: Background: We are creating software for agent-based simulation and visualization of bio-molecular processes in bacterial and eukaryotic cells. As a first example, we have built a 3-dimensional, interactive computer model of an Escherichia coli bacterium and its associated biomolecular processes. Our illustrative model focuses on the gene regulatory processes that control the expression of genes involved in the lactose operon. Prokaryo, our agent-based cell simulator, incorporates cellular structures, such as plasma membranes and cytoplasm, as well as elements of the molecular machinery, including RNA polymerase, messenger RNA, lactose permease, and ribosomes. Results: The dynamics of cellular ’agents’ are defined by their rules of interaction, implemented as finite state machines. The agents are embedded within a 3-dimensional virtual environment with simulated physical and electrochemical properties. The hybrid model is driven by a combination of (1) mathematical equations (DEQs) to capture higher-scale phenomena and (2) agent-based rules to implement localized interactions among a small number of molecular elements. Consequently, our model is able to capture phenomena across multiple spatial scales, from changing concentration gradients to one-on-one molecular interactions.We use the classic gene regulatory mechanism of the lactose operon to demonstrate our model’s resolution, visual presentation, and real-time interactivity. Our agent-based model expands on a sophisticated mathematical E. coli metabolism model, through which we highlight our model’s scientific validity. Conclusion: We believe that through illustration and interactive exploratory learning a model system like Prokaryo can enhance the general understanding and perception of biomolecular processes. Our agent-DEQ hybrid modeling approach can also be of value to conceptualize, illustrate, and—eventually—validate cell experiments in the wet lab.
    Electronic ISSN: 1471-2105
    Topics: Biology , Computer Science
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 91
    Publication Date: 2015-11-24
    Description: Background: The alignment of multiple protein sequences is one of the most commonly performed tasks in bioinformatics. In spite of considerable research and efforts that have been recently deployed for improving the performance of multiple sequence alignment (MSA) algorithms, finding a highly accurate alignment between multiple protein sequences is still a challenging problem. Results: We propose a novel and efficient algorithm called, MSAIndelFR, for multiple sequence alignment using the information on the predicted locations of IndelFRs and the computed average log–loss values obtained from IndelFR predictors, each of which is designed for a different protein fold. We demonstrate that the introduction of a new variable gap penalty function based on the predicted locations of the IndelFRs and the computed average log–loss values into the proposed algorithm substantially improves the protein alignment accuracy. This is illustrated by evaluating the performance of the algorithm in aligning sequences belonging to the protein folds for which the IndelFR predictors already exist and by using the reference alignments of the four popular benchmarks, BAliBASE 3.0, OXBENCH, PREFAB 4.0, and SABRE (SABmark 1.65). Conclusions: We have proposed a novel and efficient algorithm, the MSAIndelFR algorithm, for multiple protein sequence alignment incorporating a new variable gap penalty function. It is shown that the performance of the proposed algorithm is superior to that of the most–widely used alignment algorithms, Clustal W2, Clustal Omega, Kalign2, MSAProbs, MAFFT, MUSCLE, ProbCons and Probalign, in terms of both the sum–of–pairs and total column metrics.
    Electronic ISSN: 1471-2105
    Topics: Biology , Computer Science
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 92
    facet.materialart.
    Unknown
    Springer
    Publication Date: 2015-11-21
    Description: Stochastic boolean function evaluation (SBFE) is the problem of determining the value of a given boolean function f on an unknown input x , when each bit \(x_i\) of x can only be determined by paying a given associated cost \(c_i\) . Further, x is drawn from a given product distribution: for each \(x_i\) , \(\mathbf{Pr}[x_i=1] = p_i\) and the bits are independent. The goal is to minimize the expected cost of evaluation. In this paper, we study the complexity of the SBFE problem for classes of DNF formulas. We consider both exact and approximate versions of the problem for subclasses of DNF, for arbitrary costs and product distributions, and for unit costs and/or the uniform distribution.
    Print ISSN: 0178-4617
    Electronic ISSN: 1432-0541
    Topics: Computer Science , Mathematics
    Published by Springer
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 93
    Publication Date: 2015-11-20
    Description: Background: Molecular interactions between proteins and DNA molecules underlie many cellular processes, including transcriptional regulation, chromosome replication, and nucleosome positioning. Computational analyses of protein-DNA interactions rely on experimental data characterizing known protein-DNA interactions structurally and biochemically. While many databases exist that contain either structural or biochemical data, few integrate these two data sources in a unified fashion. Such integration is becoming increasingly critical with the rapid growth of structural and biochemical data, and the emergence of algorithms that rely on the synthesis of multiple data types to derive computational models of molecular interactions.DescriptionWe have developed an integrated affinity-structure database in which the experimental and quantitative DNA binding affinities of helix-turn-helix proteins are mapped onto the crystal structures of the corresponding protein-DNA complexes. This database provides access to: (i) protein-DNA structures, (ii) quantitative summaries of protein-DNA binding affinities using position weight matrices, and (iii) raw experimental data of protein-DNA binding instances. Critically, this database establishes a correspondence between experimental structural data and quantitative binding affinity data at the single basepair level. Furthermore, we present a novel alignment algorithm that structurally aligns the protein-DNA complexes in the database and creates a unified residue-level coordinate system for comparing the physico-chemical environments at the interface between complexes. Using this unified coordinate system, we compute the statistics of atomic interactions at the protein-DNA interface of helix-turn-helix proteins. We provide an interactive website for visualization, querying, and analyzing this database, and a downloadable version to facilitate programmatic analysis. Conclusions: This database will facilitate the analysis of protein-DNA interactions and the development of programmatic computational methods that capitalize on integration of structural and biochemical datasets. The database can be accessed at http://ProteinDNA.hms.harvard.edu.
    Electronic ISSN: 1471-2105
    Topics: Biology , Computer Science
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 94
    Publication Date: 2015-11-24
    Description: Background: Available methods to simulate nucleotide or amino acid data typically use Markov models to simulate each position independently. These approaches are not appropriate to assess the performance of combinatorial and probabilistic methods that look for coevolving positions in nucleotide or amino acid sequences. Results: We have developed a web-based platform that gives a user-friendly access to two phylogenetic-based methods implementing the Coev model: the evaluation of coevolving scores and the simulation of coevolving positions. We have also extended the capabilities of the Coev model to allow for the generalization of the alphabet used in the Markov model, which can now analyse both nucleotide and amino acid data sets. The simulation of coevolving positions is novel and builds upon the developments of the Coev model. It allows user to simulate pairs of dependent nucleotide or amino acid positions. Conclusions: The main focus of our paper is the new simulation method we present for coevolving positions. The implementation of this method is embedded within the web platform Coev-web that is freely accessible at http://coev.vital-it.ch/, and was tested in most modern web browsers.
    Electronic ISSN: 1471-2105
    Topics: Biology , Computer Science
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 95
    Publication Date: 2015-11-24
    Description: Background: Identification of one or several disease causing variant(s) from the large collection of variants present in an individual is often achieved by the sequential use of heuristic filters. The recent development of whole exome sequencing enrichment designs for several non-model species created the need for a species-independent, fast and versatile analysis tool, capable of tackling a wide variety of standard and more complex inheritance models. With this aim, we developed “Mendelian”, an R-package that can be used for heuristic variant filtering. Results: The R-package Mendelian offers fast and convenient filters to analyze putative variants for both recessive and dominant models of inheritance, with variable degrees of penetrance and detectance. Analysis of trios is supported. Filtering against variant databases and annotation of variants is also included. This package is not species specific and supports parallel computation. We validated this package by reanalyzing data from a whole exome sequencing experiment on intellectual disability in humans. In a second example, we identified the mutations responsible for coat color in the dog. This is the first example of whole exome sequencing without prior mapping in the dog. Conclusion: We developed an R-package that enables the identification of disease-causing variants from the long list of variants called in sequencing experiments. The software and a detailed manual are available at https://github.com/BartBroeckx/Mendelian.
    Electronic ISSN: 1471-2105
    Topics: Biology , Computer Science
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 96
    Publication Date: 2015-11-26
    Description: Background: Identification of bacteria may be based on sequencing and molecular analysis of a specific locus such as 16S rRNA, or a set of loci such as in multilocus sequence typing. In the near future, healthcare institutions and routine diagnostic microbiology laboratories may need to sequence the entire genome of microbial isolates. Therefore we have developed Reads2Type, a web-based tool for taxonomy identification based on whole bacterial genome sequence data. Results: Raw sequencing data provided by the user are mapped against a set of marker probes that are derived from currently available bacteria complete genomes. Using a dataset of 1003 whole genome sequenced bacteria from various sequencing platforms, Reads2Type was able to identify the species with 99.5 % accuracy and on the minutes time scale. Conclusions: In comparison with other tools, Reads2Type offers the advantage of not needing to transfer sequencing files, as the entire computational analysis is done on the computer of whom utilizes the web application. This also prevents data privacy issues to arise. The Reads2Type tool is available at http://www.cbs.dtu.dk/~dhany/reads2type.html.
    Electronic ISSN: 1471-2105
    Topics: Biology , Computer Science
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 97
    Publication Date: 2015-11-26
    Description: Background: In recent years, hyperspectral microscopy techniques such as infrared or Raman microscopy have been applied successfully for diagnostic purposes. In many of the corresponding studies, it is common practice to measure one and the same sample under different types of microscopes. Any joint analysis of the two image modalities requires to overlay the images, so that identical positions in the sample are located at the same coordinate in both images. This step, commonly referred to as image registration, has typically been performed manually in the lack of established automated computational registration tools. Results: We propose a corresponding registration algorithm that addresses this registration problem, and demonstrate the robustness of our approach in different constellations of microscopes. First, we deal with subregion registration of Fourier Transform Infrared (FTIR) microscopic images in whole-slide histopathological staining images. Second, we register FTIR imaged cores of tissue microarrays in their histopathologically stained counterparts, and finally perform registration of Coherent anti-Stokes Raman spectroscopic (CARS) images within histopathological staining images. Conclusions: Our validation involves a large variety of samples obtained from colon, bladder, and lung tissue on three different types of microscopes, and demonstrates that our proposed method works fully automated and highly robust in different constellations of microscopes involving diverse types of tissue samples.
    Electronic ISSN: 1471-2105
    Topics: Biology , Computer Science
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 98
    Publication Date: 2015-11-22
    Description: Background: The number of γH2AX foci per nucleus is an accepted measure of the number of DNA double-strand breaks in single cells. One of the experimental techniques for γH2AX detection in cultured cells is immunofluorescent labelling of γH2AX and nuclei followed by microscopy imaging and analysis. Results: In this study, we present the algorithm FoCo for reliable and robust automatic nuclear foci counting in single cell images. FoCo has the following advantages with respect to other software packages: i) the ability to reliably quantify even densely distributed foci, e.g., on images of cells subjected to radiation doses up to 10 Gy, ii) robustness of foci quantification in the sense of suppressing out-of-focus background signal, and iii) its simplicity. FoCo requires only 5 parameters that have to be adjusted by the user. Conclusions: FoCo is an open-source user-friendly software with GUI for individual foci counting, which is able to produce reliable and robust foci quantifications even for low signal/noise ratios and densely distributed foci.
    Electronic ISSN: 1471-2105
    Topics: Biology , Computer Science
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 99
    Publication Date: 2015-11-26
    Description: Background: Analysis of single cells in their native environment is a powerful method to address key questions in developmental systems biology. Confocal microscopy imaging of intact tissues, followed by automatic image segmentation, provides a means to conduct cytometric studies while at the same time preserving crucial information about the spatial organization of the tissue and morphological features of the cells. This technique is rapidly evolving but is still not in widespread use among research groups that do not specialize in technique development, perhaps in part for lack of tools that automate repetitive tasks while allowing experts to make the best use of their time in injecting their domain-specific knowledge. Results: Here we focus on a well-established stem cell model system, the C. elegans gonad, as well as on two other model systems widely used to study cell fate specification and morphogenesis: the pre-implantation mouse embryo and the developing mouse olfactory epithelium. We report a pipeline that integrates machine-learning-based cell detection, fast human-in-the-loop curation of these detections, and running of active contours seeded from detections to segment cells. The procedure can be bootstrapped by a small number of manual detections, and outperforms alternative pieces of software we benchmarked on C. elegans gonad datasets. Using cell segmentations to quantify fluorescence contents, we report previously-uncharacterized cell behaviors in the model systems we used. We further show how cell morphological features can be used to identify cell cycle phase; this provides a basis for future tools that will streamline cell cycle experiments by minimizing the need for exogenous cell cycle phase labels. Conclusions: High-throughput 3D segmentation makes it possible to extract rich information from images that are routinely acquired by biologists, and provides insights — in particular with respect to the cell cycle — that would be difficult to derive otherwise.
    Electronic ISSN: 1471-2105
    Topics: Biology , Computer Science
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 100
    Publication Date: 2015-11-26
    Description: Background: Inferring gene regulatory network (GRN) has been an important topic in Bioinformatics. Many computational methods infer the GRN from high-throughput expression data. Due to the presence of time delays in the regulatory relationships, High-Order Dynamic Bayesian Network (HO-DBN) is a good model of GRN. However, previous GRN inference methods assume causal sufficiency, i.e. no unobserved common cause. This assumption is convenient but unrealistic, because it is possible that relevant factors have not even been conceived of and therefore un-measured. Therefore an inference method that also handles hidden common cause(s) is highly desirable. Also, previous methods for discovering hidden common causes either do not handle multi-step time delays or restrict that the parents of hidden common causes are not observed genes. Results: We have developed a discrete HO-DBN learning algorithm that can infer also hidden common cause(s) from discrete time series expression data, with some assumptions on the conditional distribution, but is less restrictive than previous methods. We assume that each hidden variable has only observed variables as children and parents, with at least two children and possibly no parents. We also make the simplifying assumption that children of hidden variable(s) are not linked to each other. Moreover, our proposed algorithm can also utilize multiple short time series (not necessarily of the same length), as long time series are difficult to obtain. Conclusions: We have performed extensive experiments using synthetic data on GRNs of size up to 100, with up to 10 hidden nodes. Experiment results show that our proposed algorithm can recover the causal GRNs adequately given the incomplete data. Using the limited real expression data and small subnetworks of the YEASTRACT network, we have also demonstrated the potential of our algorithm on real data, though more time series expression data is needed.
    Electronic ISSN: 1471-2105
    Topics: Biology , Computer Science
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
Close ⊗
This website uses cookies and the analysis tool Matomo. More information can be found here...