ALBERT

All Library Books, journals and Electronic Records Telegrafenberg

Your email was sent successfully. Check your inbox.

An error occurred while sending the email. Please try again.

Proceed reservation?

Export
Filter
  • Articles  (17,631)
  • PLoS Computational Biology  (3,993)
  • IEEE Transactions on Knowledge and Data Engineering  (1,300)
  • Algorithmica  (744)
  • 1274
  • 56466
  • 825
  • Computer Science  (17,631)
  • 1
    Publication Date: 2020-07-02
    Print ISSN: 0178-4617
    Electronic ISSN: 1432-0541
    Topics: Computer Science , Mathematics
    Published by Springer
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 2
    Publication Date: 2020-07-15
    Description: It is generally accepted that populations are useful for the global exploration of multi-modal optimisation problems. Indeed, several theoretical results are available showing such advantages over single-trajectory search heuristics. In this paper we provide evidence that evolving populations via crossover and mutation may also benefit the optimisation time for hillclimbing unimodal functions. In particular, we prove bounds on the expected runtime of the standard ($$mu +1$$ μ + 1 ) GA for OneMax that are lower than its unary black box complexity and decrease in the leading constant with the population size up to $$mu =oleft( sqrt{log n} ight) $$ μ = o log n . Our analysis suggests that the optimal mutation strategy is to flip two bits most of the time. To achieve the results we provide two interesting contributions to the theory of randomised search heuristics: (1) A novel application of drift analysis which compares absorption times of different Markov chains without defining an explicit potential function. (2) The inversion of fundamental matrices to calculate the absorption times of the Markov chains. The latter strategy was previously proposed in the literature but to the best of our knowledge this is the first time is has been used to show non-trivial bounds on expected runtimes.
    Print ISSN: 0178-4617
    Electronic ISSN: 1432-0541
    Topics: Computer Science , Mathematics
    Published by Springer
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 3
    Publication Date: 2020-07-15
    Description: Given two strings S and T, each of length at most n, the longest common substring (LCS) problem is to find a longest substring common to S and T. This is a classical problem in computer science with an $$mathcal {O}(n)$$ O ( n ) -time solution. In the fully dynamic setting, edit operations are allowed in either of the two strings, and the problem is to find an LCS after each edit. We present the first solution to the fully dynamic LCS problem requiring sublinear time in n per edit operation. In particular, we show how to find an LCS after each edit operation in $$ilde{mathcal {O}}(n^{2/3})$$ O ~ ( n 2 / 3 ) time, after $$ilde{mathcal {O}}(n)$$ O ~ ( n ) -time and space preprocessing. This line of research has been recently initiated in a somewhat restricted dynamic variant by Amir et al. [SPIRE 2017]. More specifically, the authors presented an $$ilde{mathcal {O}}(n)$$ O ~ ( n ) -sized data structure that returns an LCS of the two strings after a single edit operation (that is reverted afterwards) in $$ilde{mathcal {O}}(1)$$ O ~ ( 1 ) time. At CPM 2018, three papers (Abedin et al., Funakoshi et al., and Urabe et al.) studied analogously restricted dynamic variants of problems on strings; specifically, computing the longest palindrome and the Lyndon factorization of a string after a single edit operation. We develop dynamic sublinear-time algorithms for both of these problems as well. We also consider internal LCS queries, that is, queries in which we are to return an LCS of a pair of substrings of S and T. We show that answering such queries is hard in general and propose efficient data structures for several restricted cases.
    Print ISSN: 0178-4617
    Electronic ISSN: 1432-0541
    Topics: Computer Science , Mathematics
    Published by Springer
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 4
    Publication Date: 2020-07-10
    Print ISSN: 0178-4617
    Electronic ISSN: 1432-0541
    Topics: Computer Science , Mathematics
    Published by Springer
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 5
    facet.materialart.
    Unknown
    Springer
    Publication Date: 2015-08-13
    Description: In the classic k -center problem, we are given a metric graph, and the objective is to select k nodes as centers such that the maximum distance from any vertex to its closest center is minimized. In this paper, we consider two important generalizations of k -center, the matroid center problem and the knapsack center problem. Both problems are motivated by recent content distribution network applications. Our contributions can be summarized as follows: (1) We consider the matroid center problem in which the centers are required to form an independent set of a given matroid. We show this problem is NP-hard even on a line. We present a 3-approximation algorithm for the problem on general metrics. We also consider the outlier version of the problem where a given number of vertices can be excluded as outliers from the solution. We present a 7-approximation for the outlier version. (2) We consider the (multi-)knapsack center problem in which the centers are required to satisfy one (or more) knapsack constraint(s). It is known that the knapsack center problem with a single knapsack constraint admits a 3-approximation. However, when there are at least two knapsack constraints, we show this problem is not approximable at all. To complement the hardness result, we present a polynomial time algorithm that gives a 3-approximate solution such that one knapsack constraint is satisfied and the others may be violated by at most a factor of \(1+\epsilon \) . We also obtain a 3-approximation for the outlier version that may violate the knapsack constraint by \(1+\epsilon \) .
    Print ISSN: 0178-4617
    Electronic ISSN: 1432-0541
    Topics: Computer Science , Mathematics
    Published by Springer
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 6
    facet.materialart.
    Unknown
    Springer
    Publication Date: 2015-08-13
    Description: We design a succinct data structure for representing a poset that, given two elements, can report whether one precedes the other in constant time. This is equivalent to succinctly representing the transitive closure graph of the poset, and we note that the same method can also be used to succinctly represent the transitive reduction graph. For an n element poset, the data structure occupies \(n^2/4 + o(n^2)\) bits in the worst case. Furthermore, a slight extension to this data structure yields a succinct oracle for reachability in arbitrary directed graphs. Thus, using no more than a quarter of the space required to represent an arbitrary directed graph, reachability queries can be supported in constant time. We also consider the operation of listing all the successors or predecessors of a given element, and show how to do this in constant time per element reported using a slightly modified version of our succinct data structure.
    Print ISSN: 0178-4617
    Electronic ISSN: 1432-0541
    Topics: Computer Science , Mathematics
    Published by Springer
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 7
    facet.materialart.
    Unknown
    Springer
    Publication Date: 2015-08-14
    Description: We consider a graph observability problem: how many edge colors are needed for an unlabeled graph so that an agent, walking from node to node, can uniquely determine its location from just the observed color sequence of the walk? Specifically, let G ( n ,  d ) be an edge-colored subgraph of d -dimensional (directed or undirected) lattice of size \(n^d = n \times n \times \cdots \times n\) . We say that G ( n ,  d ) is t -observable if an agent can uniquely determine its current position in the graph from the color sequence of any t -dimensional walk, where the dimension is the number of different directions spanned by the edges of the walk. A walk in an undirected lattice G ( n ,  d ) has dimension between 1 and d , but a directed walk can have dimension between 1 and 2 d because of two different orientations for each axis. We derive bounds on the number of colors needed for t -observability. Our main result is that \(\varTheta (n^{d/t})\) colors are both necessary and sufficient for t -observability of G ( n ,  d ), where d is considered a constant. This shows an interesting dependence of graph observability on the ratio between the dimension of the lattice and that of the walk. In particular, the number of colors for full-dimensional walks is \(\varTheta (n^{1/2})\) in the directed case, and \(\varTheta (n)\) in the undirected case, independent of the lattice dimension. All of our results extend easily to non-square lattices: given a lattice graph of size \(N = n_1 \times n_2 \times \cdots \times n_d\) , the number of colors for t -observability is \(\varTheta (\root t \of {N})\) .
    Print ISSN: 0178-4617
    Electronic ISSN: 1432-0541
    Topics: Computer Science , Mathematics
    Published by Springer
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 8
    facet.materialart.
    Unknown
    Springer
    Publication Date: 2015-08-04
    Description: We revisit the matrix problems sparse null space and matrix sparsification , and show that they are equivalent. We then proceed to seek algorithms for these problems: we prove the hardness of approximation of these problems, and also give a powerful tool to extend algorithms and heuristics for sparse approximation theory to these problems.
    Print ISSN: 0178-4617
    Electronic ISSN: 1432-0541
    Topics: Computer Science , Mathematics
    Published by Springer
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 9
    facet.materialart.
    Unknown
    Springer
    Publication Date: 2015-08-04
    Description: We introduce Planar Disjoint Paths Completion , a completion counterpart of the Disjoint Paths problem, and study its parameterized complexity. The problem can be stated as follows: given a, not necessarily connected, plane graph G ,  k pairs of terminals, and a face F of G ,  find a minimum-size set of edges, if one exists, to be added inside F so that the embedding remains planar and the pairs become connected by k disjoint paths in the augmented network. Our results are twofold: first, we give an upper bound on the number of necessary additional edges when a solution exists. This bound is a function of k , independent of the size of G . Second, we show that the problem is fixed-parameter tractable, in particular, it can be solved in time \(f(k)\cdot n^{2}\) .
    Print ISSN: 0178-4617
    Electronic ISSN: 1432-0541
    Topics: Computer Science , Mathematics
    Published by Springer
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 10
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-08-07
    Description: Given a database table with records that can be ranked, an interesting problem is to identify selection conditions for the table, which are qualified by an input record and render its ranking as high as possible among the qualifying tuples. In this paper, we study this standing maximization problem, which finds application in object promotion and characterization. After showing the hardness of the problem, we propose greedy methods, which are experimentally shown to achieve high accuracy compared to exhaustive enumeration, while scaling very well to the problem input size. Our contributions include a linear-time algorithm for determining the optimal selection range for an ordinal attribute and techniques for choosing and prioritizing the most promising selection predicates to apply. Experiments on real datasets confirm the effectiveness and efficiency of our techniques.
    Print ISSN: 1041-4347
    Electronic ISSN: 1558-2191
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 11
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-08-07
    Description: Some fairly recent research has focused on providing XACML-based solutions for dynamic privacy policy management. In this regard, a number of works have provided enhancements to the performance of XACML policy enforcement point (PEP) component, but very few have focused on enhancing the accuracy of that component. This paper improves the accuracy of an XACML PEP by filling some gaps in the existing works. In particular, dynamically incorporating user access context into the privacy policy decision, and its enforcement. We provide an XACML-based implementation of a dynamic privacy policy management framework and an evaluation of the applicability of our system in comparison to some of the existing approaches.
    Print ISSN: 1041-4347
    Electronic ISSN: 1558-2191
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 12
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-08-07
    Description: This paper first introduces pattern aided regression (PXR) models, a new type of regression models designed to represent accurate and interpretable prediction models. This was motivated by two observations: (1) Regression modeling applications often involve complex diverse predictor-response relationships , which occur when the optimal regression models (of given regression model type) fitting two or more distinct logical groups of data are highly different. (2) State-of-the-art regression methods are often unable to adequately model such relationships. This paper defines PXR models using several patterns and local regression models, which respectively serve as logical and behavioral characterizations of distinct predictor-response relationships. The paper also introduces a contrast pattern aided regression (CPXR) method, to build accurate PXR models. In experiments, the PXR models built by CPXR are very accurate in general, often outperforming state-of-the-art regression methods by big margins. Usually using (a) around seven simple patterns and (b) linear local regression models, those PXR models are easy to interpret; in fact, their complexity is just a bit higher than that of (piecewise) linear regression models and is significantly lower than that of traditional ensemble based regression models. CPXR is especially effective for high-dimensional data. The paper also discusses how to use CPXR methodology for analyzing prediction models and correcting their prediction errors.
    Print ISSN: 1041-4347
    Electronic ISSN: 1558-2191
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 13
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-08-07
    Description: We analyze models for predicting the probability of a strikeout for a batter/pitcher matchup in baseball using player descriptors that can be estimated accurately from small samples. We start with the log5 model which has been used extensively for describing matchups in sports. Log5 is a special case of a logit model and we use constrained logistic regression over nearly one million matchup observations to assess the use of the log5 explanatory variables for this application. We also show that a batter/pitcher ground ball rate interaction variable is significant for the prediction of strikeout probability and we provide physical justification for the inclusion of this variable in the model. We quantify the differences among the models and show that batters control the majority of the variance in predicted strikeout rate.
    Print ISSN: 1041-4347
    Electronic ISSN: 1558-2191
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 14
    facet.materialart.
    Unknown
    Public Library of Science (PLoS)
    Publication Date: 2015-08-07
    Description: by Eliseo Ferrante, Ali Emre Turgut, Edgar Duéñez-Guzmán, Marco Dorigo, Tom Wenseleers Division of labor is ubiquitous in biological systems, as evidenced by various forms of complex task specialization observed in both animal societies and multicellular organisms. Although clearly adaptive, the way in which division of labor first evolved remains enigmatic, as it requires the simultaneous co-occurrence of several complex traits to achieve the required degree of coordination. Recently, evolutionary swarm robotics has emerged as an excellent test bed to study the evolution of coordinated group-level behavior. Here we use this framework for the first time to study the evolutionary origin of behavioral task specialization among groups of identical robots. The scenario we study involves an advanced form of division of labor, common in insect societies and known as “task partitioning”, whereby two sets of tasks have to be carried out in sequence by different individuals. Our results show that task partitioning is favored whenever the environment has features that, when exploited, reduce switching costs and increase the net efficiency of the group, and that an optimal mix of task specialists is achieved most readily when the behavioral repertoires aimed at carrying out the different subtasks are available as pre-adapted building blocks. Nevertheless, we also show for the first time that self-organized task specialization could be evolved entirely from scratch, starting only from basic, low-level behavioral primitives, using a nature-inspired evolutionary method known as Grammatical Evolution. Remarkably, division of labor was achieved merely by selecting on overall group performance, and without providing any prior information on how the global object retrieval task was best divided into smaller subtasks. We discuss the potential of our method for engineering adaptively behaving robot swarms and interpret our results in relation to the likely path that nature took to evolve complex sociality and task specialization.
    Print ISSN: 1553-734X
    Electronic ISSN: 1553-7358
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 15
    Publication Date: 2015-08-07
    Description: by Patrícia Santos-Oliveira, António Correia, Tiago Rodrigues, Teresa M Ribeiro-Rodrigues, Paulo Matafome, Juan Carlos Rodríguez-Manzaneque, Raquel Seiça, Henrique Girão, Rui D. M. Travasso Sprouting angiogenesis, where new blood vessels grow from pre-existing ones, is a complex process where biochemical and mechanical signals regulate endothelial cell proliferation and movement. Therefore, a mathematical description of sprouting angiogenesis has to take into consideration biological signals as well as relevant physical processes, in particular the mechanical interplay between adjacent endothelial cells and the extracellular microenvironment. In this work, we introduce the first phase-field continuous model of sprouting angiogenesis capable of predicting sprout morphology as a function of the elastic properties of the tissues and the traction forces exerted by the cells. The model is very compact, only consisting of three coupled partial differential equations, and has the clear advantage of a reduced number of parameters. This model allows us to describe sprout growth as a function of the cell-cell adhesion forces and the traction force exerted by the sprout tip cell. In the absence of proliferation, we observe that the sprout either achieves a maximum length or, when the traction and adhesion are very large, it breaks. Endothelial cell proliferation alters significantly sprout morphology, and we explore how different types of endothelial cell proliferation regulation are able to determine the shape of the growing sprout. The largest region in parameter space with well formed long and straight sprouts is obtained always when the proliferation is triggered by endothelial cell strain and its rate grows with angiogenic factor concentration. We conclude that in this scenario the tip cell has the role of creating a tension in the cells that follow its lead. On those first stalk cells, this tension produces strain and/or empty spaces, inevitably triggering cell proliferation. The new cells occupy the space behind the tip, the tension decreases, and the process restarts. Our results highlight the ability of mathematical models to suggest relevant hypotheses with respect to the role of forces in sprouting, hence underlining the necessary collaboration between modelling and molecular biology techniques to improve the current state-of-the-art.
    Print ISSN: 1553-734X
    Electronic ISSN: 1553-7358
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 16
    Publication Date: 2015-08-08
    Description: by Sayed-Rzgar Hosseini, Aditya Barve, Andreas Wagner All biological evolution takes place in a space of possible genotypes and their phenotypes. The structure of this space defines the evolutionary potential and limitations of an evolving system. Metabolism is one of the most ancient and fundamental evolving systems, sustaining life by extracting energy from extracellular nutrients. Here we study metabolism’s potential for innovation by analyzing an exhaustive genotype-phenotype map for a space of 10 15 metabolisms that encodes all possible subsets of 51 reactions in central carbon metabolism. Using flux balance analysis, we predict the viability of these metabolisms on 10 different carbon sources which give rise to 1024 potential metabolic phenotypes. Although viable metabolisms with any one phenotype comprise a tiny fraction of genotype space, their absolute numbers exceed 10 9 for some phenotypes. Metabolisms with any one phenotype typically form a single network of genotypes that extends far or all the way through metabolic genotype space, where any two genotypes can be reached from each other through a series of single reaction changes. The minimal distance of genotype networks associated with different phenotypes is small, such that one can reach metabolisms with novel phenotypes – viable on new carbon sources – through one or few genotypic changes. Exceptions to these principles exist for those metabolisms whose complexity (number of reactions) is close to the minimum needed for viability. Increasing metabolic complexity enhances the potential for both evolutionary conservation and evolutionary innovation.
    Print ISSN: 1553-734X
    Electronic ISSN: 1553-7358
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 17
    Publication Date: 2015-08-19
    Description: by Pengxing Cao, Ada W. C. Yan, Jane M. Heffernan, Stephen Petrie, Robert G. Moss, Louise A. Carolan, Teagan A. Guarnaccia, Anne Kelso, Ian G. Barr, Jodie McVernon, Karen L. Laurie, James M. McCaw Influenza is an infectious disease that primarily attacks the respiratory system. Innate immunity provides both a very early defense to influenza virus invasion and an effective control of viral growth. Previous modelling studies of virus–innate immune response interactions have focused on infection with a single virus and, while improving our understanding of viral and immune dynamics, have been unable to effectively evaluate the relative feasibility of different hypothesised mechanisms of antiviral immunity. In recent experiments, we have applied consecutive exposures to different virus strains in a ferret model, and demonstrated that viruses differed in their ability to induce a state of temporary immunity or viral interference capable of modifying the infection kinetics of the subsequent exposure. These results imply that virus-induced early immune responses may be responsible for the observed viral hierarchy. Here we introduce and analyse a family of within-host models of re-infection viral kinetics which allow for different viruses to stimulate the innate immune response to different degrees. The proposed models differ in their hypothesised mechanisms of action of the non-specific innate immune response. We compare these alternative models in terms of their abilities to reproduce the re-exposure data. Our results show that 1) a model with viral control mediated solely by a virus-resistant state, as commonly considered in the literature, is not able to reproduce the observed viral hierarchy; 2) the synchronised and desynchronised behaviour of consecutive virus infections is highly dependent upon the interval between primary virus and challenge virus exposures and is consistent with virus-dependent stimulation of the innate immune response. Our study provides the first mechanistic explanation for the recently observed influenza viral hierarchies and demonstrates the importance of understanding the host response to multi-strain viral infections. Re-exposure experiments provide a new paradigm in which to study the immune response to influenza and its role in viral control.
    Print ISSN: 1553-734X
    Electronic ISSN: 1553-7358
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 18
    Publication Date: 2015-08-21
    Description: by Paul M. Harrison, Laurent Badel, Mark J. Wall, Magnus J. E. Richardson Models of neocortical networks are increasingly including the diversity of excitatory and inhibitory neuronal classes. Significant variability in cellular properties are also seen within a nominal neuronal class and this heterogeneity can be expected to influence the population response and information processing in networks. Recent studies have examined the population and network effects of variability in a particular neuronal parameter with some plausibly chosen distribution. However, the empirical variability and covariance seen across multiple parameters are rarely included, partly due to the lack of data on parameter correlations in forms convenient for model construction. To addess this we quantify the heterogeneity within and between the neocortical pyramidal-cell classes in layers 2/3, 4, and the slender-tufted and thick-tufted pyramidal cells of layer 5 using a combination of intracellular recordings, single-neuron modelling and statistical analyses. From the response to both square-pulse and naturalistic fluctuating stimuli, we examined the class-dependent variance and covariance of electrophysiological parameters and identify the role of the h current in generating parameter correlations. A byproduct of the dynamic I-V method we employed is the straightforward extraction of reduced neuron models from experiment. Empirically these models took the refractory exponential integrate-and-fire form and provide an accurate fit to the perisomatic voltage responses of the diverse pyramidal-cell populations when the class-dependent statistics of the model parameters were respected. By quantifying the parameter statistics we obtained an algorithm which generates populations of model neurons, for each of the four pyramidal-cell classes, that adhere to experimentally observed marginal distributions and parameter correlations. As well as providing this tool, which we hope will be of use for exploring the effects of heterogeneity in neocortical networks, we also provide the code for the dynamic I-V method and make the full electrophysiological data set available.
    Print ISSN: 1553-734X
    Electronic ISSN: 1553-7358
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 19
    Publication Date: 2015-08-22
    Description: We prove that in a graph with n vertices, induced chordal and interval subgraphs with the maximum number of vertices can be found in time \(\mathcal {O}(2^{\lambda n})\) for some \(\lambda 〈1\) . These are the first algorithms breaking the trivial \(2^n n^{\mathcal {O}(1)}\) bound of the brute-force search for these problems.
    Print ISSN: 0178-4617
    Electronic ISSN: 1432-0541
    Topics: Computer Science , Mathematics
    Published by Springer
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 20
    Publication Date: 2015-08-25
    Description: We investigate the effect of limiting the number of reserve prices on the revenue in a probabilistic single item auction. In the model considered, bidders compete for an impression drawn from a known distribution of possible types. The auction mechanism sets up to \(\ell \) reserve prices, and each impression type is assigned the highest reserve price lower than the valuation of some bidder for it. The bidder proposing the highest bid for an arriving impression gets it provided his bid is at least the corresponding reserve price, and pays the maximum between the reserve price and the second highest bid. Since the number of impression types may be huge, we consider the revenue \(R_{\ell }\) that can be ensured using only \(\ell \) reserve prices. Our main results are tight lower bounds on \(R_{\ell }\) for the cases where the impressions are drawn from the uniform or a general probability distribution.
    Print ISSN: 0178-4617
    Electronic ISSN: 1432-0541
    Topics: Computer Science , Mathematics
    Published by Springer
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 21
    Publication Date: 2015-08-19
    Description: by Ariel Afek, Hila Cohen, Shiran Barber-Zucker, Raluca Gordân, David B. Lukatsky Recent genome-wide experiments in different eukaryotic genomes provide an unprecedented view of transcription factor (TF) binding locations and of nucleosome occupancy. These experiments revealed that a large fraction of TF binding events occur in regions where only a small number of specific TF binding sites (TFBSs) have been detected. Furthermore, in vitro protein-DNA binding measurements performed for hundreds of TFs indicate that TFs are bound with wide range of affinities to different DNA sequences that lack known consensus motifs. These observations have thus challenged the classical picture of specific protein-DNA binding and strongly suggest the existence of additional recognition mechanisms that affect protein-DNA binding preferences. We have previously demonstrated that repetitive DNA sequence elements characterized by certain symmetries statistically affect protein-DNA binding preferences. We call this binding mechanism nonconsensus protein-DNA binding in order to emphasize the point that specific consensus TFBSs do not contribute to this effect. In this paper, using the simple statistical mechanics model developed previously, we calculate the nonconsensus protein-DNA binding free energy for the entire C . elegans and D . melanogaster genomes. Using the available chromatin immunoprecipitation followed by sequencing (ChIP-seq) results on TF-DNA binding preferences for ~100 TFs, we show that DNA sequences characterized by low predicted free energy of nonconsensus binding have statistically higher experimental TF occupancy and lower nucleosome occupancy than sequences characterized by high free energy of nonconsensus binding. This is in agreement with our previous analysis performed for the yeast genome. We suggest therefore that nonconsensus protein-DNA binding assists the formation of nucleosome-free regions, as TFs outcompete nucleosomes at genomic locations with enhanced nonconsensus binding. In addition, here we perform a new, large-scale analysis using in vitro TF-DNA preferences obtained from the universal protein binding microarrays (PBM) for ~90 eukaryotic TFs belonging to 22 different DNA-binding domain types. As a result of this new analysis, we conclude that nonconsensus protein-DNA binding is a widespread phenomenon that significantly affects protein-DNA binding preferences and need not require the presence of consensus (specific) TFBSs in order to achieve genome-wide TF-DNA binding specificity.
    Print ISSN: 1553-734X
    Electronic ISSN: 1553-7358
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 22
    Publication Date: 2015-08-20
    Description: by The PLOS Computational Biology Staff
    Print ISSN: 1553-734X
    Electronic ISSN: 1553-7358
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 23
    Publication Date: 2015-08-21
    Description: by James Tamerius, Cécile Viboud, Jeffrey Shaman, Gerardo Chowell While a relationship between environmental forcing and influenza transmission has been established in inter-pandemic seasons, the drivers of pandemic influenza remain debated. In particular, school effects may predominate in pandemic seasons marked by an atypical concentration of cases among children. For the 2009 A/H1N1 pandemic, Mexico is a particularly interesting case study due to its broad geographic extent encompassing temperate and tropical regions, well-documented regional variation in the occurrence of pandemic outbreaks, and coincidence of several school breaks during the pandemic period. Here we fit a series of transmission models to daily laboratory-confirmed influenza data in 32 Mexican states using MCMC approaches, considering a meta-population framework or the absence of spatial coupling between states. We use these models to explore the effect of environmental, school–related and travel factors on the generation of spatially-heterogeneous pandemic waves. We find that the spatial structure of the pandemic is best understood by the interplay between regional differences in specific humidity (explaining the occurrence of pandemic activity towards the end of the school term in late May-June 2009 in more humid southeastern states), school vacations (preventing influenza transmission during July-August in all states), and regional differences in residual susceptibility (resulting in large outbreaks in early fall 2009 in central and northern Mexico that had yet to experience fully-developed outbreaks). Our results are in line with the concept that very high levels of specific humidity, as present during summer in southeastern Mexico, favor influenza transmission, and that school cycles are a strong determinant of pandemic wave timing.
    Print ISSN: 1553-734X
    Electronic ISSN: 1553-7358
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 24
    Publication Date: 2015-08-21
    Description: by Alireza Alemi, Carlo Baldassi, Nicolas Brunel, Riccardo Zecchina Understanding the theoretical foundations of how memories are encoded and retrieved in neural populations is a central challenge in neuroscience. A popular theoretical scenario for modeling memory function is the attractor neural network scenario, whose prototype is the Hopfield model. The model simplicity and the locality of the synaptic update rules come at the cost of a poor storage capacity, compared with the capacity achieved with perceptron learning algorithms. Here, by transforming the perceptron learning rule, we present an online learning rule for a recurrent neural network that achieves near-maximal storage capacity without an explicit supervisory error signal, relying only upon locally accessible information. The fully-connected network consists of excitatory binary neurons with plastic recurrent connections and non-plastic inhibitory feedback stabilizing the network dynamics; the memory patterns to be memorized are presented online as strong afferent currents, producing a bimodal distribution for the neuron synaptic inputs. Synapses corresponding to active inputs are modified as a function of the value of the local fields with respect to three thresholds. Above the highest threshold, and below the lowest threshold, no plasticity occurs. In between these two thresholds, potentiation/depression occurs when the local field is above/below an intermediate threshold. We simulated and analyzed a network of binary neurons implementing this rule and measured its storage capacity for different sizes of the basins of attraction. The storage capacity obtained through numerical simulations is shown to be close to the value predicted by analytical calculations. We also measured the dependence of capacity on the strength of external inputs. Finally, we quantified the statistics of the resulting synaptic connectivity matrix, and found that both the fraction of zero weight synapses and the degree of symmetry of the weight matrix increase with the number of stored patterns.
    Print ISSN: 1553-734X
    Electronic ISSN: 1553-7358
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 25
    Publication Date: 2015-08-12
    Description: by Sander Land, Steven A. Niederer Biophysical models of cardiac tension development provide a succinct representation of our understanding of force generation in the heart. The link between protein kinetics and interactions that gives rise to high cooperativity is not yet fully explained from experiments or previous biophysical models. We propose a biophysical ODE-based representation of cross-bridge (XB), tropomyosin and troponin within a contractile regulatory unit (RU) to investigate the mechanisms behind cooperative activation, as well as the role of cooperativity in dynamic tension generation across different species. The model includes cooperative interactions between regulatory units (RU-RU), between crossbridges (XB-XB), as well more complex interactions between crossbridges and regulatory units (XB-RU interactions). For the steady-state force-calcium relationship, our framework predicts that: (1) XB-RU effects are key in shifting the half-activation value of the force-calcium relationship towards lower [Ca 2+ ], but have only small effects on cooperativity. (2) XB-XB effects approximately double the duty ratio of myosin, but do not significantly affect cooperativity. (3) RU-RU effects derived from the long-range action of tropomyosin are a major factor in cooperative activation, with each additional unblocked RU increasing the rate of additional RU’s unblocking. (4) Myosin affinity for short (1–4 RU) unblocked stretches of actin of is very low, and the resulting suppression of force at low [Ca 2+ ] is a major contributor in the biphasic force-calcium relationship. We also reproduce isometric tension development across mouse, rat and human at physiological temperature and pacing rate, and conclude that species differences require only changes in myosin affinity and troponin I/troponin C affinity. Furthermore, we show that the calcium dependence of the rate of tension redevelopment k tr is explained by transient blocking of RU’s by a temporary decrease in XB-RU effects.
    Print ISSN: 1553-734X
    Electronic ISSN: 1553-7358
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 26
    facet.materialart.
    Unknown
    Public Library of Science (PLoS)
    Publication Date: 2015-08-12
    Description: by Jonas Paulsen, Odin Gramstad, Philippe Collas The three-dimensional (3D) structure of the genome is important for orchestration of gene expression and cell differentiation. While mapping genomes in 3D has for a long time been elusive, recent adaptations of high-throughput sequencing to chromosome conformation capture (3C) techniques, allows for genome-wide structural characterization for the first time. However, reconstruction of "consensus" 3D genomes from 3C-based data is a challenging problem, since the data are aggregated over millions of cells. Recent single-cell adaptations to the 3C-technique, however, allow for non-aggregated structural assessment of genome structure, but data suffer from sparse and noisy interaction sampling. We present a manifold based optimization (MBO) approach for the reconstruction of 3D genome structure from chromosomal contact data. We show that MBO is able to reconstruct 3D structures based on the chromosomal contacts, imposing fewer structural violations than comparable methods. Additionally, MBO is suitable for efficient high-throughput reconstruction of large systems, such as entire genomes, allowing for comparative studies of genomic structure across cell-lines and different species.
    Print ISSN: 1553-734X
    Electronic ISSN: 1553-7358
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 27
    Publication Date: 2015-08-12
    Description: by Hiroo Kenzaki, Shoji Takada Nucleosomes, basic units of chromatin, are known to show spontaneous DNA unwrapping dynamics that are crucial for transcriptional activation, but its structural details are yet to be elucidated. Here, employing a coarse-grained molecular model that captures residue-level structural details up to histone tails, we simulated equilibrium fluctuations and forced unwrapping of single nucleosomes at various conditions. The equilibrium simulations showed spontaneous unwrapping from outer DNA and subsequent rewrapping dynamics, which are in good agreement with experiments. We found several distinct partially unwrapped states of nucleosomes, as well as reversible transitions among these states. At a low salt concentration, histone tails tend to sit in the concave cleft between the histone octamer and DNA, tightening the nucleosome. At a higher salt concentration, the tails tend to bound to the outer side of DNA or be expanded outwards, which led to higher degree of unwrapping. Of the four types of histone tails, H3 and H2B tail dynamics are markedly correlated with partial unwrapping of DNA, and, moreover, their contributions were distinct. Acetylation in histone tails was simply mimicked by changing their charges, which enhanced the unwrapping, especially markedly for H3 and H2B tails.
    Print ISSN: 1553-734X
    Electronic ISSN: 1553-7358
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 28
    facet.materialart.
    Unknown
    Public Library of Science (PLoS)
    Publication Date: 2015-08-13
    Description: by Sebastian Bitzer, Jelle Bruineberg, Stefan J. Kiebel Even for simple perceptual decisions, the mechanisms that the brain employs are still under debate. Although current consensus states that the brain accumulates evidence extracted from noisy sensory information, open questions remain about how this simple model relates to other perceptual phenomena such as flexibility in decisions, decision-dependent modulation of sensory gain, or confidence about a decision. We propose a novel approach of how perceptual decisions are made by combining two influential formalisms into a new model. Specifically, we embed an attractor model of decision making into a probabilistic framework that models decision making as Bayesian inference. We show that the new model can explain decision making behaviour by fitting it to experimental data. In addition, the new model combines for the first time three important features: First, the model can update decisions in response to switches in the underlying stimulus. Second, the probabilistic formulation accounts for top-down effects that may explain recent experimental findings of decision-related gain modulation of sensory neurons. Finally, the model computes an explicit measure of confidence which we relate to recent experimental evidence for confidence computations in perceptual decision tasks.
    Print ISSN: 1553-734X
    Electronic ISSN: 1553-7358
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 29
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-08-07
    Description: Over the past decade or so, several research groups have addressed the problem of multi-label classification where each example can belong to more than one class at the same time. A common approach, called  Binary Relevance (BR) , addresses this problem by inducing a separate classifier for each class. Research has shown that this framework can be improved if mutual class dependence is exploited: an example that belongs to class $X$ is likely to belong also to class $Y$ ; conversely, belonging to $X$ can make an example less likely to belong to $Z$ . Several works sought to model this information by using the vector of class labels as additional example attributes. To fill the unknown values of these attributes during prediction, existing methods resort to using outputs of other classifiers, and this makes them prone to errors. This is where our paper wants to contribute. We identified two potential ways to prune unnecessary dependencies and to reduce error-propagation in our new classifier-stacking technique, which is named PruDent . Experimental results indicate that the classification performance of PruDent compares favorably with that of other state-of-the-art approaches over a broad range of testbeds. Mor- over, its computational costs grow only linearly in the number of classes.
    Print ISSN: 1041-4347
    Electronic ISSN: 1558-2191
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 30
    Publication Date: 2015-08-07
    Description: This work deals with the problem of producing a fast and accurate data classification, learning it from a possibly small set of records that are already classified. The proposed approach is based on the framework of the so-called Logical Analysis of Data (LAD), but enriched with information obtained from statistical considerations on the data. A number of discrete optimization problems are solved in the different steps of the procedure, but their computational demand can be controlled. The accuracy of the proposed approach is compared to that of the standard LAD algorithm, of support vector machines and of label propagation algorithm on publicly available datasets of the UCI repository. Encouraging results are obtained and discussed.
    Print ISSN: 1041-4347
    Electronic ISSN: 1558-2191
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 31
    Publication Date: 2015-08-07
    Description: A new graph based constrained semi-supervised learning (G-CSSL) framework is proposed. Pairwise constraints (PC) are used to specify the types (intra- or inter-class) of points with labels. Since the number of labeled data is typically small in SSL setting, the core idea of this framework is to create and enrich the PC sets using the propagated soft labels from both labeled and unlabeled data by special label propagation (SLP), and hence obtaining more supervised information for delivering enhanced performance. We also propose a Two-stage Sparse Coding, termed TSC, for achieving adaptive neighborhood for SLP. The first stage aims at correcting the possible corruptions in data and training an informative dictionary, and the second stage focuses on sparse coding. To deliver enhanced inter-class separation and intra-class compactness, we also present a mixed soft-similarity measure to evaluate the similarity/dissimilarity of constrained pairs using the sparse codes and outputted probabilistic values by SLP. Simulations on the synthetic and real datasets demonstrated the validity of our algorithms for data representation and image recognition, compared with other related state-of-the-art graph based semi-supervised techniques.
    Print ISSN: 1041-4347
    Electronic ISSN: 1558-2191
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 32
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-08-07
    Description: In large databases, the amount and the complexity of the data calls for data summarization techniques. Such summaries are used to assist fast approximate query answering or query optimization. Histograms are a prominent class of model-free data summaries and are widely used in database systems. So-called self-tuning histograms look at query-execution results to refine themselves. An assumption with such histograms, which has not been questioned so far, is that they can learn the dataset from scratch, that is—starting with an empty bucket configuration. We show that this is not the case. Self-tuning methods are very sensitive to the initial configuration. Three major problems stem from this. Traditional self-tuning is unable to learn projections of multi-dimensional data, is sensitive to the order of queries, and reaches only local optima with high estimation errors. We show how to improve a self-tuning method significantly by starting with a carefully chosen initial configuration. We propose initialization by dense subspace clusters in projections of the data, which improves both accuracy and robustness of self-tuning. Our experiments on different datasets show that the error rate is typically halved compared to the uninitialized version.
    Print ISSN: 1041-4347
    Electronic ISSN: 1558-2191
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 33
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-08-07
    Description: Recently, two ideas have been explored that lead to more accurate algorithms for time-series classification (TSC). First, it has been shown that the simplest way to gain improvement on TSC problems is to transform into an alternative data space where discriminatory features are more easily detected. Second, it was demonstrated that with a single data representation, improved accuracy can be achieved through simple ensemble schemes. We combine these two principles to test the hypothesis that forming a collective of ensembles of classifiers on different data transformations improves the accuracy of time-series classification. The collective contains classifiers constructed in the time, frequency, change, and shapelet transformation domains. For the time domain, we use a set of elastic distance measures. For the other domains, we use a range of standard classifiers. Through extensive experimentation on 72 datasets, including all of the 46 UCR datasets, we demonstrate that the simple collective formed by including all classifiers in one ensemble is significantly more accurate than any of its components and any other previously published TSC algorithm. We investigate alternative hierarchical collective structures and demonstrate the utility of the approach on a new problem involving classifying Caenorhabditis elegans mutant types.
    Print ISSN: 1041-4347
    Electronic ISSN: 1558-2191
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 34
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-08-07
    Description: In real-world graphs such as social networks, Semantic Web and biological networks, each vertex usually contains rich information, which can be modeled by a set of tokens or elements. In this paper, we study a subgraph matching with set similarity (SMS $^2$ ) query over a large graph database, which retrieves subgraphs that are structurally isomorphic to the query graph, and meanwhile satisfy the condition of vertex pair matching with the (dynamic) weighted set similarity. To efficiently process the SMS $^2$ query, this paper designs a novel lattice-based index for data graph, and lightweight signatures for both query vertices and data vertices. Based on the index and signatures, we propose an efficient two-phase pruning strategy including set similarity pruning and structure-based pruning, which exploits the unique features of both (dynamic) weighted set similarity and graph topology. We also propose an efficient dominating-set-based subgraph matching algorithm guided by a dominating set selection algorithm to achieve better query performance. Extensive experiments on both real and synthetic datasets demonstrate that our method outperforms state-of-the-art methods by an order of magnitude.
    Print ISSN: 1041-4347
    Electronic ISSN: 1558-2191
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 35
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-08-07
    Description: Data imputation aims at filling in missing attribute values in databases. Most existing imputation methods to string attribute values are inferring-based approaches, which usually fail to reach a high imputation recall by just inferring missing values from the complete part of the data set. Recently, some retrieving-based methods are proposed to retrieve missing values from external resources such as the World Wide Web, which tend to reach a much higher imputation recall, but inevitably bring a large overhead by issuing a large number of search queries. In this paper, we investigate the interaction between the inferring-based methods and the retrieving-based methods. We show that retrieving a small number of selected missing values can greatly improve the imputation recall of the inferring-based methods. With this intuition, we propose an inTeractive Retrieving-Inferring data imPutation approach (TRIP), which performs retrieving and inferring alternately in filling in missing attribute values in a data set. To ensure the high recall at the minimum cost, TRIP faces a challenge of selecting the least number of missing values for retrieving to maximize the number of inferable values. Our proposed solution is able to identify an optimal retrieving-inferring scheduling scheme in deterministic data imputation, and the optimality of the generated scheme is theoretically analyzed with proofs. We also analyze with an example that the optimal scheme is not feasible to be achieved in $tau$ -constrained stochastic data imputation ( $tau$ -SDI), but still, our proposed solution identifies an expected-optimal scheme in $tau$ -SDI. Extensive experiments on four data collections show that TRIP retrieves on average 20 percent missing values and achieves the same high recall that was reached by the retrieving-based approach.
    Print ISSN: 1041-4347
    Electronic ISSN: 1558-2191
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 36
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-08-07
    Description: Visual classification has attracted considerable research interests in the past decades. In this paper, a novel $ell _1$ -hypergraph model for visual classification is proposed. Hypergraph learning, as a natural extension of graph model, has been widely used in many machine learning tasks. In previous work, hypergraph is usually constructed by attribute-based or neighborhood-based methods. That is, a hyperedge is generated by connecting a set of samples sharing a same feature attribute or in a neighborhood. However, these methods are unable to explore feature space globally or sensitive to noises. To address these problems, we propose a novel hypergraph construction approach that leverages sparse representation to generate hyperedges and learns the relationship among hyperedges and their vertices. First, for each sample, a hyperedge is generated by regarding it as the centroid and linking it as well as its nearest neighbors. Then, the sparse representation method is applied to represent the centroid vertex by other vertices within the same hyperedge. The vertices with zero coefficients are removed from the hyperedge. Finally, the representation coefficients are used to define the incidence relation between the hyperedge and the vertices. In our approach, we also optimize the hyperedge weights to modulate the effects of different hyperedges. We leverage the prior knowledge on the hyperedges so that the hyperedges sharing more vertices can have closer weights, where a graph Laplacian is used to regularize the optimization of the weights. Our approach is named $ell _1$ -hypergraph since the $ell _1$ sparse representation is employed in the hypergraph construction process. The method is evaluated on various visual classification tasks, and it demonstrates promising performance.
    Print ISSN: 1041-4347
    Electronic ISSN: 1558-2191
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 37
    facet.materialart.
    Unknown
    Public Library of Science (PLoS)
    Publication Date: 2015-08-07
    Description: by Malachi Griffith, Jason R. Walker, Nicholas C. Spies, Benjamin J. Ainscough, Obi L. Griffith Massively parallel RNA sequencing (RNA-seq) has rapidly become the assay of choice for interrogating RNA transcript abundance and diversity. This article provides a detailed introduction to fundamental RNA-seq molecular biology and informatics concepts. We make available open-access RNA-seq tutorials that cover cloud computing, tool installation, relevant file formats, reference genomes, transcriptome annotations, quality-control strategies, expression, differential expression, and alternative splicing analysis methods. These tutorials and additional training resources are accompanied by complete analysis pipelines and test datasets made available without encumbrance at www.rnaseq.wiki.
    Print ISSN: 1553-734X
    Electronic ISSN: 1553-7358
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 38
    facet.materialart.
    Unknown
    Public Library of Science (PLoS)
    Publication Date: 2015-08-07
    Description: by Arjun Bharioke, Dmitri B. Chklovskii Neurons must faithfully encode signals that can vary over many orders of magnitude despite having only limited dynamic ranges. For a correlated signal, this dynamic range constraint can be relieved by subtracting away components of the signal that can be predicted from the past, a strategy known as predictive coding, that relies on learning the input statistics. However, the statistics of input natural signals can also vary over very short time scales e.g., following saccades across a visual scene. To maintain a reduced transmission cost to signals with rapidly varying statistics, neuronal circuits implementing predictive coding must also rapidly adapt their properties. Experimentally, in different sensory modalities, sensory neurons have shown such adaptations within 100 ms of an input change. Here, we show first that linear neurons connected in a feedback inhibitory circuit can implement predictive coding. We then show that adding a rectification nonlinearity to such a feedback inhibitory circuit allows it to automatically adapt and approximate the performance of an optimal linear predictive coding network, over a wide range of inputs, while keeping its underlying temporal and synaptic properties unchanged. We demonstrate that the resulting changes to the linearized temporal filters of this nonlinear network match the fast adaptations observed experimentally in different sensory modalities, in different vertebrate species. Therefore, the nonlinear feedback inhibitory network can provide automatic adaptation to fast varying signals, maintaining the dynamic range necessary for accurate neuronal transmission of natural inputs.
    Print ISSN: 1553-734X
    Electronic ISSN: 1553-7358
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 39
    Publication Date: 2015-08-08
    Description: The new dual-pivot Quicksort by Vladimir Yaroslavskiy—used in Oracle’s Java runtime library since version 7—features intriguing asymmetries. They make a basic variant of this algorithm use less comparisons than classic single-pivot Quicksort. In this paper, we extend the analysis to the case where the two pivots are chosen as fixed order statistics of a random sample. Surprisingly, dual-pivot Quicksort then needs more comparisons than a corresponding version of classic Quicksort, so it is clear that counting comparisons is not sufficient to explain the running time advantages observed for Yaroslavskiy’s algorithm in practice. Consequently, we take a more holistic approach and give also the precise leading term of the average number of swaps, the number of executed Java Bytecode instructions and the number of scanned elements, a new simple cost measure that approximates I/O costs in the memory hierarchy. We determine optimal order statistics for each of the cost measures. It turns out that the asymmetries in Yaroslavskiy’s algorithm render pivots with a systematic skew more efficient than the symmetric choice. Moreover, we finally have a convincing explanation for the success of Yaroslavskiy’s algorithm in practice: compared with corresponding versions of classic single-pivot Quicksort, dual-pivot Quicksort needs significantly less I/Os, both with and without pivot sampling.
    Print ISSN: 0178-4617
    Electronic ISSN: 1432-0541
    Topics: Computer Science , Mathematics
    Published by Springer
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 40
    Publication Date: 2015-08-08
    Description: by Murat Alp, Vipan K. Parihar, Charles L. Limoli, Francis A. Cucinotta In this work, a stochastic computational model of microscopic energy deposition events is used to study for the first time damage to irradiated neuronal cells of the mouse hippocampus. An extensive library of radiation tracks for different particle types is created to score energy deposition in small voxels and volume segments describing a neuron’s morphology that later are sampled for given particle fluence or dose. Methods included the construction of in silico mouse hippocampal granule cells from neuromorpho.org with spine and filopodia segments stochastically distributed along the dendritic branches. The model is tested with high-energy 56 Fe, 12 C, and 1 H particles and electrons. Results indicate that the tree-like structure of the neuronal morphology and the microscopic dose deposition of distinct particles may lead to different outcomes when cellular injury is assessed, leading to differences in structural damage for the same absorbed dose. The significance of the microscopic dose in neuron components is to introduce specific local and global modes of cellular injury that likely contribute to spine, filopodia, and dendrite pruning, impacting cognition and possibly the collapse of the neuron. Results show that the heterogeneity of heavy particle tracks at low doses, compared to the more uniform dose distribution of electrons, juxtaposed with neuron morphology make it necessary to model the spatial dose painting for specific neuronal components. Going forward, this work can directly support the development of biophysical models of the modifications of spine and dendritic morphology observed after low dose charged particle irradiation by providing accurate descriptions of the underlying physical insults to complex neuron structures at the nano-meter scale.
    Print ISSN: 1553-734X
    Electronic ISSN: 1553-7358
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 41
    facet.materialart.
    Unknown
    Public Library of Science (PLoS)
    Publication Date: 2015-08-08
    Description: by Brinda Vallat, Carlos Madrid-Aliste, Andras Fiser Predicting the three-dimensional structure of proteins from their amino acid sequences remains a challenging problem in molecular biology. While the current structural coverage of proteins is almost exclusively provided by template-based techniques, the modeling of the rest of the protein sequences increasingly require template-free methods. However, template-free modeling methods are much less reliable and are usually applicable for smaller proteins, leaving much space for improvement. We present here a novel computational method that uses a library of supersecondary structure fragments, known as Smotifs, to model protein structures. The library of Smotifs has saturated over time, providing a theoretical foundation for efficient modeling. The method relies on weak sequence signals from remotely related protein structures to create a library of Smotif fragments specific to the target protein sequence. This Smotif library is exploited in a fragment assembly protocol to sample decoys, which are assessed by a composite scoring function. Since the Smotif fragments are larger in size compared to the ones used in other fragment-based methods, the proposed modeling algorithm, SmotifTF, can employ an exhaustive sampling during decoy assembly. SmotifTF successfully predicts the overall fold of the target proteins in about 50% of the test cases and performs competitively when compared to other state of the art prediction methods, especially when sequence signal to remote homologs is diminishing. Smotif-based modeling is complementary to current prediction methods and provides a promising direction in addressing the structure prediction problem, especially when targeting larger proteins for modeling.
    Print ISSN: 1553-734X
    Electronic ISSN: 1553-7358
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 42
    Publication Date: 2015-06-05
    Description: In this paper, we consider the Unsplittable (hard) Capacitated Facility Location Problem (UCFLP) with uniform capacities and present new approximation algorithms for it. This problem is a generalization of the classical facility location problem where each facility can serve at most u units of demand and each client must be served by exactly one facility. This problem is motivated by its applications in many practical problems including supply chain problems of indivisible goods (Verter in Foundations of location analysis, chapter 2. International series in operations research and management science. Springer, Berlin, 2011 ) and the assignment problem in the content distribution networks (Bateni and Hajiaghayi in Proceedings of the nineteenth annual ACM-SIAM symposium on discrete algorithms, pp 805–814, 2009 ). While there are several approximation algorithms for the soft capacitated version of this problem (in which one can open multiple copies of each facility) or the splittable version (in which the demand of each client can be divided to be served by multiple open facilities), there are very few results for the UCFLP. It is known that it is NP-hard to approximate this problem within any factor without violating the capacities. So we consider bicriteria \((\alpha ,\beta )\) -approximations where the algorithm returns a solution whose cost is within factor \(\alpha \) of the optimum and violates the capacity constraints within factor \(\beta \) . Shmoys et al. (Proceedings of the twenty-ninth annual ACM symposium on theory of computing, pp 265–274, 1997 ) were the first to consider this problem and gave a (9, 4)-approximation. Later results imply ( O (1), 2)-approximations, however, no constant factor approximation is known with capacity violation of less than 2. We present a framework for designing bicriteria approximation algorithms for this problem and show two new approximation algorithms with factors (9, 3 / 2) and (29.315, 4 / 3). These are the first algorithms with constant approximation in which the violation of capacities is below 2. The heart of our algorithm is a reduction from the UCFLP to a restricted version of the problem. One feature of this reduction is that any \((O(1),1+{\epsilon })\) -approximation for the restricted version implies an \((O(1),1+{\epsilon })\) -approximation for the UCFLP and we believe our techniques might be useful towards finding such approximations or perhaps \((f({\epsilon }),1+{\epsilon })\) -approximation for the UCFLP for some function f . In addition, we present a quasi-polynomial time \((1+\epsilon ,1+\epsilon )\) -approximation for the (uniform) UCFLP in Euclidean metrics, for any constant \({\epsilon }〉0\) .
    Print ISSN: 0178-4617
    Electronic ISSN: 1432-0541
    Topics: Computer Science , Mathematics
    Published by Springer
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 43
    Publication Date: 2015-08-05
    Description: by Po-Wei Chen, Luis L. Fonseca, Yusuf A. Hannun, Eberhard O. Voit The article demonstrates that computational modeling has the capacity to convert metabolic snapshots, taken sequentially over time, into a description of cellular, dynamic strategies. The specific application is a detailed analysis of a set of actions with which Saccharomyces cerevisiae responds to heat stress. Using time dependent metabolic concentration data, we use a combination of mathematical modeling, reverse engineering, and optimization to infer dynamic changes in enzyme activities within the sphingolipid pathway. The details of the sphingolipid responses to heat stress are important, because they guide some of the longer-term alterations in gene expression, with which the cells adapt to the increased temperature. The analysis indicates that all enzyme activities in the system are affected and that the shapes of the time trends in activities depend on the fatty-acyl CoA chain lengths of the different ceramide species in the system.
    Print ISSN: 1553-734X
    Electronic ISSN: 1553-7358
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 44
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-08-07
    Description: We consider the problem of adaptively routing a fleet of cooperative vehicles within a road network in the presence of uncertain and dynamic congestion conditions. To tackle this problem, we first propose a Gaussian process dynamic congestion model that can effectively characterize both the dynamics and the uncertainty of congestion conditions. Our model is efficient and thus facilitates real-time adaptive routing in the face of uncertainty. Using this congestion model, we develop efficient algorithms for non-myopic adaptive routing to minimize the collective travel time of all vehicles in the system. A key property of our approach is the ability to efficiently reason about the long-term value of exploration, which enables collectively balancing the exploration/exploitation trade-off for entire fleets of vehicles. Our approach is validated by traffic data from two large Asian cities. Our congestion model is shown to be effective in modeling dynamic congestion conditions. Our routing algorithms also generate significantly faster routes compared to standard baselines, and achieve near-optimal performance compared to an omniscient routing algorithm. We also present the results from a preliminary field study, which showcases the efficacy of our approach.
    Print ISSN: 1041-4347
    Electronic ISSN: 1558-2191
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 45
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-08-07
    Description: Betweenness centrality is a classic measure that quantifies the importance of a graph element (vertex or edge) according to the fraction of shortest paths passing through it. This measure is notoriously expensive to compute, and the best known algorithm runs in $mathcal {O}(nm)$ time. The problems of efficiency and scalability are exacerbated in a dynamic setting, where the input is an evolving graph seen edge by edge, and the goal is to keep the betweenness centrality up to date. In this paper, we propose the first truly scalable algorithm for online computation of betweenness centrality of both vertices and edges in an evolving graph where new edges are added and existing edges are removed. Our algorithm is carefully engineered with out-of-core techniques and tailored for modern parallel stream processing engines that run on clusters of shared-nothing commodity hardware. Hence, it is amenable to real-world deployment. We experiment on graphs that are two orders of magnitude larger than previous studies. Our method is able to keep the betweenness centrality measures up-to-date online, i.e., the time to update the measures is smaller than the inter-arrival time between two consecutive updates.
    Print ISSN: 1041-4347
    Electronic ISSN: 1558-2191
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 46
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-08-07
    Description: Phase change memory (PCM) is non-volatile memory that is byte-addressable. It is two to four times denser than DRAM, orders of magnitude better than NAND Flash memory in read latency, and 10 times better than NAND Flash memory in write endurance. However, it still limits the number of write operations to at most $10^6$ times per PCM cell. To extend its lifetime, it is necessary to evenly distribute write operations over all the memory cells. Up to now, the $mathrm{B^{+}}$ -Tree index structure has been used to quickly locate a search key in a relational database management system (RDBMS). All the record keys in each node are sorted and packed upon insertion in, and deletion from, the $mathrm{B^{+}}$ -Tree. In addition, a counter keeps track of the number of valid keys in the $mathrm{B^{+}}$ -Tree. Consequently, a $mathrm{B^{+}}$ -Tree algorithm results in a large number of write operations, which deteriorates the endurance of PCM. This restricts the usage of PCM on a database server and deteriorates performance of database servers. In this paper, we propose a novel PCM-aware $math- m{B^{+}}$ -Tree index structure, called $mathrm{PB^{+}}$ -Tree, to provide wear-leveling in PCM. According to our experiment results, $mathrm{PB^{+}}$ -Tree is much faster than the existing $mathrm{B^{+}}$ -Tree algorithms for PCM and NAND Flash memory with versatile workloads. More importantly, our scheme also greatly reduces the number of write operations compared to other $mathrm{B^{+}}$ -Tree algorithms. All of these results suggest that $mathrm{PB^{+}}$ -Tree is the $mathrm{B^{+}}$ -Tree algorithm best fitted to PCM.
    Print ISSN: 1041-4347
    Electronic ISSN: 1558-2191
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 47
    Publication Date: 2015-08-08
    Description: by Pengyi Yang, Xiaofeng Zheng, Vivek Jayaswal, Guang Hu, Jean Yee Hwa Yang, Raja Jothi Cell signaling underlies transcription/epigenetic control of a vast majority of cell-fate decisions. A key goal in cell signaling studies is to identify the set of kinases that underlie key signaling events. In a typical phosphoproteomics study, phosphorylation sites (substrates) of active kinases are quantified proteome-wide. By analyzing the activities of phosphorylation sites over a time-course, the temporal dynamics of signaling cascades can be elucidated. Since many substrates of a given kinase have similar temporal kinetics, clustering phosphorylation sites into distinctive clusters can facilitate identification of their respective kinases. Here we present a knowledge-based CLUster Evaluation (CLUE) approach for identifying the most informative partitioning of a given temporal phosphoproteomics data. Our approach utilizes prior knowledge, annotated kinase-substrate relationships mined from literature and curated databases, to first generate biologically meaningful partitioning of the phosphorylation sites and then determine key kinases associated with each cluster. We demonstrate the utility of the proposed approach on two time-series phosphoproteomics datasets and identify key kinases associated with human embryonic stem cell differentiation and insulin signaling pathway. The proposed approach will be a valuable resource in the identification and characterizing of signaling networks from phosphoproteomics data.
    Print ISSN: 1553-734X
    Electronic ISSN: 1553-7358
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 48
    Publication Date: 2015-08-13
    Description: by Deborah A. Striegel, Manami Hara, Vipul Periwal Pancreatic islets of Langerhans consist of endocrine cells, primarily α, β and δ cells, which secrete glucagon, insulin, and somatostatin, respectively, to regulate plasma glucose. β cells form irregular locally connected clusters within islets that act in concert to secrete insulin upon glucose stimulation. Due to the central functional significance of this local connectivity in the placement of β cells in an islet, it is important to characterize it quantitatively. However, quantification of the seemingly stochastic cytoarchitecture of β cells in an islet requires mathematical methods that can capture topological connectivity in the entire β-cell population in an islet. Graph theory provides such a framework. Using large-scale imaging data for thousands of islets containing hundreds of thousands of cells in human organ donor pancreata, we show that quantitative graph characteristics differ between control and type 2 diabetic islets. Further insight into the processes that shape and maintain this architecture is obtained by formulating a stochastic theory of β-cell rearrangement in whole islets, just as the normal equilibrium distribution of the Ornstein-Uhlenbeck process can be viewed as the result of the interplay between a random walk and a linear restoring force. Requiring that rearrangements maintain the observed quantitative topological graph characteristics strongly constrained possible processes. Our results suggest that β-cell rearrangement is dependent on its connectivity in order to maintain an optimal cluster size in both normal and T2D islets.
    Print ISSN: 1553-734X
    Electronic ISSN: 1553-7358
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 49
    Publication Date: 2015-08-15
    Description: by Ghanim Ullah, Yina Wei, Markus A Dahlem, Martin Wechselberger, Steven J Schiff Cell volume changes are ubiquitous in normal and pathological activity of the brain. Nevertheless, we know little of how cell volume affects neuronal dynamics. We here performed the first detailed study of the effects of cell volume on neuronal dynamics. By incorporating cell swelling together with dynamic ion concentrations and oxygen supply into Hodgkin-Huxley type spiking dynamics, we demonstrate the spontaneous transition between epileptic seizure and spreading depression states as the cell swells and contracts in response to changes in osmotic pressure. Our use of volume as an order parameter further revealed a dynamical definition for the experimentally described physiological ceiling that separates seizure from spreading depression, as well as predicted a second ceiling that demarcates spreading depression from anoxic depolarization. Our model highlights the neuroprotective role of glial K buffering against seizures and spreading depression, and provides novel insights into anoxic depolarization and the relevant cell swelling during ischemia. We argue that the dynamics of seizures, spreading depression, and anoxic depolarization lie along a continuum of the repertoire of the neuron membrane that can be understood only when the dynamic ion concentrations, oxygen homeostasis,and cell swelling in response to osmotic pressure are taken into consideration. Our results demonstrate the feasibility of a unified framework for a wide range of neuronal behaviors that may be of substantial importance in the understanding of and potentially developing universal intervention strategies for these pathological states.
    Print ISSN: 1553-734X
    Electronic ISSN: 1553-7358
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 50
    Publication Date: 2015-08-15
    Description: by John R. Houser, Craig Barnhart, Daniel R. Boutz, Sean M. Carroll, Aurko Dasgupta, Joshua K. Michener, Brittany D. Needham, Ophelia Papoulas, Viswanadham Sridhara, Dariya K. Sydykova, Christopher J. Marx, M. Stephen Trent, Jeffrey E. Barrick, Edward M. Marcotte, Claus O. Wilke How do bacteria regulate their cellular physiology in response to starvation? Here, we present a detailed characterization of Escherichia coli growth and starvation over a time-course lasting two weeks. We have measured multiple cellular components, including RNA and proteins at deep genomic coverage, as well as lipid modifications and flux through central metabolism. Our study focuses on the physiological response of E . coli in stationary phase as a result of being starved for glucose, not on the genetic adaptation of E . coli to utilize alternative nutrients. In our analysis, we have taken advantage of the temporal correlations within and among RNA and protein abundances to identify systematic trends in gene regulation. Specifically, we have developed a general computational strategy for classifying expression-profile time courses into distinct categories in an unbiased manner. We have also developed, from dynamic models of gene expression, a framework to characterize protein degradation patterns based on the observed temporal relationships between mRNA and protein abundances. By comparing and contrasting our transcriptomic and proteomic data, we have identified several broad physiological trends in the E . coli starvation response. Strikingly, mRNAs are widely down-regulated in response to glucose starvation, presumably as a strategy for reducing new protein synthesis. By contrast, protein abundances display more varied responses. The abundances of many proteins involved in energy-intensive processes mirror the corresponding mRNA profiles while proteins involved in nutrient metabolism remain abundant even though their corresponding mRNAs are down-regulated.
    Print ISSN: 1553-734X
    Electronic ISSN: 1553-7358
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 51
    Publication Date: 2015-08-15
    Description: by Shaun S. Sanders, Dale D. O. Martin, Stefanie L. Butland, Mathieu Lavallée-Adam, Diego Calzolari, Chris Kay, John R. Yates, Michael R. Hayden Palmitoylation involves the reversible posttranslational addition of palmitate to cysteines and promotes membrane binding and subcellular localization. Recent advancements in the detection and identification of palmitoylated proteins have led to multiple palmitoylation proteomics studies but these datasets are contained within large supplemental tables, making downstream analysis and data mining time-consuming and difficult. Consequently, we curated the data from 15 palmitoylation proteomics studies into one compendium containing 1,838 genes encoding palmitoylated proteins; representing approximately 10% of the genome. Enrichment analysis revealed highly significant enrichments for Gene Ontology biological processes, pathway maps, and process networks related to the nervous system. Strikingly, 41% of synaptic genes encode a palmitoylated protein in the compendium. The top disease associations included cancers and diseases and disorders of the nervous system, with Schizophrenia, HD, and pancreatic ductal carcinoma among the top five, suggesting that aberrant palmitoylation may play a pivotal role in the balance of cell death and survival. This compendium provides a much-needed resource for cell biologists and the palmitoylation field, providing new perspectives for cancer and neurodegeneration.
    Print ISSN: 1553-734X
    Electronic ISSN: 1553-7358
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 52
    Publication Date: 2015-08-15
    Description: by Liat Rockah-Shmuel, Ágnes Tóth-Petróczy, Dan S. Tawfik Systematic mappings of the effects of protein mutations are becoming increasingly popular. Unexpectedly, these experiments often find that proteins are tolerant to most amino acid substitutions, including substitutions in positions that are highly conserved in nature. To obtain a more realistic distribution of the effects of protein mutations, we applied a laboratory drift comprising 17 rounds of random mutagenesis and selection of M.HaeIII, a DNA methyltransferase. During this drift, multiple mutations gradually accumulated. Deep sequencing of the drifted gene ensembles allowed determination of the relative effects of all possible single nucleotide mutations. Despite being averaged across many different genetic backgrounds, about 67% of all nonsynonymous, missense mutations were evidently deleterious, and an additional 16% were likely to be deleterious. In the early generations, the frequency of most deleterious mutations remained high. However, by the 17th generation, their frequency was consistently reduced, and those remaining were accepted alongside compensatory mutations. The tolerance to mutations measured in this laboratory drift correlated with sequence exchanges seen in M.HaeIII’s natural orthologs. The biophysical constraints dictating purging in nature and in this laboratory drift also seemed to overlap. Our experiment therefore provides an improved method for measuring the effects of protein mutations that more closely replicates the natural evolutionary forces, and thereby a more realistic view of the mutational space of proteins.
    Print ISSN: 1553-734X
    Electronic ISSN: 1553-7358
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 53
    Publication Date: 2015-08-11
    Description: A commonly studied means of parameterizing graph problems is the deletion distance from triviality (Guo et al., Parameterized and exact computation, Springer, Berlin, pp. 162–173, 2004 ), which counts vertices that need to be deleted from a graph to place it in some class for which efficient algorithms are known. In the context of graph isomorphism, we define triviality to mean a graph with maximum degree bounded by a constant, as such graph classes admit polynomial-time isomorphism tests. We generalise deletion distance to a measure we call elimination distance to triviality, based on elimination trees or tree-depth decompositions. We establish that graph canonisation, and thus graph isomorphism, is \(\mathsf {FPT}\) when parameterized by elimination distance to bounded degree, extending results of Bouland et al. (Parameterized and exact computation, Springer, Berlin, pp. 218–230, 2012 ).
    Print ISSN: 0178-4617
    Electronic ISSN: 1432-0541
    Topics: Computer Science , Mathematics
    Published by Springer
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 54
    facet.materialart.
    Unknown
    Springer
    Publication Date: 2015-08-11
    Description: Recent advances in drift analysis have given us better and better tools for understanding random processes, including the run time of randomized search heuristics. In the setting of multiplicative drift we do not only have excellent bounds on the expected run time, but also more general results showing the strong concentration of the run time. In this paper we investigate the setting of additive drift under the assumption of strong concentration of the “step size” of the process. Under sufficiently strong drift towards the goal we show a strong concentration of the hitting time. In contrast to this, we show that in the presence of small drift a Gambler’s-Ruin-like behavior of the process overrides the influence of the drift, leading to a maximal movement of about \(\sqrt{t}\) steps within t iterations. Finally, in the presence of sufficiently strong negative drift the hitting time is superpolynomial with high probability; this corresponds to the well-known Negative Drift Theorem.
    Print ISSN: 0178-4617
    Electronic ISSN: 1432-0541
    Topics: Computer Science , Mathematics
    Published by Springer
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 55
    Publication Date: 2015-08-13
    Description: by Shuai Yuan, H. Richard Johnston, Guosheng Zhang, Yun Li, Yi-Juan Hu, Zhaohui S. Qin With rapid decline of the sequencing cost, researchers today rush to embrace whole genome sequencing (WGS), or whole exome sequencing (WES) approach as the next powerful tool for relating genetic variants to human diseases and phenotypes. A fundamental step in analyzing WGS and WES data is mapping short sequencing reads back to the reference genome. This is an important issue because incorrectly mapped reads affect the downstream variant discovery, genotype calling and association analysis. Although many read mapping algorithms have been developed, the majority of them uses the universal reference genome and do not take sequence variants into consideration. Given that genetic variants are ubiquitous, it is highly desirable if they can be factored into the read mapping procedure. In this work, we developed a novel strategy that utilizes genotypes obtained a priori to customize the universal haploid reference genome into a personalized diploid reference genome. The new strategy is implemented in a program named RefEditor. When applying RefEditor to real data, we achieved encouraging improvements in read mapping, variant discovery and genotype calling. Compared to standard approaches, RefEditor can significantly increase genotype calling consistency (from 43% to 61% at 4X coverage; from 82% to 92% at 20X coverage) and reduce Mendelian inconsistency across various sequencing depths. Because many WGS and WES studies are conducted on cohorts that have been genotyped using array-based genotyping platforms previously or concurrently, we believe the proposed strategy will be of high value in practice, which can also be applied to the scenario where multiple NGS experiments are conducted on the same cohort. The RefEditor sources are available at https://github.com/superyuan/refeditor.
    Print ISSN: 1553-734X
    Electronic ISSN: 1553-7358
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 56
    Publication Date: 2015-08-15
    Description: by Alexey A. Gritsenko, Marc Hulsman, Marcel J. T. Reinders, Dick de Ridder Translation of RNA to protein is a core process for any living organism. While for some steps of this process the effect on protein production is understood, a holistic understanding of translation still remains elusive. In silico modelling is a promising approach for elucidating the process of protein synthesis. Although a number of computational models of the process have been proposed, their application is limited by the assumptions they make. Ribosome profiling (RP), a relatively new sequencing-based technique capable of recording snapshots of the locations of actively translating ribosomes, is a promising source of information for deriving unbiased data-driven translation models. However, quantitative analysis of RP data is challenging due to high measurement variance and the inability to discriminate between the number of ribosomes measured on a gene and their speed of translation. We propose a solution in the form of a novel multi-scale interpretation of RP data that allows for deriving models with translation dynamics extracted from the snapshots. We demonstrate the usefulness of this approach by simultaneously determining for the first time per-codon translation elongation and per-gene translation initiation rates of Saccharomyces cerevisiae from RP data for two versions of the Totally Asymmetric Exclusion Process (TASEP) model of translation. We do this in an unbiased fashion, by fitting the models using only RP data with a novel optimization scheme based on Monte Carlo simulation to keep the problem tractable. The fitted models match the data significantly better than existing models and their predictions show better agreement with several independent protein abundance datasets than existing models. Results additionally indicate that the tRNA pool adaptation hypothesis is incomplete, with evidence suggesting that tRNA post-transcriptional modifications and codon context may play a role in determining codon elongation rates.
    Print ISSN: 1553-734X
    Electronic ISSN: 1553-7358
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 57
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-09-11
    Description: The $k$ nearest neighbor ( $k$ NN) search on road networks is an important function in web mapping services. These services are now dealing with rapidly arriving queries, that are issued by a massive amount of users. While overlay graph-based indices can answer shortest path queries efficiently, there have been no studies on utilizing such indices to answer $k$ NN queries efficiently. In this paper, we fill this research gap and present two efficient $k$ NN search solutions on overlay graph-based indices. Experimental results show that our solutions offer very low query latency (0.1 ms) and require only small index sizes, even for 10-million-node networks.
    Print ISSN: 1041-4347
    Electronic ISSN: 1558-2191
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 58
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-09-11
    Description: Measuring semantic similarity between two terms is essential for a variety of text analytics and understanding applications. Currently, there are two main approaches for this task, namely the knowledge based and the corpus based approaches. However, existing approaches are more suitable for semantic similarity between words rather than the more general multi-word expressions (MWEs), and they do not scale very well. Contrary to these existing techniques, we propose an efficient and effective approach for semantic similarity using a large scale semantic network. This semantic network is automatically acquired from billions of web documents. It consists of millions of concepts, which explicitly model the context of semantic relationships. In this paper, we first show how to map two terms into the concept space, and compare their similarity there. Then, we introduce a clustering approach to orthogonalize the concept space in order to improve the accuracy of the similarity measure. Finally, we conduct extensive studies to demonstrate that our approach can accurately compute the semantic similarity between terms of MWEs and with ambiguity, and significantly outperforms 12 competing methods under Pearson Correlation Coefficient. Meanwhile, our approach is much more efficient than all competing algorithms, and can be used to compute semantic similarity in a large scale.
    Print ISSN: 1041-4347
    Electronic ISSN: 1558-2191
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 59
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-09-11
    Description: Given a spatio-temporal network, a source, a destination, and a desired departure time interval, the All-departure-time Lagrangian Shortest Paths (ALSP) problem determines a set which includes the shortest path for every departure time in the given interval. ALSP is important for critical societal applications such as eco-routing. However, ALSP is computationally challenging due to the non-stationary ranking of the candidate paths across distinct departure-times. Current related work for reducing the redundant work, across consecutive departure-times sharing a common solution, exploits only partial information e.g., the earliest feasible arrival time of a path. In contrast, our approach uses all available information, e.g., the entire time series of arrival times for all departure-times. This allows elimination of all knowable redundant computation based on complete information available at hand. We operationalize this idea through the concept of critical-time-points (CTP), i.e., departure-times before which ranking among candidate paths cannot change. In our preliminary work, we proposed a CTP based forward search strategy. In this paper, we propose a CTP based temporal bi-directional search for the ALSP problem via a novel impromptu rendezvous termination condition. Theoretical and experimental analysis show that the proposed approach outperforms the related work approaches particularly when there are few critical-time-points.
    Print ISSN: 1041-4347
    Electronic ISSN: 1558-2191
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 60
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-09-11
    Description: Computing connected components is a core operation on graph data. Since billion-scale graphs cannot be resident in memory of a single server, several approaches based on distributed machines have recently been proposed. The representative methods are $mathsf{Hashhbox{-}Tohbox{-}Min}$ and $mathsf{PowerGraph}$ . $mathsf{Hashhbox{-}Tohbox{-}Min}$ is the state-of-the art disk-based distributed method which minimizes the number of MapReduce rounds. $mathsf{PowerGraph}$ is the-state-of-the-art in-memory distributed system, which is typically faster than the disk-based distributed one, however, requires a lot of machines for handling billion-scale graphs. In this paper, we propose an I/O efficient parallel algorithm for billion-scale graphs in a single PC. We first propose the Disk-based Sequential access-oriented Parallel processing  (DSP) model that exploits sequential disk access in terms of disk I/Os and parallel processing in terms of computation. We then propose an ultra-fast disk-based parallel algorithm for computing connected components, $mathsf{DSPhbox{-}CC}$ , which largely improves the performance through sequential disk scan and page-level cache-conscious parallel processing . Extensive experimental results show that $mathsf{DSPhbox{-}CC}$ 1) computes connected components in billion-scale graphs using the limited memory size whereas in-memory algorithms can only support medium-sized graphs with the same memory size, and 2) significantly outperforms all distributed competitors as well as a representative disk-based parallel method.
    Print ISSN: 1041-4347
    Electronic ISSN: 1558-2191
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 61
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-09-11
    Description: Answering why-not questions in databases is promised to have wide application prospect in many areas and thereby, has attracted recent attention in the database research community. This paper addresses the problem of answering these so-called why-not questions in similar graph matching for graph databases. Given a set of answer graphs of an initial query graph $q$ and a set of missing ( why-not ) graphs, we aim to modify $q$ into a new query graph $q^*$ such that the missing graphs are included in the new answer set of $q^*$ . We present an approximate solution to address the above as the optimal solution is NP-hard to compute. In our approach, we first compute the bounded search space and the distance to be minimized for $q^*$ . Then, we present a two-phase algorithm to find the new query $q^*$ . In the first phase, we generate a set of candidate edges to be added/deleted into/from the initial query $q$ within the bounded search space and in the second phase, we select a subset of candidate edges generated in the first phase to minimize the distance for $q^*$ . We also demonstrate the effectiveness and efficiency of our approach by conducting extensive experiments on two real datasets.
    Print ISSN: 1041-4347
    Electronic ISSN: 1558-2191
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 62
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-09-11
    Description: How has the interdisciplinary data mining field been practiced in Network and Systems Management (NSM)? In Science and Technology, there is a wide use of data mining in areas like bioinformatics, genetics, Web, and, more recently, astroinformatics. However, the application in NSM has been limited and inconsiderable. In this article, we provide an account of how data mining has been applied in managing networks and systems for the past four decades, presumably since its birth. We look into the field’s applications in the key NSM activities—discovery, monitoring, analysis, reporting, and domain knowledge acquisition. In the end, we discuss our perspective on the issues that are considered critical for the effective application of data mining in the modern systems which are characterized by heterogeneity and high dynamism.
    Print ISSN: 1041-4347
    Electronic ISSN: 1558-2191
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 63
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-09-11
    Description: With the rapid development of location-aware mobile devices, ubiquitous Internet access and social computing technologies, lots of users’ personal information, such as location data and social data, has been readily accessible from various mobile platforms and online social networks. The convergence of these two types of data, known as geo-social data , has enabled collaborative spatial computing that explicitly combines both location and social factors to answer useful geo-social queries for either business or social good. In this paper, we study a new type of Geo-Social K-Cover Group (GSKCG) queries that, given a set of query points and a social network, retrieves a minimum user group in which each user is socially related to at least $k$ other users and the users’ associated regions (e.g., familiar regions or service regions) can jointly cover all the query points. Albeit its practical usefulness, the GSKCG query problem is NP-complete. We consequently explore a set of effective pruning strategies to derive an efficient algorithm for finding the optimal solution. Moreover, we design a novel index structure tailored to our problem to further accelerate query processing. Extensive experiments demonstrate that our algorithm achieves desirable performance on real-life datasets.
    Print ISSN: 1041-4347
    Electronic ISSN: 1558-2191
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 64
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-09-11
    Description: High utility sequential pattern mining has been considered as an important research problem and a number of relevant algorithms have been proposed for this topic. The main challenge of high utility sequential pattern mining is that, the search space is large and the efficiency of the solutions is directly affected by the degree at which they can eliminate the candidate patterns. Therefore, the efficiency of any high utility sequential pattern mining solution depends on its ability to reduce this big search space, and as a result, lower the computational complexity of calculating the utilities of the candidate patterns. In this paper, we propose efficient data structures and pruning technique which is based on Cumulated Rest of Match (CRoM) based upper bound. CRoM, by defining a tighter upper bound on the utility of the candidates, allows more conservative pruning before candidate pattern generation in comparison to the existing techniques. In addition, we have developed an efficient algorithm, High Utility Sequential Pattern Extraction (HuspExt), which calculates the utilities of the child patterns based on that of the parents’. Substantial experiments on both synthetic and real datasets from different domains show that, the proposed solution efficiently discovers high utility sequential patterns from large scale datasets with different data characteristics, under low utility thresholds.
    Print ISSN: 1041-4347
    Electronic ISSN: 1558-2191
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 65
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-09-11
    Description: Ordinal classification with a monotonicity constraint is a kind of classification tasks, in which the objects with better attribute values should not be assigned to a worse decision class. Several learning algorithms have been proposed to handle this kind of tasks in recent years. The rank entropy-based monotonic decision tree is very representative thanks to its better robustness and generalization. Ensemble learning is an effective strategy to significantly improve the generalization ability of machine learning systems. The objective of this work is to develop a method of fusing monotonic decision trees. In order to achieve this goal, we take two factors into account: attribute reduction and fusing principle. Through introducing variable dominance rough sets, we firstly propose an attribute reduction approach with rank-preservation for learning base classifiers, which can effectively avoid overfitting and improve classification performance. Then, we establish a fusing principe based on maximal probability through combining the base classifiers, which is used to further improve generalization ability of the learning system. The experimental analysis shows that the proposed fusing method can significantly improve classification performance of the learning system constructed by monotonic decision trees.
    Print ISSN: 1041-4347
    Electronic ISSN: 1558-2191
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 66
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-09-11
    Description: Influence maximization, defined as finding a small subset of nodes that maximizes spread of influence in social networks, is NP-hard under both Independent Cascade (IC) and Linear Threshold (LT) models, where many greedy-based algorithms have been proposed with the best approximation guarantee. However, existing greedy-based algorithms are inefficient on large networks, as it demands heavy Monte-Carlo simulations of the spread functions for each node at the initial step [7] . In this paper, we establish new upper bounds to significantly reduce the number of Monte-Carlo simulations in greedy-based algorithms, especially at the initial step. We theoretically prove that the bound is tight and convergent when the summation of weights towards (or from) each node is less than 1. Based on the bound, we propose a new Upper Bound based Lazy Forward algorithm ( UBLF in short) for discovering the top-k influential nodes in social networks. We test and compare UBLF with prior greedy algorithms, especially CELF [30] . Experimental results show that UBLF reduces more than 95 percent Monte-Carlo simulations of CELF and achieves about $2hbox{-}10$ times speedup when the seed set is small.
    Print ISSN: 1041-4347
    Electronic ISSN: 1558-2191
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 67
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-09-11
    Description: Feature selection has been an important research topic in data mining, because the real data sets often have high-dimensional features, such as the bioinformatics and text mining applications. Many existing filter feature selection methods rank features by optimizing certain feature ranking criterions, such that correlated features often have similar rankings. These correlated features are redundant and don’t provide large mutual information to help data mining. Thus, when we select a limited number of features, we hope to select the top non-redundant features such that the useful mutual information can be maximized. In previous research, Ding et al. recognized this important issue and proposed the minimum Redundancy Maximum Relevance Feature Selection (mRMR) model to minimize the redundancy between sequentially selected features. However, this method used the greedy search, thus the global feature redundancy wasn’t considered and the results are not optimal. In this paper, we propose a new feature selection framework to globally minimize the feature redundancy with maximizing the given feature ranking scores, which can come from any supervised or unsupervised methods. Our new model has no parameter so that it is especially suitable for practical data mining application. Experimental results on benchmark data sets show that the proposed method consistently improves the feature selection results compared to the original methods. Meanwhile, we introduce a new unsupervised global and local discriminative feature selection method which can be unified with the global feature redundancy minimization framework and shows superior performance.
    Print ISSN: 1041-4347
    Electronic ISSN: 1558-2191
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 68
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-09-11
    Description: Multimedia information retrieval usually involves two key modules including effective feature representation and ranking model construction. Most existing approaches are incapable of well modeling the inherent correlations and interactions between them, resulting in the loss of the latent consensus structure information. To alleviate this problem, we propose a learning to rank approach that simultaneously obtains a set of deep linear features and constructs structure-aware ranking models in a joint learning framework. Specifically, the deep linear feature learning corresponds to a series of matrix factorization tasks in a hierarchical manner, while the learning-to-rank part concentrates on building a ranking model that effectively encodes the intrinsic ranking information by structural SVM learning. Through a joint learning mechanism, the two parts are mutually reinforced in our approach, and meanwhile their underlying interaction relationships are implicitly reflected by solving an alternating optimization problem. Due to the intrinsic correlations among different queries (i.e., similar queries for similar ranking lists), we further formulate the learning-to-rank problem as a multi-task problem, which is associated with a set of mutually related query-specific learning-to-rank subproblems. For computational efficiency and scalability, we design a MapReduce-based parallelization approach to speed up the learning processes. Experimental results demonstrate the efficiency, effectiveness, and scalability of the proposed approach in multimedia information retrieval.
    Print ISSN: 1041-4347
    Electronic ISSN: 1558-2191
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 69
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-09-11
    Description: In a peer-to-peer system, a node should estimate reputation of other peers not only on the basis of its own interaction, but also on the basis of expression of other nodes. Reputation aggregation mechanism implements strategy for achieving this. Reputation aggregation in peer to peer networks is generally a very time and resource consuming process. Moreover, most of the methods consider that a node will have the same reputation after aggregation with all the nodes in the network, which is not true. This paper proposes a reputation aggregation algorithm that uses a variant of gossip algorithm called differential gossip. In this paper, estimate of reputation is considered to be having two parts, one common component which is same with every node, and the other one is the information received from immediate neighbours based on the neighbours’ direct interaction with the node. The differential gossip is fast and requires a lesser amount of resources. This mechanism allows computation of independent reputation value by every node, of every other node in the network. The differential gossip trust has been investigated for a power law network formed using preferential attachment (PA) Model. The reputation computed using differential gossip trust shows good amount of immunity to the collusion. We have verified the performance of the algorithm on the power law networks with sizes ranging from 100 nodes to 50,000 nodes.
    Print ISSN: 1041-4347
    Electronic ISSN: 1558-2191
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 70
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-09-11
    Description: In this paper, the limitation that is prominent in most existing works of change-point detection methods is addressed by proposing a nonparametric, computationally efficient method. The limitation is that most works assume that each data point observed at each time step is a single multi-dimensional vector. However, there are many situations where this does not hold. Therefore, a setting where each observation is a collection of random variables, which we call a bag of data, is considered. After estimating the underlying distribution behind each bag of data and embedding those distributions in a metric space, the change-point score is derived by evaluating how the sequence of distributions is fluctuating in the metric space using a distance-based information estimator. Also, a procedure that adaptively determines when to raise alerts is incorporated by calculating the confidence interval of the change-point score at each time step. This avoids raising false alarms in highly noisy situations and enables detecting changes of various magnitudes. A number of experimental studies and numerical examples are provided to demonstrate the generality and the effectiveness of our approach with both synthetic and real datasets.
    Print ISSN: 1041-4347
    Electronic ISSN: 1558-2191
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 71
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-09-11
    Description: In many applications, top- k query is an important operation to return a set of interesting points in a potentially huge data space. It is analyzed in this paper that the existing algorithms cannot process top- k query on massive data efficiently. This paper proposes a novel table-scan-based T2S algorithm to efficiently compute top- k results on massive data. T2S first constructs the presorted table, whose tuples are arranged in the order of the round-robin retrieval on the sorted lists. T2S maintains only fixed number of tuples to compute results. The early termination checking for T2S is presented in this paper, along with the analysis of scan depth. The selective retrieval is devised to skip the tuples in the presorted table which are not top- k results. The theoretical analysis proves that selective retrieval can reduce the number of the retrieved tuples significantly. The construction and incremental-update/batch-processing methods for the used structures are proposed in this paper. The extensive experimental results, conducted on synthetic and real-life data sets, show that T2S has a significant advantage over the existing algorithms.
    Print ISSN: 1041-4347
    Electronic ISSN: 1558-2191
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 72
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-09-11
    Description: We witness an unprecedented proliferation of knowledge graphs that record millions of entities and their relationships. While knowledge graphs are structure-flexible and content-rich, they are difficult to use. The challenge lies in the gap between their overwhelming complexity and the limited database knowledge of non-professional users. If writing structured queries over “simple” tables is difficult, complex graphs are only harder to query. As an initial step toward improving the usability of knowledge graphs, we propose to query such data by example entity tuples, without requiring users to form complex graph queries. Our system, Graph Query By Example ( $mathsf {GQBE}$ ), automatically discovers a weighted hidden maximum query graph based on input query tuples, to capture a user’s query intent. It then efficiently finds and ranks the top approximate matching answer graphs and answer tuples. We conducted experiments and user studies on the large Freebase and DBpedia datasets and observed appealing accuracy and efficiency. Our system provides a complementary approach to the existing keyword-based methods, facilitating user-friendly graph querying. To the best of our knowledge, there was no such proposal in the past in the context of graphs.
    Print ISSN: 1041-4347
    Electronic ISSN: 1558-2191
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 73
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-09-11
    Description: This paper explores combinatorial optimization for problems of max-weight graph matching on multi-partite graphs, which arise in integrating multiple data sources. In the most common two-source case, it is often desirable for the final matching to be one-to-one; the database and statistical record linkage communities accomplish this by weighted bipartite graph matching on similarity scores. Such matchings are intuitively appealing: they leverage a natural global property of many real-world entity stores—that of being nearly deduped—and are known to provide significant improvements to precision and recall. Unfortunately, unlike the bipartite case, exact max-weight matching on multi-partite graphs is known to be NP-hard. Our two-fold algorithmic contributions approximate multi-partite max-weight matching: our first algorithm borrows optimization techniques common to Bayesian probabilistic inference; our second is a greedy approximation algorithm. In addition to a theoretical guarantee on the latter, we present comparisons on a real-world entity resolution problem from Bing significantly larger than typically found in the literature, on publication data, and on a series of synthetic problems. Our results quantify significant improvements due to exploiting multiple sources, which are made possible by global one-to-one constraints linking otherwise independent matching sub-problems. We also discover that our algorithms are complementary: one being much more robust under noise, and the other being simple to implement and very fast to run.
    Print ISSN: 1041-4347
    Electronic ISSN: 1558-2191
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 74
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-09-11
    Description: Internet users can be classified as two types: (a) active users —actively contribute to blogs, publish opinions, write comments in Youtube, tweet messages, etc. and (b) passive consumers who only consume Internet information without contributing to it. While the majority of current social media research deals with active user analysis, there is very little work in understanding the dynamics of passive consumers and their influence. Our global scale Internet measurement of user access patterns of a diverse set of Internet media services indicates conclusively that majority of consumers are passive. In this paper, we develop a spatio-temporal mathematical model and the corresponding stochastic analysis to understand the passive consumer dynamics. Both discrete and continuous time analysis are presented. We also show how the analysis can be used to identify spatial points of influence, i.e., spatial locations that have maximum expertise or influence on a topic. The analysis takes into account, the initial level of consumption at each spatial location and the influence of different passive consumers at different geographic locations on each other. The effect of information noise is taken into account to derive fundamental limits of passive information consumption. Theoretical results are verified using real Internet measurement data. The large scale data have been made available for other researchers.
    Print ISSN: 1041-4347
    Electronic ISSN: 1558-2191
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 75
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-09-11
    Description: The objective of the paper is a contribution to data mining within the framework of the observational calculus, through introducing ǵeneralized quantifiers related to copulas. Fitting copulas to multidimensional data is an increasingly important method for analyzing dependencies, and the proposed quantifiers of observational calculus assess the results of estimating the structure of joint distributions of continuous variables by means of hierarchical Archimedean copulas. To this end, the existing theory of hierarchical Archimedean copulas has been slightly extended in the paper: It has been proven that sufficient conditions for the function defining a hierarchical Archimedean copula to be indeed a copula, which have so far been rigorously established only for the special case of fully nested Archimedean copulas, hold in general. These conditions allow us to define three new generalized quantifiers, which are then thoroughly validated on four benchmark data sets and one data set from a real-world application. The paper concludes by comparing the proposed quantifiers to a more traditional approach—maximum weight spanning trees.
    Print ISSN: 1041-4347
    Electronic ISSN: 1558-2191
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 76
    Publication Date: 2015-09-11
    Description: In this work, we develop an approach for the safe distribution and parallel execution of data-centric workflows over the publish/subscribe abstraction. In essence, we design a unique representation of data-centric workflows, specifically designed to exploit the loosely coupled and distributed nature of publish/subscribe systems. Furthermore, we argue for the practicality and expressiveness of our approach by mapping a standard and industry-strength data-centric workflow model, namely, IBM Business Artifacts with Guard-Stage-Milestone (GSM), into the publish/subscribe abstraction. In short, the contributions of this work are three-fold: (1) mapping of data-centric workflows into publish/subscribe to achieve distributed and parallel execution; (2) detailed theoretical analysis of the mapping; and (3) formulation of the complexity of the optimal workflow distribution over the publish/subscribe abstraction as an NP-hard problem.
    Print ISSN: 1041-4347
    Electronic ISSN: 1558-2191
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 77
    facet.materialart.
    Unknown
    Public Library of Science (PLoS)
    Publication Date: 2015-09-11
    Description: by Santiago Schnell
    Print ISSN: 1553-734X
    Electronic ISSN: 1553-7358
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 78
    Publication Date: 2015-09-15
    Description: Binets and trinets are phylogenetic networks with two and three leaves, respectively. Here we consider the problem of deciding if there exists a binary level-1 phylogenetic network displaying a given set  \(\mathbb {T}\) of binary binets or trinets over a taxon set  X , and constructing such a network whenever it exists. We show that this is NP-hard for trinets but polynomial-time solvable for binets. Moreover, we show that the problem is still polynomial-time solvable for inputs consisting of binets and trinets as long as the cycles in the trinets have size three. Finally, we present an  \(O(3^{|X|} poly(|X|))\) time algorithm for general sets of binets and trinets. The latter two algorithms generalise to instances containing level-1 networks with arbitrarily many leaves, and thus provide some of the first supernetwork algorithms for computing networks from a set of rooted phylogenetic networks.
    Print ISSN: 0178-4617
    Electronic ISSN: 1432-0541
    Topics: Computer Science , Mathematics
    Published by Springer
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 79
    facet.materialart.
    Unknown
    Public Library of Science (PLoS)
    Publication Date: 2015-09-15
    Description: by Alexander Ullrich, Mathias A. Böhme, Johannes Schöneberg, Harald Depner, Stephan J. Sigrist, Frank Noé Synaptic vesicle fusion is mediated by SNARE proteins forming in between synaptic vesicle (v-SNARE) and plasma membrane (t-SNARE), one of which is Syntaxin-1A. Although exocytosis mainly occurs at active zones, Syntaxin-1A appears to cover the entire neuronal membrane. By using STED super-resolution light microscopy and image analysis of Drosophila neuro-muscular junctions, we show that Syntaxin-1A clusters are more abundant and have an increased size at active zones. A computational particle-based model of syntaxin cluster formation and dynamics is developed. The model is parametrized to reproduce Syntaxin cluster-size distributions found by STED analysis, and successfully reproduces existing FRAP results. The model shows that the neuronal membrane is adjusted in a way to strike a balance between having most syntaxins stored in large clusters, while still keeping a mobile fraction of syntaxins free or in small clusters that can efficiently search the membrane or be traded between clusters. This balance is subtle and can be shifted toward almost no clustering and almost complete clustering by modifying the syntaxin interaction energy on the order of only 1 k B T. This capability appears to be exploited at active zones. The larger active-zone syntaxin clusters are more stable and provide regions of high docking and fusion capability, whereas the smaller clusters outside may serve as flexible reserve pool or sites of spontaneous ectopic release.
    Print ISSN: 1553-734X
    Electronic ISSN: 1553-7358
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 80
    Publication Date: 2015-09-15
    Description: by Stephan Köhler, Friederike Schmid, Giovanni Settanni Fibrinogen is a serum multi-chain protein which, when activated, aggregates to form fibrin, one of the main components of a blood clot. Fibrinolysis controls blood clot dissolution through the action of the enzyme plasmin, which cleaves fibrin at specific locations. Although the main biochemical factors involved in fibrin formation and lysis have been identified, a clear mechanistic picture of how these processes take place is not available yet. This picture would be instrumental, for example, for the design of improved thrombolytic or anti-haemorrhagic strategies, as well as, materials with improved biocompatibility. Here, we present extensive molecular dynamics simulations of fibrinogen which reveal large bending motions centered at a hinge point in the coiled-coil regions of the molecule. This feature, likely conserved across vertebrates according to our analysis, suggests an explanation for the mechanism of exposure to lysis of the plasmin cleavage sites on fibrinogen coiled-coil region. It also explains the conformational variability of fibrinogen observed during its adsorption on inorganic surfaces and it is supposed to play a major role in the determination of the hydrodynamic properties of fibrinogen. In addition the simulations suggest how the dynamics of the D region of fibrinogen may contribute to the allosteric regulation of the blood coagulation cascade through a dynamic coupling between the a- and b-holes, important for fibrin polymerization, and the integrin binding site P1.
    Print ISSN: 1553-734X
    Electronic ISSN: 1553-7358
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 81
    facet.materialart.
    Unknown
    Public Library of Science (PLoS)
    Publication Date: 2015-09-15
    Description: by Sergei Maslov, Kim Sneppen Populations of species in ecosystems are often constrained by availability of resources within their environment. In effect this means that a growth of one population, needs to be balanced by comparable reduction in populations of others. In neutral models of biodiversity all populations are assumed to change incrementally due to stochastic births and deaths of individuals. Here we propose and model another redistribution mechanism driven by abrupt and severe reduction in size of the population of a single species freeing up resources for the remaining ones. This mechanism may be relevant e.g. for communities of bacteria, with strain-specific collapses caused e.g. by invading bacteriophages, or for other ecosystems where infectious diseases play an important role. The emergent dynamics of our system is characterized by cyclic ‘‘diversity waves’’ triggered by collapses of globally dominating populations. The population diversity peaks at the beginning of each wave and exponentially decreases afterwards. Species abundances have bimodal time-aggregated distribution with the lower peak formed by populations of recently collapsed or newly introduced species while the upper peak - species that has not yet collapsed in the current wave. In most waves both upper and lower peaks are composed of several smaller peaks. This self-organized hierarchical peak structure has a long-term memory transmitted across several waves. It gives rise to a scale-free tail of the time-aggregated population distribution with a universal exponent of 1.7. We show that diversity wave dynamics is robust with respect to variations in the rules of our model such as diffusion between multiple environments, species-specific growth and extinction rates, and bet-hedging strategies.
    Print ISSN: 1553-734X
    Electronic ISSN: 1553-7358
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 82
    facet.materialart.
    Unknown
    Public Library of Science (PLoS)
    Publication Date: 2015-09-16
    Description: by Hannah Edwards, Charlotte M. Deane Several protein structure classification schemes exist that partition the protein universe into structural units called folds. Yet these schemes do not discuss how these units sit relative to each other in a global structure space. In this paper we construct networks that describe such global relationships between folds in the form of structural bridges. We generate these networks using four different structural alignment methods across multiple score thresholds. The networks constructed using the different methods remain a similar distance apart regardless of the probability threshold defining a structural bridge. This suggests that at least some structural bridges are method specific and that any attempt to build a picture of structural space should not be reliant on a single structural superposition method. Despite these differences all representations agree on an organisation of fold space into five principal community structures: all- α , all- β sandwiches, all- β barrels, α / β and α + β . We project estimated fold ages onto the networks and find that not only are the pairings of unconnected folds associated with higher age differences than bridged folds, but this difference increases with the number of networks displaying an edge. We also examine different centrality measures for folds within the networks and how these relate to fold age. While these measures interpret the central core of fold space in varied ways they all identify the disposition of ancestral folds to fall within this core and that of the more recently evolved structures to provide the peripheral landscape. These findings suggest that evolutionary information is encoded along these structural bridges. Finally, we identify four highly central pivotal folds representing dominant topological features which act as key attractors within our landscapes.
    Print ISSN: 1553-734X
    Electronic ISSN: 1553-7358
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 83
    facet.materialart.
    Unknown
    Public Library of Science (PLoS)
    Publication Date: 2015-09-16
    Description: by Ayal Lavi, Omri Perez, Uri Ashery Neuronal microcircuits generate oscillatory activity, which has been linked to basic functions such as sleep, learning and sensorimotor gating. Although synaptic release processes are well known for their ability to shape the interaction between neurons in microcircuits, most computational models do not simulate the synaptic transmission process directly and hence cannot explain how changes in synaptic parameters alter neuronal network activity. In this paper, we present a novel neuronal network model that incorporates presynaptic release mechanisms, such as vesicle pool dynamics and calcium-dependent release probability, to model the spontaneous activity of neuronal networks. The model, which is based on modified leaky integrate-and-fire neurons, generates spontaneous network activity patterns, which are similar to experimental data and robust under changes in the model's primary gain parameters such as excitatory postsynaptic potential and connectivity ratio. Furthermore, it reliably recreates experimental findings and provides mechanistic explanations for data obtained from microelectrode array recordings, such as network burst termination and the effects of pharmacological and genetic manipulations. The model demonstrates how elevated asynchronous release, but not spontaneous release, synchronizes neuronal network activity and reveals that asynchronous release enhances utilization of the recycling vesicle pool to induce the network effect. The model further predicts a positive correlation between vesicle priming at the single-neuron level and burst frequency at the network level; this prediction is supported by experimental findings. Thus, the model is utilized to reveal how synaptic release processes at the neuronal level govern activity patterns and synchronization at the network level.
    Print ISSN: 1553-734X
    Electronic ISSN: 1553-7358
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 84
    facet.materialart.
    Unknown
    Springer
    Publication Date: 2015-09-17
    Description: We consider stochastic versions of OneMax and LeadingOnes and analyze the performance of evolutionary algorithms with and without populations on these problems. It is known that the ( \(1+1\) ) EA on OneMax performs well in the presence of very small noise, but poorly for higher noise levels. We extend these results to LeadingOnes and to many different noise models, showing how the application of drift theory can significantly simplify and generalize previous analyses. Most surprisingly, even small populations (of size \(\varTheta (\log n)\) ) can make evolutionary algorithms perform well for high noise levels, well outside the abilities of the ( \(1+1\) ) EA. Larger population sizes are even more beneficial; we consider both parent and offspring populations. In this sense, populations are robust in these stochastic settings.
    Print ISSN: 0178-4617
    Electronic ISSN: 1432-0541
    Topics: Computer Science , Mathematics
    Published by Springer
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 85
    facet.materialart.
    Unknown
    Public Library of Science (PLoS)
    Publication Date: 2015-09-18
    Description: by Michael A. Cerullo
    Print ISSN: 1553-734X
    Electronic ISSN: 1553-7358
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 86
    facet.materialart.
    Unknown
    Public Library of Science (PLoS)
    Publication Date: 2015-09-18
    Description: by Jasmine Foo, Lin L Liu, Kevin Leder, Markus Riester, Yoh Iwasa, Christoph Lengauer, Franziska Michor The traditional view of cancer as a genetic disease that can successfully be treated with drugs targeting mutant onco-proteins has motivated whole-genome sequencing efforts in many human cancer types. However, only a subset of mutations found within the genomic landscape of cancer is likely to provide a fitness advantage to the cell. Distinguishing such “driver” mutations from innocuous “passenger” events is critical for prioritizing the validation of candidate mutations in disease-relevant models. We design a novel statistical index, called the Hitchhiking Index, which reflects the probability that any observed candidate gene is a passenger alteration, given the frequency of alterations in a cross-sectional cancer sample set, and apply it to a mutational data set in colorectal cancer. Our methodology is based upon a population dynamics model of mutation accumulation and selection in colorectal tissue prior to cancer initiation as well as during tumorigenesis. This methodology can be used to aid in the prioritization of candidate mutations for functional validation and contributes to the process of drug discovery.
    Print ISSN: 1553-734X
    Electronic ISSN: 1553-7358
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 87
    facet.materialart.
    Unknown
    Public Library of Science (PLoS)
    Publication Date: 2015-09-18
    Description: by Héctor García Martín, Vinay Satish Kumar, Daniel Weaver, Amit Ghosh, Victor Chubukov, Aindrila Mukhopadhyay, Adam Arkin, Jay D. Keasling Current limitations in quantitatively predicting biological behavior hinder our efforts to engineer biological systems to produce biofuels and other desired chemicals. Here, we present a new method for calculating metabolic fluxes, key targets in metabolic engineering, that incorporates data from 13 C labeling experiments and genome-scale models. The data from 13 C labeling experiments provide strong flux constraints that eliminate the need to assume an evolutionary optimization principle such as the growth rate optimization assumption used in Flux Balance Analysis (FBA). This effective constraining is achieved by making the simple but biologically relevant assumption that flux flows from core to peripheral metabolism and does not flow back. The new method is significantly more robust than FBA with respect to errors in genome-scale model reconstruction. Furthermore, it can provide a comprehensive picture of metabolite balancing and predictions for unmeasured extracellular fluxes as constrained by 13 C labeling data. A comparison shows that the results of this new method are similar to those found through 13 C Metabolic Flux Analysis ( 13 C MFA) for central carbon metabolism but, additionally, it provides flux estimates for peripheral metabolism. The extra validation gained by matching 48 relative labeling measurements is used to identify where and why several existing COnstraint Based Reconstruction and Analysis (COBRA) flux prediction algorithms fail. We demonstrate how to use this knowledge to refine these methods and improve their predictive capabilities. This method provides a reliable base upon which to improve the design of biological systems.
    Print ISSN: 1553-734X
    Electronic ISSN: 1553-7358
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 88
    facet.materialart.
    Unknown
    Springer
    Publication Date: 2015-09-19
    Description: Since Tinhofer proposed the MinGreedy algorithm for maximum cardinality matching in 1984, several experimental studies found the randomized algorithm to perform excellently for various classes of random graphs and benchmark instances. In contrast, only few analytical results are known. We show that MinGreedy cannot improve on the trivial approximation ratio of  \(\frac{1}{2}\) whp., even for bipartite graphs. Our hard inputs seem to require a small number of high-degree nodes. This motivates an investigation of greedy algorithms on graphs with maximum degree  \(\varDelta \) : we show that  MinGreedy achieves a  \({\frac{{\varDelta }-1}{2{\varDelta }-3}} \) -approximation for graphs with  \({\varDelta } {=} 3\) and for \(\varDelta \) -regular graphs, and a guarantee of  \({\frac{{\varDelta }-1/2}{2{\varDelta }-2}} \) for graphs with maximum degree  \({\varDelta } \) . Interestingly, our bounds even hold for the deterministic MinGreedy that breaks all ties arbitrarily. Moreover, we investigate the limitations of the greedy paradigm, using the model of priority algorithms introduced by Borodin, Nielsen, and Rackoff. We study deterministic priority algorithms and prove a  \({\frac{{\varDelta }-1}{2{\varDelta }-3}}\) -inapproximability result for graphs with maximum degree  \({\varDelta } \) ; thus, these greedy algorithms do not achieve a  \(\frac{1}{2} {+} \varepsilon \) -approximation and in particular the  \(\frac{2}{3}\) -approximation obtained by the deterministic MinGreedy for  \({\varDelta } {=} 3\) is optimal in this class. For  k -uniform hypergraphs we show a tight  \(\frac{1}{k}\) -inapproximability bound. We also study fully randomized priority algorithms and give a  \(\frac{5}{6}\) -inapproximability bound. Thus, they cannot compete with matching algorithms of other paradigms.
    Print ISSN: 0178-4617
    Electronic ISSN: 1432-0541
    Topics: Computer Science , Mathematics
    Published by Springer
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 89
    facet.materialart.
    Unknown
    Springer
    Publication Date: 2015-09-19
    Description: We discuss how string sorting algorithms can be parallelized on modern multi-core shared memory machines. As a synthesis of the best sequential string sorting algorithms and successful parallel sorting algorithms for atomic objects, we first propose string sample sort. The algorithm makes effective use of the memory hierarchy, uses additional word level parallelism, and largely avoids branch mispredictions. Then we focus on NUMA architectures, and develop parallel multiway LCP-merge and -mergesort to reduce the number of random memory accesses to remote nodes. Additionally, we parallelize variants of multikey quicksort and radix sort that are also useful in certain situations. As base-case sorter for LCP-aware string sorting we describe sequential LCP-insertion sort which calculates the LCP array and accelerates its insertions using it. Comprehensive experiments on five current multi-core platforms are then reported and discussed. The experiments show that our parallel string sorting implementations scale very well on real-world inputs and modern machines.
    Print ISSN: 0178-4617
    Electronic ISSN: 1432-0541
    Topics: Computer Science , Mathematics
    Published by Springer
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 90
    Publication Date: 2015-09-22
    Description: by Julia C. Quindlen, Victor K. Lai, Victor H. Barocas Cutaneous mechanoreceptors transduce different tactile stimuli into neural signals that produce distinct sensations of touch. The Pacinian corpuscle (PC), a cutaneous mechanoreceptor located deep within the dermis of the skin, detects high frequency vibrations that occur within its large receptive field. The PC is comprised of lamellae that surround the nerve fiber at its core. We hypothesized that a layered, anisotropic structure, embedded deep within the skin, would produce the nonlinear strain transmission and low spatial sensitivity characteristic of the PC. A multiscale finite-element model was used to model the equilibrium response of the PC to indentation. The first simulation considered an isolated PC with fiber networks aligned with the PC’s surface. The PC was subjected to a 10 μm indentation by a 250 μm diameter indenter. The multiscale model captured the nonlinear strain transmission through the PC, predicting decreased compressive strain with proximity to the receptor’s core, as seen experimentally by others. The second set of simulations considered a single PC embedded epidermally (shallow) or dermally (deep) to model the PC’s location within the skin. The embedded models were subjected to 10 μm indentations at a series of locations on the surface of the skin. Strain along the long axis of the PC was calculated after indentation to simulate stretch along the nerve fiber at the center of the PC. Receptive fields for the epidermis and dermis models were constructed by mapping the long-axis strain after indentation at each point on the surface of the skin mesh. The dermis model resulted in a larger receptive field, as the calculated strain showed less indenter location dependence than in the epidermis model.
    Print ISSN: 1553-734X
    Electronic ISSN: 1553-7358
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 91
    facet.materialart.
    Unknown
    Public Library of Science (PLoS)
    Publication Date: 2015-09-23
    Description: by Vipin Narang, Muhamad Azfar Ramli, Amit Singhal, Pavanish Kumar, Gennaro de Libero, Michael Poidinger, Christopher Monterola Human gene regulatory networks (GRN) can be difficult to interpret due to a tangle of edges interconnecting thousands of genes. We constructed a general human GRN from extensive transcription factor and microRNA target data obtained from public databases. In a subnetwork of this GRN that is active during estrogen stimulation of MCF-7 breast cancer cells, we benchmarked automated algorithms for identifying core regulatory genes (transcription factors and microRNAs). Among these algorithms, we identified K-core decomposition, pagerank and betweenness centrality algorithms as the most effective for discovering core regulatory genes in the network evaluated based on previously known roles of these genes in MCF-7 biology as well as in their ability to explain the up or down expression status of up to 70% of the remaining genes. Finally, we validated the use of K-core algorithm for organizing the GRN in an easier to interpret layered hierarchy where more influential regulatory genes percolate towards the inner layers. The integrated human gene and miRNA network and software used in this study are provided as supplementary materials (S1 Data) accompanying this manuscript.
    Print ISSN: 1553-734X
    Electronic ISSN: 1553-7358
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 92
    Publication Date: 2015-09-23
    Description: by Jianfei Hu, Johnathan Neiswinger, Jin Zhang, Heng Zhu, Jiang Qian Scaffold proteins play a crucial role in facilitating signal transduction in eukaryotes by bringing together multiple signaling components. In this study, we performed a systematic analysis of scaffold proteins in signal transduction by integrating protein-protein interaction and kinase-substrate relationship networks. We predicted 212 scaffold proteins that are involved in 605 distinct signaling pathways. The computational prediction was validated using a protein microarray-based approach. The predicted scaffold proteins showed several interesting characteristics, as we expected from the functionality of scaffold proteins. We found that the scaffold proteins are likely to interact with each other, which is consistent with previous finding that scaffold proteins tend to form homodimers and heterodimers. Interestingly, a single scaffold protein can be involved in multiple signaling pathways by interacting with other scaffold protein partners. Furthermore, we propose two possible regulatory mechanisms by which the activity of scaffold proteins is coordinated with their associated pathways through phosphorylation process.
    Print ISSN: 1553-734X
    Electronic ISSN: 1553-7358
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 93
    Publication Date: 2015-09-23
    Description: by Vassilios Christopoulos, Paul R. Schrater Decisions involve two fundamental problems, selecting goals and generating actions to pursue those goals. While simple decisions involve choosing a goal and pursuing it, humans evolved to survive in hostile dynamic environments where goal availability and value can change with time and previous actions, entangling goal decisions with action selection. Recent studies suggest the brain generates concurrent action-plans for competing goals, using online information to bias the competition until a single goal is pursued. This creates a challenging problem of integrating information across diverse types, including both the dynamic value of the goal and the costs of action. We model the computations underlying dynamic decision-making with disparate value types, using the probability of getting the highest pay-off with the least effort as a common currency that supports goal competition. This framework predicts many aspects of decision behavior that have eluded a common explanation.
    Print ISSN: 1553-734X
    Electronic ISSN: 1553-7358
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 94
    facet.materialart.
    Unknown
    Public Library of Science (PLoS)
    Publication Date: 2015-09-23
    Description: by Chakravarthy Marella, Andrew E. Torda, Dominik Schwudke A lipidome is the set of lipids in a given organism, cell or cell compartment and this set reflects the organism’s synthetic pathways and interactions with its environment. Recently, lipidomes of biological model organisms and cell lines were published and the number of functional studies of lipids is increasing. In this study we propose a homology metric that can quantify systematic differences in the composition of a lipidome. Algorithms were developed to 1. consistently convert lipids structure into SMILES, 2. determine structural similarity between molecular species and 3. describe a lipidome in a chemical space model. We tested lipid structure conversion and structure similarity metrics, in detail, using sets of isomeric ceramide molecules and chemically related phosphatidylinositols. Template-based SMILES showed the best properties for representing lipid-specific structural diversity. We also show that sequence analysis algorithms are best suited to calculate distances between such template-based SMILES and we adjudged the Levenshtein distance as best choice for quantifying structural changes. When all lipid molecules of the LIPIDMAPS structure database were mapped in chemical space, they automatically formed clusters corresponding to conventional chemical families. Accordingly, we mapped a pair of lipidomes into the same chemical space and determined the degree of overlap by calculating the Hausdorff distance. We named this metric the ‘Lipidome jUXtaposition (LUX) score’. First, we tested this approach for estimating the lipidome similarity on four yeast strains with known genetic alteration in fatty acid synthesis. We show that the LUX score reflects the genetic relationship and growth temperature better than conventional methods although the score is based solely on lipid structures. Next, we applied this metric to high-throughput data of larval tissue lipidomes of Drosophila. This showed that the LUX score is sufficient to cluster tissues and determine the impact of nutritional changes in an unbiased manner, despite the limited information on the underlying structural diversity of each lipidome. This study is the first effort to define a lipidome homology metric based on structures that will enrich functional association of lipids in a similar manner to measures used in genetics. Finally, we discuss the significance of the LUX score to perform comparative lipidome studies across species borders.
    Print ISSN: 1553-734X
    Electronic ISSN: 1553-7358
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 95
    Publication Date: 2015-09-24
    Description: by Noah Ollikainen, René M. de Jong, Tanja Kortemme Interactions between small molecules and proteins play critical roles in regulating and facilitating diverse biological functions, yet our ability to accurately re-engineer the specificity of these interactions using computational approaches has been limited. One main difficulty, in addition to inaccuracies in energy functions, is the exquisite sensitivity of protein–ligand interactions to subtle conformational changes, coupled with the computational problem of sampling the large conformational search space of degrees of freedom of ligands, amino acid side chains, and the protein backbone. Here, we describe two benchmarks for evaluating the accuracy of computational approaches for re-engineering protein-ligand interactions: (i) prediction of enzyme specificity altering mutations and (ii) prediction of sequence tolerance in ligand binding sites. After finding that current state-of-the-art “fixed backbone” design methods perform poorly on these tests, we develop a new “coupled moves” design method in the program Rosetta that couples changes to protein sequence with alterations in both protein side-chain and protein backbone conformations, and allows for changes in ligand rigid-body and torsion degrees of freedom. We show significantly increased accuracy in both predicting ligand specificity altering mutations and binding site sequences. These methodological improvements should be useful for many applications of protein – ligand design. The approach also provides insights into the role of subtle conformational adjustments that enable functional changes not only in engineering applications but also in natural protein evolution.
    Print ISSN: 1553-734X
    Electronic ISSN: 1553-7358
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 96
    Publication Date: 2015-09-25
    Description: by Pete Riley, Michal Ben-Nun, Jon A. Linker, Angelia A. Cost, Jose L. Sanchez, Dylan George, David P. Bacon, Steven Riley The potential rapid availability of large-scale clinical episode data during the next influenza pandemic suggests an opportunity for increasing the speed with which novel respiratory pathogens can be characterized. Key intervention decisions will be determined by both the transmissibility of the novel strain (measured by the basic reproductive number R 0 ) and its individual-level severity. The 2009 pandemic illustrated that estimating individual-level severity, as described by the proportion p C of infections that result in clinical cases, can remain uncertain for a prolonged period of time. Here, we use 50 distinct US military populations during 2009 as a retrospective cohort to test the hypothesis that real-time encounter data combined with disease dynamic models can be used to bridge this uncertainty gap. Effectively, we estimated the total number of infections in multiple early-affected communities using the model and divided that number by the known number of clinical cases. Joint estimates of severity and transmissibility clustered within a relatively small region of parameter space, with 40 of the 50 populations bounded by: p C , 0.0133–0.150 and R 0 , 1.09–2.16. These fits were obtained despite widely varying incidence profiles: some with spring waves, some with fall waves and some with both. To illustrate the benefit of specific pairing of rapidly available data and infectious disease models, we simulated a future moderate pandemic strain with p C approximately ×10 that of 2009; the results demonstrating that even before the peak had passed in the first affected population, R 0 and p C could be well estimated. This study provides a clear reference in this two-dimensional space against which future novel respiratory pathogens can be rapidly assessed and compared with previous pandemics.
    Print ISSN: 1553-734X
    Electronic ISSN: 1553-7358
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 97
    facet.materialart.
    Unknown
    Public Library of Science (PLoS)
    Publication Date: 2015-09-26
    Description: by Gang Li, Karen E. Ross, Cecilia N. Arighi, Yifan Peng, Cathy H. Wu, K. Vijay-Shanker MicroRNAs (miRNAs) regulate a wide range of cellular and developmental processes through gene expression suppression or mRNA degradation. Experimentally validated miRNA gene targets are often reported in the literature. In this paper, we describe miRTex, a text mining system that extracts miRNA-target relations, as well as miRNA-gene and gene-miRNA regulation relations. The system achieves good precision and recall when evaluated on a literature corpus of 150 abstracts with F-scores close to 0.90 on the three different types of relations. We conducted full-scale text mining using miRTex to process all the Medline abstracts and all the full-length articles in the PubMed Central Open Access Subset. The results for all the Medline abstracts are stored in a database for interactive query and file download via the website at http://proteininformationresource.org/mirtex. Using miRTex, we identified genes potentially regulated by miRNAs in Triple Negative Breast Cancer, as well as miRNA-gene relations that, in conjunction with kinase-substrate relations, regulate the response to abiotic stress in Arabidopsis thaliana . These two use cases demonstrate the usefulness of miRTex text mining in the analysis of miRNA-regulated biological processes.
    Print ISSN: 1553-734X
    Electronic ISSN: 1553-7358
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 98
    Publication Date: 2015-09-26
    Description: by Greg Jensen, Fabian Muñoz, Yelda Alkan, Vincent P. Ferrera, Herbert S. Terrace Transitive inference (the ability to infer that B 〉 D given that B 〉 C and C 〉 D ) is a widespread characteristic of serial learning, observed in dozens of species. Despite these robust behavioral effects, reinforcement learning models reliant on reward prediction error or associative strength routinely fail to perform these inferences. We propose an algorithm called betasort , inspired by cognitive processes, which performs transitive inference at low computational cost. This is accomplished by (1) representing stimulus positions along a unit span using beta distributions, (2) treating positive and negative feedback asymmetrically, and (3) updating the position of every stimulus during every trial, whether that stimulus was visible or not. Performance was compared for rhesus macaques, humans, and the betasort algorithm, as well as Q -learning, an established reward-prediction error (RPE) model. Of these, only Q -learning failed to respond above chance during critical test trials. Betasort’s success (when compared to RPE models) and its computational efficiency (when compared to full Markov decision process implementations) suggests that the study of reinforcement learning in organisms will be best served by a feature-driven approach to comparing formal models.
    Print ISSN: 1553-734X
    Electronic ISSN: 1553-7358
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 99
    facet.materialart.
    Unknown
    Public Library of Science (PLoS)
    Publication Date: 2015-11-22
    Description: by Alberto Romagnoni, Jérôme Ribot, Daniel Bennequin, Jonathan Touboul The layout of sensory brain areas is thought to subtend perception. The principles shaping these architectures and their role in information processing are still poorly understood. We investigate mathematically and computationally the representation of orientation and spatial frequency in cat primary visual cortex. We prove that two natural principles, local exhaustivity and parsimony of representation, would constrain the orientation and spatial frequency maps to display a very specific pinwheel-dipole singularity. This is particularly interesting since recent experimental evidences show a dipolar structures of the spatial frequency map co-localized with pinwheels in cat. These structures have important properties on information processing capabilities. In particular, we show using a computational model of visual information processing that this architecture allows a trade-off in the local detection of orientation and spatial frequency, but this property occurs for spatial frequency selectivity sharper than reported in the literature. We validated this sharpening on high-resolution optical imaging experimental data. These results shed new light on the principles at play in the emergence of functional architecture of cortical maps, as well as their potential role in processing information.
    Print ISSN: 1553-734X
    Electronic ISSN: 1553-7358
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 100
    Publication Date: 2015-11-22
    Description: by Kimberly Glass, Michelle Girvan The Gene Ontology (GO) provides biologists with a controlled terminology that describes how genes are associated with functions and how functional terms are related to one another. These term-term relationships encode how scientists conceive the organization of biological functions, and they take the form of a directed acyclic graph (DAG). Here, we propose that the network structure of gene-term annotations made using GO can be employed to establish an alternative approach for grouping functional terms that captures intrinsic functional relationships that are not evident in the hierarchical structure established in the GO DAG. Instead of relying on an externally defined organization for biological functions, our approach connects biological functions together if they are performed by the same genes, as indicated in a compendium of gene annotation data from numerous different sources. We show that grouping terms by this alternate scheme provides a new framework with which to describe and predict the functions of experimentally identified sets of genes.
    Print ISSN: 1553-734X
    Electronic ISSN: 1553-7358
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
Close ⊗
This website uses cookies and the analysis tool Matomo. More information can be found here...