ALBERT

All Library Books, journals and Electronic Records Telegrafenberg

Your email was sent successfully. Check your inbox.

An error occurred while sending the email. Please try again.

Proceed reservation?

Export
Filter
  • Books
  • Articles  (1,610)
  • Latest Papers from Table of Contents or Articles in Press  (1,610)
  • Oxford University Press  (1,610)
  • MDPI Publishing
  • National Academy of Sciences
  • 2015-2019  (1,336)
  • 1990-1994  (274)
  • 1935-1939
  • 2019  (1,336)
  • 1991  (274)
  • Computer Science  (1,610)
Collection
  • Books
  • Articles  (1,610)
Source
  • Latest Papers from Table of Contents or Articles in Press  (1,610)
Years
  • 2015-2019  (1,336)
  • 1990-1994  (274)
  • 1935-1939
Year
Journal
  • 1
    Publication Date: 2019-02-26
    Description: Formal ontologies are axiomatizations in a logic-based formalism. The development of formal ontologies is generating considerable research on the use of automated reasoning techniques and tools that help in ontology engineering. One of the main aims is to refine and to improve axiomatizations for enabling automated reasoning tools to efficiently infer reliable information. Defects in the axiomatization cannot only cause wrong inferences, but can also hinder the inference of expected information, either by increasing the computational cost of or even preventing the inference. In this paper, we introduce a novel, fully automatic white-box testing framework for first-order logic (FOL) ontologies. Our methodology is based on the detection of inference-based redundancies in the given axiomatization. The application of the proposed testing method is fully automatic since (i) the automated generation of tests is guided only by the syntax of axioms and (ii) the evaluation of tests is performed by automated theorem provers (ATPs). Our proposal enables the detection of defects and serves to certify the grade of suitability—for reasoning purposes—of every axiom. We formally define the set of tests that are (automatically) generated from any axiom and prove that every test is logically related to redundancies in the axiom from which the test has been generated. We have implemented our method and used this implementation to automatically detect several non-trivial defects that were hidden in various FOL ontologies. Throughout the paper we provide illustrative examples of these defects, explain how they were found and how each proof—given by an ATP—provides useful hints on the nature of each defect. Additionally, by correcting all the detected defects, we have obtained an improved version of one of the tested ontologies: Adimen-SUMO.
    Print ISSN: 0955-792X
    Electronic ISSN: 1465-363X
    Topics: Computer Science , Mathematics
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 2
    Publication Date: 2019-03-08
    Description: In this paper, we provide a fairly general self-reference-free proof of the second incompleteness theorem from Tarski’s theorem on the undefinability of truth.
    Print ISSN: 0955-792X
    Electronic ISSN: 1465-363X
    Topics: Computer Science , Mathematics
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 3
    Publication Date: 2019-12-01
    Description: We study the symbolic model checking problem against public announcement protocol logic (PAPL), featuring protocols with public announcements, arbitrary public announcements and group announcements. Technically, symbolic models are Kripke models whose accessibility relations are presented as programs described in a dynamic logic style with propositional assignments. We highlight the relevance of such symbolic models and show that the symbolic model checking problem against PAPL is A$_{extrm{pol}}$Exptime-complete as soon as announcement protocols allow for either arbitrary announcements or iteration of public announcements. However, when both options are discarded, the complexity drops to Pspace-complete.
    Print ISSN: 0955-792X
    Electronic ISSN: 1465-363X
    Topics: Computer Science , Mathematics
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 4
    Publication Date: 2019-04-10
    Description: In this paper we provide a strongly complete axiomatization of a temporal epistemic logic in which non-rigid sets of agents are allowed. Using this framework, we prove a number of properties of the blockchain protocol with respect to the given set of axioms and premises.
    Print ISSN: 0955-792X
    Electronic ISSN: 1465-363X
    Topics: Computer Science , Mathematics
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 5
    Publication Date: 2019-05-07
    Description: In this paper we consider modal team logic, a generalization of classical modal logic in which it is possible to describe dependence phenomena between data. We prove that most known fragments of full modal team logic allow the elimination of the so called ‘existential bisimulation quantifiers’, where the existence of a certain set is required only modulo bisimulation (i.e. not in the model itself but possibly in a bisimilar model). As a consequence, we prove that these fragments enjoy the uniform interpolation property.
    Print ISSN: 0955-792X
    Electronic ISSN: 1465-363X
    Topics: Computer Science , Mathematics
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 6
    Publication Date: 2019-06-06
    Description: During the last 20 years or so, a wide range of realizability interpretations of classical analysis have been developed. In many cases, these are achieved by extending the base interpreting system of primitive recursive functionals with some form of bar recursion, which realizes the negative translation of either countable or countable dependent choice. In this work, we present the many variants of bar recursion used in this context as instantiations of a parametrized form of backward recursion, and give a uniform proof that under certain conditions this recursor realizes a corresponding family of parametrized dependent choice principles. From this proof, the soundness of most of the existing bar recursive realizability interpretations of choice, including those based on the Berardi–Bezem–Coquand functional, modified realizability and the more recent products of selection functions of Escardó and Oliva, follows as a simple corollary. We achieve not only a uniform framework in which familiar realizability interpretations of choice can be compared, but show that these represent just simple instances of a large family of potential interpretations of dependent choice principles.
    Print ISSN: 0955-792X
    Electronic ISSN: 1465-363X
    Topics: Computer Science , Mathematics
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 7
    Publication Date: 2019-06-06
    Print ISSN: 0955-792X
    Electronic ISSN: 1465-363X
    Topics: Computer Science , Mathematics
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 8
    Publication Date: 2019-02-26
    Description: I discuss certain principles for substituting strictly equivalent propositions and provide alternative axiomatizations of some standard modal systems and proofs of containment relations among some systems that have the principles, including partial proof of Steffen Lewitzka’s ‘conjecture’ that ‘S1+$Box $SP is strictly contained between S1+SP and S3’ (2016, J. Logic Comput., 26, 1780).
    Print ISSN: 0955-792X
    Electronic ISSN: 1465-363X
    Topics: Computer Science , Mathematics
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 9
    Publication Date: 2019-12-01
    Description: We present a substructural epistemic logic, based on Boolean BI, in which the epistemic modalities are parametrized on agents’ local resources. The new modalities can be seen as generalizations of the usual epistemic modalities. The logic combines Boolean BI’s resource semantics—we introduce BI and its resource semantics at some length—with epistemic agency. We illustrate the use of the logic in systems modelling by discussing some examples about access control, including semaphores, using resource tokens. We also give a labelled tableaux calculus and establish soundness and completeness with respect to the resource semantics.
    Print ISSN: 0955-792X
    Electronic ISSN: 1465-363X
    Topics: Computer Science , Mathematics
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 10
    Publication Date: 2019-07-30
    Description: We study the satisfiability problem for two-variable first-order logic over structures with one transitive relation. We show that the problem is decidable in 2-NExpTime for the fragment consisting of formulas where existential quantifiers are guarded by transitive atoms. As this fragment enjoys neither the finite model property nor the tree model property, to show decidability we introduce a novel model construction technique based on the infinite Ramsey theorem. We also point out why the technique is not sufficient to obtain decidability for the full two-variable logic with one transitive relation; hence, contrary to our previous claim, [FO$^2$ with one transitive relation is decidable, STACS 2013: 317-328], the status of the latter problem remains open.
    Print ISSN: 0955-792X
    Electronic ISSN: 1465-363X
    Topics: Computer Science , Mathematics
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 11
    Publication Date: 2019-03-05
    Description: In this paper, we use the generalize d rotation construction to lift results from the lattice of subvarieties of basic hoops to some parts of the lattice of subvarieties of monoidal t-norm based logic-algebras. In particular, we study splitting algebras for (the lattice of subvarieties of) varieties generated by generalized rotations of basic hoops and relevant subvarieties such as Wajsberg hoops, cancellative hoops and Gödel hoops. Finally, we show that the generalized rotation construction preserves the amalgamation property.
    Print ISSN: 0955-792X
    Electronic ISSN: 1465-363X
    Topics: Computer Science , Mathematics
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 12
    Publication Date: 2019-10-01
    Description: In this paper, we study Bernoulli random sequences, i.e. sequences that are Martin-Löf random with respect to a Bernoulli measure $mu _p$ for some $pin [0,1]$, where we allow for the possibility that $p$ is noncomputable. We focus in particular on the case in which the underlying Bernoulli parameter $p$ is proper (i.e. Martin-Löf random with respect to some computable measure). We show for every Bernoulli parameter $p$, if there is a sequence that is both proper and Martin-Löf random with respect to $mu _p$, then $p$ itself must be proper, and explore further consequences of this result. We also study the Turing degrees of Bernoulli random sequences, showing, for instance, that the Turing degrees containing a Bernoulli random sequence do not coincide with the Turing degrees containing a Martin-Löf random sequence. Lastly, we consider several possible approaches to characterizing blind Bernoulli randomness, where the corresponding Martin-Löf tests do not have access to the Bernoulli parameter $p$, and show that these fail to characterize blind Bernoulli randomness.
    Print ISSN: 0955-792X
    Electronic ISSN: 1465-363X
    Topics: Computer Science , Mathematics
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 13
    Publication Date: 2019-08-23
    Description: We define a real $A$ to be low for paths in Baire space (or Cantor space) if every $varPi ^0_1$ class with an $A$-computable element has a computable element. We prove that lowness for paths in Baire space and lowness for paths in Cantor space are equivalent and, furthermore, that these notions are also equivalent to lowness for isomorphism.
    Print ISSN: 0955-792X
    Electronic ISSN: 1465-363X
    Topics: Computer Science , Mathematics
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 14
    Publication Date: 2019-06-06
    Description: We define notions of well-definedness and observational equivalence for programs of mixed inductive and coinductive types. These notions are defined by means of tests formulas which combine structural congruence for inductive types and modal logic for coinductive types. Tests also correspond to certain evaluation contexts. We define a program to be well-defined if it is strongly normalizing under all tests, and two programs are observationally equivalent if they satisfy the same tests. We show that observational equivalence is sufficiently coarse to ensure that least and greatest fixed point types are initial algebras and final coalgebras, respectively. This yields inductive and coinductive proof principles for reasoning about program behaviour. On the other hand, we argue that observational equivalence does not identify too many terms, by showing that tests induce a topology that, on streams, coincides with usual topology induced by the prefix metric. As one would expect, observational equivalence is, in general, undecidable, but in order to develop some practically useful heuristics we provide coinductive techniques for establishing observational normalization and observational equivalence, along with up-to techniques for enhancing these methods.
    Print ISSN: 0955-792X
    Electronic ISSN: 1465-363X
    Topics: Computer Science , Mathematics
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 15
    Publication Date: 2019-07-30
    Description: Coalition structure generation (CSG) is one of the main research issues in the use of coalitional games in multiagent systems and weighted partial MaxSAT (WPM) encodings, i.e. rule relation-based WPM (RWPM) and agent relation-based WPM (AWPM), which are efficient for solving the CSG problem. Existing studies show that AWPM surpasses RWPM since it achieves more compact encoding; it generates fewer variables and clauses than RWPM. However, in this paper, we focus on a special case in which the two encodings generate identical numbers of variables and clauses. Experiments show that RWPM surprisingly has a dominant advantage over AWPM, which aroused our interest. We exploit the deep-rooted reason and find that it is the redundancy when encoding transitive laws in RWPM that leads to this situation. Finally, we remove redundant clauses for transitive laws in RWPM and develop an improved RWPM with refined transitive laws to solve the CSG problem. Experiments demonstrate that refined encoding is more compact and efficient than previous WPM encodings.
    Print ISSN: 0955-792X
    Electronic ISSN: 1465-363X
    Topics: Computer Science , Mathematics
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 16
    Publication Date: 2019-10-01
    Description: Logics of design have been formulated until recently to offer systematic treatments of the way systems express the relation between resources, processes and their outputs. We present a logic of systems design which explicitly formalizes this relation as a decidable checking problem on resource access and define computable efficiency and optimality properties.
    Print ISSN: 0955-792X
    Electronic ISSN: 1465-363X
    Topics: Computer Science , Mathematics
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 17
    Publication Date: 2019-04-10
    Description: Propositional and modal inclusion logic are formalisms that belong to the family of logics based on team semantics. This article investigates the model checking and validity problems of these logics. We identify complexity bounds for both problems, covering both lax and strict team semantics. By doing so, we come close to finalizing the programme that aims to completely classify the complexities of the basic reasoning problems for modal and propositional dependence, independence and inclusion logics.
    Print ISSN: 0955-792X
    Electronic ISSN: 1465-363X
    Topics: Computer Science , Mathematics
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 18
    Publication Date: 2019-03-20
    Description: Information criteria (ICs) based on penalized likelihood, such as Akaike’s information criterion (AIC), the Bayesian information criterion (BIC) and sample-size-adjusted versions of them, are widely used for model selection in health and biological research. However, different criteria sometimes support different models, leading to discussions about which is the most trustworthy. Some researchers and fields of study habitually use one or the other, often without a clearly stated justification. They may not realize that the criteria may disagree. Others try to compare models using multiple criteria but encounter ambiguity when different criteria lead to substantively different answers, leading to questions about which criterion is best. In this paper we present an alternative perspective on these criteria that can help in interpreting their practical implications. Specifically, in some cases the comparison of two models using ICs can be viewed as equivalent to a likelihood ratio test, with the different criteria representing different alpha levels and BIC being a more conservative test than AIC. This perspective may lead to insights about how to interpret the ICs in more complex situations. For example, AIC or BIC could be preferable, depending on the relative importance one assigns to sensitivity versus specificity. Understanding the differences and similarities among the ICs can make it easier to compare their results and to use them to make informed decisions.
    Print ISSN: 1467-5463
    Electronic ISSN: 1477-4054
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 19
    Publication Date: 2019-06-14
    Description: We present an overview of diffusion models commonly used for quantifying the dynamics of intracellular particles (e.g. biomolecules) inside eukaryotic living cells. It is established that inference on the modes of mobility of molecules is central in cell biology since it reflects interactions between structures and determines functions of biomolecules in the cell. In that context, Brownian motion is a key component in short distance transportation (e.g. connectivity for signal transduction). Another dynamical process that has been heavily studied in the past decade is the motor-mediated transport (e.g. dynein, kinesin and myosin) of molecules. Primarily supported by actin filament and microtubule network, it ensures spatial organization and temporal synchronization in the intracellular mechanisms and structures. Nevertheless, the complexity of internal structures and molecular processes in the living cell influence the molecular dynamics and prevent the systematic application of pure Brownian or directed motion modeling. On the one hand, cytoskeleton density will hinder the free displacement of the particle, a phenomenon called subdiffusion. On the other hand, the cytoskeleton elasticity combined with thermal bending can contribute a phenomenon called superdiffusion. This paper discusses the basics of diffusion modes observed in eukariotic cells, by introducing the essential properties of these processes. Applications of diffusion models include protein trafficking and transport and membrane diffusion.
    Print ISSN: 1467-5463
    Electronic ISSN: 1477-4054
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 20
    Publication Date: 2019-12-07
    Description: Modern data centers provide multiple parallel paths for end-to-end communications. Recent studies have been done on how to allocate rational paths for data flows to increase the throughput of data center networks. A centralized load balancing algorithm can improve the rationality of the path selection by using path bandwidth information. However, to ensure the accuracy of the information, current centralized load balancing algorithms monitor all the link bandwidth information in the path to determine the path bandwidth. Due to the excessive link bandwidth information monitored by the controller, however, much time is consumed, which is unacceptable for modern data centers. This paper proposes an algorithm called hidden Markov Model-based Load Balancing (HMMLB). HMMLB utilizes the hidden Markov Model (HMM) to select paths for data flows with fewer monitored links, less time cost, and approximate the same network throughput rate as a traditional centralized load balancing algorithm. To generate HMMLB, this research first turns the problem of path selection into an HMM problem. Secondly, deploying traditional centralized load balancing algorithms in the data center topology to collect training data. Finally, training the HMM with the collected data. Through simulation experiments, this paper verifies HMMLB’s effectiveness.
    Print ISSN: 0010-4620
    Electronic ISSN: 1460-2067
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 21
    Publication Date: 2019-12-05
    Description: Cloud adoption has significantly increased using the infrastructure-as-a-service (IaaS) paradigm, in order to meet the growing demands of computing, storage and networking, in small as well as large enterprises. Different vendors provide their customized solutions for OpenStack deployment on bare metal or virtual infrastructure. Among these many available IaaS solutions, OpenStack stands out as being an agile and open-source platform. However, its deployment procedure is a time-taking and complex process with a learning curve. This paper addresses the lack of basic infrastructure automation in almost all of the OpenStack deployment projects. We propose a flexible framework to automate the process of infrastructure bring up for deployment of several OpenStack distributions, as well as resolving dependencies for a successful deployment. Our experimental results demonstrate the effectiveness of the proposed framework in terms of automation status and deployment time, that is, reducing the time spent in preparing a basic virtual infrastructure by four times, on average.
    Print ISSN: 0010-4620
    Electronic ISSN: 1460-2067
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 22
    Publication Date: 2019-12-09
    Description: Subdivision, triangulation, Kronecker product, corona product and many other graph operations or products play an important role in complex networks. In this paper, we study the properties of $q$-subdivision graphs, which have been applied to model complex networks. For a simple connected graph $G$, its $q$-subdivision graph $S_q(G)$ is obtained from $G$ through replacing every edge $uv$ in $G$ by $q$ disjoint paths of length 2, with each path having $u$ and $v$ as its ends. We derive explicit formulas for many quantities of $S_q(G)$ in terms of those corresponding to $G$, including the eigenvalues and eigenvectors of normalized adjacency matrix, two-node hitting time, Kemeny constant, two-node resistance distance, Kirchhoff index, additive degree-Kirchhoff index and multiplicative degree-Kirchhoff index. We also study the properties of the iterated $q$-subdivision graphs, based on which we obtain the closed-form expressions for a family of hierarchical lattices, which has been used to describe scale-free fractal networks.
    Print ISSN: 0010-4620
    Electronic ISSN: 1460-2067
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 23
    Publication Date: 2019-09-10
    Description: Subversion of cryptography has received wide attentions especially after the Snowden Revelations in 2013. Most of the currently proposed subversion attacks essentially rely on the freedom of randomness choosing in the cryptographic protocol to hide backdoors embedded in the cryptosystems. Despite the fact that significant progresses in this line of research have been made, most of them mainly considered the classical setting, while the research gap regarding subversion attacks against post-quantum cryptography remains tremendous. Inspired by this observation, we investigate a subversion attack against existing protocol that is proved post-quantum secure. Particularly, we show an efficient way to undetectably subvert the well-known lattice-based encryption scheme proposed by Regev (STOC 2005). Our subversion enables the subverted algorithm to stealthily leak arbitrary messages to the outsider who knows the backdoor. Through theoretical analysis and experimental observations, we demonstrate that the subversion attack against the LWE encryption scheme is feasible and practical.
    Print ISSN: 0010-4620
    Electronic ISSN: 1460-2067
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 24
    Publication Date: 2019-01-22
    Description: This paper describes a roadmap for the development of the SP Machine, based on the SP Theory of Intelligence and its realization in the SP Computer Model. The SP Machine will be developed initially as a software virtual machine with high levels of parallel processing, hosted on a high-performance computer. The system should help users visualize knowledge structures and processing. Research is needed into how the system may discover low-level features in speech and in images. Strengths of the SP System in the processing of natural language may be augmented, in conjunction with the further development of the SP System’s strengths in unsupervised learning. Strengths of the SP System in pattern recognition may be developed for computer vision. Work is needed on the representation of numbers and the performance of arithmetic processes. A computer model is needed of SP-Neural, the version of the SP Theory expressed in terms of neurons and their interconnections. The SP Machine has potential in many areas of application, several of which may be realized on short-to-medium timescales.
    Print ISSN: 0010-4620
    Electronic ISSN: 1460-2067
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 25
    Publication Date: 2019-11-15
    Description: Efficient key revocation in Identity-based Encryption (IBE) has been a both fundamental and critical problem when deploying an IBE system in practice. Boneh and Franklin proposed the first revocable IBE (RIBE) scheme where the size of key updates is linear in the number of users. Then, Boldyreva, Goyal and Kumar proposed the first scalable RIBE by using the tree-based approach where the size of key updates is $O(rlog (N/r))$ and the size of every user’s long-term secret key is $O(log N)$ with $N$ being the number of users and $r$ the number of revoked users. Recently, Qin et al. presented the notion of server-aided RIBE where the size of every user’s long-term secret key is $O(1),$ and users do not need to communicate with Key Generator Center (KGC) during every key updates. However, users must change their identities once their secret keys are revoked as they cannot decrypt ciphertexts by using their revoked secret keys. To address the above problem, we formalize the notion of RIBE with identity reuse. In our system model, users can obtain a new secret key called the reuse secret key from KGC when their secret keys are revoked. The decryption key can be derived from the reuse secret key and new key updates while it cannot be derived from the revoked secret key and the new key updates. We present a concrete construction that is secure against adaptive-ID chosen plaintext attacks and decryption key exposure attacks under the $mathsf{ADDH}1$ and $mathsf{DDH}2$ assumptions in the standard model. Furthermore, we extend it to server-aided RIBE scheme with identity reuse property that is more suitable for lightweight devices.
    Print ISSN: 0010-4620
    Electronic ISSN: 1460-2067
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 26
    Publication Date: 2019-11-14
    Description: Diagnosability of a multiprocessor system is an important research topic. The system and an interconnection network have an underlying topology, which is usually presented by a graph. Under the Maeng and Malek's (MM) model, to diagnose the system, a node sends the same task to two of its neighbors, and then compares their responses. The MM$^{*}$ is a special case of the MM model and each node must test all pairs of its adjacent nodes. In 2009, Chiang and Tan (Using node diagnosability to determine $t$-diagnosability under the comparison diagnosis (cd) model. IEEE Trans. Comput., 58, 251–259) proposed a new viewpoint for fault diagnosis of the system, namely, the node diagnosability. As a new topology structure of interconnection networks, the nest graph $CK_{n}$ has many good properties. In this paper, we study the local diagnosability of $CK_{n}$ and show it has the strong local diagnosability property even if there exist $(frac{n(n-1)}{2}-2)$ missing edges in it under the MM$^{*}$ model, and the result is optimal with respect to the number of missing edges.
    Print ISSN: 0010-4620
    Electronic ISSN: 1460-2067
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 27
    Publication Date: 2019-11-29
    Description: The crossing resolution of a non-planar drawing of a graph is the value of the minimum angle formed by any pair of crossing edges. Recent experiments suggest that the larger the crossing resolution is, the easier it is to read and interpret a drawing of a graph. However, maximizing the crossing resolution turns out to be an NP-hard problem in general, and only heuristic algorithms are known that are mainly based on appropriately adjusting force-directed algorithms. In this paper, we propose a new heuristic algorithm for the crossing resolution maximization problem and we experimentally compare it against the known approaches from the literature. Our experimental evaluation indicates that the new heuristic produces drawings with better crossing resolution, but this comes at the cost of slightly higher edge-length ratio, especially when the input graph is large.
    Print ISSN: 0010-4620
    Electronic ISSN: 1460-2067
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 28
    Publication Date: 2019-11-22
    Description: Due to the positive impact of ride sharing on urban traffic and environment, it has attracted a lot of research attention recently. However, most existing researches focused on the profit maximization or the itinerary minimization of drivers, only rare work has covered on adjustable price function and matching algorithm for the batch requests. In this paper, we propose a request matching algorithm and an adjustable price function that benefits drivers as well as passengers. Our request-matching algorithm consists of an exact search algorithm and a group search algorithm. The exact search algorithm consists of three steps. The first step is to prune some invalid groups according to the total number of passengers and the capacity of vehicles. The second step is to filter out all candidate groups according to the compatibility of requests in same group. The third step is to obtain the most profitable group by the adjustable price function, and recommend the most profitable group to drivers. In order to enhance the efficiency of the exact search algorithm, we further design an improved group search algorithm based on the idea of original simulated annealing. Extensive experimental results show that our method can improve the income of drivers, and reduce the expense of passengers. Meanwhile, ride sharing can also keep the utilization rate of seats 80%, driving distance is reduced by 30%.
    Print ISSN: 0010-4620
    Electronic ISSN: 1460-2067
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 29
    Publication Date: 2019-05-13
    Description: Cloud Service Providers supply services to clients in terms of their demands. They need to be constantly under monitoring for their services with respect to consensus agreements between clients and service providers. A Third Party Auditor or TPA as a trusted organization appears to be necessary to monitor executing agreements of cloud services. Using a third party as an extra component creates cost overheads for clients in a cloud environment. Thus, introducing a cost efficient framework for a cloud environment which includes a third party is an eminent achievement to make a TPA feasible and practical in cloud environments. In this paper, we propose a TPA framework for monitoring service level agreements between cloud service providers and cloud clients using several cloud resources. This framework employs different types of service deployments from various cloud service providers excluding the cloud service provider which is being monitored. Then, we demonstrate that the framework can mitigate costs of a third party auditor in a cloud environment. Simulations of trends for costs exhibits cost efficiency of at least forty percent over ten years when a TPA follows our proposed framework in comparison to other frameworks. Finally, we provide an analysis to compare characteristics of our framework with other frameworks and discuss the advantages of our proposed framework. Our results indicate that TPA as a component of the framework not only reduces overall costs of its presentation in a cloud environment but additionally improves management efficiency and security.
    Print ISSN: 0010-4620
    Electronic ISSN: 1460-2067
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 30
    Publication Date: 2019-11-15
    Description: Most graphs have this property: after removing a linear number of vertices from a graph, the surviving graph is either connected or consists of a large connected component and small components containing a small number of vertices. This property can be applied to derive fault-tolerance related network parameters: extra edge connectivity and component edge connectivity. Using this general property, we obtained the $h$-extra edge connectivity and $(h+2)$-component edge connectivity of augmented cubes, Cayley graphs generated by transposition trees, complete cubic networks (including hierarchical cubic networks), generalized exchanged hypercubes (including exchanged hypercubes) and dual-cube-like graphs (including dual cubes).
    Print ISSN: 0010-4620
    Electronic ISSN: 1460-2067
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 31
    Publication Date: 2019-10-21
    Description: We propose a quantitative metric (called relative assortativity index, RAI) to assess the extent with which a real-world network would become relatively more assortative due to link addition(s) using a link prediction technique. Our methodology is as follows: for a link prediction technique applied on a particular real-world network, we keep track of the assortativity index values incurred during the sequence of link additions until there is negligible change in the assortativity index values for successive link additions. We count the number of network instances for which the assortativity index after a link addition is greater or lower than the assortativity index prior to the link addition and refer to these counts as relative assortativity count and relative dissortativity count, respectively. RAI is computed as (relative assortativity count − relative dissortativity count) / (relative assortativity count + relative dissortativity count). We analyzed a suite of 80 real-world networks across different domains using 3 representative neighborhood-based link prediction techniques (Preferential attachment, Adamic Adar and Jaccard coefficients [JACs]). We observe the RAI values for the JAC technique to be positive and larger for several real-world networks, while most of the biological networks exhibited positive RAI values for all the three techniques.
    Print ISSN: 0010-4620
    Electronic ISSN: 1460-2067
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 32
    Publication Date: 2019-05-13
    Description: This paper presents SGAC (Solution de Gestion Automatisée du Consentement / automated consent management solution), a new healthcare access control model and its support tool, which manages patient wishes regarding access to their electronic health records (EHR). This paper also presents the verification of access control policies for SGAC using two first-order-logic model checkers based on distinct technologies, Alloy and ProB. The development of SGAC has been achieved within the scope of a project with the University of Sherbrooke Hospital (CHUS), and thus has been adapted to take into account regional laws and regulations applicable in Québec and Canada, as they set bounds to patient wishes: for safety reasons, under strictly defined contexts, patient consent can be overriden to protect his/her life (break-the-glass rules). Since patient wishes and those regulations can be in conflict, SGAC provides a mechanism to address this problem based on priority, specificity and modality. In order to protect patient privacy while ensuring effective caregiving in safety-critical situations, we check four types of properties: accessibility, availability, contextuality and rule effectivity. We conducted performance tests comparison: implementation of SGAC versus an implementation of another access control model, XACML, and property verification with Alloy versus ProB. The performance results show that SGAC performs better than XACML and that ProB outperforms Alloy by two order of magnitude thanks to its programmable approach to constraint solving.
    Print ISSN: 0010-4620
    Electronic ISSN: 1460-2067
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 33
    Publication Date: 2019-05-08
    Description: Finding global optima for functions is a very important problem. Although a large number of methods have been proposed for solving this problem, more effective and efficient methods are greatly required. This paper proposes an innovative method that combines different effective techniques for speeding up the convergence to the solution and greatly improving its precision. In particular, the method uses feedback-guided random search technique to identify the promising regions of the domains and uses the biased mapping technique to focus the search on these promising regions, without ignoring the other regions of the domains. Therefore, at any point of time, the domain of each variable is entirely covered with much more emphasis on the promising regions. Experiments with our prototype implementation showed that our method is efficient, effective, and outperformed the state-of-art techniques.
    Print ISSN: 0010-4620
    Electronic ISSN: 1460-2067
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 34
    Publication Date: 2019-05-08
    Description: Spectrum-sensing as a service has been proposed and studied by many researchers over the past decade as a promising approach to support the viability of cognitive radio networks (CRNs). A spectrum-sensing service provider (SSP) provides information about spectrum occupancy to its clients that is generally more accurate than what clients can learn on their own. Two approaches are used by SSPs in their operation, the dedicated sensing infrastructure approach (sensor-aided CRN) and the crowdsensing approach. In this work, we assume a hybrid model where a dedicated sensing infrastructure is used along with crowdsensing. We study the tradeoff between sensing time paid by cognitive users to the SSP and their achievable transmission time. Our objective is to maximize the minimum achievable transmission time for any cognitive user in the network by carefully selecting the channels to be used. Two algorithms are proposed, one is based on the hill-climbing search algorithm (abbreviated HCA) and the other is a less optimal but faster greedy selection algorithm (abbreviated GSA). Results show that both HCA and GSA are within 3% of the optimal solution. Results also confirm that GSA is faster than HCA, while HCA outperforms GSA.
    Print ISSN: 0010-4620
    Electronic ISSN: 1460-2067
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 35
    Publication Date: 2019-08-15
    Description: Area detection and measuring is one of the most important problems in wireless sensor network because it mainly relates to the continuity and functionality of most routing protocols applied to the region of interest (ROI). Electronics failure, random deployment of nodes, software errors or some phenomena such as fire spreading or water flood could lead to wide death of sensor nodes. The damage on ROI can be controlled by detecting and calculating the area of the holes, resulting from the damaged sensor networks. In this paper, a new mathematical algorithm, wireless sensor hole detection algorithm (WHD), is developed to detect and calculate the holes area in ROI where the sensor nodes are spread randomly. WHD is developed for achieving quality of service in terms of power consumption and average hole detection time. The dynamic behavior of the proposed WHD depends on executing the following steps. Firstly, WHD algorithm divides down the ROI into many cells using the advantage of the grid construction to physically partition the ROI into many small individual cells. Secondly, WHD algorithm works on each cell individually by allocating the nearest three sensor nodes to each of the cell’s coordinates by comparing their positions, WHD connects each cell’s coordinate points with the selected sensor nodes by lines that construct a group of triangles, then WHD calculates the area of upcoming triangles. Repeating the previous step on all the cells, WHD can calculate and locate each hole in the ROI. The performance evaluation depends on the NS-2 simulator as a simulation technique to study and analyze the performance of WHD algorithm. Results show that WHD outperforms, in terms of average energy consumption and average hole discovery time, path density algorithm, novel coverage hole discovery algorithm and distriputed coverage hole Detection.
    Print ISSN: 0010-4620
    Electronic ISSN: 1460-2067
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 36
    Publication Date: 2019-02-14
    Description: Due to the strong security and high performance of the AES block cipher, many hash functions take AES-like structures as building blocks. To evaluate the security of these AES-like structures against differential cryptanalysis, giving the lower bounds on the number of active S-boxes in a differential trail, is an important perspective. However, the original ‘wide-trail strategy’ for AES becomes less effective to get tight bounds for these AES-like structures, because of the different state dimensions (M×M2, instead of M×M) and different round functions from AES. In this paper, we focus on a kind of AES-like structure with state dimensions M×M2, diffusion-optimal permutations and MixColumns transformations using MDS matrices. Inspired by the ‘wide-trail strategy’, we propose a theoretical method to count active S-boxes, by which we prove that there are at least rBd(Bd−1) active S-boxes in any 2r(r≥3) rounds of such an AES-like structure, where Bd is the differential branch number of the MixColumns transformation and equals to M+1. What’s more, this lower bound can be achieved by some diffusion layers. As examples, we apply our method to the LANE hash function and 3D block cipher, optimal lower bounds are both got.
    Print ISSN: 0010-4620
    Electronic ISSN: 1460-2067
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 37
    Publication Date: 2019-02-08
    Print ISSN: 0010-4620
    Electronic ISSN: 1460-2067
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 38
    Publication Date: 2019-02-14
    Print ISSN: 0010-4620
    Electronic ISSN: 1460-2067
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 39
    Publication Date: 2019-02-08
    Print ISSN: 0010-4620
    Electronic ISSN: 1460-2067
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 40
    Publication Date: 2019-02-12
    Print ISSN: 0010-4620
    Electronic ISSN: 1460-2067
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 41
    Publication Date: 2019-01-11
    Print ISSN: 0010-4620
    Electronic ISSN: 1460-2067
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 42
    Publication Date: 2019-01-11
    Print ISSN: 0010-4620
    Electronic ISSN: 1460-2067
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 43
    Publication Date: 2019-02-14
    Description: Presently, software industry is severely suffering from inaccurate effort estimation and inadequate unstructured or semi-structured project history management. In fact, both are difficult to accomplish and hence badly impact the software projects. We proposed improvements in the effort estimation and the project history management of e-commerce projects focusing on Extreme Programing (XP) and Scrum methodologies using ontology models in our software effort estimation system. Proposed system infers suitable estimate in the form of time, resources and lessons learnt as per the project leader’s requirements by using description logic and HermiT reasoner. To validate our approach, we have performed a case study comprising 20 Business-to-Consumer (B2C) web projects and performed comparative analysis on the collected efforts in both XP and Scrum contexts by applying (Mean Magnitude of Relative Error) MMRE and PRED(25) prediction accuracy measures. Likewise, software functional size of understudy e-commerce projects was measured using COSMIC functional size measurement methodology. Regression analysis of relations among actual COSMIC function points, estimated effort, and actual effort spent for the projects show better significance-F and R2 values for our approach. The comparative results show that overall proposed approach provides accurate estimates and significantly improves over planning poker and delphi methods by 10% and 30%, respectively.
    Print ISSN: 0010-4620
    Electronic ISSN: 1460-2067
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 44
    Publication Date: 2019-01-22
    Description: Web Browser Fingerprinting is a process in which the users are, with high likelihood, uniquely identified by the extracted features from their devices, generating an identifier key (fingerprint). Although it can be used for malicious purposes, especially regarding privacy invasion, Web Browser Fingerprinting can also be used to enhance security (e.g. as a factor in two-factor authentication). This paper investigates the use of Web Audio API as a Web Browser Fingerprinting method capable of identifying the devices. The idea is to prove or not if audio can provide features capable to identify users and devices. Our initial results show that the proposed method is capable of identifying the device’s class, based on features like device’s type, web browser’s version and rendering engine.
    Print ISSN: 0010-4620
    Electronic ISSN: 1460-2067
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 45
    Publication Date: 2019-01-11
    Print ISSN: 0010-4620
    Electronic ISSN: 1460-2067
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 46
    Publication Date: 2019-11-17
    Description: Although analyzing and mining user’s trajectory data can provide outstanding benefit, data owners may not be willing to upload their trajectory data because of privacy concerns. Recently, differential privacy technology has achieved a good trade-off between data utility and privacy preserving by publishing noisy outputs, and relevant schemes have been proposed for trajectory release. However, we experimentally find that a relatively accurate estimate of the true data value can still be obtained from the noisy outputs by means of a posterior estimation. But there are no practical mechanisms against current schemes to verify their effectiveness and resistance. To fill this gap, we propose a solution to evaluate the resistance performance of differential privacy on trajectory data release, including a notion of correlation-distinguishability filtering (CDF) and a privacy quantification measurement. Specifically, taking advantage of the principle of filtering that independent noise can be filtered out from correlated sequence, CDF is proposed to sanitize the noise added into the trajectory. To conduct this notion in practice, we attempt to apply a Kalman/particle filter to filter out the corresponding Gaussian/Laplace noise added by differential privacy schemes. Furthermore, to quantify the distortion of privacy strength before and after filtering, an entropy-based privacy quantification metric is proposed, which is used to measure the lost uncertainty of the true locations for an adversary. Experimental results show that the resistance performance of current approaches has a degradation to varying degrees under the filtering attack model in our solution. Moreover, the privacy quantification metric can be regarded as a unified criterion to measure the privacy strength introduced by the noise that does not conform to the form required by differential privacy.
    Print ISSN: 0010-4620
    Electronic ISSN: 1460-2067
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 47
    Publication Date: 2019-11-13
    Description: Recently, much attention has been focused on designing provably secure cryptographic primitives in the presence of key leakage, even the continuous leakage attacks. However, several constructions on the (continuous) leakage-resilient certificate-based encryption (CBE) scheme were proposed based on the bilinear pairings, and the corresponding computational efficiency is lower. Also, the leakage on the master secret key is omitted in the previous constructions. In this paper, to further achieve the better performance, a new construction method of continuous leakage-resilient CBE scheme without bilinear pairings is proposed, and the chosen-ciphertext attacks security of designed scheme is proved based on the hardness of the classic decisional Diffie–Hellman assumption. The performance analysis shows that our method not only can obtain higher computational efficiency but also enjoys better security performances, such as the leakage parameter of secret key of user has the constant size, and an adversary cannot obtain any leakage on the secret key of user from the corresponding given ciphertext etc. The advantage is that our proposal allows leakage attacks of multiple keys, i.e. continuous leakage resilience of the secret key of user and bounded leakage resilience of the master secret key. Additionally, to provide the leakage resilience for the cloud computing, a novel data access control scheme for cloud storage service is proposed from our continuous leakage-resilient CBE scheme, which can keep its claimed security in the leakage seting.
    Print ISSN: 0010-4620
    Electronic ISSN: 1460-2067
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 48
    Publication Date: 2019-11-17
    Description: Effective protection against cyber-attacks requires constant monitoring and analysis of system data in an IT infrastructure, such as log files and network packets, which may contain private and sensitive information. Security operation centers (SOC), which are established to detect, analyze and respond to cyber-security incidents, often utilize detection models either for known types of attacks or for anomaly and applies them to the system data for detection. SOC are also motivated to keep their models private to capitalize on the models that are their propriety expertise, and to protect their detection strategies against adversarial machine learning. In this paper, we develop a protocol for privately evaluating detection models on the system data, in which privacy of both the system data and detection models is protected and information leakage is either prevented altogether or quantifiably decreased. Our main approach is to provide an end-to-end encryption for the system data and detection models utilizing lattice-based cryptography that allows homomorphic operations over ciphertext. We employ recent data sets in our experiments which demonstrate that the proposed privacy-preserving intrusion detection system is feasible in terms of execution times and bandwidth requirements and reliable in terms of accuracy.
    Print ISSN: 0010-4620
    Electronic ISSN: 1460-2067
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 49
    Publication Date: 2019-04-27
    Description: Twitter is an online micro-blogging platform through which one can explore the hidden valuable and delightful information about the current context at any point of time, which also serves as a data source to carry out sentiment analysis. In this paper, the sentiments of large amount of tweets generated from Twitter in the form of big data have been analyzed using machine learning algorithms. A multi-tier architecture for sentiment classification is proposed in this paper, which includes modules such as tokenization, data cleaning, preprocessing, stemming, updated lexicon, stopwords and emoticon dictionaries, feature selection and machine learning classifier. Unigram and bigrams have been used as feature extractors together with χ2 (Chi-squared) and Singular Value Decomposition for dimensionality reduction together with two model types (Binary and Reg), with four types of scaling methods (No scaling, Standard, Signed and Unsigned) and represented them in three different vector formats (TF-IDF, Binary and Int). Accuracy is considered as the evaluation standard for random forest and bagged trees classification methods. Sentiments were analyzed through tokenization and having several stages of pre-processing and several combinations of feature vectors and classification methods. Through which it was possible to achieve an accuracy of 84.14%. Obtained results conclude that, the proposed scheme gives a better accuracy when compared with existing schemes in the literature.
    Print ISSN: 0010-4620
    Electronic ISSN: 1460-2067
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 50
    Publication Date: 2019-05-24
    Description: Breast cancer survivability has always been an important and challenging issue for researchers. Different methods have been utilized mostly based on machine learning techniques for prediction of survivability among cancer patients. The most comprehensive available database of cancer incidence is SEER in the United States, which has been frequently used for different research purposes. In this paper, a new data mining has been performed on the SEER database in order to investigate the ability of machine learning techniques for survivability prediction of breast cancer patients. To this end, the data related to breast cancer incidence have been preprocessed to remove unusable records from the dataset. In sequel, two machine learning techniques were developed based on the Multi-Layer Perceptron (MLP) learner machine including MLP stacked generalization and mixture of MLP-experts to make predictions over the database. The machines have been evaluated using K-fold cross-validation technique. The evaluation of the predictors revealed an accuracy of 84.32% and 83.86% by the mixture of MLP-experts and MLP stacked generalization methods, respectively. This indicates that the predictors can be significantly used for survivability prediction suggesting time- and cost-effective treatment for breast cancer patients.
    Print ISSN: 0010-4620
    Electronic ISSN: 1460-2067
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 51
    Publication Date: 2019-04-16
    Description: Emotion recognition is a key work of research area in brain computer interactions. With the increasing concerns about affective computing, emotion recognition has attracted more and more attention in the past decades. Focusing on geometric positions of key parts of the face and well detecting them is the best way to increase accuracy of emotion recognition systems and reach high classification rates. In this paper, we propose a hybrid system based on wavelet networks using 1D Fast Wavelet Transform. This system combines two approaches: the biometric distances approach where we propose a new technique to locate feature points and the wrinkles approach where we propose a new method to locate the wrinkles regions in the face. The classification rates given by experimental results show the effectiveness of our proposed approach compared to other methods.
    Print ISSN: 0010-4620
    Electronic ISSN: 1460-2067
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 52
    Publication Date: 2019-06-04
    Description: Microblog is a popular social network in which hot topics propagate online rapidly. Real-time topic detection can not only understand public opinion well but also bring high commercial value. We design a method for real-time microblog data analysis in order to detect popular long lasting events as well as emerging events. Firstly, a mining frequent items algorithm on microblog data stream is proposed to count approximate word frequency. This mining frequent items algorithm can find the frequent words for some time. Secondly, the windows size of the monitored words is adjusted dynamically according to the duration time and the evolution of events. Lastly, new topics and trends of existing topics can be detected by using dynamic clustering algorithm based on vector space model. Experimental results show that the proposed algorithms can improve performance in terms of running time and accuracy.
    Print ISSN: 0010-4620
    Electronic ISSN: 1460-2067
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 53
    Publication Date: 2019-04-26
    Description: Automated classification of magnetic resonance brain images (MRIs) is a hot topic in the field of medical and biomedical imaging. Various methods have been suggested recently to improve this technology. In this paper, to reduce the complexity involved in the medical images and to ameliorate the classification of MRIs, a novel 3D magnetic resonance (MR) brain image classifier using kernel principal component analysis (KPCA) and support vector machines (SVMs) is proposed. Experiments are carried out using A deep multiple kernel SVM (DMK-SVM) and a regular SVM. An algorithm entitled SVM–KPCA is put forward. Its main task is to classify a brain MRI as a normal brain image or as a pathological brain image. This algorithm, firstly, adopts the discrete wavelet transform technique to extract features from images. Secondly, KPCA is applied to decrease the dimensionality of features. SVM is then applied to the reduced data. A K-fold cross-validation strategy is used to avoid overfitting and to ameliorate the generalization of the SVM–KPCA algorithm. Three databases are used to validate the suggested SVM–KPCA method. Three conclusions are obtained from this work. First, KPCA is highly efficient in increasing the classifier’s performance compared with similar algorithms working on the proposed database. Second, the SVM–KPCA algorithm performs well in differentiating between two classes of medical images. Third, the approach is robust and might be utilized for other MRIs. This proposes a significant role for computer aided diagnosis analysis systems used for clinical practice.
    Print ISSN: 0010-4620
    Electronic ISSN: 1460-2067
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 54
    Publication Date: 2019-05-04
    Description: Despite the development of two- and three-dimensional (2D&3D) technology, it has attracted the attention of researchers in recent years. This research is done to reveal the detailed effects of 2D in comparison with 3D technology on the human brain waves. The impact of 2D&3D video watching using electroencephalography (EEG) brain signals is studied. A group of eight healthy volunteers with the average age of 31 ± 3.06 years old participated in this three-stage test. EEG signal recording consisted of three stages: After a bit of relaxation (a), a 2D video was displayed (b), the recording of the signal continued for a short period of time as rest (c), and finally the trial ended. Exactly the same steps were repeated for the 3D video. Power spectrum density (PSD) based on short time Fourier transform (STFT) was used to analyze the brain signals of 2D&3D video viewers. After testing all the EEG frequency bands, delta and theta were extracted as the features. Partial least squares regression (PLSR) and Support vector machine (SVM) classification algorithms were considered in order to classify EEG signals obtained as the result of 2D&3D video watching. Successful classification results were obtained by selecting the correct combinations of effective channels representing the brain regions.
    Print ISSN: 0010-4620
    Electronic ISSN: 1460-2067
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 55
    Publication Date: 2019-11-26
    Description: Blind signatures are an important and useful tool in designing digital cash schemes and electronic voting protocols. Ring signatures on the other hand provide the anonymity of the signer within the ring of users. In order to fit to some real-life applications, it is useful to combine both protocols to create a blind ring signature scheme, which utilizes all of their features. In this paper, we propose, for the first time, a post-quantum blind ring signature scheme. Our scheme is constructed based on multivariate public key cryptography, which is one of the main candidates for post-quantum cryptography.
    Print ISSN: 0010-4620
    Electronic ISSN: 1460-2067
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 56
    Publication Date: 2019-01-04
    Print ISSN: 0010-4620
    Electronic ISSN: 1460-2067
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 57
    Publication Date: 2019-03-16
    Print ISSN: 0010-4620
    Electronic ISSN: 1460-2067
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 58
    Publication Date: 2019-11-18
    Description: With the development of Lie theory, Lie groups have attained profound significance in several branches of Mathematics and Physics. In Lie theory, the matrix exponential plays a crucial role between Lie groups and Lie algebras. Meanwhile, as the finite analogue of Lie groups, finite groups of Lie type have potential applications in cryptography due to their unique mathematical structures. In this paper, we first put forward a novel idea of designing cryptosystems based on Lie theory. First of all, combing with discrete logarithm problem and group factorization problem, we proposed several new intractable assumptions based on the matrix exponential in finite groups of Lie type. Subsequently, in analog with Boyen’s scheme (Asiacrypt 2007), we designed a public-key encryption scheme based on the non-abelian factorization problem in finite groups of Lie type. Finally, our proposal was proved to be indistinguishable against adaptively chosen-ciphertext attack in the random oracle model. It is encouraging that our scheme also has the potential to resist against Shor’s quantum algorithm attack.
    Print ISSN: 0010-4620
    Electronic ISSN: 1460-2067
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 59
    Publication Date: 2019-11-17
    Description: Suicide is a major health issue nowadays and has become one of the highest reason for deaths. There are many negative emotions like anxiety, depression, stress that can lead to suicide. By identifying the individuals having suicidal ideation beforehand, the risk of them completing suicide can be reduced. Social media is increasingly becoming a powerful platform where people around the world are sharing emotions and thoughts. Moreover, this platform in some way is working as a catalyst for invoking and inciting the suicidal ideation. The objective of this proposal is to use social media as a tool that can aid in preventing the same. Data is collected from Twitter, a social networking site using some features that are related to suicidal ideation. The tweets are preprocessed as per the semantics of the identified features and then it is converted into probabilistic values so that it will be suitably used by machine learning and ensemble learning algorithms. Different machine learning algorithms like Bernoulli Naïve Bayes, Multinomial Naïve Bayes, Decision Tree, Logistic Regression, Support Vector Machine were applied on the data to predict and identify trends of suicidal ideation. Further the proposed work is evaluated with some ensemble approaches like Random Forest, AdaBoost, Voting Ensemble to see the improvement.
    Print ISSN: 0010-4620
    Electronic ISSN: 1460-2067
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 60
    Publication Date: 2019-05-27
    Description: Summary We describe a novel computational method for genotyping repeats using sequence graphs. This method addresses the long-standing need to accurately genotype medically important loci containing repeats adjacent to other variants or imperfect DNA repeats such as polyalanine repeats. Here we introduce a new version of our repeat genotyping software, ExpansionHunter, that uses this method to perform targeted genotyping of a broad class of such loci. Availability and implementation ExpansionHunter is implemented in C++ and is available under the Apache License Version 2.0. The source code, documentation, and Linux/macOS binaries are available at https://github.com/Illumina/ExpansionHunter/. Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 61
    Publication Date: 2019-12-10
    Description: Summary Subpathways, which are defined as local gene subregions within a biological pathway, have been reported to be associated with the occurrence and development of cancer. The recent subpathway identification tools generally identify differentially expressed subpathways between normal and cancer samples. psSubpathway is a novel systems biology R-based software package that enables flexible identification of phenotype-specific subpathways in a cancer dataset with multiple categories (such as multiple subtypes and developmental stages of cancer). The operation modes include extraction of subpathways from pathway networks, inference with subpathway activities in the context of gene expression data, identification of subtype-specific subpathways, identification of dynamic-changed subpathways associated with the cancer developmental stage and visualization of subpathway activities of samples in different phenotypes. Its capabilities enable psSubpathway to find specific abnormal subpathways in the datasets with multi-phenotype categories and to fill the gaps in the recent tools. psSubpathway may identify more specific biomarkers to facilitate the development of tailored treatment for patients with cancer. Availability and implementation The package is implemented in R and available under GPL-2 license from the CRAN website (https://cran.r-project.org/web/packages/psSubpathway/). Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 62
    Publication Date: 2019-01-21
    Description: Motivation Structural variants are defined as genomic variants larger than 50 bp. They have been shown to affect more bases in any given genome than single-nucleotide polymorphisms or small insertions and deletions. Additionally, they have great impact on human phenotype and diversity and have been linked to numerous diseases. Due to their size and association with repeats, they are difficult to detect by shotgun sequencing, especially when based on short reads. Long read, single-molecule sequencing technologies like those offered by Pacific Biosciences or Oxford Nanopore Technologies produce reads with a length of several thousand base pairs. Despite the higher error rate and sequencing cost, long-read sequencing offers many advantages for the detection of structural variants. Yet, available software tools still do not fully exploit the possibilities. Results We present SVIM, a tool for the sensitive detection and precise characterization of structural variants from long-read data. SVIM consists of three components for the collection, clustering and combination of structural variant signatures from read alignments. It discriminates five different variant classes including similar types, such as tandem and interspersed duplications and novel element insertions. SVIM is unique in its capability of extracting both the genomic origin and destination of duplications. It compares favorably with existing tools in evaluations on simulated data and real datasets from Pacific Biosciences and Nanopore sequencing machines. Availability and implementation The source code and executables of SVIM are available on Github: github.com/eldariont/svim. SVIM has been implemented in Python 3 and published on bioconda and the Python Package Index. Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 63
    Publication Date: 2019-09-10
    Description: Motivation Biomedical text mining is becoming increasingly important as the number of biomedical documents rapidly grows. With the progress in natural language processing (NLP), extracting valuable information from biomedical literature has gained popularity among researchers, and deep learning has boosted the development of effective biomedical text mining models. However, directly applying the advancements in NLP to biomedical text mining often yields unsatisfactory results due to a word distribution shift from general domain corpora to biomedical corpora. In this article, we investigate how the recently introduced pre-trained language model BERT can be adapted for biomedical corpora. Results We introduce BioBERT (Bidirectional Encoder Representations from Transformers for Biomedical Text Mining), which is a domain-specific language representation model pre-trained on large-scale biomedical corpora. With almost the same architecture across tasks, BioBERT largely outperforms BERT and previous state-of-the-art models in a variety of biomedical text mining tasks when pre-trained on biomedical corpora. While BERT obtains performance comparable to that of previous state-of-the-art models, BioBERT significantly outperforms them on the following three representative biomedical text mining tasks: biomedical named entity recognition (0.62% F1 score improvement), biomedical relation extraction (2.80% F1 score improvement) and biomedical question answering (12.24% MRR improvement). Our analysis results show that pre-training BERT on biomedical corpora helps it to understand complex biomedical texts. Availability and implementation We make the pre-trained weights of BioBERT freely available at https://github.com/naver/biobert-pretrained, and the source code for fine-tuning BioBERT available at https://github.com/dmis-lab/biobert.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 64
    Publication Date: 2019-07-01
    Description: Motivation The composition and density of immune cells in the tumor microenvironment (TME) profoundly influence tumor progression and success of anti-cancer therapies. Flow cytometry, immunohistochemistry staining or single-cell sequencing are often unavailable such that we rely on computational methods to estimate the immune-cell composition from bulk RNA-sequencing (RNA-seq) data. Various methods have been proposed recently, yet their capabilities and limitations have not been evaluated systematically. A general guideline leading the research community through cell type deconvolution is missing. Results We developed a systematic approach for benchmarking such computational methods and assessed the accuracy of tools at estimating nine different immune- and stromal cells from bulk RNA-seq samples. We used a single-cell RNA-seq dataset of ∼11 000 cells from the TME to simulate bulk samples of known cell type proportions, and validated the results using independent, publicly available gold-standard estimates. This allowed us to analyze and condense the results of more than a hundred thousand predictions to provide an exhaustive evaluation across seven computational methods over nine cell types and ∼1800 samples from five simulated and real-world datasets. We demonstrate that computational deconvolution performs at high accuracy for well-defined cell-type signatures and propose how fuzzy cell-type signatures can be improved. We suggest that future efforts should be dedicated to refining cell population definitions and finding reliable signatures. Availability and implementation A snakemake pipeline to reproduce the benchmark is available at https://github.com/grst/immune_deconvolution_benchmark. An R package allows the community to perform integrated deconvolution using different methods (https://grst.github.io/immunedeconv). Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 65
    Publication Date: 2019-12-16
    Description: Motivation High throughput next generation sequencing (NGS) has become exceedingly cheap, facilitating studies to be undertaken containing large sample numbers. Quality control (QC) is an essential stage during analytic pipelines and the outputs of popular bioinformatics tools such as FastQC and Picard can provide information on individual samples. Although these tools provide considerable power when carrying out QC, large sample numbers can make inspection of all samples and identification of systemic bias a challenge. Results We present ngsReports, an R package designed for the management and visualization of NGS reports from within an R environment. The available methods allow direct import into R of FastQC reports along with outputs from other tools. Visualization can be carried out across many samples using default, highly customizable plots with options to perform hierarchical clustering to quickly identify outlier libraries. Moreover, these can be displayed in an interactive shiny app or HTML report for ease of analysis. Availability and implementation The ngsReports package is available on Bioconductor and the GUI shiny app is available at https://github.com/UofABioinformaticsHub/shinyNgsreports. Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 66
    Publication Date: 2019-08-29
    Description: Motivation Epidemiologic, clinical and translational studies are increasingly generating multiplatform omics data. Methods that can integrate across multiple high-dimensional data types while accounting for differential patterns are critical for uncovering novel associations and underlying relevant subgroups. Results We propose an integrative model to estimate latent unknown clusters (LUCID) aiming to both distinguish unique genomic, exposure and informative biomarkers/omic effects while jointly estimating subgroups relevant to the outcome of interest. Simulation studies indicate that we can obtain consistent estimates reflective of the true simulated values, accurately estimate subgroups and recapitulate subgroup-specific effects. We also demonstrate the use of the integrated model for future prediction of risk subgroups and phenotypes. We apply this approach to two real data applications to highlight the integration of genomic, exposure and metabolomic data. Availability and Implementation The LUCID method is implemented through the LUCIDus R package available on CRAN (https://CRAN.R-project.org/package=LUCIDus). Supplementary information Supplementary materials are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 67
    Publication Date: 2019-04-16
    Description: Motivation A cancer genome includes many mutations derived from various mutagens and mutational processes, leading to specific mutation patterns. It is known that each mutational process leads to characteristic mutations, and when a mutational process has preferences for mutations, this situation is called a ‘mutation signature.’ Identification of mutation signatures is an important task for elucidation of carcinogenic mechanisms. In previous studies, analyses with statistical approaches (e.g. non-negative matrix factorization and latent Dirichlet allocation) revealed a number of mutation signatures. Nonetheless, strictly speaking, these existing approaches employ an ad hoc method or incorrect approximation to estimate the number of mutation signatures, and the whole picture of mutation signatures is unclear. Results In this study, we present a novel method for estimating the number of mutation signatures—latent Dirichlet allocation with variational Bayes inference (VB-LDA)—where variational lower bounds are utilized for finding a plausible number of mutation patterns. In addition, we performed cluster analyses for estimated mutation signatures to extract novel mutation signatures that appear in multiple primary lesions. In a simulation with artificial data, we confirmed that our method estimated the correct number of mutation signatures. Furthermore, applying our method in combination with clustering procedures for real mutation data revealed many interesting mutation signatures that have not been previously reported. Availability and implementation All the predicted mutation signatures with clustering results are freely available at http://www.f.waseda.jp/mhamada/MS/index.html. All the C++ source code and python scripts utilized in this study can be downloaded on the Internet (https://github.com/qkirikigaku/MS_LDA). Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 68
    Publication Date: 2019-07-01
    Description: Motivation There exist several large genomic and metagenomic data collection efforts, including GenomeTrakr and MetaSub, which are routinely updated with new data. To analyze such datasets, memory-efficient methods to construct and store the colored de Bruijn graph were developed. Yet, a problem that has not been considered is constructing the colored de Bruijn graph in a scalable manner that allows new data to be added without reconstruction. This problem is important for large public datasets as scalability is needed but also the ability to update the construction is also needed. Results We create a method for constructing the colored de Bruijn graph for large datasets that is based on partitioning the data into smaller datasets, building the colored de Bruijn graph using a FM-index based representation, and succinctly merging these representations to build a single graph. The last step, merging succinctly, is the algorithmic challenge which we solve in this article. We refer to the resulting method as VariMerge. This construction method also allows the graph to be updated with new data. We validate our approach and show it produces a three-fold reduction in working space when constructing a colored de Bruijn graph for 8000 strains. Lastly, we compare VariMerge to other competing methods—including Vari, Rainbowfish, Mantis, Bloom Filter Trie, the method of Almodaresi et al. and Multi-BRWT—and illustrate that VariMerge is the only method that is capable of building the colored de Bruijn graph for 16 000 strains in a manner that allows it to be updated. Competing methods either did not scale to this large of a dataset or do not allow for additions without reconstruction. Availability and implementation VariMerge is available at https://github.com/cosmo-team/cosmo/tree/VARI-merge under GPLv3 license. Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 69
    Publication Date: 2019-07-05
    Description: Motivation Protein structure refinement is an important step of protein structure prediction. Existing approaches have generally used a single scoring function combined with Monte Carlo method or Molecular Dynamics algorithm. The one-dimension optimization of a single energy function may take the structure too far away without a constraint. The basic motivation of our study is to reduce the bias problem caused by minimizing only a single energy function due to the very diversity of different protein structures. Results We report a new Artificial Intelligence-based protein structure Refinement method called AIR. Its fundamental idea is to use multiple energy functions as multi-objectives in an effort to correct the potential inaccuracy from a single function. A multi-objective particle swarm optimization algorithm-based structure refinement is designed, where each structure is considered as a particle in the protocol. With the refinement iterations, the particles move around. The quality of particles in each iteration is evaluated by three energy functions, and the non-dominated particles are put into a set called Pareto set. After enough iteration times, particles from the Pareto set are screened and part of the top solutions are outputted as the final refined structures. The multi-objective energy function optimization strategy designed in the AIR protocol provides a different constraint view of the structure, by extending the one-dimension optimization to a new three-dimension space optimization driven by the multi-objective particle swarm optimization engine. Experimental results on CASP11, CASP12 refinement targets and blind tests in CASP 13 turn to be promising. Availability and implementation The AIR is available online at: www.csbio.sjtu.edu.cn/bioinf/AIR/. Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 70
    Publication Date: 2019-06-19
    Description: Motivation Electronic health records (EHRs) are quickly becoming omnipresent in healthcare, but interoperability issues and technical demands limit their use for biomedical and clinical research. Interactive and flexible software that interfaces directly with EHR data structured around a common data model (CDM) could accelerate more EHR-based research by making the data more accessible to researchers who lack computational expertise and/or domain knowledge. Results We present PatientExploreR, an extensible application built on the R/Shiny framework that interfaces with a relational database of EHR data in the Observational Medical Outcomes Partnership CDM format. PatientExploreR produces patient-level interactive and dynamic reports and facilitates visualization of clinical data without any programming required. It allows researchers to easily construct and export patient cohorts from the EHR for analysis with other software. This application could enable easier exploration of patient-level data for physicians and researchers. PatientExploreR can incorporate EHR data from any institution that employs the CDM for users with approved access. The software code is free and open source under the MIT license, enabling institutions to install and users to expand and modify the application for their own purposes. Availability and implementation PatientExploreR can be freely obtained from GitHub: https://github.com/BenGlicksberg/PatientExploreR. We provide instructions for how researchers with approved access to their institutional EHR can use this package. We also release an open sandbox server of synthesized patient data for users without EHR access to explore: http://patientexplorer.ucsf.edu. Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 71
    Publication Date: 2019-04-30
    Description: Motivation Methods for reconstructing developmental trajectories from time-series single-cell RNA-Seq (scRNA-Seq) data can be largely divided into two categories. The first, often referred to as pseudotime ordering methods are deterministic and rely on dimensionality reduction followed by an ordering step. The second learns a probabilistic branching model to represent the developmental process. While both types have been successful, each suffers from shortcomings that can impact their accuracy. Results We developed a new method based on continuous-state HMMs (CSHMMs) for representing and modeling time-series scRNA-Seq data. We define the CSHMM model and provide efficient learning and inference algorithms which allow the method to determine both the structure of the branching process and the assignment of cells to these branches. Analyzing several developmental single-cell datasets, we show that the CSHMM method accurately infers branching topology and correctly and continuously assign cells to paths, improving upon prior methods proposed for this task. Analysis of genes based on the continuous cell assignment identifies known and novel markers for different cell types. Availability and implementation Software and Supporting website: www.andrew.cmu.edu/user/chiehl1/CSHMM/ Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 72
    Publication Date: 2019-12-03
    Description: Motivation Emerging evidence indicates that circular RNA (circRNA) plays a crucial role in human disease. Using circRNA as biomarker gives rise to a new perspective regarding our diagnosing of diseases and understanding of disease pathogenesis. However, detection of circRNA–disease associations by biological experiments alone is often blind, limited to small scale, high cost and time consuming. Therefore, there is an urgent need for reliable computational methods to rapidly infer the potential circRNA–disease associations on a large scale and to provide the most promising candidates for biological experiments. Results In this article, we propose an efficient computational method based on multi-source information combined with deep convolutional neural network (CNN) to predict circRNA–disease associations. The method first fuses multi-source information including disease semantic similarity, disease Gaussian interaction profile kernel similarity and circRNA Gaussian interaction profile kernel similarity, and then extracts its hidden deep feature through the CNN and finally sends them to the extreme learning machine classifier for prediction. The 5-fold cross-validation results show that the proposed method achieves 87.21% prediction accuracy with 88.50% sensitivity at the area under the curve of 86.67% on the CIRCR2Disease dataset. In comparison with the state-of-the-art SVM classifier and other feature extraction methods on the same dataset, the proposed model achieves the best results. In addition, we also obtained experimental support for prediction results by searching published literature. As a result, 7 of the top 15 circRNA–disease pairs with the highest scores were confirmed by literature. These results demonstrate that the proposed model is a suitable method for predicting circRNA–disease associations and can provide reliable candidates for biological experiments. Availability and implementation The source code and datasets explored in this work are available at https://github.com/look0012/circRNA-Disease-association. Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 73
    Publication Date: 2019-11-28
    Description: Motivation Although long-read sequencing technologies can produce genomes with long contiguity, they suffer from high error rates. Thus, we developed NextPolish, a tool that efficiently corrects sequence errors in genomes assembled with long reads. This new tool consists of two interlinked modules that are designed to score and count K-mers from high quality short reads, and to polish genome assemblies containing large numbers of base errors. Results When evaluated for the speed and efficiency using human and a plant (Arabidopsis thaliana) genomes, NextPolish outperformed Pilon by correcting sequence errors faster, and with a higher correction accuracy. Availability and implementation NextPolish is implemented in C and Python. The source code is available from https://github.com/Nextomics/NextPolish. Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 74
    Publication Date: 2019-03-01
    Description: Summary Translational models that utilize omics data generated in in vitro studies to predict the drug efficacy of anti-cancer compounds in patients are highly distinct, which complicates the benchmarking process for new computational approaches. In reaction to this, we introduce the uniFied translatiOnal dRug rESponsE prEdiction platform FORESEE, an open-source R-package. FORESEE not only provides a uniform data format for public cell line and patient datasets, but also establishes a standardized environment for drug response prediction pipelines, incorporating various state-of-the-art pre-processing methods, model training algorithms and validation techniques. The modular implementation of individual elements of the pipeline facilitates a straightforward development of combinatorial models, which can be used to re-evaluate and improve already existing pipelines as well as to develop new ones. Availability and implementation FORESEE is licensed under GNU General Public License v3.0 and available at https://github.com/JRC-COMBINE/FORESEE and https://doi.org/10.17605/OSF.IO/RF6QK, and provides vignettes for documentation and application both online and in the Supplementary Files 2 and 3. Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 75
    Publication Date: 2019-08-13
    Description: Motivation The identification of enhancer–promoter interactions (EPIs), especially condition-specific ones, is important for the study of gene transcriptional regulation. Existing experimental approaches for EPI identification are still expensive, and available computational methods either do not consider or have low performance in predicting condition-specific EPIs. Results We developed a novel computational method called EPIP to reliably predict EPIs, especially condition-specific ones. EPIP is capable of predicting interactions in samples with limited data as well as in samples with abundant data. Tested on more than eight cell lines, EPIP reliably identifies EPIs, with an average area under the receiver operating characteristic curve of 0.95 and an average area under the precision–recall curve of 0.73. Tested on condition-specific EPIPs, EPIP correctly identified 99.26% of them. Compared with two recently developed methods, EPIP outperforms them with a better accuracy. Availability and implementation The EPIP tool is freely available at http://www.cs.ucf.edu/˜xiaoman/EPIP/. Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 76
    Publication Date: 2019-05-09
    Description: Motivation One of the most successful methods for predicting the properties of chemical compounds is the quantitative structure–activity relationship (QSAR) methods. The prediction accuracy of QSAR models has recently been greatly improved by employing deep learning technology. Especially, newly developed molecular featurizers based on graph convolution operations on molecular graphs significantly outperform the conventional extended connectivity fingerprints (ECFP) feature in both classification and regression tasks, indicating that it is critical to develop more effective new featurizers to fully realize the power of deep learning techniques. Motivated by the fact that there is a clear analogy between chemical compounds and natural languages, this work develops a new molecular featurizer, FP2VEC, which represents a chemical compound as a set of trainable embedding vectors. Results To implement and test our new featurizer, we build a QSAR model using a simple convolutional neural network (CNN) architecture that has been successfully used for natural language processing tasks such as sentence classification task. By testing our new method on several benchmark datasets, we demonstrate that the combination of FP2VEC and CNN model can achieve competitive results in many QSAR tasks, especially in classification tasks. We also demonstrate that the FP2VEC model is especially effective for multitask learning. Availability and implementation FP2VEC is available from https://github.com/wsjeon92/FP2VEC. Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 77
    Publication Date: 2019-07-02
    Description: Motivation Gene regulatory networks (GRNs) of the same organism can be different under different conditions, although the overall network structure may be similar. Understanding the difference in GRNs under different conditions is important to understand condition-specific gene regulation. When gene expression and other relevant data under two different conditions are available, they can be used by an existing network inference algorithm to estimate two GRNs separately, and then to identify the difference between the two GRNs. However, such an approach does not exploit the similarity in two GRNs, and may sacrifice inference accuracy. Results In this paper, we model GRNs with the structural equation model (SEM) that can integrate gene expression and genetic perturbation data, and develop an algorithm named fused sparse SEM (FSSEM), to jointly infer GRNs under two conditions, and then to identify difference of the two GRNs. Computer simulations demonstrate that the FSSEM algorithm outperforms the approaches that estimate two GRNs separately. Analysis of a dataset of lung cancer and another dataset of gastric cancer with FSSEM inferred differential GRNs in cancer versus normal tissues, whose genes with largest network degrees have been reported to be implicated in tumorigenesis. The FSSEM algorithm provides a valuable tool for joint inference of two GRNs and identification of the differential GRN under two conditions. Availability and implementation The R package fssemR implementing the FSSEM algorithm is available at https://github.com/Ivis4ml/fssemR.git. It is also available on CRAN. Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 78
    Publication Date: 2019-12-10
    Description: Summary Sequence logos are visually compelling ways of illustrating the biological properties of DNA, RNA and protein sequences, yet it is currently difficult to generate and customize such logos within the Python programming environment. Here we introduce Logomaker, a Python API for creating publication-quality sequence logos. Logomaker can produce both standard and highly customized logos from either a matrix-like array of numbers or a multiple-sequence alignment. Logos are rendered as native matplotlib objects that are easy to stylize and incorporate into multi-panel figures. Availability and implementation Logomaker can be installed using the pip package manager and is compatible with both Python 2.7 and Python 3.6. Documentation is provided at http://logomaker.readthedocs.io; source code is available at http://github.com/jbkinney/logomaker.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 79
    Publication Date: 2019-07-26
    Description: Motivation Nowadays, virtual screening (VS) plays a major role in the process of drug development. Nonetheless, an accurate estimation of binding affinities, which is crucial at all stages, is not trivial and may require target-specific fine-tuning. Furthermore, drug design also requires improved predictions for putative secondary targets among which is Estrogen Receptor alpha (ERα). Results VS based on combinations of Structure-Based VS (SBVS) and Ligand-Based VS (LBVS) is gaining momentum to improve VS performances. In this study, we propose an integrated approach using ligand docking on multiple structural ensembles to reflect receptor flexibility. Then, we investigate the impact of the two different types of features (structure-based and ligand molecular descriptors) on affinity predictions using a random forest algorithm. We find that ligand-based features have lower predictive power (rP = 0.69, R2 = 0.47) than structure-based features (rP = 0.78, R2 = 0.60). Their combination maintains high accuracy (rP = 0.73, R2 = 0.50) on the internal test set, but it shows superior robustness on external datasets. Further improvement and extending the training dataset to include xenobiotics, leads to a novel high-throughput affinity prediction method for ERα ligands (rP = 0.85, R2 = 0.71). The presented prediction tool is provided to the community as a dedicated satellite of the @TOME server in which one can upload a ligand dataset in mol2 format and get ligand docked and affinity predicted. Availability and implementation http://edmon.cbs.cnrs.fr. Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 80
    Publication Date: 2019-07-01
    Description: Motivation Peptidic natural products (PNPs) are considered a promising compound class that has many applications in medicine. Recently developed mass spectrometry-based pipelines are transforming PNP discovery into a high-throughput technology. However, the current computational methods for PNP identification via database search of mass spectra are still in their infancy and could be substantially improved. Results Here we present NPS, a statistical learning-based approach for scoring PNP–spectrum matches. We incorporated NPS into two leading PNP discovery tools and benchmarked them on millions of natural product mass spectra. The results demonstrate more than 45% increase in the number of identified spectra and 20% more found PNPs at a false discovery rate of 1%. Availability and implementation NPS is available as a command line tool and as a web application at http://cab.spbu.ru/software/NPS. Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 81
    Publication Date: 2019-01-22
    Description: Supplementary information: Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 82
    Publication Date: 2019-11-05
    Description: Motivation To provide high quality computationally tractable enzyme annotation in UniProtKB using Rhea, a comprehensive expert-curated knowledgebase of biochemical reactions which describes reaction participants using the ChEBI (Chemical Entities of Biological Interest) ontology. Results We replaced existing textual descriptions of biochemical reactions in UniProtKB with their equivalents from Rhea, which is now the standard for annotation of enzymatic reactions in UniProtKB. We developed improved search and query facilities for the UniProt website, REST API, and SPARQL endpoint that leverage the chemical structure data, nomenclature, and classification that Rhea and ChEBI provide. Availability and Implementation UniProtKB at https://www.uniprot.org; UniProt REST API at https://www.uniprot.org/help/api; UniProt SPARQL endpoint at https://sparql.uniprot.org/; Rhea at https://www.rhea-db.org. Supplementary information None.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 83
    Publication Date: 2019-02-07
    Description: Motivation The growing number of annotated biological sequences available makes it possible to learn genotype-phenotype relationships from data with increasingly high accuracy. When large quantities of labeled samples are available for training a model, convolutional neural networks can be used to predict the phenotype of unannotated sequences with good accuracy. Unfortunately, their performance with medium- or small-scale datasets is mitigated, which requires inventing new data-efficient approaches. Results We introduce a hybrid approach between convolutional neural networks and kernel methods to model biological sequences. Our method enjoys the ability of convolutional neural networks to learn data representations that are adapted to a specific task, while the kernel point of view yields algorithms that perform significantly better when the amount of training data is small. We illustrate these advantages for transcription factor binding prediction and protein homology detection, and we demonstrate that our model is also simple to interpret, which is crucial for discovering predictive motifs in sequences. Availability and implementation Source code is freely available at https://gitlab.inria.fr/dchen/CKN-seq. Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 84
    Publication Date: 2019-03-28
    Description: Motivation The ability to generate massive amounts of sequencing data continues to overwhelm the processing capability of existing algorithms and compute infrastructures. In this work, we explore the use of hardware/software co-design and hardware acceleration to significantly reduce the execution time of short sequence alignment, a crucial step in analyzing sequenced genomes. We introduce Shouji, a highly parallel and accurate pre-alignment filter that remarkably reduces the need for computationally-costly dynamic programming algorithms. The first key idea of our proposed pre-alignment filter is to provide high filtering accuracy by correctly detecting all common subsequences shared between two given sequences. The second key idea is to design a hardware accelerator that adopts modern field-programmable gate array (FPGA) architectures to further boost the performance of our algorithm. Results Shouji significantly improves the accuracy of pre-alignment filtering by up to two orders of magnitude compared to the state-of-the-art pre-alignment filters, GateKeeper and SHD. Our FPGA-based accelerator is up to three orders of magnitude faster than the equivalent CPU implementation of Shouji. Using a single FPGA chip, we benchmark the benefits of integrating Shouji with five state-of-the-art sequence aligners, designed for different computing platforms. The addition of Shouji as a pre-alignment step reduces the execution time of the five state-of-the-art sequence aligners by up to 18.8×. Shouji can be adapted for any bioinformatics pipeline that performs sequence alignment for verification. Unlike most existing methods that aim to accelerate sequence alignment, Shouji does not sacrifice any of the aligner capabilities, as it does not modify or replace the alignment step. Availability and implementation https://github.com/CMU-SAFARI/Shouji. Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 85
    Publication Date: 2019-02-13
    Description: Motivation Network alignment (NA) finds conserved regions between two networks. NA methods optimize node conservation (NC) and edge conservation. Dynamic graphlet degree vectors are a state-of-the-art dynamic NC measure, used within the fastest and most accurate NA method for temporal networks: DynaWAVE. Here, we use graphlet-orbit transitions (GoTs), a different graphlet-based measure of temporal node similarity, as a new dynamic NC measure within DynaWAVE, resulting in GoT-WAVE. Results On synthetic networks, GoT-WAVE improves DynaWAVE’s accuracy by 30% and speed by 64%. On real networks, when optimizing only dynamic NC, the methods are complementary. Furthermore, only GoT-WAVE supports directed edges. Hence, GoT-WAVE is a promising new temporal NA algorithm, which efficiently optimizes dynamic NC. We provide a user-friendly user interface and source code for GoT-WAVE. Availability and implementation http://www.dcc.fc.up.pt/got-wave/ Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 86
    Publication Date: 2019-01-14
    Description: Motivation Single cell RNA-Seq (scRNA-Seq) facilitates the characterization of cell type heterogeneity and developmental processes. Further study of single cell profiles across different conditions enables the understanding of biological processes and underlying mechanisms at the sub-population level. However, developing proper methodology to compare multiple scRNA-Seq datasets remains challenging. Results We have developed ClusterMap, a systematic method and workflow to facilitate the comparison of scRNA-seq profiles across distinct biological contexts. Using hierarchical clustering of the marker genes of each sub-group, ClusterMap matches the sub-types of cells across different samples and provides ‘similarity’ as a metric to quantify the quality of the match. We introduce a purity tree cut method designed specifically for this matching problem. We use Circos plot and regrouping method to visualize the results concisely. Furthermore, we propose a new metric ‘separability’ to summarize sub-population changes among all sample pairs. In the case studies, we demonstrate that ClusterMap has the ability to provide us further insight into the different molecular mechanisms of cellular sub-populations across different conditions. Availability and implementation ClusterMap is implemented in R and available at https://github.com/xgaoo/ClusterMap. Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 87
    Publication Date: 2019-10-24
    Description: Motivation Scaling by sequencing depth is usually the first step of analysis of bulk or single-cell RNA-seq data, but estimating sequencing depth accurately can be difficult, especially for single-cell data, risking the validity of downstream analysis. It is thus of interest to eliminate the use of sequencing depth and analyze the original count data directly. Results We call an analysis method ‘scale-invariant’ (SI) if it gives the same result under different estimates of sequencing depth and hence can use the original count data without scaling. For the problem of classifying samples into pre-specified classes, such as normal versus cancerous, we develop a deep-neural-network based SI classifier named scale-invariant deep neural-network classifier (SINC). On nine bulk and single-cell datasets, the classification accuracy of SINC is better than or competitive to the best of eight other classifiers. SINC is easier to use and more reliable on data where proper sequencing depth is hard to determine. Availability and implementation This source code of SINC is available at https://www.nd.edu/∼jli9/SINC.zip. Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 88
    Publication Date: 2019-07-01
    Description: Motivation Hi-C is a genome-wide technology for investigating 3D chromatin conformation by measuring physical contacts between pairs of genomic regions. The resolution of Hi-C data directly impacts the effectiveness and accuracy of downstream analysis such as identifying topologically associating domains (TADs) and meaningful chromatin loops. High resolution Hi-C data are valuable resources which implicate the relationship between 3D genome conformation and function, especially linking distal regulatory elements to their target genes. However, high resolution Hi-C data across various tissues and cell types are not always available due to the high sequencing cost. It is therefore indispensable to develop computational approaches for enhancing the resolution of Hi-C data. Results We proposed hicGAN, an open-sourced framework, for inferring high resolution Hi-C data from low resolution Hi-C data with generative adversarial networks (GANs). To the best of our knowledge, this is the first study to apply GANs to 3D genome analysis. We demonstrate that hicGAN effectively enhances the resolution of low resolution Hi-C data by generating matrices that are highly consistent with the original high resolution Hi-C matrices. A typical scenario of usage for our approach is to enhance low resolution Hi-C data in new cell types, especially where the high resolution Hi-C data are not available. Our study not only presents a novel approach for enhancing Hi-C data resolution, but also provides fascinating insights into disclosing complex mechanism underlying the formation of chromatin contacts. Availability and implementation We release hicGAN as an open-sourced software at https://github.com/kimmo1019/hicGAN. Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 89
    Publication Date: 2019-07-01
    Description: Motivation At RECOMB-CG 2018, we presented NJMerge and showed that it could be used within a divide-and-conquer framework to scale computationally intensive methods for species tree estimation to larger datasets. However, NJMerge has two significant limitations: it can fail to return a tree and, when used within the proposed divide-and-conquer framework, has O(n5) running time for datasets with n species. Results Here we present a new method called ‘TreeMerge’ that improves on NJMerge in two ways: it is guaranteed to return a tree and it has dramatically faster running time within the same divide-and-conquer framework—only O(n2) time. We use a simulation study to evaluate TreeMerge in the context of multi-locus species tree estimation with two leading methods, ASTRAL-III and RAxML. We find that the divide-and-conquer framework using TreeMerge has a minor impact on species tree accuracy, dramatically reduces running time, and enables both ASTRAL-III and RAxML to complete on datasets (that they would otherwise fail on), when given 64 GB of memory and 48 h maximum running time. Thus, TreeMerge is a step toward a larger vision of enabling researchers with limited computational resources to perform large-scale species tree estimation, which we call Phylogenomics for All. Availability and implementation TreeMerge is publicly available on Github (http://github.com/ekmolloy/treemerge). Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 90
    Publication Date: 2019-04-02
    Description: Motivation Matrix factorization (MF) methods are widely used in order to reduce dimensionality of transcriptomic datasets to the action of few hidden factors (metagenes). MF algorithms have never been compared based on the between-datasets reproducibility of their outputs in similar independent datasets. Lack of this knowledge might have a crucial impact when generalizing the predictions made in a study to others. Results We systematically test widely used MF methods on several transcriptomic datasets collected from the same cancer type (14 colorectal, 8 breast and 4 ovarian cancer transcriptomic datasets). Inspired by concepts of evolutionary bioinformatics, we design a novel framework based on Reciprocally Best Hit (RBH) graphs in order to benchmark the MF methods for their ability to produce generalizable components. We show that a particular protocol of application of independent component analysis (ICA), accompanied by a stabilization procedure, leads to a significant increase in the between-datasets reproducibility. Moreover, we show that the signals detected through this method are systematically more interpretable than those of other standard methods. We developed a user-friendly tool for performing the Stabilized ICA-based RBH meta-analysis. We apply this methodology to the study of colorectal cancer (CRC) for which 14 independent transcriptomic datasets can be collected. The resulting RBH graph maps the landscape of interconnected factors associated to biological processes or to technological artifacts. These factors can be used as clinical biomarkers or robust and tumor-type specific transcriptomic signatures of tumoral cells or tumoral microenvironment. Their intensities in different samples shed light on the mechanistic basis of CRC molecular subtyping. Availability and implementation The RBH construction tool is available from http://goo.gl/DzpwYp Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 91
    Publication Date: 2019-10-09
    Description: Motivation Synapses are essential to neural signal transmission. Therefore, quantification of synapses and related neurites from images is vital to gain insights into the underlying pathways of brain functionality and diseases. Despite the wide availability of synaptic punctum imaging data, several issues are impeding satisfactory quantification of these structures by current tools. First, the antibodies used for labeling synapses are not perfectly specific to synapses. These antibodies may exist in neurites or other cell compartments. Second, the brightness of different neurites and synaptic puncta is heterogeneous due to the variation of antibody concentration and synapse-intrinsic differences. Third, images often have low signal to noise ratio due to constraints of experiment facilities and availability of sensitive antibodies. These issues make the detection of synapses challenging and necessitates developing a new tool to easily and accurately quantify synapses. Results We present an automatic probability-principled synapse detection algorithm and integrate it into our synapse quantification tool SynQuant. Derived from the theory of order statistics, our method controls the false discovery rate and improves the power of detecting synapses. SynQuant is unsupervised, works for both 2D and 3D data, and can handle multiple staining channels. Through extensive experiments on one synthetic and three real datasets with ground truth annotation or manually labeling, SynQuant was demonstrated to outperform peer specialized unsupervised synapse detection tools as well as generic spot detection methods. Availability and implementation Java source code, Fiji plug-in, and test data are available at https://github.com/yu-lab-vt/SynQuant. Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 92
    Publication Date: 2019-01-14
    Description: Motivation Knowledge-based statistical potentials constitute a simpler and easier alternative to physics-based potentials in many applications, including folding, docking and protein modeling. Here, to improve the effectiveness of the current approximations, we attempt to capture the six-dimensional nature of residue–residue interactions from known protein structures using a simple backbone-based representation. Results We have developed KORP, a knowledge-based pairwise potential for proteins that depends on the relative position and orientation between residues. Using a minimalist representation of only three backbone atoms per residue, KORP utilizes a six-dimensional joint probability distribution to outperform state-of-the-art statistical potentials for native structure recognition and best model selection in recent critical assessment of protein structure prediction and loop-modeling benchmarks. Compared with the existing methods, our side-chain independent potential has a lower complexity and better efficiency. The superior accuracy and robustness of KORP represent a promising advance for protein modeling and refinement applications that require a fast but highly discriminative energy function. Availability and implementation http://chaconlab.org/modeling/korp. Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 93
    Publication Date: 2019-01-07
    Description: Motivation The Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR)-Cpf1 system has been successfully applied in genome editing. However, target efficiency of the CRISPR-Cpf1 system varies among different guide RNA (gRNA) sequences. Results In this study, we reanalyzed the published CRISPR-Cpf1 gRNAs data and found many sequence and structural features related to their target efficiency. With the aid of Random Forest in feature selection, a support vector machine model was created to predict target efficiency for any given gRNAs. We have developed the first CRISPR-Cpf1 web service application, CRISPR-DT (CRISPR DNA Targeting), to help users design optimal gRNAs for the CRISPR-Cpf1 system by considering both target efficiency and specificity. CRISPR-DT will empower researchers in genome editing. Availability and implementation CRISPR-DT, mainly implemented in Perl, PHP and JavaScript, is freely available at http://bioinfolab.miamioh.edu/CRISPR-DT. Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 94
    Publication Date: 2019-12-24
    Description: Summary Antibody repertoires reveal insights into the biology of the adaptive immune system and empower diagnostics and therapeutics. There are currently multiple tools available for the annotation of antibody sequences. All downstream analyses such as choosing lead drug candidates depend on the correct annotation of these sequences; however, a thorough comparison of the performance of these tools has not been investigated. Here, we benchmark the performance of commonly used immunoinformatic tools, i.e. IMGT/HighV-QUEST, IgBLAST and MiXCR, in terms of reproducibility of annotation output, accuracy and speed using simulated and experimental high-throughput sequencing datasets. We analyzed changes in IMGT reference germline database in the last 10 years in order to assess the reproducibility of the annotation output. We found that only 73/183 (40%) V, D and J human genes were shared between the reference germline sets used by the tools. We found that the annotation results differed between tools. In terms of alignment accuracy, MiXCR had the highest average frequency of gene mishits, 0.02 mishit frequency and IgBLAST the lowest, 0.004 mishit frequency. Reproducibility in the output of complementarity determining three regions (CDR3 amino acids) ranged from 4.3% to 77.6% with preprocessed data. In addition, run time of the tools was assessed: MiXCR was the fastest tool for number of sequences processed per unit of time. These results indicate that immunoinformatic analyses greatly depend on the choice of bioinformatics tool. Our results support informed decision-making to immunoinformaticians based on repertoire composition and sequencing platforms. Availability and implementation All tools utilized in the paper are free for academic use. Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 95
    Publication Date: 2019-03-01
    Description: Motivation The development of single-cell RNA-sequencing (scRNA-seq) provides a new perspective to study biological problems at the single-cell level. One of the key issues in scRNA-seq analysis is to resolve the heterogeneity and diversity of cells, which is to cluster the cells into several groups. However, many existing clustering methods are designed to analyze bulk RNA-seq data, it is urgent to develop the new scRNA-seq clustering methods. Moreover, the high noise in scRNA-seq data also brings a lot of challenges to computational methods. Results In this study, we propose a novel scRNA-seq cell type detection method based on similarity learning, called SinNLRR. The method is motivated by the self-expression of the cells with the same group. Specifically, we impose the non-negative and low rank structure on the similarity matrix. We apply alternating direction method of multipliers to solve the optimization problem and propose an adaptive penalty selection method to avoid the sensitivity to the parameters. The learned similarity matrix could be incorporated with spectral clustering, t-distributed stochastic neighbor embedding for visualization and Laplace score for prioritizing gene markers. In contrast to other scRNA-seq clustering methods, our method achieves more robust and accurate results on different datasets. Availability and implementation Our MATLAB implementation of SinNLRR is available at, https://github.com/zrq0123/SinNLRR. Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 96
    Publication Date: 2019-12-12
    Description: Motivation G-quadruplex is a DNA or RNA form in which four guanine-rich regions are held together by base pairing between guanine nucleotides in coordination with potassium ions. G-quadruplexes are increasingly seen as a biologically important component of genomes. Their detection in vivo is problematic; however, sequencing and spectrometric techniques exist for their in vitro detection. We previously devised the pqsfinder algorithm for PQS identification, implemented it in C++ and published as an R/Bioconductor package. We looked for ways to optimize pqsfinder for faster and user-friendly sequence analysis. Results We identified two weak points where pqsfinder could be optimized. We modified the internals of the recursive algorithm to avoid matching and scoring many sub-optimal PQS conformations that are later discarded. To accommodate the needs of a broader range of users, we created a website for submission of sequence analysis jobs that does not require knowledge of R to use pqsfinder. Availability and implementation https://pqsfinder.fi.muni.cz, https://bioconductor.org/packages/pqsfinder. Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 97
    Publication Date: 2019-05-17
    Description: Motivation Accurate identification of N4-methylcytosine (4mC) modifications in a genome wide can provide insights into their biological functions and mechanisms. Machine learning recently have become effective approaches for computational identification of 4mC sites in genome. Unfortunately, existing methods cannot achieve satisfactory performance, owing to the lack of effective DNA feature representations that are capable to capture the characteristics of 4mC modifications. Results In this work, we developed a new predictor named 4mcPred-IFL, aiming to identify 4mC sites. To represent and capture discriminative features, we proposed an iterative feature representation algorithm that enables to learn informative features from several sequential models in a supervised iterative mode. Our analysis results showed that the feature representations learnt by our algorithm can capture the discriminative distribution characteristics between 4mC sites and non-4mC sites, enlarging the decision margin between the positives and negatives in feature space. Additionally, by evaluating and comparing our predictor with the state-of-the-art predictors on benchmark datasets, we demonstrate that our predictor can identify 4mC sites more accurately. Availability and implementation The user-friendly webserver that implements the proposed 4mcPred-IFL is well established, and is freely accessible at http://server.malab.cn/4mcPred-IFL. Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 98
    Publication Date: 2019-02-06
    Description: Motivation Increasing evidence has shown that nucleotide modifications such as methylation and hydroxymethylation on cytosine would greatly impact the binding of transcription factors (TFs). However, there is a lack of motif finding algorithms with the function to search for motifs with modified bases. In this study, we expand on our previous motif finding pipeline Epigram to provide systematic de novo motif discovery and performance evaluation on methylated DNA motifs. Results mEpigram outperforms both MEME and DREME on finding modified motifs in simulated data that mimics various motif enrichment scenarios. Furthermore we were able to identify methylated motifs in Arabidopsis DNA affinity purification sequencing (DAP-seq) data that were previously demonstrated to contain such motifs. When applied to TF ChIP-seq and DNA methylome data in H1 and GM12878, our method successfully identified novel methylated motifs that can be recognized by the TFs or their co-factors. We also observed spacing constraint between the canonical motif of the TF and the newly discovered methylated motifs, which suggests operative recognition of these cis-elements by collaborative proteins. Availability and implementation The mEpigram program is available at http://wanglab.ucsd.edu/star/mEpigram. Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 99
    Publication Date: 2019-04-15
    Description: Motivation Image augmentation is a frequently used technique in computer vision and has been seeing increased interest since the popularity of deep learning. Its usefulness is becoming more and more recognized due to deep neural networks requiring larger amounts of data to train, and because in certain fields, such as biomedical imaging, large amounts of labelled data are difficult to come by or expensive to produce. In biomedical imaging, features specific to this domain need to be addressed. Results Here we present the Augmentor software package for image augmentation. It provides a stochastic, pipeline-based approach to image augmentation with a number of features that are relevant to biomedical imaging, such as z-stack augmentation and randomized elastic distortions. The software has been designed to be highly extensible meaning an operation that might be specific to a highly specialized task can easily be added to the library, even at runtime. Although it has been designed as a general software library, it has features that are particularly relevant to biomedical imaging and the techniques required for this domain. Availability and implementation Augmentor is a Python package made available under the terms of the MIT licence. Source code can be found on GitHub under https://github.com/mdbloice/Augmentor and installation is via the pip package manager (A Julia version of the package, developed in parallel by Christof Stocker, is also available under https://github.com/Evizero/Augmentor.jl).
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 100
    Publication Date: 2019-06-28
    Description: Motivation Alternative splicing contributes to the functional diversity of protein species and the proteoforms translated from alternatively spliced isoforms of a gene actually execute the biological functions. Computationally predicting the functions of genes has been studied for decades. However, how to distinguish the functional annotations of isoforms, whose annotations are essential for understanding developmental abnormalities and cancers, is rarely explored. The main bottleneck is that functional annotations of isoforms are generally unavailable and functional genomic databases universally store the functional annotations at the gene level. Results We propose IsoFun to accomplish Isoform Function prediction based on bi-random walks on a heterogeneous network. IsoFun firstly constructs an isoform functional association network based on the expression profiles of isoforms derived from multiple RNA-seq datasets. Next, IsoFun uses the available Gene Ontology annotations of genes, gene–gene interactions and the relations between genes and isoforms to construct a heterogeneous network. After this, IsoFun performs a tailored bi-random walk on the heterogeneous network to predict the association between GO terms and isoforms, thus accomplishing the prediction of GO annotations of isoforms. Experimental results show that IsoFun significantly outperforms the state-of-the-art algorithms and improves the area under the receiver-operating curve (AUROC) and the area under the precision-recall curve (AUPRC) by 17% and 44% at the gene-level, respectively. We further validated the performance of IsoFun on the genes ADAM15 and BCL2L1. IsoFun accurately differentiates the functions of respective isoforms of these two genes. Availability and implementation The code of IsoFun is available at http://mlda.swu.edu.cn/codes.php? name=IsoFun. Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
Close ⊗
This website uses cookies and the analysis tool Matomo. More information can be found here...