ALBERT

All Library Books, journals and Electronic Records Telegrafenberg

Your email was sent successfully. Check your inbox.

An error occurred while sending the email. Please try again.

Proceed reservation?

Export
Filter
  • Articles  (3,403)
  • Institute of Electrical and Electronics Engineers (IEEE)  (1,736)
  • BioMed Central  (1,368)
  • Molecular Diversity Preservation International  (299)
  • 2020-2022
  • 2015-2019
  • 2010-2014  (3,403)
  • 1990-1994
  • 1945-1949
  • 2014  (3,403)
  • Computer Science  (3,220)
  • Economics  (183)
Collection
  • Articles  (3,403)
Years
  • 2020-2022
  • 2015-2019
  • 2010-2014  (3,403)
  • 1990-1994
  • 1945-1949
Year
  • 1
    Publication Date: 2014-12-13
    Description: Replication in herpesvirus genomes is a major concern of public health as they multiply rapidly during the lytic phase of infection that cause maximum damage to the host cells. Earlier research has established that sites of replication origin are dominated by high concentration of rare palindrome sequences of DNA. Computational methods are devised based on scoring to determine the concentration of palindromes. In this paper, we propose both extraction and localization of rare palindromes in an automated manner. Discrete Cosine Transform (DCT-II), a widely recognized image compression algorithm is utilized here to extract palindromic sequences based on their reverse complimentary symmetry property of existence. We formulate a novel approach to localize the rare palindrome clusters by devising a Minimum Quadratic Entropy (MQE) measure based on the Renyi’s Quadratic Entropy (RQE) function. Experimental results over a large number of herpesvirus genomes show that the RQE based scoring of rare palindromes have higher order of sensitivity, and lesser false alarm in detecting concentration of rare palindromes and thereby sites of replication origin.
    Print ISSN: 1545-5963
    Electronic ISSN: 1557-9964
    Topics: Biology , Computer Science
    Published by Institute of Electrical and Electronics Engineers (IEEE) on behalf of The IEEE Computational Intelligence Society ; The IEEE Computer Society ; The IEEE Control Systems Society ; The IEEE Engineering in Medicine and Biology Society ; The Association for Computing Machinery.
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 2
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2014-12-13
    Description: GO relation embodies some aspects of existence dependency. If GO term x is existence-dependent on GO term y , the presence of y implies the presence of x . Therefore, the genes annotated with the function of the GO term y are usually functionally and semantically related to the genes annotated with the function of the GO term x . A large number of gene set enrichment analysis methods have been developed in recent years for analyzing gene sets enrichment. However, most of these methods overlook the structural dependencies between GO terms in GO graph by not considering the concept of existence dependency. We propose in this paper a biological search engine called RSGSearch that identifies enriched sets of genes annotated with different functions using the concept of existence dependency. We observe that GO term x cannot be existence-dependent on GO term y , if x and y have the same specificity (biological characteristics). After encoding into a numeric format the contributions of GO terms annotating target genes to the semantics of their lowest common ancestors (LCAs), RSGSearch uses microarray experiment to identify the most significant LCA that annotates the result genes. We evaluated RSGSearch experimentally and compared it with five gene set enrichment systems. Results showed marked improvement.
    Print ISSN: 1545-5963
    Electronic ISSN: 1557-9964
    Topics: Biology , Computer Science
    Published by Institute of Electrical and Electronics Engineers (IEEE) on behalf of The IEEE Computational Intelligence Society ; The IEEE Computer Society ; The IEEE Control Systems Society ; The IEEE Engineering in Medicine and Biology Society ; The Association for Computing Machinery.
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 3
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2014-12-13
    Description: The Tikhonov regularized nonnegative matrix factorization (TNMF) is an NMF objective function that enforces smoothness on the computed solutions, and has been successfully applied to many problem domains including text mining, spectral data analysis, and cancer clustering. There is, however, an issue that is still insufficiently addressed in the development of TNMF algorithms, i.e., how to develop mechanisms that can learn the regularization parameters directly from the data sets. The common approach is to use fixed values based on a priori knowledge about the problem domains. However, from the linear inverse problems study it is known that the quality of the solutions of the Tikhonov regularized least square problems depends heavily on the choosing of appropriate regularization parameters. Since least squares are the building blocks of the NMF, it can be expected that similar situation also applies to the NMF. In this paper, we propose two formulas to automatically learn the regularization parameters from the data set based on the L-curve approach. We also develop a convergent algorithm for the TNMF based on the additive update rules. Finally, we demonstrate the use of the proposed algorithm in cancer clustering tasks.
    Print ISSN: 1545-5963
    Electronic ISSN: 1557-9964
    Topics: Biology , Computer Science
    Published by Institute of Electrical and Electronics Engineers (IEEE) on behalf of The IEEE Computational Intelligence Society ; The IEEE Computer Society ; The IEEE Control Systems Society ; The IEEE Engineering in Medicine and Biology Society ; The Association for Computing Machinery.
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 4
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2014-12-13
    Description: Analysis of probability distributions conditional on species trees has demonstrated the existence of anomalous ranked gene trees (ARGTs), ranked gene trees that are more probable than the ranked gene tree that accords with the ranked species tree. Here, to improve the characterization of ARGTs, we study enumerative and probabilistic properties of two classes of ranked labeled species trees, focusing on the presence or avoidance of certain subtree patterns associated with the production of ARGTs. We provide exact enumerations and asymptotic estimates for cardinalities of these sets of trees, showing that as the number of species increases without bound, the fraction of all ranked labeled species trees that are ARGT-producing approaches $1$ . This result extends beyond earlier existence results to provide a probabilistic claim about the frequency of ARGTs.
    Print ISSN: 1545-5963
    Electronic ISSN: 1557-9964
    Topics: Biology , Computer Science
    Published by Institute of Electrical and Electronics Engineers (IEEE) on behalf of The IEEE Computational Intelligence Society ; The IEEE Computer Society ; The IEEE Control Systems Society ; The IEEE Engineering in Medicine and Biology Society ; The Association for Computing Machinery.
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 5
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2014-12-13
    Description: The existence of various types of correlations among the expressions of a group of biologically significant genes poses challenges in developing effective methods of gene expression data analysis. The initial focus of computational biologists was to work with only absolute and shifting correlations. However, researchers have found that the ability to handle shifting-and-scaling correlation enables them to extract more biologically relevant and interesting patterns from gene microarray data. In this paper, we introduce an effective shifting-and-scaling correlation measure named Shifting and Scaling Similarity (SSSim), which can detect highly correlated gene pairs in any gene expression data. We also introduce a technique named Intensive Correlation Search (ICS) biclustering algorithm, which uses SSSim to extract biologically significant biclusters from a gene expression data set. The technique performs satisfactorily with a number of benchmarked gene expression data sets when evaluated in terms of functional categories in Gene Ontology database.
    Print ISSN: 1545-5963
    Electronic ISSN: 1557-9964
    Topics: Biology , Computer Science
    Published by Institute of Electrical and Electronics Engineers (IEEE) on behalf of The IEEE Computational Intelligence Society ; The IEEE Computer Society ; The IEEE Control Systems Society ; The IEEE Engineering in Medicine and Biology Society ; The Association for Computing Machinery.
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 6
    Publication Date: 2014-12-13
    Description: Attractors in gene regulatory networks represent cell types or states of cells. In system biology and synthetic biology, it is important to generate gene regulatory networks with desired attractors. In this paper, we focus on a singleton attractor, which is also called a fixed point. Using a Boolean network (BN) model, we consider the problem of finding Boolean functions such that the system has desired singleton attractors and has no undesired singleton attractors. To solve this problem, we propose a matrix-based representation of BNs. Using this representation, the problem of finding Boolean functions can be rewritten as an Integer Linear Programming (ILP) problem and a Satisfiability Modulo Theories (SMT) problem. Furthermore, the effectiveness of the proposed method is shown by a numerical example on a WNT5A network, which is related to melanoma. The proposed method provides us a basic method for design of gene regulatory networks.
    Print ISSN: 1545-5963
    Electronic ISSN: 1557-9964
    Topics: Biology , Computer Science
    Published by Institute of Electrical and Electronics Engineers (IEEE) on behalf of The IEEE Computational Intelligence Society ; The IEEE Computer Society ; The IEEE Control Systems Society ; The IEEE Engineering in Medicine and Biology Society ; The Association for Computing Machinery.
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 7
    Publication Date: 2014-12-13
    Description: In this paper, we study Copy Number Variation (CNV) data.The underlying process generating CNV segments is generally assumed to be memory-less, giving rise to an exponential distribution of segment lengths. In this paper, we provide evidence from cancer patient data, which suggests that this generative model is too simplistic , and that segment lengths follow a power-law distribution instead . We conjecture a simple preferential attachment generative model that provides the basis for the observed power-law distribution. We then show how an existing statistical method for detecting cancer driver genes can be improved by incorporating the power-law distribution in the null model.
    Print ISSN: 1545-5963
    Electronic ISSN: 1557-9964
    Topics: Biology , Computer Science
    Published by Institute of Electrical and Electronics Engineers (IEEE) on behalf of The IEEE Computational Intelligence Society ; The IEEE Computer Society ; The IEEE Control Systems Society ; The IEEE Engineering in Medicine and Biology Society ; The Association for Computing Machinery.
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 8
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2014-12-13
    Description: Proteins fold into complex three-dimensional shapes. Simplified representations of their shapes are central to rationalise, compare, classify, and interpret protein structures. Traditional methods to abstract protein folding patterns rely on representing their standard secondary structural elements (helices and strands of sheet) using line segments. This results in ignoring a significant proportion of structural information. The motivation of this research is to derive mathematically rigorous and biologically meaningful abstractions of protein folding patterns that maximize the economy of structural description and minimize the loss of structural information. We report on a novel method to describe a protein as a non-overlapping set of parametric three dimensional curves of varying length and complexity. Our approach to this problem is supported by information theory and uses the statistical framework of minimum message length (MML) inference. We demonstrate the effectiveness of our non-linear abstraction to support efficient and effective comparison of protein folding patterns on a large scale.
    Print ISSN: 1545-5963
    Electronic ISSN: 1557-9964
    Topics: Biology , Computer Science
    Published by Institute of Electrical and Electronics Engineers (IEEE) on behalf of The IEEE Computational Intelligence Society ; The IEEE Computer Society ; The IEEE Control Systems Society ; The IEEE Engineering in Medicine and Biology Society ; The Association for Computing Machinery.
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 9
    Publication Date: 2014-12-13
    Description: The organization of global protein interaction networks (PINs) has been extensively studied and heatedly debated. We revisited this issue in the context of the analysis of dynamic organization of a PIN in the yeast cell cycle. Statistically significant bimodality was observed when analyzing the distribution of the differences in expression peak between periodically expressed partners. A close look at their behavior revealed that date and party hubs derived from this analysis have some distinct features. There are no significant differences between them in terms of protein essentiality, expression correlation and semantic similarity derived from gene ontology (GO) biological process hierarchy. However, date hubs exhibit significantly greater values than party hubs in terms of semantic similarity derived from both GO molecular function and cellular component hierarchies. Relating to three-dimensional structures, we found that both single- and multi-interface proteins could become date hubs coordinating multiple functions performed at different times while party hubs are mainly multi-interface proteins. Furthermore, we constructed and analyzed a PPI network specific to the human cell cycle and highlighted that the dynamic organization in human interactome is far more complex than the dichotomy of hubs observed in the yeast cell cycle.
    Print ISSN: 1545-5963
    Electronic ISSN: 1557-9964
    Topics: Biology , Computer Science
    Published by Institute of Electrical and Electronics Engineers (IEEE) on behalf of The IEEE Computational Intelligence Society ; The IEEE Computer Society ; The IEEE Control Systems Society ; The IEEE Engineering in Medicine and Biology Society ; The Association for Computing Machinery.
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 10
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2014-12-13
    Print ISSN: 1545-5963
    Electronic ISSN: 1557-9964
    Topics: Biology , Computer Science
    Published by Institute of Electrical and Electronics Engineers (IEEE) on behalf of The IEEE Computational Intelligence Society ; The IEEE Computer Society ; The IEEE Control Systems Society ; The IEEE Engineering in Medicine and Biology Society ; The Association for Computing Machinery.
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 11
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2014-12-13
    Print ISSN: 1545-5963
    Electronic ISSN: 1557-9964
    Topics: Biology , Computer Science
    Published by Institute of Electrical and Electronics Engineers (IEEE) on behalf of The IEEE Computational Intelligence Society ; The IEEE Computer Society ; The IEEE Control Systems Society ; The IEEE Engineering in Medicine and Biology Society ; The Association for Computing Machinery.
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 12
    Publication Date: 2014-12-13
    Print ISSN: 1545-5963
    Electronic ISSN: 1557-9964
    Topics: Biology , Computer Science
    Published by Institute of Electrical and Electronics Engineers (IEEE) on behalf of The IEEE Computational Intelligence Society ; The IEEE Computer Society ; The IEEE Control Systems Society ; The IEEE Engineering in Medicine and Biology Society ; The Association for Computing Machinery.
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 13
    Publication Date: 2014-12-13
    Description: The articles in this special section were presented at the 2012 IEEE International Workshop on Genomic Signal Processing and Statistics (GENSIPS 2012) that was held in Washington DC from December 2nd to 4th.
    Print ISSN: 1545-5963
    Electronic ISSN: 1557-9964
    Topics: Biology , Computer Science
    Published by Institute of Electrical and Electronics Engineers (IEEE) on behalf of The IEEE Computational Intelligence Society ; The IEEE Computer Society ; The IEEE Control Systems Society ; The IEEE Engineering in Medicine and Biology Society ; The Association for Computing Machinery.
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 14
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2014-12-13
    Description: Disk additions to an RAID-6 storage system can increase the I/O parallelism and expand the storage capacity simultaneously. To regain load balance among all disks including old and new, RAID-6 scaling requires moving certain data blocks onto newly added disks. Existing approaches to RAID-6 scaling, restricted by preserving a round-robin data distribution, require migrating all the data, which results in an expensive cost for RAID-6 scaling. In this paper, we propose RS6—a new approach to accelerating RDP RAID-6 scaling by reducing disk I/Os and XOR operations. First, RS6 minimizes the number of data blocks to be moved while maintaining a uniform data distribution across all data disks. Second, RS6 piggybacks parity updates during data migration to reduce the cost of maintaining consistent parities. Third, RS6 selects parameters of data migration so as to reduce disk I/Os for parity updates. Our mathematical analysis indicates that RS6 provides uniform data distribution, minimal data migration, and fast data addressing. We also conducted extensive simulation experiments to quantitatively characterize the properties of RS6. The results show that, compared with existing “moving-everything” Round-Robin approaches, RS6 reduces the number of blocks to be moved by 60.0%–88.9%, and saves the migration time by 40.27%–69.88%.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 15
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2014-12-13
    Description: This paper focuses on designing a distributed medium access control algorithm for fairly sharing network resources among contending stations in an 802.11 wireless network. Because the notion of fairness is not universal and there lacks a rigorous analysis on the relationships among the four types of most popular fairness criteria, we first mathematically prove that there exist certain connections between these types of fairness criteria. We then propose an efficient medium access algorithm that aims at achieving time fairness and throughput enhancement in a fully distributed manner. The core idea of our proposed algorithm lies in that each station needs to select an appropriate contention window size so as to fairly share the channel occupancy time and maximize the throughput under the time fairness constraint. The derivation of the proper contention window size is addressed rigorously. We evaluate the performance of our proposed algorithm through an extensive simulation study, and the evaluation results demonstrate that our proposed algorithm leads to nearly perfect time fairness, high throughput, and low collision overhead.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 16
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2014-12-13
    Description: This paper investigates the limits of adaptive voltage scaling (AVS) applied to commercial FPGAs which do not specifically support voltage adaptation. An adaptive power architecture based on a modified design flow is created with in-situ detectors and dynamic reconfiguration of clock management resources. AVS is a power-saving technique that enables a device to regulate its own voltage and frequency based on workload, process and operating conditions in a closed-loop configuration. It results in significant improved energy profiles compared with dynamic voltage frequency scaling (DVFS) in which the device uses a number of pre-calculated valid working points. The results of deploying AVS in FPGAs with in-situ detectors shows power and energy savings exceeding 85 percent compared with nominal voltage operation at the same frequency. The in-situ detector approach compares favorably with critical path replication based on delay lines since it avoids the need of cumbersome and error-prone delay line calibration.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 17
    Publication Date: 2014-12-14
    Description: Background: Biomedical ontologies are increasingly instrumental in the advancement of biological research primarily through their use to efficiently consolidate large amounts of data into structured, accessible sets. However, ontology development and usage can be hampered by the segregation of knowledge by domain that occurs due to independent development and use of the ontologies. The ability to infer data associated with one ontology to data associated with another ontology would prove useful in expanding information content and scope. We here focus on relating two ontologies: the Gene Ontology (GO), which encodes canonical gene function, and the Mammalian Phenotype Ontology (MP), which describes non-canonical phenotypes, using statistical methods to suggest GO functional annotations from existing MP phenotype annotations. This work is in contrast to previous studies that have focused on inferring gene function from phenotype primarily through lexical or semantic similarity measures. Results: We have designed and tested a set of algorithms that represents a novel methodology to define rules for predicting gene function by examining the emergent structure and relationships between the gene functions and phenotypes rather than inspecting the terms semantically. The algorithms inspect relationships among multiple phenotype terms to deduce if there are cases where they all arise from a single gene function.We apply this methodology to data about genes in the laboratory mouse that are formally represented in the Mouse Genome Informatics (MGI) resource. From the data, 7444 rule instances were generated from five generalized rules, resulting in 4818 unique GO functional predictions for 1796 genes. Conclusions: We show that our method is capable of inferring high-quality functional annotations from curated phenotype data. As well as creating inferred annotations, our method has the potential to allow for the elucidation of unforeseen, biologically significant associations between gene function and phenotypes that would be overlooked by a semantics-based approach. Future work will include the implementation of the described algorithms for a variety of other model organism databases, taking full advantage of the abundance of available high quality curated data.
    Electronic ISSN: 1471-2105
    Topics: Biology , Computer Science
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 18
    Publication Date: 2014-12-18
    Description: Background: Identification of individual components in complex mixtures is an important and sometimes daunting task in several research areas like metabolomics and natural product studies. NMR spectroscopy is an excellent technique for analysis of mixtures of organic compounds and gives a detailed chemical fingerprint of most individual components above the detection limit. For the identification of individual metabolites in metabolomics, correlation or covariance between peaks in 1H NMR spectra has previously been successfully employed. Similar correlation of 2D 1H-13C Heteronuclear Single Quantum Correlation spectra was recently applied to investigate the structure of heparine. In this paper, we demonstrate how a similar approach can be used to identify metabolites in human biofluids (post-prostatic palpation urine). Results: From 50 1H-13C Heteronuclear Single Quantum Correlation spectra, 23 correlation plots resembling pure metabolites were constructed. The identities of these metabolites were confirmed by comparing the correlation plots with reported NMR data, mostly from the Human Metabolome Database. Conclusions: Correlation plots prepared by statistically correlating 1H-13C Heteronuclear Single Quantum Correlation spectra from human biofluids provide unambiguous identification of metabolites. The correlation plots highlight cross-peaks belonging to each individual compound, not limited by long-range magnetization transfer as conventional NMR experiments.
    Electronic ISSN: 1471-2105
    Topics: Biology , Computer Science
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 19
    Publication Date: 2014-12-18
    Description: Background: Alternative Splicing (AS) as a post-transcription regulation mechanism is an important application of RNA-seq studies in eukaryotes. A number of software and computational methods have been developed for detecting AS. Most of the methods, however, are designed and tested on animal data, such as human and mouse. Plants genes differ from those of animals in many ways, e.g., the average intron size and preferred AS types. These differences may require different computational approaches and raise questions about their effectiveness on plant data. The goal of this paper is to benchmark existing computational differential splicing (or transcription) detection methods so that biologists can choose the most suitable tools to accomplish their goals. Results: This study compares the eight popular public available software packages for differential splicing analysis using both simulated and real Arabidopsis thaliana RNA-seq data. All software are freely available. The study examines the effect of varying AS ratio, read depth, dispersion pattern, AS types, sample sizes and the influence of annotation. Using a real data, the study looks at the consistences between the packages and verifies a subset of the detected AS events using PCR studies. Conclusions: No single method performs the best in all situations. The accuracy of annotation has a major impact on which method should be chosen for AS analysis. DEXSeq performs well in the simulated data when the AS signal is relative strong and annotation is accurate. Cufflinks achieve a better tradeoff between precision and recall and turns out to be the best one when incomplete annotation is provided. Some methods perform inconsistently for different AS types. Complex AS events that combine several simple AS events impose problems for most methods, especially for MATS. MATS stands out in the analysis of real RNA-seq data when all the AS events being evaluated are simple AS events.
    Electronic ISSN: 1471-2105
    Topics: Biology , Computer Science
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 20
    Publication Date: 2014-11-07
    Description: Background: PGxClean is a new web application that performs quality control analyses for data produced by the Affymetrix DMET chip or other candidate gene technologies. Importantly, the software does not assume that variants are biallelic single-nucleotide polymorphisms, but can be used on the variety of variant characteristics included on the DMET chip. Once quality control analyses has been completed, the associated PGxClean-Viz web application performs principal component analyses and provides tools for characterizing and visualizing population structure.FindingsThe PGxClean web application accepts genotype data from the Affymetrix DMET chip or the PLINK PED format with genotypes annotated as (A,C,G,T or 1,2,3,4). Options for removing missing data and calculating genotype and allele frequencies are offered. Data can be subdivided by cohort characteristics, such as family ID, sex, phenotype, or case-control status. Once the data has been processed through the PGxClean web application, the output files can be entered into the PGxClean-Viz web application for performing principal component analysis to visualize population substructure. Conclusions: The PGxClean software provides rapid quality-control processing, data analysis, and data visualization for the Affymetrix DMET chip or other candidate gene technologies while improving on common analysis platforms by not assuming that variants are biallelic. The web application is available at www.pgxclean.com.
    Electronic ISSN: 1756-0381
    Topics: Biology , Computer Science
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 21
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2014-11-08
    Description: In microprocessor-based systems, such as the cloud computing infrastructure, high reliability is essential. As multiprocessor systems become more widespread and increasingly complex, system-level diagnosis will increasingly be adopted to determine their robustness. In this paper, we consider a pessimistic diagnostic strategy for hypermesh multiprocessor systems under the PMC model. The pessimistic strategy is a diagnostic process whereby all faulty processors are correctly identified and at most one fault-free processor may be misjudged to be a faulty processor. We first determine the pessimistic diagnosability of a hypermesh to be ${2}{{n}}({{k}} - {1}) - {{k}}$ . We then propose an efficient pessimistic diagnostic algorithm to identify at most ${ 2}{{n}}({{k}} - { 1}) - {{k}}$ faults in ${{O}}({{N}})$ time, where ${mbi{k}}$ is the radix, ${mbi{n}}$ is the number of dimensions, and ${{N}} = {{k^n}}$ is the total number of processors. This result is superior to the best precise diagnostic algorithm, which runs in ${{O}}({{N}}{log _{{k}}}{{N}})$ time. Furthermore, the Cartesian product network, a subgraph of the hypermesh and the proposed algorithm can be employed to determine faults in the product network.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 22
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2014-11-08
    Description: In a top- $k$ Geometric Intersection Query (top- $k$ GIQ) problem, a set of $n$ weighted, geometric objects in ${bb R}^d$ is to be pre-processed into a compact data structure so that for any query geometric object, $q$ , and integer $k>0$ , the $k$ largest-weight objects intersected by $q$ can be reported efficiently. While the top- $k$ problem has been studied extensively for non-geometric problems (e.g., recommender systems), the geometric version has received little attention. This paper gives a general technique to solve any top-
    Print ISSN: 1041-4347
    Electronic ISSN: 1558-2191
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 23
    Publication Date: 2014-11-08
    Description: Rule induction method based on rough set theory (RST) has received much attention recently since it may generate a minimal set of rules from the decision system for real-life applications by using of attribute reduction and approximations. The decision system may vary with time, e.g., the variation of objects, attributes and attribute values. The reduction and approximations of the decision system may alter on Attribute Values’ Coarsening and Refining (AVCR), a kind of variation of attribute values, which results in the alteration of decision rules simultaneously. This paper aims for dynamic maintenance of decision rules $w.r.t.$ AVCR. The definition of minimal discernibility attribute set is proposed firstly, which aims to improve the efficiency of attribute reduction in RST. Then, principles of updating decision rules in case of AVCR are discussed. Furthermore, the rough set-based methods for updating decision rules in the inconsistent decision system are proposed. The complexity analysis and extensive experiments on UCI data sets have verified the effectiveness and efficiency of the proposed methods.
    Print ISSN: 1041-4347
    Electronic ISSN: 1558-2191
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 24
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2014-11-08
    Description: A major mining task for binary matrixes is the extraction of approximate top- (k) patterns that are able to concisely describe the input data. The top- (k) pattern discovery problem is commonly stated as an optimization one, where the goal is to minimize a given cost function, see the accuracy of the data description. In this work, we review several greedy algorithms, and discuss PaNDa + , an algorithmic framework able to optimize different cost functions generalized into a unifying formulation. We evaluated the goodness of the algorithm by measuring the quality of the extracted patterns. We adapted standard quality measures to assess the capability of the algorithm to discover both the items and transactions of the patterns embedded in the data. The evaluation was conducted on synthetic data, where patterns were artificially embedded, and on real-world text collection, where each document is labeled with a topic. Finally, in order to qualitatively evaluate the usefulness of the discovered patterns, we exploited PaNDa + to detect overlapping communities in a bipartite network. The results show that PaNDa + is able to discover high-quality patterns in both synthetic and real-world datasets.
    Print ISSN: 1041-4347
    Electronic ISSN: 1558-2191
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 25
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2014-11-08
    Description: Mining network evolution has emerged as an intriguing research topic in many domains such as data mining, social networks, and machine learning. While a bulk of research has focused on mining evolutionary patterns of homogeneous networks (e.g., networks of friends), however, most real-world networks are heterogeneous, containing objects of different types, such as authors, papers, venues, and terms in a bibliographic network. Modeling co-evolution of multityped objects can capture richer information than that on single-typed objects alone. For example, studying co-evolution of authors, venues, and terms in a bibliographic network can tell better the evolution of research areas than just examining co-author network or term network alone. In this paper, we study mining co-evolution of multityped objects in a special type of heterogeneous networks, called star networks, and examine how the multityped objects influence each other in the network evolution. A hierarchical Dirichlet process mixture model-based evolution model is proposed, which detects the co-evolution of multityped objects in the form of multityped cluster evolution in dynamic star networks. An efficient inference algorithm is provided to learn the proposed model. Experiments on several real networks (DBLP, Twitter, and Delicious) validate the effectiveness of the model and the scalability of the algorithm.
    Print ISSN: 1041-4347
    Electronic ISSN: 1558-2191
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 26
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2014-11-08
    Description: In this work, we define cost-free learning (CFL) formally in comparison with cost-sensitive learning (CSL). The main difference between them is that a CFL approach seeks optimal classification results without requiring any cost information, even in the class imbalance problem. In fact, several CFL approaches exist in the related studies, such as sampling and some criteria-based approaches. However, to our best knowledge, none of the existing CFL and CSL approaches are able to process the abstaining classifications properly when no information is given about errors and rejects. Based on information theory, we propose a novel CFL which seeks to maximize normalized mutual information of the targets and the decision outputs of classifiers. Using the strategy, we can handle binary/multi-class classifications with/without abstaining. Significant features are observed from the new strategy. While the degree of class imbalance is changing, the proposed strategy is able to balance the errors and rejects accordingly and automatically. Another advantage of the strategy is its ability of deriving optimal rejection thresholds for abstaining classifications and the “equivalent” costs in binary classifications. The connection between rejection thresholds and ROC curve is explored. Empirical investigation is made on several benchmark data sets in comparison with other existing approaches. The classification results demonstrate a promising perspective of the strategy in machine learning.
    Print ISSN: 1041-4347
    Electronic ISSN: 1558-2191
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 27
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2014-11-08
    Description: Multivariate time series are common in many application domains, particularly in industrial processes with a large number of sensors installed for process monitoring and control. Often, such data encapsulate complex relations among individual series. This paper presents a new type of patterns in multivariate time series, referred to as temporal associations, to capture a wide range of local relations along and across individual series. A scalable algorithm is developed to discover frequent associations by incorporating (1) redundancy pruning of patterns in single time series and (2) two conditions to avoid over-counting the occurrences of associations, thus greatly reducing the space and runtime complexity of the discovery process. A statistical significance measure is also introduced for ranking and post-pruning discovered associations. To evaluate the proposed method, synthetic data sets and a real world data set taken from the time series mining repository as well as a large data set obtained from a delayed coking plant are used. The experiments demonstrated that the discovered associations capture the local relations in multiple time series and that the proposed method is scalable to large data sets.
    Print ISSN: 1041-4347
    Electronic ISSN: 1558-2191
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 28
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2014-11-08
    Description: Short texts are popular on today’s web, especially with the emergence of social media. Inferring topics from large scale short texts becomes a critical but challenging task for many content analysis tasks. Conventional topic models such as latent Dirichlet allocation (LDA) and probabilistic latent semantic analysis (PLSA) learn topics from document-level word co-occurrences by modeling each document as a mixture of topics, whose inference suffers from the sparsity of word co-occurrence patterns in short texts. In this paper, we propose a novel way for short text topic modeling, referred as biterm topic model (BTM) . BTM learns topics by directly modeling the generation of word co-occurrence patterns (i.e., biterms) in the corpus, making the inference effective with the rich corpus-level information. To cope with large scale short text data, we further introduce two online algorithms for BTM for efficient topic learning. Experiments on real-word short text collections show that BTM can discover more prominent and coherent topics, and significantly outperform the state-of-the-art baselines. We also demonstrate the appealing performance of the two online BTM algorithms on both time efficiency and topic learning.
    Print ISSN: 1041-4347
    Electronic ISSN: 1558-2191
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 29
    Publication Date: 2014-11-08
    Description: In the literature about association analysis, many interestingness measures have been proposed to assess the quality of obtained association rules in order to select a small set of the most interesting among them. In the particular case of hierarchically organized items and generalized association rules connecting them, a measure that dealt appropriately with the hierarchy would be advantageous. Here we present the further developments of a new class of such hierarchical interestingness measures and compare them with a large set of conventional measures and with three hierarchical pruning methods from the literature. The aim is to find interesting pairwise generalized association rules connecting the concepts of multiple ontologies. Interested in the broad empirical evaluation of interestingness measures, we compared the rules obtained by 37 methods on four real world data sets against predefined ground truth sets of associations. To this end, we adopted a framework of instance-based ontology matching and extended the set of performance measures by two novel measures: relation learning recall and precision which take into account hierarchical relationships.
    Print ISSN: 1041-4347
    Electronic ISSN: 1558-2191
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 30
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2014-11-08
    Description: A highly comparative, feature-based approach to time series classification is introduced that uses an extensive database of algorithms to extract thousands of interpretable features from time series. These features are derived from across the scientific time-series analysis literature, and include summaries of time series in terms of their correlation structure, distribution, entropy, stationarity, scaling properties, and fits to a range of time-series models. After computing thousands of features for each time series in a training set, those that are most informative of the class structure are selected using greedy forward feature selection with a linear classifier. The resulting feature-based classifiers automatically learn the differences between classes using a reduced number of time-series properties, and circumvent the need to calculate distances between time series. Representing time series in this way results in orders of magnitude of dimensionality reduction, allowing the method to perform well on very large data sets containing long time series or time series of different lengths. For many of the data sets studied, classification performance exceeded that of conventional instance-based classifiers, including one nearest neighbor classifiers using euclidean distances and dynamic time warping and, most importantly, the features selected provide an understanding of the properties of the data set, insight that can guide further scientific investigation.
    Print ISSN: 1041-4347
    Electronic ISSN: 1558-2191
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 31
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2014-11-08
    Description: The explosive usage of social media produces massive amount of unlabeled and high-dimensional data. Feature selection has been proven to be effective in dealing with high-dimensional data for efficient learning and data mining. Unsupervised feature selection remains a challenging task due to the absence of label information based on which feature relevance is often assessed. The unique characteristics of social media data further complicate the already challenging problem of unsupervised feature selection, e.g., social media data is inherently linked, which makes invalid the independent and identically distributed assumption, bringing about new challenges to unsupervised feature selection algorithms. In this paper, we investigate a novel problem of feature selection for social media data in an unsupervised scenario. In particular, we analyze the differences between social media data and traditional attribute-value data, investigate how the relations extracted from linked data can be exploited to help select relevant features, and propose a novel unsupervised feature selection framework, LUFS, for linked social media data. We systematically design and conduct systemic experiments to evaluate the proposed framework on data sets from real-world social media websites. The empirical study demonstrates the effectiveness and potential of our proposed framework.
    Print ISSN: 1041-4347
    Electronic ISSN: 1558-2191
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 32
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2014-11-08
    Description: The discovery of process models from event logs has emerged as one of the crucial problems for enabling the continuous support in the life-cycle of an information system. However, in a decade of process discovery research, the algorithms and tools that have appeared are known to have strong limitations in several dimensions. The size of the logs and the formal properties of the model discovered are the two main challenges nowadays. In this paper we propose the use of numerical abstract domains for tackling these two problems, for the particular case of the discovery of Petri nets. First, numerical abstract domains enable the discovery of general process models, requiring no knowledge (e.g., the bound of the Petri net to derive) for the discovery algorithm. Second, by using divide and conquer techniques we are able to control the size of the process discovery problems. The methods proposed in this paper have been implemented in a prototype tool and experiments are reported illustrating the significance of this fresh view of the process discovery problem.
    Print ISSN: 1041-4347
    Electronic ISSN: 1558-2191
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 33
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2014-11-08
    Description: Given a real world graph, how should we lay-out its edges? How can we compress it? These questions are closely related, and the typical approach so far is to find clique-like communities, like the ‘cavemen graph’, and compress them. We show that the block-diagonal mental image of the ‘cavemen graph’ is the wrong paradigm, in full agreement with earlier results that real world graphs have no good cuts. Instead, we propose to envision graphs as a collection of hubs connecting spokes, with super-hubs connecting the hubs, and so on, recursively. Based on the idea, we propose the SlashBurn method to recursively split a graph into hubs and spokes connected only by the hubs. We also propose techniques to select the hubs and give an ordering to the spokes, in addition to the basic SlashBurn. We give theoretical analysis of the proposed hub selection methods. Our view point has several advantages: (a) it avoids the ‘no good cuts’ problem, (b) it gives better compression, and (c) it leads to faster execution times for matrix-vector operations, which are the back-bone of most graph processing tools. Through experiments, we show that SlashBurn consistently outperforms other methods for all data sets, resulting in better compression and faster running time. Moreover, we show that SlashBurn with the appropriate spokes ordering can further improve compression while hardly sacrificing the running time.
    Print ISSN: 1041-4347
    Electronic ISSN: 1558-2191
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 34
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2014-11-08
    Description: In this paper, we introduce “task trail” to understand user search behaviors. We define a task to be an atomic user information need, whereas a task trail represents all user activities within that particular task, such as query reformulations, URL clicks. Previously, web search logs have been studied mainly at session or query level where users may submit several queries within one task and handle several tasks within one session. Although previous studies have addressed the problem of task identification, little is known about the advantage of using task over session or query for search applications. In this paper, we conduct extensive analyses and comparisons to evaluate the effectiveness of task trails in several search applications: determining user satisfaction, predicting user search interests, and suggesting related queries. Experiments on large scale data sets of a commercial search engine show that: (1) Task trail performs better than session and query trails in determining user satisfaction; (2) Task trail increases webpage utilities of end users comparing to session and query trails; (3) Task trails are comparable to query trails but more sensitive than session trails in measuring different ranking functions; (4) Query terms from the same task are more topically consistent to each other than query terms from different tasks; (5) Query suggestion based on task trail is a good complement of query suggestions based on session trail and click-through bipartite. The findings in this paper verify the need of extracting task trails from web search logs and enhance applications in search and recommendation systems.
    Print ISSN: 1041-4347
    Electronic ISSN: 1558-2191
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 35
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2014-11-08
    Description: This paper studies the problem of mining named entity translations by aligning comparable corpora. Current state-of-the-art approaches mine a translation pair by aligning an entity graph in one language to another based on node similarity or propagated similarity of related entities. However, they, building on the assumption of “symmetry”, quickly deteriorate on “weakly” comparable corpora with some asymmetry. In this paper, we pursue two directions for overcoming relation and entity asymmetry respectively. The first approach starts from weakly comparable corpora (for high recall) then ensures precision by selective propagation only to entities of symmetric relations. The second approach starts from parallel corpora (for high precision) then enhances recall by extending the translation matrix based on node similarity and contextual similarity. Our experimental results on English-Chinese corpora show that both approaches are effective and complementary. Our combined approach outperforms the best-performing baseline in terms of F1-score by up to 0.28.
    Print ISSN: 1041-4347
    Electronic ISSN: 1558-2191
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 36
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2014-11-08
    Description: The knowledge remembered by the human body and reflected by the dexterity of body motion is called embodied knowledge. In this paper, we propose a new method using singular value decomposition for extracting embodied knowledge from the time-series data of the motion. We compose a matrix from the time-series data and use the left singular vectors of the matrix as the patterns of the motion and the singular values as a scalar, by which each corresponding left singular vector affects the matrix. Two experiments were conducted to validate the method. One is a gesture recognition experiment in which we categorize gesture motions by two kinds of models with indexes of similarity and estimation that use left singular vectors. The proposed method obtained a higher correct categorization ratio than principal component analysis (PCA) and correlation efficiency (CE). The other is an ambulation evaluation experiment in which we distinguished the levels of walking disability. The first singular values derived from the walking acceleration were suggested to be a reliable criterion to evaluate walking disability. Finally we discuss the characteristic and significance of the embodied knowledge extraction using the singular value decomposition proposed in this paper.
    Print ISSN: 1041-4347
    Electronic ISSN: 1558-2191
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 37
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2014-11-08
    Description: Edit distance is widely used for measuring the similarity between two strings. As a primitive operation, edit distance based string similarity search is to find strings in a collection that are similar to a given query string using edit distance. Existing approaches for answering such string similarity queries follow the filter-and-verify framework by using various indexes. Typically, most approaches assume that indexes and data sets are maintained in main memory. To overcome this limitation, in this paper, we propose B $^+$ -tree based approaches to answer edit distance based string similarity queries, and hence, our approaches can be easily integrated into existing RDBMSs. In general, we answer string similarity search using pruning techniques employed in the metric space in that edit distance is a metric. First, we split the string collection into partitions according to a set of reference strings. Then, we index strings in all partitions using a single B $^+$ -tree based on the distances of these strings to their corresponding reference strings. Finally, we propose two approaches to efficiently answer range and KNN queries, respectively, based on the B $^+$ -tree. We prove that the optimal partitioning of the data set is an NP-hard problem, and therefore propose a heuristic approach for selecting the reference strings greedily and present an optimal partition assignment strategy to minimize the expected number of strings that need to be verified during the query evaluation. Through extensive experiments over a variety of real data sets, we demonstrate that our B $^+$ -tree based approaches provide superior performance over state-of-the-art techniques on both range and KNN queries in most cases.
    Print ISSN: 1041-4347
    Electronic ISSN: 1558-2191
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 38
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2014-11-08
    Description: A top- k query retrieves the best (k) tuples by assigning scores for each tuple in a target relation with respect to a user-specific scoring function. This paper studies the problem of constructing an indexing structure for supporting top- k queries over varying scoring functions and retrieval sizes. The existing research efforts can be categorized into three approaches: list- , layer- , and view-based approaches. In this paper, we mainly focus on the layer-based approach that pre-materializes tuples into consecutive multiple layers. We first propose a dual-resolution layer that consists of coarse-level and fine-level layers. Specifically, we build coarse-level layers using skylines , and divide each coarse-level layer into fine-level sublayers using convex skylines . To make our proposed dual-resolution layer scalable , we then address the following optimization directions: 1) index construction; 2) disk-based storage scheme; 3) the design of the virtual layer; and 4) index maintenance for tuple updates. Our evaluation results show that our proposed method is more scalable than the state-of-the-art methods.
    Print ISSN: 1041-4347
    Electronic ISSN: 1558-2191
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 39
    Publication Date: 2014-11-09
    Description: Background: The rapid accumulation of whole-genome data has renewed interest in the study of using gene-order data for phylogenetic analyses and ancestral reconstruction. Current software and web servers typically do not support duplication and loss events along with rearrangements. Results: MLGOMLGO (Maximum Likelihood for Gene-Order Analysis) is a web tool for the reconstruction of phylogeny and/or ancestral genomes from gene-order data. MLGOMLGO is based on likelihood computation and shows advantages over existing methods in terms of accuracy, scalability and flexibility. Conclusions: To the best of our knowledge, it is the first web tool for analysis of large-scale genomic changes including not only rearrangements but also gene insertions, deletions and duplications. The web tool is available from http://www.geneorder.org/server.php.
    Electronic ISSN: 1471-2105
    Topics: Biology , Computer Science
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 40
    Publication Date: 2014-11-05
    Description: Background: The major histocompatibility complex (MHC) is responsible for presenting antigens (epitopes) on the surface of antigen-presenting cells (APCs). When pathogen-derived epitopes are presented by MHC class II on an APC surface, T cells may be able to trigger an specific immune response. Prediction of MHC-II epitopes is particularly challenging because the open binding cleft of the MHC-II molecule allows epitopes to bind beyond the peptide binding groove; therefore, the molecule is capable of accommodating peptides of variable length. Among the methods proposed to predict MHC-II epitopes, artificial neural networks (ANNs) and support vector machines (SVMs) are the most effective methods. We propose a novel classification algorithm to predict MHC-II called sparse representation via l1-minimization. Results: We obtained a collection of experimentally confirmed MHC-II epitopes from the Immune Epitope Database and Analysis Resource (IEDB) and applied our l1-minimization algorithm. To benchmark the performance of our proposed algorithm, we compared our predictions against a SVM classifier. We measured sensitivity, specificity and accuracy; then we used Receiver Operating Characteristic (ROC) analysis to evaluate the performance of our method.The prediction performance of MHC-II epitopes of the l1-minimization algorithm was generally comparable and, in some cases, superior to the standard SVM classification method and overcame the lack of robustness of other methods with respect to outliers. While our method consistently favored DPPS encoding with the alleles tested, SVM showed a slightly better accuracy when "11-factor" encoding was used. Conclusions: l1-minimization has similar accuracy than SVM, and has additional advantages, such as overcoming the lack of robustness with respect to outliers. With l1-minimization no model selection dependency is involved.
    Electronic ISSN: 1756-0381
    Topics: Biology , Computer Science
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 41
    Publication Date: 2014-12-13
    Description: Increased availability of multi-platform genomics data on matched samples has sparked research efforts to discover how diverse molecular features interact both within and between platforms. In addition, simultaneous measurements of genetic and epigenetic characteristics illuminate the roles their complex relationships play in disease progression and outcomes. However, integrative methods for diverse genomics data are faced with the challenges of ultra-high dimensionality and the existence of complex interactions both within and between platforms. We propose a novel modeling framework for integrative analysis based on decompositions of the large number of platform-specific features into a smaller number of latent features. Subsequently we build a predictive model for clinical outcomes accounting for both within- and between-platform interactions based on Bayesian model averaging procedures. Principal components, partial least squares and non-negative matrix factorization as well as sparse counterparts of each are used to define the latent features, and the performance of these decompositions is compared both on real and simulated data. The latent feature interactions are shown to preserve interactions between the original features and not only aid prediction but also allow explicit selection of outcome-related features. The methods are motivated by and applied to a glioblastoma multiforme data set from The Cancer Genome Atlas to predict patient survival times integrating gene expression, microRNA, copy number and methylation data. For the glioblastoma data, we find a high concordance between our selected prognostic genes and genes with known associations with glioblastoma. In addition, our model discovers several relevant cross-platform interactions such as copy number variation associated gene dosing and epigenetic regulation through promoter methylation. On simulated data, we show that our proposed method successfully incorporates interactions within and between g- nomic platforms to aid accurate prediction and variable selection. Our methods perform best when principal components are used to define the latent features.
    Print ISSN: 1545-5963
    Electronic ISSN: 1557-9964
    Topics: Biology , Computer Science
    Published by Institute of Electrical and Electronics Engineers (IEEE) on behalf of The IEEE Computational Intelligence Society ; The IEEE Computer Society ; The IEEE Control Systems Society ; The IEEE Engineering in Medicine and Biology Society ; The Association for Computing Machinery.
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 42
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2014-12-13
    Description: In Cyber-Physical Networked Systems (CPNS), the adversary can inject false measurements into the controller through compromised sensor nodes, which not only threaten the security of the system, but also consume network resources. To deal with this issue, a number of en-route filtering schemes have been designed for wireless sensor networks. However, these schemes either lack resilience to the number of compromised nodes or depend on the statically configured routes and node localization, which are not suitable for CPNS. In this paper, we propose a Polynomial-based Compromise-Resilient En-route Filtering scheme (PCREF), which can filter false injected data effectively and achieve a high resilience to the number of compromised nodes without relying on static routes and node localization. PCREF adopts polynomials instead of Message Authentication Codes (MACs) for endorsing measurement reports to achieve resilience to attacks. Each node stores two types of polynomials: authentication polynomial and check polynomial, derived from the primitive polynomial, and used for endorsing and verifying the measurement reports. Through extensive theoretical analysis and experiments, our data shows that PCREF achieves better filtering capacity and resilience to the large number of compromised nodes in comparison to the existing schemes.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 43
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2014-12-13
    Description: The Resistive Random Access Memory (RRAM) is a new type of non-volatile memory based on the resistive memory device. Researchers are currently moving from resistive device development to memory circuit design and implementation, hoping to fabricate memory chips that can be deployed in the market in the near future. However, so far the low manufacturing yield is still a major issue. In this paper, we propose defect and fault models specific to RRAM, i.e., the Over-Forming (OF) defect and the Read-One-Disturb (R1D) fault. We then propose a March algorithm to cover these defects and faults in addition to the conventional RAM faults, which is called March C*. We also develop a novel squeeze-search scheme to identify the OF defect, which leads to the Stuck-At Fault (SAF). The proposed test algorithm is applied to a first-cut 4-Mb HfO 2 -based RRAM test chip. Results show that OF defects and R1D faults do exist in the RRAM chip. We also identify specific failure patterns from the test results, which are shown to be induced by multiple short defects between bit-lines. By identifying the defects and faults, designers and process engineers can improve the RRAM yield in a more cost-effective way.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 44
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2014-12-13
    Description: In response to the increasing ubiquity of multicore processors, applications are usually designed or deployed to make each core busy. Unfortunately, lock contention within operating systems can limit the scalability of multicore systems so severely that an increase in the number of cores can actually lead to reduced performance (i.e., scalability collapse). Existing lock implementations have disadvantages in scalability, power consumption, and energy efficiency. In this paper, we observe that the number of tasks requesting a lock has a significant correlation with the occurrence of scalability collapse. Based on this observation, a lock implementation that allows tasks waiting for a lock to either spin or enter a power-saving state based on the number of requesters is proposed. Our lock protocol is called requester-based lock and is implemented in the Linux kernel to replace its default spin lock. Based on the results of a sensitivity analysis, we find that the best policy, in practice, for a task waiting for a lock to be granted is to enter the power-saving state immediately after noticing the lock cannot be acquired. Our requester-based lock scheme is evaluated using intensive benchmarking on AMD 32-core and Intel 40-core systems. Experimental results suggest that our lock avoids scalability collapse completely for most applications and shows better scalability, power consumption, and energy efficiency than previous work. Besides, the requester-based lock is extensible, which means using together with other kinds of spin locks can provide better scalability and energy efficiency.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 45
    Publication Date: 2014-12-13
    Description: Functionally equivalent web services can be composed to form more reliable service-oriented systems. However, the choice of fault tolerance strategy can have a significant effect on the quality-of-service (QoS) of the resulting service-oriented systems. In this paper, we investigate the problem of selecting an optimal fault tolerance strategy for building reliable service-oriented systems. We formulate the user requirements as local and global constraints and model the selection of fault tolerance strategy as an optimization problem. A heuristic algorithm is proposed to efficiently solve the optimization problem. Fault tolerance strategy selection for semantically related tasks is also investigated in this paper. Large-scale real-world experiments are conducted to illustrate the benefits of the proposed approach. The experimental results show that our problem modeling approach and the proposed selection algorithm make it feasible to manage the fault tolerance of complex service-oriented systems both efficiently and effectively.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 46
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2014-12-13
    Description: This paper describes an end-to-end system implementation of a transactional memory (TM) programming model on top of the hardware transactional memory (HTM) of the Blue Gene/Q machine. The TM programming model supports most C/C++ programming constructs using a best-effort HTM and the help of a complete software stack including the compiler, the kernel, and the TM runtime. An extensive evaluation of the STAMP and the RMS-TM benchmark suites on BG/Q is the first of its kind in understanding characteristics of running TM workloads on real hardware TM. The study reveals several interesting insights on the overhead and the scalability of BG/Q HTM with respect to sequential execution, coarse-grain locking, and software TM.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 47
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2014-12-13
    Description: Radio Frequency Identification (RFID) technology has been widely used in inventory management in many scenarios, e.g., warehouses, retail stores, hospitals, etc. This paper investigates a challenging problem of complete identification of missing tags in large-scale RFID systems. Although this problem has attracted extensive attention from academy and industry, the existing work can hardly satisfy the stringent real-time requirements. In this paper, a Slot Filter-based Missing Tag Identification (SFMTI) protocol is proposed to reconcile some expected collision slots into singleton slots and filter out the expected empty slots as well as the unreconcilable collision slots, thereby achieving the improved time-efficiency. The theoretical analysis is conducted to minimize the execution time of the proposed SFMTI. We then propose a cost-effective method to extend SFMTI to the multi-reader scenarios. The extensive simulation experiments and performance results demonstrate that the proposed SFMTI protocol outperforms the most promising Iterative ID-free Protocol (IIP) by reducing nearly 45% of the required execution time, and is just within a factor of 1.18 from the lower bound of the minimum execution time.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 48
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2014-12-13
    Description: Memristor-based memory technology, also referred to as resistive RAM (RRAM), is one of the emerging memory technologies potentially to replace conventional semiconductor memories such as SRAM, DRAM, and flash. Existing research on such novel circuits focuses mainly on the integration between CMOS and non-CMOS, fabrication techniques, and reliability improvement. However, research on (manufacturing) test for yield and quality improvement is still in its infancy stage. This paper presents fault analysis and modeling for open defects based on electrical simulation, introduces fault models, and proposes test approaches for RRAMs. The fault analysis reveals that unique faults occur in addition to some conventional memory faults, and the detection of such unique faults cannot be guaranteed with just the application of traditional march tests. The paper also presents a new Design-for-Testability (DfT) concept to facilitate the detection of the unique faults. Two DfT schemes are developed by exploiting the access time duration and supply voltage level of the RRAM cells, and their simulation results show that the fault coverage can be increased with minor circuit modification. As the fault behavior may vary due to process variations, the DfT schemes are extended to be programmable to track the changes and further improve the fault/defect coverage.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 49
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2014-12-13
    Description: Runtime power management using dynamic voltage and frequency scaling (DVFS) has been extensively studied for video processing applications. But there is only a little work on game power management although gaming applications are now widely run on battery-operated portable devices like mobile phones. Taking a cue from video power management, where PID controllers have been successfully used, they were recently applied to game workload prediction and DVFS. However, the use of hand-tuned PID controller gains on relatively short game plays left open questions on the robustness of the controller and the sensitivity of prediction quality on the choice of the gain values. In this paper, we try to systematically answer these questions. We first show that from the space of PID controller gain values, only a small subset leads to good game quality and power savings. Further, the choice of this set highly depends on the scene and the game application. For most gain values the controller becomes unstable, which can lead to large oscillations in the processor’s frequency setting and thereby poor results. We then study a number of time series models, such as a Least Mean Squares (LMS) Linear Predictor and its generalizations in the form of Autoregressive Moving Average (ARMA) models. These models learn most of the relevant model parameters iteratively as the game progresses, thereby dramatically reducing the complexity of manual parameter estimation. This makes them deployable in real setups, where all game plays and even game applications are not a priori known. We have evaluated each of these models (PID, LMS, and ARMA) for a variety of games—ranging from Quake II to more recent closed-source games such as Crysis, Need for Speed—Shift and World in Conflict—with very encouraging results. To the best of our knowledge, this is the first work that systematically explores (a) the feasibility of manually tuning PID controller parameters for p- wer management, (b) time series models for workload prediction for gaming applications, and (c) power management for closed-source games.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 50
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2014-12-13
    Description: Ciphertext Policy Attribute-Based Encryption (CP-ABE) enforces expressive data access policies and each policy consists of a number of attributes. Most existing CP-ABE schemes incur a very large ciphertext size, which increases linearly with respect to the number of attributes in the access policy. Recently, Herranz proposed a construction of CP-ABE with constant ciphertext. However, Herranz do not consider the recipients’ anonymity and the access policies are exposed to potential malicious attackers. On the other hand, existing privacy preserving schemes protect the anonymity but require bulky, linearly increasing ciphertext size. In this paper, we proposed a new construction of CP-ABE, named Privacy Preserving Constant CP-ABE (denoted as PP-CP-ABE) that significantly reduces the ciphertext to a constant size with any given number of attributes. Furthermore, PP-CP-ABE leverages a hidden policy construction such that the recipients’ privacy is preserved efficiently. As far as we know, PP-CP-ABE is the first construction with such properties. Furthermore, we developed a Privacy Preserving Attribute-Based Broadcast Encryption (PP-AB-BE) scheme. Compared to existing Broadcast Encryption (BE) schemes, PP-AB-BE is more flexible because a broadcasted message can be encrypted by an expressive hidden access policy, either with or without explicit specifying the receivers. Moreover, PP-AB-BE significantly reduces the storage and communication overhead to the order of ${mbi {O}}(log {mbi {N}})$ , where ${mbi {N}}$ is the system size. Also, we proved, using information theoretical approaches, PP-AB-BE attains minimal bound on storage overhead for each user to cover all possible subgroups in the communication system.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 51
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2014-12-13
    Description: Recent mobile devices adopt high-performance processors to support various functions. As a side effect, higher performance inevitably leads to power density increase, eventually resulting in thermal problems. In order to alleviate the thermal problems, off-the-shelf mobile devices rely on dynamic voltage-frequency scaling (DVFS)-based dynamic thermal management (DTM) schemes. Unfortunately, in the DVFS-based DTM schemes, an excessive number of DTM operations worsen not only performance but also power efficiency. In this paper, we propose a temperature-aware DVFS scheme for Android-based mobile devices to optimize power or performance depending on the option. We evaluate our scheme in the off-the-shelf mobile device. Our evaluation results show that our scheme saves energy consumption by 12.7%, on average, when we use the power optimizing option. Our scheme also enhances the performance by 6.3%, on average, by using the performance optimizing scheme, still reducing the energy consumption by 6.7%.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 52
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2014-12-13
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 53
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2014-12-13
    Description: Han propose a new method for parallel decimal multiplication with redundant partial products. They compare the performance of their multiplier with some previous relevant works, based on analytical and synthesis results. We have noted that the claimed critical delay path in (IEEE Trans. Computers, vol. 62, no. 5, pp. 956–968, May 2013) is faster than the actual critical delay path. Therefore, comparison results seem to be deceptive. For example, our accurate analytical evaluation devaluated the claimed speed advantage over the multiplier of (Microelectronics J., vol. 40, no. 10, pp. 1471–1481, Oct. 2009). Furthermore, we synthesized both multipliers, to show synthesis results confirm those of analytical evaluation.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 54
    Publication Date: 2014-12-16
    Description: Background: Genomic selection (GS) promises to improve accuracy in estimating breeding values and genetic gain for quantitative traits compared to traditional breeding methods. Its reliance on high-throughput genome-wide markers and statistical complexity, however, is a serious challenge in data management, analysis, and sharing. A bioinformatics infrastructure for data storage and access, and user-friendly web-based tool for analysis and sharing output is needed to make GS more practical for breeders. Results: We have developed a web-based tool, called solGS, for predicting genomic estimated breeding values (GEBVs) of individuals, using a Ridge-Regression Best Linear Unbiased Predictor (RR-BLUP) model. It has an intuitive web-interface for selecting a training population for modeling and estimating genomic estimated breeding values of selection candidates. It estimates phenotypic correlation and heritability of traits and selection indices of individuals. Raw data is stored in a generic database schema, Chado Natural Diversity, co-developed by multiple database groups. Analysis output is graphically visualized and can be interactively explored online or downloaded in text format. An instance of its implementation can be accessed at the NEXTGEN Cassava breeding database, http://cassavabase.org/solgs. Conclusions: solGS enables breeders to store raw data and estimate GEBVs of individuals online, in an intuitive and interactive workflow. It can be adapted to any breeding program.
    Electronic ISSN: 1471-2105
    Topics: Biology , Computer Science
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 55
    Publication Date: 2014-12-16
    Description: Background: According to Regulation (EU) No 619/2011, trace amounts of non-authorised genetically modified organisms (GMO) in feed are tolerated within the EU if certain prerequisites are met. Tolerable traces must not exceed the so-called `minimum required performance limit? (MRPL), which was defined according to the mentioned regulation to correspond to 0.1% mass fraction per ingredient. Therefore, not yet authorised GMO (and some GMO whose approvals have expired) have to be quantified at very low level following the qualitative detection in genomic DNA extracted from feed samples. As the results of quantitative analysis can imply severe legal and financial consequences for producers or distributors of feed, the quantification results need to be utterly reliable. Results: We developed a statistical approach to investigate the experimental measurement variability within one 96-well PCR plate. This approach visualises the frequency distribution as zygosity-corrected relative content of genetically modified material resulting from different combinations of transgene and reference gene Cq values. One application of it is the simulation of the consequences of varying parameters on measurement results. Parameters could be for example replicate numbers or baseline and threshold settings, measurement results could be for example median (class) and relative standard deviation (RSD). All calculations can be done using the built-in functions of Excel without any need for programming. The developed Excel spreadsheets are available (see section `Availability of supporting data? for details). In most cases, the combination of four PCR replicates for each of the two DNA isolations already resulted in a relative standard deviation of 15% or less. Conclusions: The aims of the study are scientifically based suggestions for minimisation of uncertainty of measurement especially in ?but not limited to? the field of GMO quantification at low concentration levels. Four PCR replicates for each of the two DNA isolations seem to be a reasonable minimum number to narrow down the possible spread of results.
    Electronic ISSN: 1471-2105
    Topics: Biology , Computer Science
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 56
    Publication Date: 2014-12-16
    Description: Background: Last generations of Single Nucleotide Polymorphism (SNP) arrays allow to study copy-number variations in addition to genotyping measures. Results: MPAgenomicsMPAgenomics, standing for multi-patient analysis (MPA) of genomic markers, is an R-package devoted to: (i) efficient segmentation and (i i) selection of genomic markers from multi-patient copy number and SNP data profiles. It provides wrappers from commonly used packages to streamline their repeated (sometimes difficult) manipulation, offering an easy-to-use pipeline for beginners in R.The segmentation of successive multiple profiles (finding losses and gains) is performed with an automatic choice of parameters involved in the wrapped packages. Considering multiple profiles in the same time, MPAgenomics MPAgenomics wraps efficient penalized regression methods to select relevant markers associated with a given outcome. Conclusions: MPAgenomics MPAgenomics provides an easy tool to analyze data from SNP arrays in R. The R-package MPAgenomics MPAgenomics is available on CRAN.
    Electronic ISSN: 1471-2105
    Topics: Biology , Computer Science
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 57
    Publication Date: 2014-12-16
    Description: Background: With the ever increasing use of computational models in the biosciences, the need to share models and reproduce the results of published studies efficiently and easily is becoming more important. To this end, various standards have been proposed that can be used to describe models, simulations, data or other essential information in a consistent fashion. These constitute various separate components required to reproduce a given published scientific result. Results: We describe the Open Modeling EXchange format (OMEX). Together with the use of other standard formats from the Computational Modeling in Biology Network (COMBINE), OMEX is the basis of the COMBINE Archive, a single file that supports the exchange of all the information necessary for a modeling and simulation experiment in biology. An OMEX file is a ZIP container that includes a manifest file, listing the content of the archive, an optional metadata file adding information about the archive and its content, and the files describing the model. The content of a COMBINE Archive consists of files encoded in COMBINE standards whenever possible, but may include additional files defined by an Internet Media Type. Several tools that support the COMBINE Archive are available, either as independent libraries or embedded in modeling software. Conclusions: The COMBINE Archive facilitates the reproduction of modeling and simulation experiments in biology by embedding all the relevant information in one file. Having all the information stored and exchanged at once also helps in building activity logs and audit trails. We anticipate that the COMBINE Archive will become a significant help for modellers, as the domain moves to larger, more complex experiments such as multi-scale models of organs, digital organisms, and bioengineering.
    Electronic ISSN: 1471-2105
    Topics: Biology , Computer Science
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 58
    Publication Date: 2014-12-16
    Description: Background: Management of diabetes mellitus is complex and involves controlling multiple risk factors that may lead to complications. Given that patients provide most of their own diabetes care, patient self-management training is an important strategy for improving quality of care. Web-based interventions have the potential to bridge gaps in diabetes self-care and self-management. The objective of this study was to determine the effect of a web-based patient self-management intervention on psychological (self-efficacy, quality of life, self-care) and clinical (blood pressure, cholesterol, glycemic control, weight) outcomes. Methods: For this cohort study we used repeated-measures modelling and qualitative individual interviews. We invited patients with type 2 diabetes to use a self-management website and asked them to complete questionnaires assessing self-efficacy (primary outcome) every three weeks for nine months before and nine months after they received access to the website. We collected clinical outcomes at three-month intervals over the same period. We conducted in-depth interviews at study conclusion to explore acceptability, strengths and weaknesses, and mediators of use of the website. We analyzed the data using a qualitative descriptive approach and inductive thematic analysis. Results: Eighty-one participants (mean age 57.2?years, standard deviation 12) were included in the analysis. The self-efficacy score did not improve significantly more than expected after nine months (absolute change 0.12; 95% confidence interval ?0.028, 0.263; p?=?0.11), nor did clinical outcomes. Website usage was limited (average 0.7 logins/month). Analysis of the interviews (n?=?21) revealed four themes:1) mediators of website use; 2) patterns of website use, including role of the blog in driving site traffic; 3) feedback on website; and 4) potential mechanisms for website effect. Conclusions: A self-management website for patients with type 2 diabetes did not improve self-efficacy. Website use was limited. Although its perceived reliability, availability of a blog and emailed reminders drew people to the website, participants? struggles with type 2 diabetes, competing priorities in their lives, and website accessibility were barriers to its use. Future interventions should aim to integrate the intervention seamlessly into the daily routine of end users such that it is not seen as yet another chore.
    Electronic ISSN: 1472-6947
    Topics: Computer Science , Medicine
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 59
    Publication Date: 2014-12-15
    Description: Background: With the advent of low cost, fast sequencing technologies metagenomic analyses are made possible. The large data volumes gathered by these techniques and the unpredictable diversity captured in them are still, however, a challenge for computational biology. Results: In this paper we address the problem of rapid taxonomic assignment with small and adaptive data models ( 〈 5 MB) and present the accelerated k-mer explorer (AKE). Acceleration in AKE?s taxonomic assignments is achieved by a special machine learning architecture, which is well suited to model data collections that are intrinsically hierarchical. We report classification accuracy reasonably well for ranks down to order, observed on a study on real world data (Acid Mine Drainage, Cow Rumen). Conclusion: We show that the execution time of this approach is orders of magnitude shorter than competitive approaches and that accuracy is comparable. The tool is presented to the public as a web application (url: https://ani.cebitec.uni-bielefeld.de/ake/, username: bmc, password: bmcbioinfo).
    Electronic ISSN: 1471-2105
    Topics: Biology , Computer Science
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 60
    Publication Date: 2014-12-15
    Description: Background: Next generation sequencing produces base calls with low quality scores that can affect the accuracy of identifying simple nucleotide variation calls, including single nucleotide polymorphisms and small insertions and deletions. Here we compare the effectiveness of two data preprocessing methods, masking and trimming, and the accuracy of simple nucleotide variation calls on whole-genome sequence data from Caenorhabditis elegans. Masking substitutes low quality base calls with `N?s (undetermined bases), whereas trimming removes low quality bases that results in a shorter read lengths. Results: We demonstrate that masking is more effective than trimming in reducing the false-positive rate in single nucleotide polymorphism (SNP) calling. However, both of the preprocessing methods did not affect the false-negative rate in SNP calling with statistical significance compared to the data analysis without preprocessing. False-positive rate and false-negative rate for small insertions and deletions did not show differences between masking and trimming. Conclusions: We recommend masking over trimming as a more effective preprocessing method for next generation sequencing data analysis since masking reduces the false-positive rate in SNP calling without sacrificing the false-negative rate although trimming is more commonly used currently in the field. The perl script for masking is available at http://code.google.com/p/subn/. The sequencing data used in the study were deposited in the Sequence Read Archive (SRX450968 and SRX451773).
    Electronic ISSN: 1471-2105
    Topics: Biology , Computer Science
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 61
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2014-12-06
    Description: Learn about metasystems, their characteristics, and the challenges IT professionals and systems engineers face in designing and managing such systems.
    Print ISSN: 1520-9202
    Electronic ISSN: 1941-045X
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 62
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2014-12-06
    Description: Before writing a single line of code, software engineers can increase application assurance by instituting the practice recommendations articulated in their enterprise architecture. Many Common Weakness Enumerations (CWEs) can be addressed in the architecture and design phases of the development life cycle. Architectural and design flaws found late in the SDLC can be costly to repair; often, these flaws are so baked into the application that they're resistant to code patches. The only viable response might be to catalogue their existence for a later redesign of the application. Moreover, patches to flaws can inject additional defects as well as alert adversaries to the existence of these flaws.
    Print ISSN: 1520-9202
    Electronic ISSN: 1941-045X
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 63
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2014-12-06
    Description: Cloud computing provides flexibility and agility to meet growing business needs in a dynamic and competitive landscape. Banking, financial services, and insurance sector organizations are interested in exploring cloud services as a technology, provided that security and privacy are ensured. One solution is a community cloud, in which cloud services are targeted for organizations with common objectives and security controls. The Indian Banking Community Cloud (IBCC) initiative of the Institute for Development and Research in Banking Technology in Hyderabad, India, provides cloud-based services exclusively for Indian banks. In this article, the authors describe the IBCC architecture, along with its implementation details, cloud services offered, security and disaster-recovery aspects, deployment challenges, and future work. This article is part of a special issue on advancing cloud computing.
    Print ISSN: 1520-9202
    Electronic ISSN: 1941-045X
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 64
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2014-12-06
    Description: Motivated by a continually increasing demand for applications that depend on machine comprehension of text-based content, researchers in both academia and industry have developed innovative solutions for automated information extraction from text. In this article, the authors focus on a subset of such tools--semantic taggers--that not only extract and disambiguate entities mentioned in the text but also identify topics that unambiguously describe the text's main themes. The authors offer insight into the process of semantic tagging, the capabilities and specificities of today's semantic taggers, and also indicate some of the criteria to be considered when choosing a tagger.
    Print ISSN: 1520-9202
    Electronic ISSN: 1941-045X
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 65
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2014-12-06
    Description: The objectives of datacenter consolidation are cost reduction and superior services. A datacenter consolidation plan includes minimizing investments in IT infrastructure and buildings and reducing power consumption related to cooling. Such a process requires scalable planning and implementation. Virtualization is the most popular and cost-effective technology for datacenter consolidation. In this article, the author runs a cost-benefit analysis of virtualization and datacenter consolidation using the Global Virtual datacenter online calculator and VMware's ROI CO ((for) return on investment/total cost of ownership) calculator version 3.0.
    Print ISSN: 1520-9202
    Electronic ISSN: 1941-045X
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 66
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2014-12-06
    Description: The adoption of various converging trends in IT, such as cloud computing, the Internet of Things (IoT), crypto-currency, autonomous systems, and big data, challenge traditional notions of program management and highlight the importance of computational networks.
    Print ISSN: 1520-9202
    Electronic ISSN: 1941-045X
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 67
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2014-12-06
    Description: Provides a listing of current staff, committee members and society officers.
    Print ISSN: 1520-9202
    Electronic ISSN: 1941-045X
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 68
    Publication Date: 2014-12-09
    Description: Background: Online cancer information can support patients in making treatment decisions. However, such information may not be adequately tailored to the patient?s perspective, particularly if healthcare professionals do not sufficiently engage patient groups when developing online information. We applied qualitative user testing during the development of a patient information website on stereotactic ablative radiotherapy (SABR), a new guideline-recommended curative treatment for early-stage lung cancer. Methods: We recruited 27 participants who included patients referred for SABR and their relatives. A qualitative user test of the website was performed in 18 subjects, followed by an additional evaluation by users after website redesign (N?=?9). We primarily used the `thinking aloud? approach and semi-structured interviewing. Qualitative data analysis was performed to assess the main findings reported by the participants. Results: Study participants preferred receiving different information that had been provided initially. Problems identified with the online information related to comprehending medical terminology, understanding the scientific evidence regarding SABR, and appreciating the side-effects associated with SABR. Following redesign of the website, participants reported fewer problems with understanding content, and some additional recommendations for better online information were identified. Conclusions: Our findings indicate that input from patients and their relatives allows for a more comprehensive and usable website for providing treatment information. Such a website can facilitate improved patient participation in treatment decision-making for cancer.
    Electronic ISSN: 1472-6947
    Topics: Computer Science , Medicine
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 69
    Publication Date: 2014-12-01
    Description: Background: The identification of new diagnostic or prognostic biomarkers is one of the main aims of clinical cancer research. Technologies like mass spectrometry are commonly being used in proteomic research. Mass spectrometry signals show the proteomic profiles of the individuals under study at a given time. These profiles correspond to the recording of a large number of proteins, much larger than the number of individuals. These variables come in addition to or to complete classical clinical variables. The objective of this study is to evaluate and compare the predictive ability of new and existing models combining mass spectrometry data and classical clinical variables. This study was conducted in the context of binary prediction. Results: To achieve this goal, simulated data as well as a real dataset dedicated to the selection of proteomic markers of steatosis were used to evaluate the methods. The proposed methods meet the challenge of high-dimensional data and the selection of predictive markers by using penalization methods (Ridge, Lasso) and dimension reduction techniques (PLS), as well as a combination of both strategies through sparse PLS in the context of a binary class prediction. The methods were compared in terms of mean classification rate and their ability to select the true predictive values. These comparisons were done on clinical-only models, mass-spectrometry-only models and combined models. Conclusions: It was shown that models which combine both types of data can be more efficient than models that use only clinical or mass spectrometry data when the sample size of the dataset is large enough.
    Electronic ISSN: 1471-2105
    Topics: Biology , Computer Science
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 70
    Publication Date: 2014-12-01
    Description: Background: In order to extract meaningful information from electronic medical records, such as signs and symptoms, diagnoses, and treatments, it is important to take into account the contextual properties of the identified information: negation, temporality, and experiencer. Most work on automatic identification of these contextual properties has been done on English clinical text. This study presents ContextD, an adaptation of the English ConText algorithm to the Dutch language, and a Dutch clinical corpus.We created a Dutch clinical corpus containing four types of anonymized clinical documents: entries from general practitioners, specialists? letters, radiology reports, and discharge letters. Using a Dutch list of medical terms extracted from the Unified Medical Language System, we identified medical terms in the corpus with exact matching. The identified terms were annotated for negation, temporality, and experiencer properties. To adapt the ConText algorithm, we translated English trigger terms to Dutch and added several general and document specific enhancements, such as negation rules for general practitioners? entries and a regular expression based temporality module. Results: The ContextD algorithm utilized 41 unique triggers to identify the contextual properties in the clinical corpus. For the negation property, the algorithm obtained an F-score from 87% to 93% for the different document types. For the experiencer property, the F-score was 99% to 100%. For the historical and hypothetical values of the temporality property, F-scores ranged from 26% to 54% and from 13% to 44%, respectively. Conclusions: The ContextD showed good performance in identifying negation and experiencer property values across all Dutch clinical document types. Accurate identification of the temporality property proved to be difficult and requires further work. The anonymized and annotated Dutch clinical corpus can serve as a useful resource for further algorithm development.
    Electronic ISSN: 1471-2105
    Topics: Biology , Computer Science
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 71
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2014-12-06
    Description: In creating the Open Networking Foundation's conformance testing program for the OpenFlow networking specification, economic, technological, and market drivers must be harmonized, allowing for the simultaneous development of consumer confidence, industry competition, and trustworthy product validation.
    Print ISSN: 0018-9162
    Electronic ISSN: 1558-0814
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 72
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2014-12-06
    Description: Recently, Apple admitted that revealing photos of celebrities had been released on the Internet due to security breaches associated with its iCloud and Find My iPhone systems.
    Print ISSN: 0018-9162
    Electronic ISSN: 1558-0814
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 73
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2014-12-06
    Description: An integrity level defines a required level of confidence that a system satisfies critical properties related to relevant risk criteria. However, integrity level terms and definitions differ across industry sectors, and this hampers a common understanding and application of integrity levels.
    Print ISSN: 0018-9162
    Electronic ISSN: 1558-0814
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 74
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2014-12-06
    Description: Describes the above-named upcoming conference event. May include topics to be covered or calls for papers.
    Print ISSN: 0018-9162
    Electronic ISSN: 1558-0814
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 75
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2014-12-06
    Description: Software-defined networking opens up new possibilities for architectures based on open source components, promising improved orchestration and agility, lower operational costs, and--most important--a wave of innovation. The Web extra at http://youtu.be/pdG2btcyyK8 is a video in which authors Christian Esteve Rothenberg, Roy Chua, and Thomas Nadeau present a slideshow and discuss how software-defined networking opens up new possibilities for architectures based on open-source components, promising improved orchestration and agility, lower operational costs, and a new wave of innovation.
    Print ISSN: 0018-9162
    Electronic ISSN: 1558-0814
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 76
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2014-12-06
    Description: Why do dynamic power-management technologies that dramatically improve datacenter server energy efficiency continue to go unleveraged?
    Print ISSN: 0018-9162
    Electronic ISSN: 1558-0814
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 77
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2014-12-06
    Description: To adequately address climate change, we need novel data-science methods that account for the spatiotemporal and physical nature of climate phenomena. Only then will we be able to move from statistical analysis to scientific insights.
    Print ISSN: 0018-9162
    Electronic ISSN: 1558-0814
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 78
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2014-12-06
    Description: Amateur software developers might lack precise technical skills, but they bring detailed knowledge of their environments to the table. The first Web extra at http://youtu.be/r-kIJQu4iDQ is an audio recording of author David Alan Grier reading his Errant Hashtag column in which he discusses how amateur software developers might lack precise technical skills but bring detailed knowledge of their environments to the table. The second Web extra at http://youtu.be/EDKeN9mVfwk is an audio recording of author David Alan Grier discussing a recent report on electronic voting by the Atlantic Council, a Washington DC think tank, that shows that e-voting is still a risk that citizens of democracies and engineers should take into account.
    Print ISSN: 0018-9162
    Electronic ISSN: 1558-0814
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 79
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2014-12-06
    Description: A summary of articles recently published in IEEE Computer Society magazines.
    Print ISSN: 0018-9162
    Electronic ISSN: 1558-0814
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 80
    Publication Date: 2014-12-06
    Description: Background: Early recognition of severe sepsis and septic shock is challenging. The aim of this study was to determine the diagnostic accuracy of an electronic alert system in detecting severe sepsis or septic shock among emergency department (ED) patients. Methods: An electronic sepsis alert system was developed as a part of a quality-improvement project for severe sepsis and septic shock. The system screened all adult ED patients for a combination of systemic inflammatory response syndrome and organ dysfunction criteria (hypotension, hypoxemia or lactic acidosis). This study included all patients older than 14?years who presented to the ED of a tertiary care academic medical center from Oct. 1, 2012 to Jan. 31, 2013. As a comparator, emergency medicine physicians or the critical care physician identified the patients with severe sepsis or septic shock.In the ED, vital signs were manually entered into the hospital electronic heath record every hour in the critical care area and every two hours in other areas. We also calculated the time from the alert to the intensive care unit (ICU) referral. Results: Of the 49,838 patients who presented to the ED, 222 (0.4%) were identified to have severe sepsis or septic shock. The electronic sepsis alert had a sensitivity of 93.18% (95% CI, 88.78% - 96.00%), specificity of 98.44 (95% CI, 98.33% ? 98.55%), positive predictive value of 20.98% (95% CI, 18.50% ? 23.70%) and negative predictive value of 99.97% (95% CI, 99.95% ? 99.98%) for severe sepsis and septic shock. The alert preceded ICU referral by a median of 4.02?hours (Q1 - Q3: 1.25?8.55). Conclusions: Our study shows that electronic sepsis alert tool has high sensitivity and specificity in recognizing severe sepsis and septic shock, which may improve early recognition and management.
    Electronic ISSN: 1472-6947
    Topics: Computer Science , Medicine
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 81
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2014-12-06
    Print ISSN: 1041-4347
    Electronic ISSN: 1558-2191
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 82
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2014-12-06
    Description: The past decade has seen a dramatic increase in the amount of data captured and made available to scientists for research. This increase amplifies the difficulty scientists face in finding the data most relevant to their information needs. In prior work, we hypothesized that Information Retrieval-style ranked search can be applied to data sets to help a scientist discover the most relevant data amongst the thousands of data sets in many formats, much like text-based ranked search helps users make sense of the vast number of Internet documents. To test this hypothesis, we explored the use of ranked search for scientific data using an existing multi-terabyte observational archive as our test-bed. In this paper, we investigate whether the concept of varying relevance, and therefore ranked search, applies to numeric data—that is, are data sets are enough like documents for Information Retrieval techniques and evaluation measures to apply? We present a user study that demonstrates that data set similarity resonates with users as a basis for relevance and, therefore, for ranked search. We evaluate a prototype implementation of ranked search over data sets with a second user study and demonstrate that ranked search improves a scientist's ability to find needed data.
    Print ISSN: 1041-4347
    Electronic ISSN: 1558-2191
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 83
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2014-12-06
    Description: In recent years, probabilistic data management has received a lot of attention due to several applications that deal with uncertain data: RFID systems, sensor networks, data cleaning, scientific and biomedical data management, and approximate schema mappings. Query evaluation is a challenging problem in probabilistic databases, proved to be #P-hard. A general method for query evaluation is based on the lineage of the query and reduces the query evaluation problem to computing the probability of a propositional formula. The main approaches proposed in the literature to approximate probabilistic queries confidence computation are based on Monte Carlo simulation, or formula compilation into decision diagrams (e.g., d-trees). The former executes a polynomial, but with too many, iterations, while the latter is polynomial for easy queries, but may be exponential in the worst case. We designed a new optimized Monte Carlo algorithm that drastically reduces the number of iterations and proposed an efficient parallel version that we implemented on GPU. Thanks to the elevated degree of parallelism provided by the GPU, combined with the linear speedup of our algorithm, we managed to reduce significantly the long running time required by a sequential Monte Carlo algorithm. Experimental results show that our algorithm is so efficient as to be comparable with the formula compilation approach, but with the significant advantage of avoiding exponential behavior.
    Print ISSN: 1041-4347
    Electronic ISSN: 1558-2191
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 84
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2014-12-06
    Description: Graph-based ranking models have been widely applied in information retrieval area. In this paper, we focus on a well known graph-based model - the Ranking on Data Manifold model, or Manifold Ranking (MR). Particularly, it has been successfully applied to content-based image retrieval, because of its outstanding ability to discover underlying geometrical structure of the given image database. However, manifold ranking is computationally very expensive, which significantly limits its applicability to large databases especially for the cases that the queries are out of the database (new samples). We propose a novel scalable graph-based ranking model called Efficient Manifold Ranking (EMR), trying to address the shortcomings of MR from two main perspectives: scalable graph construction and efficient ranking computation. Specifically, we build an anchor graph on the database instead of a traditional $k$ -nearest neighbor graph, and design a new form of adjacency matrix utilized to speed up the ranking. An approximate method is adopted for efficient out-of-sample retrieval. Experimental results on some large scale image databases demonstrate that EMR is a promising method for real world retrieval applications.
    Print ISSN: 1041-4347
    Electronic ISSN: 1558-2191
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 85
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2014-12-06
    Description: Product quantization-based approaches are effective to encode high-dimensional data points for approximate nearest neighbor search. The space is decomposed into a Cartesian product of low-dimensional subspaces, each of which generates a sub codebook. Data points are encoded as compact binary codes using these sub codebooks, and the distance between two data points can be approximated efficiently from their codes by the precomputed lookup tables. Traditionally, to encode a subvector of a data point in a subspace, only one sub codeword in the corresponding sub codebook is selected, which may impose strict restrictions on the search accuracy. In this paper, we propose a novel approach, named optimized cartesian K-means (ock-means), to better encode the data points for more accurate approximate nearest neighbor search. In ock-means, multiple sub codewords are used to encode the subvector of a data point in a subspace. Each sub codeword stems from different sub codebooks in each subspace, which are optimally generated with regards to the minimization of the distortion errors. The high-dimensional data point is then encoded as the concatenation of the indices of multiple sub codewords from all the subspaces. This can provide more flexibility and lower distortion errors than traditional methods. Experimental results on the standard real-life data sets demonstrate the superiority over state-of-the-art approaches for approximate nearest neighbor search.
    Print ISSN: 1041-4347
    Electronic ISSN: 1558-2191
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 86
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2014-12-06
    Description: This paper considers the problem of determinizing probabilistic data to enable such data to be stored in legacy systems that accept only deterministic input. Probabilistic data may be generated by automated data analysis/enrichment techniques such as entity resolution, information extraction, and speech processing. The legacy system may correspond to pre-existing web applications such as Flickr, Picasa, etc. The goal is to generate a deterministic representation of probabilistic data that optimizes the quality of the end-application built on deterministic data. We explore such a determinization problem in the context of two different data processing tasks—triggers and selection queries. We show that approaches such as thresholding or top-1 selection traditionally used for determinization lead to suboptimal performance for such applications. Instead, we develop a query-aware strategy and show its advantages over existing solutions through a comprehensive empirical evaluation over real and synthetic datasets.
    Print ISSN: 1041-4347
    Electronic ISSN: 1558-2191
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 87
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2014-12-06
    Description: Identifying which text corpus leads in the context of a topic presents a great challenge of considerable interest to researchers. Recent research into lead-lag analysis has mainly focused on estimating the overall leads and lags between two corpora. However, real-world applications have a dire need to understand lead-lag patterns both globally and locally. In this paper, we introduce TextPioneer , an interactive visual analytics tool for investigating lead-lag across corpora from the global level to the local level. In particular, we extend an existing lead-lag analysis approach to derive two-level results. To convey multiple perspectives of the results, we have designed two visualizations, a novel hybrid tree visualization that couples a radial space-filling tree with a node-link diagram and a twisted-ladder-like visualization. We have applied our method to several corpora and the evaluation shows promise, especially in support of text comparison at different levels of detail.
    Print ISSN: 1041-4347
    Electronic ISSN: 1558-2191
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 88
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2014-12-06
    Print ISSN: 1041-4347
    Electronic ISSN: 1558-2191
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 89
    Publication Date: 2014-01-14
    Description: Background: Gene selection is an important part of microarray data analysis because it provides information thatcan lead to a better mechanistic understanding of an investigated phenomenon. At the same time,gene selection is very difficult because of the noisy nature of microarray data. As a consequence,gene selection is often performed with machine learning methods. The Random Forest method isparticularly well suited for this purpose. In this work, four state-of-the-art Random Forest-basedfeature selection methods were compared in a gene selection context. The analysis focused on thestability of selection because, although it is necessary for determining the significance of results, it isoften ignored in similar studies. Results: The comparison of post-selection accuracy in the validation of Random Forest classifiers revealed thatall investigated methods were equivalent in this context. However, the methods substantially differedwith respect to the number of selected genes and the stability of selection. Of the analysed methods,the Boruta algorithm predicted the most genes as potentially important. Conclusions: The post-selection classifier error rate, which is a frequently used measure, was found to be apotentially deceptive measure of gene selection quality. When the number of consistently selectedgenes was considered, the Boruta algorithm was clearly the best. Although it was also the mostcomputationally intensive method, the Boruta algorithm's computational demands could be reducedto levels comparable to those of other algorithms by replacing the Random Forest importance witha comparable measure from Random Ferns (a similar but simplified classifier). Despite their designassumptions, the minimal-optimal selection methods, were found to select a high fraction of falsepositives.
    Electronic ISSN: 1471-2105
    Topics: Biology , Computer Science
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 90
    Publication Date: 2014-01-15
    Description: Background: The Kruskal-Wallis test is a popular non-parametric statistical test for identifying expression quantitativetrait loci (eQTLs) from genome-wide data due to its robustness against variations in the underlyinggenetic model and expression trait distribution, but testing billions of marker-trait combinationsone-by-one can become computationally prohibitive. Results: We developed kruX, an algorithm implemented in Matlab, Python and R that uses matrix multiplicationsto simultaneously calculate the Kruskal-Wallis test statistic for several millions of marker-traitcombinations at once. KruX is more than ten thousand times faster than computing associations oneby-one on a typical human dataset. We used kruX and a dataset of more than 500k SNPs and 20kexpression traits measured in 102 human blood samples to compare eQTLs detected by the Kruskal-Wallis test to eQTLs detected by the parametric ANOVA and linear model methods. We found that theKruskal-Wallis test is more robust against data outliers and heterogeneous genotype group sizes anddetects a higher proportion of non-linear associations, but is more conservative for calling additivelinear associations. Conclusion: kruX enables the use of robust non-parametric methods for massive eQTL mapping without the needfor a high-performance computing infrastructure and is freely available from http://krux.googlecode.com.
    Electronic ISSN: 1471-2105
    Topics: Biology , Computer Science
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 91
    Publication Date: 2014-01-19
    Description: Background: Glioblastoma is the most aggressive primary central nervous tumor and carries a very poor prognosis. Invasion precludes effective treatment and virtually assures tumor recurrence. In the current study, we applied analytical and bioinformatics approaches to identify a set of microRNAs (miRs) from several different human glioblastoma cell lines that exhibit significant differential expression between migratory (edge) and migration-restricted (core) cell populations. The hypothesis of the study is that differential expression of miRs provides an epigenetic mechanism to drive cell migration and invasion. Results: Our research data comprise gene expression values for a set of 805 human miRs collected from matched pairs of migratory and migration-restricted cell populations from seven different glioblastoma cell lines. We identified 62 down-regulated and 2 up-regulated miRs that exhibit significant differential expression in the migratory (edge) cell population compared to matched migration-restricted (core) cells. We then conducted target prediction and pathway enrichment analysis with these miRs to investigate potential associated gene and pathway targets. Several miRs in the list appear to directly target apoptosis related genes. The analysis identifies a set of genes that are predicted by 3 different algorithms, further emphasizing the potential validity of these miRs to promote glioblastoma. Conclusions: The results of this study identify a set of miRs with potential for decreased expression in invasive glioblastoma cells. The verification of these miRs and their associated targeted proteins provides new insights for further investigation into therapeutic interventions. The methodological approaches employed here could be applied to the study of other diseases to provide biomedical researchers and clinicians with increased opportunities for therapeutic interventions.
    Electronic ISSN: 1471-2105
    Topics: Biology , Computer Science
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 92
    Publication Date: 2014-01-21
    Description: Background: The comparative modeling approach to protein structure prediction inherently relies on a template structure. Before building a model such a template protein has to be found and aligned with the query sequence. Any error made on this stage may dramatically affects the quality of result. There is a need, therefore, to develop accurate and sensitive alignment protocols. Results: BioShell threading software is a versatile tool for aligning protein structures, protein sequences or sequence profiles and query sequences to a template structures. The software is also capable of suboptimal alignment generation. It can be executed as an application from the UNIX command line, or as a set of Java classes called from a script or a Java application. The implemented Monte Carlo search engine greatly facilitates the development and benchmarking of new alignment scoring schemes evenwhen the functions exhibit non-deterministic polynomial-time complexity. Conclusions: Numerical experiments indicate that the new threading application offers template detection abilities and provides much better alignments than other methods. The package along with documentation and examples is available at: http://bioshell.pl/threading3d
    Electronic ISSN: 1471-2105
    Topics: Biology , Computer Science
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 93
    Publication Date: 2014-01-15
    Description: Background: Breast cancer risk reduction has the potential to decrease the incidence of the disease, yet remains underused. We report on the development a web-based tool that provides automated risk assessment and personalized decision support designed for collaborative use between patients and clinicians. Methods: Under Institutional Review Board approval, we evaluated the decision tool through a patient focus group, usability testing, and provider interviews (including breast specialists, primary care physicians, genetic counselors). This included demonstrations and data collection at two scientific conferences (2009 International Shared Decision Making Conference, 2009 San Antonio Breast Cancer Symposium). Results: Overall, the evaluations were favorable. The patient focus group evaluations and usability testing (N = 34) provided qualitative feedback about format and design; 88% of these participants found the tool useful and 94% found it easy to use. 91% of the providers (N = 23) indicated that they would use the tool in their clinical setting. Conclusion: BreastHealthDecisions.org represents a new approach to breast cancer prevention care and a framework for high quality preventive healthcare. The ability to integrate risk assessment and decision support in real time will allow for informed, value-driven, and patient-centered breast cancer prevention decisions. The tool is being further evaluated in the clinical setting.
    Electronic ISSN: 1472-6947
    Topics: Computer Science , Medicine
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 94
    Publication Date: 2014-01-16
    Description: Background: Independent data sources can be used to augment post-marketing drug safety signal detection. The vast amount of publicly available biomedical literature contains rich side effect information for drugs at all clinical stages. In this study, we present a large-scale signal boosting approach that combines over 4 million records in the US Food and Drug Administration (FDA) Adverse Event Reporting System (FAERS) and over 21 million biomedical articles. Results: The datasets are comprised of 4,285,097 records from FAERS and 21,354,075 MEDLINE articles. We first extracted all drug-side effect (SE) pairs from FAERS. Our study implemented a total of seven signal ranking algorithms. We then compared these different ranking algorithms before and after they were boosted with signals from MEDLINE sentences or abstracts. Finally, we manually curated all drug-cardiovascular (CV) pairs that appeared in both data sources and investigated whether our approach can detect many true signals that have not been included in FDA drug labels. We extracted a total of 2,787,797 drug-SE pairs from FAERS with a low initial precision of 0.025. The ranking algorithm combined signals from both FAERS and MEDLINE, significantly improving the precision from 0.025 to 0.371 for top-ranked pairs, representing a 13.8 fold elevation in precision. We showed by manual curation that drug-SE pairs that appeared in both data sources were highly enriched with true signals, many of which have not yet been included in FDA drug labels. Conclusions: We have developed an efficient and effective drug safety signal ranking and strengthening approach We demonstrate that large-scale combining information from FAERS and biomedical literature can significantly contribute to drug safety surveillance.
    Electronic ISSN: 1471-2105
    Topics: Biology , Computer Science
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 95
    Publication Date: 2014-01-16
    Description: Background: Computational methods for the prediction of protein features from sequence are a long-standing focusof bioinformatics. A key observation is that several protein features are closely inter-related, that is,they are conditioned on each other. Researchers invested a lot of effort into designing predictors thatexploit this fact. Most existing methods leverage inter-feature constraints by including known (orpredicted) correlated features as inputs to the predictor, thus conditioning the result. Results: By including correlated features as inputs, existing methods only rely on one side of the relation:the output feature is conditioned on the known input features. Here we show how to jointly improvethe outputs of multiple correlated predictors by means of a probabilistic-logical consistencylayer. The logical layer enforces a set of weighted first-order rules encoding biological constraintsbetween the features, and improves the raw predictions so that they least violate the constraints. Inparticular, we show how to integrate three stand-alone predictors of correlated features: subcellular localization(Loctree [J Mol Biol 348:85-100, 2005]), disulfide bonding state (Disulfind [Nucleic AcidsRes 34:W177-W181, 2006]), and metal bonding state (MetalDetector [Bioinformatics 24:2094-2095,2008]), in a way that takes into account the respective strengths and weaknesses, and does not requireany change to the predictors themselves. We also compare our methodology against two alternativerefinement pipelines based on state-of-the-art sequential prediction methods. Conclusions: The proposed framework is able to improve the performance of the underlying predictors by removingrule violations. We show that different predictors offer complementary advantages, and our method isable to integrate them using non-trivial constraints, generating more consistent predictions. In addition,our framework is fully general, and could in principle be applied to a vast array of heterogeneouspredictions without requiring any change to the underlying software. On the other hand, the alternativestrategies are more specific and tend to favor one task at the expense of the others, as shown byour experimental evaluation. The ultimate goal of our framework is to seamlessly integrate full predictionsuites, such as Distill [BMC Bioinformatics 7:402, 2006] and PredictProtein [Nucleic AcidsRes 32:W321-W326, 2004].
    Electronic ISSN: 1471-2105
    Topics: Biology , Computer Science
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 96
    Publication Date: 2014-01-14
    Description: Background: Logos are commonly used in molecular biology to provide a compact graphical representation of the conservation pattern of a set of sequences. They render the information contained in sequence alignments or profile hidden Markov models by drawing a stack of letters for each position, where the height of the stack corresponds to the conservation at that position, and the height of each letter within a stack depends on the frequency of that letter at that position. Results: We present a new tool and web server, called Skylign, which provides a unified framework for creating logos for both sequence alignments and profile hidden Markov models. In addition to static image files, Skylign creates a novel interactive logo plot for inclusion in web pages. These interactive logos enable scrolling, zooming, and inspection of underlying values. Skylign can avoid sampling bias in sequence alignments by down-weighting redundant sequences and by combining observed counts with informed priors. It also simplifies the representation of gap parameters, and can optionally scale letter heights based on alternate calculations of the conservation of a position. Conclusion: Skylign is available as a website, a scriptable web service with a RESTful interface, and as a software package for download. Skylign's interactive logos are easily incorporated into a web page with just a few lines of HTML markup. Skylign may be found at http://skylign.org.
    Electronic ISSN: 1471-2105
    Topics: Biology , Computer Science
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 97
    Publication Date: 2014-01-15
    Description: Background: Gene set analysis (GSA) is useful in deducing biological significance of gene lists using a priori defined gene sets such as gene ontology (GO) or pathways. Phenotypic annotation is sparse for human genes, but is far more abundant for other model organisms such as mouse, fly, and worm. Often, GSA needs to be done highly interactively by combining or modifying gene lists or inspecting gene-gene interactions in a molecular network.Description: We developed gsGator, a web-based platform for functional interpretation of gene sets with useful features such as cross-species GSA, simultaneous analysis of multiple gene sets, and a fully integrated network viewer for visualizing both GSA results and molecular networks. An extensive set of gene annotation information is amassed including GO & pathways, genomic annotations, protein-protein interaction, transcription factor-target (TF-target), miRNA targeting, and phenotype information for various model organisms. By combining the functionalities of Set Creator, Set Operator and Network Navigator, user can perform highly flexible and interactive GSA by creating a new gene list by any combination of existing gene sets (intersection, union and difference) or expanding genes interactively along the molecular networks such as protein-protein interaction and TF-target. We also demonstrate the utility of our interactive and cross-species GSA implemented in gsGator by several usage examples for interpreting genome-wide association study (GWAS) results. gsGator is freely available at http://gsGator.ewha.ac.kr. Conclusions: Interactive and cross-species GSA in gsGator greatly extends the scope and utility of GSA, leading to novel insights via conserved functional gene modules across different species.
    Electronic ISSN: 1471-2105
    Topics: Biology , Computer Science
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 98
    Publication Date: 2014-01-15
    Description: Background: Interpretation of binding modes of protein-small ligand complexes from 3D structure data is essential for understanding selective ligand recognition by proteins. It is often performed by visual inspection and sometimes largely depends on a priori knowledge about typical interactions such as hydrogen bonds and pi-pi stacking. Because it can introduce some biases due to scientists' subjective perspectives, more objective viewpoints considering a wide range of interactions are required.Description: In this paper, we present a web server for analyzing protein-small ligand interactions on the basis of patterns of atomic contacts, or "interaction patterns" obtained from the statistical analyses of 3D structures of protein-ligand complexes in our previous study. This server can guide visual inspection by providing information about interaction patterns for each atomic contact in 3D structures. Users can visually investigate what atomic contacts in user-specified 3D structures of protein-small ligand complexes are statistically overrepresented. This server consists of two main components: "Complex Analyzer," and "Pattern Viewer." The former provides a 3D structure viewer with annotations of interacting amino acid residues, ligand atoms, and interacting pairs of these. In the annotations of interacting pairs, assignment to an interaction pattern of each contact and statistical preferences of the patterns are presented. The "Pattern Viewer" provides details of each interaction pattern. Users can see visual representations of probability density functions of interactions, and a list of protein-ligand complexes showing similar interactions. Conclusions: Users can interactively analyze protein-small ligand binding modes with statistically determined interaction patterns rather than relying on a priori knowledge of the users, by using our new web server named GIANT that is freely available at http://giant.hgc.jp/.
    Electronic ISSN: 1471-2105
    Topics: Biology , Computer Science
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 99
    Publication Date: 2014-01-16
    Description: Background: Different methods have been proposed for analyzing differentially expressed (DE) genes in microarray data. Methods based on statistical tests that incorporate expression level variability are used more commonly than those based on fold change (FC). However, FC based results are more reproducible and biologically relevant. Results: We propose a new method based on fold change rank ordering statistics (FCROS). We exploit the variation in calculated FC levels using combinatorial pairs of biological conditions in the datasets. A statistic is associated with the ranks of the FC values for each gene, and the resulting probability is used to identify the DE genes within an error level. The FCROS method is deterministic, requires a low computational runtime and also solves the problem of multiple tests which usually arises with microarray datasets. Conclusion: We compared the performance of FCROS with those of other methods using synthetic and real microarray datasets. We found that FCROS is well suited for DE gene identification from noisy datasets when compared with existing FC based methods.
    Electronic ISSN: 1471-2105
    Topics: Biology , Computer Science
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 100
    Publication Date: 2014-01-19
    Description: Background: Physiologic signals, such as cardiac interbeat intervals, exhibit complex fluctuations. However, capturing important dynamical properties, including nonstationarities may not be feasible from conventional time series graphical representations. Methods: We introduce a simple-to-implement visualisation method, termed dynamical density delay mapping ("D3-Map" technique) that provides an animated representation of a system's dynamics. The method is based on a generalization of conventional two-dimensional (2D) Poincare plots, which are scatter plots where each data point, x(n), in a time series is plotted against the adjacent one, x(n + 1). First, we divide the original time series, x(n) (n = 1,..., N), into a sequence of segments (windows). Next, for each segment, a three-dimensional (3D) Poincare surface plot of x(n), x(n + 1), h[x(n),x(n + 1)] is generated, in which the third dimension, h, represents the relative frequency of occurrence of each (x(n),x(n + 1)) point. This 3D Poincare surface is then chromatised by mapping the relative frequency h values onto a colour scheme. We also generate a colourised 2D contour plot from each time series segment using the same colourmap scheme as for the 3D Poincare surface. Finally, the original time series graph, the colourised 3D Poincare surface plot, and its projection as a colourised 2D contour map for each segment, are animated to create the full "D3-Map." Results: We first exemplify the D3-Map method using the cardiac interbeat interval time series from a healthy subject during sleeping hours. The animations uncover complex dynamical changes, such as transitions between states, and the relative amount of time the system spends in each state. We also illustrate the utility of the method in detecting hidden temporal patterns in the heart rate dynamics of a patient with atrial fibrillation. The videos, as well as the source code, are made publicly available. Conclusions: Animations based on density delay maps provide a new way of visualising dynamical properties of complex systems not apparent in time series graphs or standard Poincare plot representations. Trainees in a variety of fields may find the animations useful as illustrations of fundamental but challenging concepts, such as nonstationarity and multistability. For investigators, the method may facilitate data exploration.
    Electronic ISSN: 1472-6947
    Topics: Computer Science , Medicine
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
Close ⊗
This website uses cookies and the analysis tool Matomo. More information can be found here...