ALBERT

All Library Books, journals and Electronic Records Telegrafenberg

Your email was sent successfully. Check your inbox.

An error occurred while sending the email. Please try again.

Proceed reservation?

Export
Filter
  • Articles  (35,356)
  • 2020-2024
  • 2010-2014  (35,356)
  • 1985-1989
  • 1950-1954
  • 1945-1949
  • 2014  (21,340)
  • 2010  (14,016)
  • Computer Science  (35,356)
Collection
Years
  • 2020-2024
  • 2010-2014  (35,356)
  • 1985-1989
  • 1950-1954
  • 1945-1949
Year
  • 1
    Publication Date: 2014-01-01
    Print ISSN: 1867-4828
    Electronic ISSN: 1869-0238
    Topics: Computer Science
    Published by Springer
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 2
    Publication Date: 2014-01-01
    Print ISSN: 1867-4828
    Electronic ISSN: 1869-0238
    Topics: Computer Science
    Published by Springer
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 3
    Publication Date: 2014-01-01
    Print ISSN: 1867-4828
    Electronic ISSN: 1869-0238
    Topics: Computer Science
    Published by Springer
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 4
    Publication Date: 2014-01-01
    Print ISSN: 1867-4828
    Electronic ISSN: 1869-0238
    Topics: Computer Science
    Published by Springer
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 5
    Publication Date: 2014-01-01
    Print ISSN: 1867-4828
    Electronic ISSN: 1869-0238
    Topics: Computer Science
    Published by Springer
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 6
    Publication Date: 2014-01-01
    Print ISSN: 1867-4828
    Electronic ISSN: 1869-0238
    Topics: Computer Science
    Published by Springer
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 7
    Publication Date: 2014-11-08
    Print ISSN: 1867-4828
    Electronic ISSN: 1869-0238
    Topics: Computer Science
    Published by Springer
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 8
    Publication Date: 2014-12-13
    Description: Replication in herpesvirus genomes is a major concern of public health as they multiply rapidly during the lytic phase of infection that cause maximum damage to the host cells. Earlier research has established that sites of replication origin are dominated by high concentration of rare palindrome sequences of DNA. Computational methods are devised based on scoring to determine the concentration of palindromes. In this paper, we propose both extraction and localization of rare palindromes in an automated manner. Discrete Cosine Transform (DCT-II), a widely recognized image compression algorithm is utilized here to extract palindromic sequences based on their reverse complimentary symmetry property of existence. We formulate a novel approach to localize the rare palindrome clusters by devising a Minimum Quadratic Entropy (MQE) measure based on the Renyi’s Quadratic Entropy (RQE) function. Experimental results over a large number of herpesvirus genomes show that the RQE based scoring of rare palindromes have higher order of sensitivity, and lesser false alarm in detecting concentration of rare palindromes and thereby sites of replication origin.
    Print ISSN: 1545-5963
    Electronic ISSN: 1557-9964
    Topics: Biology , Computer Science
    Published by Institute of Electrical and Electronics Engineers (IEEE) on behalf of The IEEE Computational Intelligence Society ; The IEEE Computer Society ; The IEEE Control Systems Society ; The IEEE Engineering in Medicine and Biology Society ; The Association for Computing Machinery.
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 9
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2014-12-13
    Description: GO relation embodies some aspects of existence dependency. If GO term x is existence-dependent on GO term y , the presence of y implies the presence of x . Therefore, the genes annotated with the function of the GO term y are usually functionally and semantically related to the genes annotated with the function of the GO term x . A large number of gene set enrichment analysis methods have been developed in recent years for analyzing gene sets enrichment. However, most of these methods overlook the structural dependencies between GO terms in GO graph by not considering the concept of existence dependency. We propose in this paper a biological search engine called RSGSearch that identifies enriched sets of genes annotated with different functions using the concept of existence dependency. We observe that GO term x cannot be existence-dependent on GO term y , if x and y have the same specificity (biological characteristics). After encoding into a numeric format the contributions of GO terms annotating target genes to the semantics of their lowest common ancestors (LCAs), RSGSearch uses microarray experiment to identify the most significant LCA that annotates the result genes. We evaluated RSGSearch experimentally and compared it with five gene set enrichment systems. Results showed marked improvement.
    Print ISSN: 1545-5963
    Electronic ISSN: 1557-9964
    Topics: Biology , Computer Science
    Published by Institute of Electrical and Electronics Engineers (IEEE) on behalf of The IEEE Computational Intelligence Society ; The IEEE Computer Society ; The IEEE Control Systems Society ; The IEEE Engineering in Medicine and Biology Society ; The Association for Computing Machinery.
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 10
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2014-12-13
    Description: The Tikhonov regularized nonnegative matrix factorization (TNMF) is an NMF objective function that enforces smoothness on the computed solutions, and has been successfully applied to many problem domains including text mining, spectral data analysis, and cancer clustering. There is, however, an issue that is still insufficiently addressed in the development of TNMF algorithms, i.e., how to develop mechanisms that can learn the regularization parameters directly from the data sets. The common approach is to use fixed values based on a priori knowledge about the problem domains. However, from the linear inverse problems study it is known that the quality of the solutions of the Tikhonov regularized least square problems depends heavily on the choosing of appropriate regularization parameters. Since least squares are the building blocks of the NMF, it can be expected that similar situation also applies to the NMF. In this paper, we propose two formulas to automatically learn the regularization parameters from the data set based on the L-curve approach. We also develop a convergent algorithm for the TNMF based on the additive update rules. Finally, we demonstrate the use of the proposed algorithm in cancer clustering tasks.
    Print ISSN: 1545-5963
    Electronic ISSN: 1557-9964
    Topics: Biology , Computer Science
    Published by Institute of Electrical and Electronics Engineers (IEEE) on behalf of The IEEE Computational Intelligence Society ; The IEEE Computer Society ; The IEEE Control Systems Society ; The IEEE Engineering in Medicine and Biology Society ; The Association for Computing Machinery.
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 11
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2014-12-13
    Description: Analysis of probability distributions conditional on species trees has demonstrated the existence of anomalous ranked gene trees (ARGTs), ranked gene trees that are more probable than the ranked gene tree that accords with the ranked species tree. Here, to improve the characterization of ARGTs, we study enumerative and probabilistic properties of two classes of ranked labeled species trees, focusing on the presence or avoidance of certain subtree patterns associated with the production of ARGTs. We provide exact enumerations and asymptotic estimates for cardinalities of these sets of trees, showing that as the number of species increases without bound, the fraction of all ranked labeled species trees that are ARGT-producing approaches $1$ . This result extends beyond earlier existence results to provide a probabilistic claim about the frequency of ARGTs.
    Print ISSN: 1545-5963
    Electronic ISSN: 1557-9964
    Topics: Biology , Computer Science
    Published by Institute of Electrical and Electronics Engineers (IEEE) on behalf of The IEEE Computational Intelligence Society ; The IEEE Computer Society ; The IEEE Control Systems Society ; The IEEE Engineering in Medicine and Biology Society ; The Association for Computing Machinery.
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 12
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2014-12-13
    Description: The existence of various types of correlations among the expressions of a group of biologically significant genes poses challenges in developing effective methods of gene expression data analysis. The initial focus of computational biologists was to work with only absolute and shifting correlations. However, researchers have found that the ability to handle shifting-and-scaling correlation enables them to extract more biologically relevant and interesting patterns from gene microarray data. In this paper, we introduce an effective shifting-and-scaling correlation measure named Shifting and Scaling Similarity (SSSim), which can detect highly correlated gene pairs in any gene expression data. We also introduce a technique named Intensive Correlation Search (ICS) biclustering algorithm, which uses SSSim to extract biologically significant biclusters from a gene expression data set. The technique performs satisfactorily with a number of benchmarked gene expression data sets when evaluated in terms of functional categories in Gene Ontology database.
    Print ISSN: 1545-5963
    Electronic ISSN: 1557-9964
    Topics: Biology , Computer Science
    Published by Institute of Electrical and Electronics Engineers (IEEE) on behalf of The IEEE Computational Intelligence Society ; The IEEE Computer Society ; The IEEE Control Systems Society ; The IEEE Engineering in Medicine and Biology Society ; The Association for Computing Machinery.
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 13
    Publication Date: 2014-12-13
    Description: Attractors in gene regulatory networks represent cell types or states of cells. In system biology and synthetic biology, it is important to generate gene regulatory networks with desired attractors. In this paper, we focus on a singleton attractor, which is also called a fixed point. Using a Boolean network (BN) model, we consider the problem of finding Boolean functions such that the system has desired singleton attractors and has no undesired singleton attractors. To solve this problem, we propose a matrix-based representation of BNs. Using this representation, the problem of finding Boolean functions can be rewritten as an Integer Linear Programming (ILP) problem and a Satisfiability Modulo Theories (SMT) problem. Furthermore, the effectiveness of the proposed method is shown by a numerical example on a WNT5A network, which is related to melanoma. The proposed method provides us a basic method for design of gene regulatory networks.
    Print ISSN: 1545-5963
    Electronic ISSN: 1557-9964
    Topics: Biology , Computer Science
    Published by Institute of Electrical and Electronics Engineers (IEEE) on behalf of The IEEE Computational Intelligence Society ; The IEEE Computer Society ; The IEEE Control Systems Society ; The IEEE Engineering in Medicine and Biology Society ; The Association for Computing Machinery.
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 14
    Publication Date: 2014-12-13
    Description: In this paper, we study Copy Number Variation (CNV) data.The underlying process generating CNV segments is generally assumed to be memory-less, giving rise to an exponential distribution of segment lengths. In this paper, we provide evidence from cancer patient data, which suggests that this generative model is too simplistic , and that segment lengths follow a power-law distribution instead . We conjecture a simple preferential attachment generative model that provides the basis for the observed power-law distribution. We then show how an existing statistical method for detecting cancer driver genes can be improved by incorporating the power-law distribution in the null model.
    Print ISSN: 1545-5963
    Electronic ISSN: 1557-9964
    Topics: Biology , Computer Science
    Published by Institute of Electrical and Electronics Engineers (IEEE) on behalf of The IEEE Computational Intelligence Society ; The IEEE Computer Society ; The IEEE Control Systems Society ; The IEEE Engineering in Medicine and Biology Society ; The Association for Computing Machinery.
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 15
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2014-12-13
    Description: Proteins fold into complex three-dimensional shapes. Simplified representations of their shapes are central to rationalise, compare, classify, and interpret protein structures. Traditional methods to abstract protein folding patterns rely on representing their standard secondary structural elements (helices and strands of sheet) using line segments. This results in ignoring a significant proportion of structural information. The motivation of this research is to derive mathematically rigorous and biologically meaningful abstractions of protein folding patterns that maximize the economy of structural description and minimize the loss of structural information. We report on a novel method to describe a protein as a non-overlapping set of parametric three dimensional curves of varying length and complexity. Our approach to this problem is supported by information theory and uses the statistical framework of minimum message length (MML) inference. We demonstrate the effectiveness of our non-linear abstraction to support efficient and effective comparison of protein folding patterns on a large scale.
    Print ISSN: 1545-5963
    Electronic ISSN: 1557-9964
    Topics: Biology , Computer Science
    Published by Institute of Electrical and Electronics Engineers (IEEE) on behalf of The IEEE Computational Intelligence Society ; The IEEE Computer Society ; The IEEE Control Systems Society ; The IEEE Engineering in Medicine and Biology Society ; The Association for Computing Machinery.
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 16
    Publication Date: 2014-12-13
    Description: The organization of global protein interaction networks (PINs) has been extensively studied and heatedly debated. We revisited this issue in the context of the analysis of dynamic organization of a PIN in the yeast cell cycle. Statistically significant bimodality was observed when analyzing the distribution of the differences in expression peak between periodically expressed partners. A close look at their behavior revealed that date and party hubs derived from this analysis have some distinct features. There are no significant differences between them in terms of protein essentiality, expression correlation and semantic similarity derived from gene ontology (GO) biological process hierarchy. However, date hubs exhibit significantly greater values than party hubs in terms of semantic similarity derived from both GO molecular function and cellular component hierarchies. Relating to three-dimensional structures, we found that both single- and multi-interface proteins could become date hubs coordinating multiple functions performed at different times while party hubs are mainly multi-interface proteins. Furthermore, we constructed and analyzed a PPI network specific to the human cell cycle and highlighted that the dynamic organization in human interactome is far more complex than the dichotomy of hubs observed in the yeast cell cycle.
    Print ISSN: 1545-5963
    Electronic ISSN: 1557-9964
    Topics: Biology , Computer Science
    Published by Institute of Electrical and Electronics Engineers (IEEE) on behalf of The IEEE Computational Intelligence Society ; The IEEE Computer Society ; The IEEE Control Systems Society ; The IEEE Engineering in Medicine and Biology Society ; The Association for Computing Machinery.
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 17
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2014-12-13
    Print ISSN: 1545-5963
    Electronic ISSN: 1557-9964
    Topics: Biology , Computer Science
    Published by Institute of Electrical and Electronics Engineers (IEEE) on behalf of The IEEE Computational Intelligence Society ; The IEEE Computer Society ; The IEEE Control Systems Society ; The IEEE Engineering in Medicine and Biology Society ; The Association for Computing Machinery.
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 18
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2014-12-13
    Print ISSN: 1545-5963
    Electronic ISSN: 1557-9964
    Topics: Biology , Computer Science
    Published by Institute of Electrical and Electronics Engineers (IEEE) on behalf of The IEEE Computational Intelligence Society ; The IEEE Computer Society ; The IEEE Control Systems Society ; The IEEE Engineering in Medicine and Biology Society ; The Association for Computing Machinery.
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 19
    Publication Date: 2014-12-13
    Print ISSN: 1545-5963
    Electronic ISSN: 1557-9964
    Topics: Biology , Computer Science
    Published by Institute of Electrical and Electronics Engineers (IEEE) on behalf of The IEEE Computational Intelligence Society ; The IEEE Computer Society ; The IEEE Control Systems Society ; The IEEE Engineering in Medicine and Biology Society ; The Association for Computing Machinery.
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 20
    Publication Date: 2014-12-13
    Description: The articles in this special section were presented at the 2012 IEEE International Workshop on Genomic Signal Processing and Statistics (GENSIPS 2012) that was held in Washington DC from December 2nd to 4th.
    Print ISSN: 1545-5963
    Electronic ISSN: 1557-9964
    Topics: Biology , Computer Science
    Published by Institute of Electrical and Electronics Engineers (IEEE) on behalf of The IEEE Computational Intelligence Society ; The IEEE Computer Society ; The IEEE Control Systems Society ; The IEEE Engineering in Medicine and Biology Society ; The Association for Computing Machinery.
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 21
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2014-12-13
    Description: Disk additions to an RAID-6 storage system can increase the I/O parallelism and expand the storage capacity simultaneously. To regain load balance among all disks including old and new, RAID-6 scaling requires moving certain data blocks onto newly added disks. Existing approaches to RAID-6 scaling, restricted by preserving a round-robin data distribution, require migrating all the data, which results in an expensive cost for RAID-6 scaling. In this paper, we propose RS6—a new approach to accelerating RDP RAID-6 scaling by reducing disk I/Os and XOR operations. First, RS6 minimizes the number of data blocks to be moved while maintaining a uniform data distribution across all data disks. Second, RS6 piggybacks parity updates during data migration to reduce the cost of maintaining consistent parities. Third, RS6 selects parameters of data migration so as to reduce disk I/Os for parity updates. Our mathematical analysis indicates that RS6 provides uniform data distribution, minimal data migration, and fast data addressing. We also conducted extensive simulation experiments to quantitatively characterize the properties of RS6. The results show that, compared with existing “moving-everything” Round-Robin approaches, RS6 reduces the number of blocks to be moved by 60.0%–88.9%, and saves the migration time by 40.27%–69.88%.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 22
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2014-12-13
    Description: This paper focuses on designing a distributed medium access control algorithm for fairly sharing network resources among contending stations in an 802.11 wireless network. Because the notion of fairness is not universal and there lacks a rigorous analysis on the relationships among the four types of most popular fairness criteria, we first mathematically prove that there exist certain connections between these types of fairness criteria. We then propose an efficient medium access algorithm that aims at achieving time fairness and throughput enhancement in a fully distributed manner. The core idea of our proposed algorithm lies in that each station needs to select an appropriate contention window size so as to fairly share the channel occupancy time and maximize the throughput under the time fairness constraint. The derivation of the proper contention window size is addressed rigorously. We evaluate the performance of our proposed algorithm through an extensive simulation study, and the evaluation results demonstrate that our proposed algorithm leads to nearly perfect time fairness, high throughput, and low collision overhead.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 23
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2014-12-13
    Description: This paper investigates the limits of adaptive voltage scaling (AVS) applied to commercial FPGAs which do not specifically support voltage adaptation. An adaptive power architecture based on a modified design flow is created with in-situ detectors and dynamic reconfiguration of clock management resources. AVS is a power-saving technique that enables a device to regulate its own voltage and frequency based on workload, process and operating conditions in a closed-loop configuration. It results in significant improved energy profiles compared with dynamic voltage frequency scaling (DVFS) in which the device uses a number of pre-calculated valid working points. The results of deploying AVS in FPGAs with in-situ detectors shows power and energy savings exceeding 85 percent compared with nominal voltage operation at the same frequency. The in-situ detector approach compares favorably with critical path replication based on delay lines since it avoids the need of cumbersome and error-prone delay line calibration.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 24
    Publication Date: 2014-12-14
    Description: Background: Biomedical ontologies are increasingly instrumental in the advancement of biological research primarily through their use to efficiently consolidate large amounts of data into structured, accessible sets. However, ontology development and usage can be hampered by the segregation of knowledge by domain that occurs due to independent development and use of the ontologies. The ability to infer data associated with one ontology to data associated with another ontology would prove useful in expanding information content and scope. We here focus on relating two ontologies: the Gene Ontology (GO), which encodes canonical gene function, and the Mammalian Phenotype Ontology (MP), which describes non-canonical phenotypes, using statistical methods to suggest GO functional annotations from existing MP phenotype annotations. This work is in contrast to previous studies that have focused on inferring gene function from phenotype primarily through lexical or semantic similarity measures. Results: We have designed and tested a set of algorithms that represents a novel methodology to define rules for predicting gene function by examining the emergent structure and relationships between the gene functions and phenotypes rather than inspecting the terms semantically. The algorithms inspect relationships among multiple phenotype terms to deduce if there are cases where they all arise from a single gene function.We apply this methodology to data about genes in the laboratory mouse that are formally represented in the Mouse Genome Informatics (MGI) resource. From the data, 7444 rule instances were generated from five generalized rules, resulting in 4818 unique GO functional predictions for 1796 genes. Conclusions: We show that our method is capable of inferring high-quality functional annotations from curated phenotype data. As well as creating inferred annotations, our method has the potential to allow for the elucidation of unforeseen, biologically significant associations between gene function and phenotypes that would be overlooked by a semantics-based approach. Future work will include the implementation of the described algorithms for a variety of other model organism databases, taking full advantage of the abundance of available high quality curated data.
    Electronic ISSN: 1471-2105
    Topics: Biology , Computer Science
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 25
    Publication Date: 2014-12-14
    Description: The multiple sequence alignment (MSA) problem has become relevant to several areas in bioinformatics from finding sequences family, detecting structural homologies of protein/DNA sequences, determining functions of protein/DNA sequences to predict patients diseases by comparing DNAs of patients in disease discovery, etc. The MSA is a NP-hard problem. In this paper, two new methods based on a cultural algorithm, namely the method of musical composition, for the solution of the MSA problem are introduced. The performance of the first and second versions were evaluated and analyzed on 26 and 12 different benchmark alignments, respectively. Test instances were taken from BAliBASE 3.0. Alignment accuracies are computed using the QSCORE program, which is a quality scoring program that compares two multiple sequence alignments. Numerical results on the tackled instances indicate that the performance levels of the proposed versions of the MMC are promising. In particular, the experimental results show that the second version found the best alignment reported in the specialized literature in 25  \(\%\) of the tested instances. Besides, for 50  \(\%\) of the tested instances, the second version achieved the second best alignment. Finally, the significance of the numerical results were analyzed according to the Wilcoxon rank-sum test, which indicated that the second proposed version is statistically similar to some state-of-the-art techniques for the MSA problem.
    Print ISSN: 0010-485X
    Electronic ISSN: 1436-5057
    Topics: Computer Science
    Published by Springer
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 26
    Publication Date: 2014-12-14
    Description: In an amplify-and-forward cooperative network, a closed-form expression of the a priori distribution of the complex-valued gain of the global relay channel is intractable, so that a priori information is often not exploited for estimating this gain. Here, we present two iterative channel gain and noise variance estimation algorithms that make use of a priori channel information and exploit the presence of not only pilot symbols but also unknown data symbols. These algorithms are approximations of maximum a posteriori estimation and linear minimum mean-square error estimation, respectively. A substantially reduced frame error rate is achieved as compared to the case where only pilot symbols are used in the estimation.
    Print ISSN: 1687-1472
    Electronic ISSN: 1687-1499
    Topics: Electrical Engineering, Measurement and Control Technology , Computer Science
    Published by Springer
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 27
    facet.materialart.
    Unknown
    IOS Press
    Publication Date: 2014-12-14
    Description: This paper describes the Ontologies of Linguistic Annotation (OLiA) as one of the data sets currently available as part of Linguistic Linked Open Data (LLOD) cloud. Within the LLOD cloud, the OLiA ontologies serve as a reference hub for annotation terminology for linguistic phenomena on a great band-width of languages, they have been used to facilitate interoperability and information integration of linguistic annotations in corpora, NLP pipelines, and lexical-semantic resources and mediate their linking with multiple community-maintained terminology repositories. Content Type Journal Article Pages - DOI 10.3233/SW-140167 Authors Christian Chiarcos, Applied Computational Linguistics (ACoLi), Department of Computer Science and Mathematics, Goethe-University Frankfurt am Main, Germany, http://acoli.cs.uni-frankfurt.de Maria Sukhareva, Applied Computational Linguistics (ACoLi), Department of Computer Science and Mathematics, Goethe-University Frankfurt am Main, Germany, http://acoli.cs.uni-frankfurt.de Journal Semantic Web Online ISSN 2210-4968 Print ISSN 1570-0844
    Print ISSN: 1570-0844
    Electronic ISSN: 2210-4968
    Topics: Computer Science
    Published by IOS Press
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 28
    facet.materialart.
    Unknown
    IOS Press
    Publication Date: 2014-12-14
    Description: This paper describes the publication and linking of (parts of) PAROLE SIMPLE CLIPS (PSC), a large scale Italian lexicon, to the Semantic Web and the Linked Data cloud using the lemon model. The main challenge of the conversion is discussed, namely the reconciliation between the PSC semantic structure which contains richly encoded semantic information, following the qualia structure of the Generative Lexicon theory and the lemon view of lexical sense as a reified pairing of a lexical item and a concept in an ontology. The result is two datasets: one consists of a list of lemon lexical entries with their lexical properties, relations and senses; the other consists of a list of OWL individuals representing the referents for the lexical senses. These OWL individuals are linked to each other by a set of semantic relations and mapped onto the SIMPLE OWL ontology of higher level semantic types. Content Type Journal Article Pages - DOI 10.3233/SW-140168 Authors Riccardo Del Gratta, Istituto Di Linguistica Computazionale ‘A. Zampolli’, Consiglio Nazionale delle Ricerche, Via Moruzzi 1, Pisa, Italy. E-mail: first.last@ilc.cnr.it Francesca Frontini, Istituto Di Linguistica Computazionale ‘A. Zampolli’, Consiglio Nazionale delle Ricerche, Via Moruzzi 1, Pisa, Italy. E-mail: first.last@ilc.cnr.it Fahad Khan, Istituto Di Linguistica Computazionale ‘A. Zampolli’, Consiglio Nazionale delle Ricerche, Via Moruzzi 1, Pisa, Italy. E-mail: first.last@ilc.cnr.it Monica Monachini, Istituto Di Linguistica Computazionale ‘A. Zampolli’, Consiglio Nazionale delle Ricerche, Via Moruzzi 1, Pisa, Italy. E-mail: first.last@ilc.cnr.it Journal Semantic Web Online ISSN 2210-4968 Print ISSN 1570-0844
    Print ISSN: 1570-0844
    Electronic ISSN: 2210-4968
    Topics: Computer Science
    Published by IOS Press
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 29
    Publication Date: 2014-12-15
    Description: Numerous statistical methods have been published for designing and analyzing microarray projects. Traditional genome-wide microarray platforms (such as Affymetrix, Illumina, and DASL) measure the expression level of tens of thousands genes. Since the sets of genes included in these array chips are selected by the manufacturers, the number of genes associated with a specific disease outcome is limited and a large portion of the genes are not associated. nCounter is a new technology by NanoString to measure the expression of a selected number (up to 800) of genes. The list of genes for nCounter chips can be selected by customers. Due to the limited number of genes and the price increase in the number of selected genes, the genes for nCounter chips are carefully selected among those discovered from previous studies, usually using traditional high-throughput platforms, and only a small number of definitely unassociated genes, called control genes, are included to standardize the overall expression level across different chips. Furthermore, nCounter chips measure the expression level of each gene using a counting observation while the traditional high-throughput platforms produce continuous observations. Due to these differences, some statistical methods developed for the design and analysis of high-throughput projects may need modification or may be inappropriate for nCounter projects. In this paper, we discuss statistical methods that can be used for designing and analyzing nCounter projects.
    Electronic ISSN: 1176-9351
    Topics: Computer Science , Medicine
    Published by Libertas Academica
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 30
    Publication Date: 2014-12-17
    Description: Nowadays, with the advance of technology, many applications generate huge amounts of data streams at very high speed. Examples include network traffic, web click streams, video surveillance, and sensor networks. Data stream mining has become a hot research topic. Its goal is to extract hidden knowledge/patterns from continuous data streams. Unlike traditional data mining where the dataset is static and can be repeatedly read many times, data stream mining algorithms face many challenges and have to satisfy constraints such as bounded memory, single-pass, real-time response, and concept-drift detection. This paper presents a comprehensive survey of the state-of-the-art data stream mining algorithms with a focus on clustering and classification because of their ubiquitous usage. It identifies mining constraints, proposes a general model for data stream mining, and depicts the relationship between traditional data mining and data stream mining. Furthermore, it analyzes the advantages as well as limitations of data stream algorithms and suggests potential areas for future research.
    Print ISSN: 0219-1377
    Electronic ISSN: 0219-3116
    Topics: Computer Science
    Published by Springer
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 31
    Publication Date: 2014-12-18
    Description: A computational approach for estimating the overall, population, and individual cancer hazard rates was developed. The population rates characterize a risk of getting cancer of a specific site/type, occurring within an age-specific group of individuals from a specified population during a distinct time period. The individual rates characterize an analogous risk but only for the individuals susceptible to cancer. The approach uses a novel regularization and anchoring technique to solve an identifiability problem that occurs while determining the age, period, and cohort (APC) effects. These effects are used to estimate the overall rate, and to estimate the population and individual cancer hazard rates. To estimate the APC effects, as well as the population and individual rates, a new web-based computing tool, called the CancerHazard@Age, was developed. The tool uses data on the past and current history of cancer incidences collected during a long time period from the surveillance databases. The utility of the tool was demonstrated using data on the female lung cancers diagnosed during 1975–2009 in nine geographic areas within the USA. The developed tool can be applied equally well to process data on other cancer sites. The data obtained by this tool can be used to develop novel carcinogenic models and strategies for cancer prevention and treatment, as well as to project future cancer burden.
    Electronic ISSN: 1176-9351
    Topics: Computer Science , Medicine
    Published by Libertas Academica
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 32
    Publication Date: 2014-12-18
    Description: To be successful, cybercriminals must figure out how to scale their scams. They duplicate content on new websites, often staying one step ahead of defenders that shut down past schemes. For some scams, such as phishing and counterfeit goods shops, the duplicated content remains nearly identical. In others, such as advanced-fee fraud and online Ponzi schemes, the criminal must alter content so that it appears different in order to evade detection by victims and law enforcement. Nevertheless, similarities often remain, in terms of the website structure or content, since making truly unique copies does not scale well. In this paper, we present a novel optimized combined clustering method that links together replicated scam websites, even when the criminal has taken steps to hide connections. We present automated methods to extract key website features, including rendered text, HTML structure, file structure, and screenshots. We describe a process to automatically identify the best combination of such attributes to most accurately cluster similar websites together. To demonstrate the method?s applicability to cybercrime, we evaluate its performance against two collected datasets of scam websites: fake escrow services and high-yield investment programs (HYIPs). We show that our method more accurately groups similar websites together than those existing general-purpose consensus clustering methods.
    Print ISSN: 1687-4161
    Electronic ISSN: 1687-417X
    Topics: Computer Science
    Published by Springer
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 33
    Publication Date: 2014-12-18
    Description: The decentralized many-to-many negotiation for resource allocation in Cloud and multi-agent systems presents numerous challenges, including ones related to the buyer strategy which is the focus of the present paper. Current approaches deriving required resources each bid must ask for aren't in all market cases an optimal choice. For this reason, we have proposed a hybrid negotiation strategy consisting of a combination of two modes of negotiation strategies that generates required resources of each bid in parallel, the first one is an existent fixed negotiation strategy and the second one is a learning selection strategy over the buyer's agreement space. Moreover, acting dynamically in the market place by adjusting appropriately the buyer's resource provisioning times and calling for proposal to hand over contracted resources in order to break some deadlocks involving buyers' tasks has been shown via simulation results to achieve better performances both in terms of social welfare and buyer utility. Content Type Journal Article Pages 165-183 DOI 10.3233/MGS-140221 Authors Mohamed Raouf Habes, Department of Computer Science, University of Badji Mokhtar, Annaba, Algeria Habiba Belleili-Souici, Department of Computer Science, University of Badji Mokhtar, Annaba, Algeria Laurent Vercouter, INSA de Rouen, Saint-Etienne du Rouvray, France Journal Multiagent and Grid Systems Online ISSN 1875-9076 Print ISSN 1574-1702 Journal Volume Volume 10 Journal Issue Volume 10, Number 3 / 2014
    Print ISSN: 1574-1702
    Electronic ISSN: 1875-9076
    Topics: Computer Science
    Published by IOS Press
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 34
    Publication Date: 2014-12-18
    Description: In future energy systems, peaks in the daily electricity generation and consumption are expected to increase. The "smart grid" concept aims to maintain high levels of efficiency in the energy system by establishing distributed intelligence. Software agents (operating on devices with unknown computational capabilities) can implement dynamic and autonomous decision making about energy usage and generation, e.g. in domestic households, farms or offices. To reach satisfactory levels of efficiency and reliability, it is crucial to include planning-ahead of the energy-involving activities. Market mechanisms are a promising approach for large-scale coordination problems about energy supply and demand, but existing electricity markets either do not involve planning-ahead sufficiently or require a high level of sophistication and computing power from participants, which is not suitable for smart grid settings. This paper proposes a new market mechanism for smart grids, ABEM (Ahead- and Balancing Energy Market). ABEM performs an ahead market and a last-minute balancing market, where planning-ahead in the ahead market supports both binding ahead-commitments and reserve capacities in bids (which can be submitted as price functions). These features of planning-ahead reflect the features in modern wholesale electricity markets. However, constructing bids in ABEM is straightforward and fast. We also provide a model of a market with the features mentioned above, which a strategic agent can use to construct a bid (e.g. in ABEM), using a decision-theoretic approach. We evaluate ABEM experimentally in various stochastic scenarios and show favourable outcomes in comparison with a benchmark mechanism. Content Type Journal Article Pages 137-163 DOI 10.3233/MGS-140220 Authors Nicolas Höning, Centrum Wiskunde en Informatica, Amsterdam, The Netherlands Han La Poutré, Centrum Wiskunde en Informatica, Amsterdam, The Netherlands Journal Multiagent and Grid Systems Online ISSN 1875-9076 Print ISSN 1574-1702 Journal Volume Volume 10 Journal Issue Volume 10, Number 3 / 2014
    Print ISSN: 1574-1702
    Electronic ISSN: 1875-9076
    Topics: Computer Science
    Published by IOS Press
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 35
    facet.materialart.
    Unknown
    IOS Press
    Publication Date: 2014-12-18
    Description: The Belief-Desire-Intention (BDI) model of agency provides a powerful technique for describing goal-based behavior for both individual agents and, more recently, agent teams. Numerous frameworks have been developed since the model was first proposed in the early 1980's. However, none of these frameworks have explicitly represented intention, which has meant that intention-based reasoning has had no direct framework support. Given the importance of this in the development of practical agent applications, we consider this to be a major shortcoming of existing frameworks. This paper explores how explicitly represented goals can be used as both a unifying modeling concept for the management of intention, and as the basis for a BDI framework. The exploration is grounded both in terms of software – a recently developed BDI framework called GORITE and in application – an execution system for a robotic assembly cell. Both are discussed in detail. Content Type Journal Article Pages 119-136 DOI 10.3233/MGS-140219 Authors D. Jarvis, School of Engineering and Technology, Central Queensland University, Brisbane, Queensland, Australia J. Jarvis, School of Engineering and Technology, Central Queensland University, Brisbane, Queensland, Australia R. Rönnquist, Intendico Pty. Ltd., Carlton, Australia Journal Multiagent and Grid Systems Online ISSN 1875-9076 Print ISSN 1574-1702 Journal Volume Volume 10 Journal Issue Volume 10, Number 3 / 2014
    Print ISSN: 1574-1702
    Electronic ISSN: 1875-9076
    Topics: Computer Science
    Published by IOS Press
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 36
    Publication Date: 2014-12-18
    Description: Background: Identification of individual components in complex mixtures is an important and sometimes daunting task in several research areas like metabolomics and natural product studies. NMR spectroscopy is an excellent technique for analysis of mixtures of organic compounds and gives a detailed chemical fingerprint of most individual components above the detection limit. For the identification of individual metabolites in metabolomics, correlation or covariance between peaks in 1H NMR spectra has previously been successfully employed. Similar correlation of 2D 1H-13C Heteronuclear Single Quantum Correlation spectra was recently applied to investigate the structure of heparine. In this paper, we demonstrate how a similar approach can be used to identify metabolites in human biofluids (post-prostatic palpation urine). Results: From 50 1H-13C Heteronuclear Single Quantum Correlation spectra, 23 correlation plots resembling pure metabolites were constructed. The identities of these metabolites were confirmed by comparing the correlation plots with reported NMR data, mostly from the Human Metabolome Database. Conclusions: Correlation plots prepared by statistically correlating 1H-13C Heteronuclear Single Quantum Correlation spectra from human biofluids provide unambiguous identification of metabolites. The correlation plots highlight cross-peaks belonging to each individual compound, not limited by long-range magnetization transfer as conventional NMR experiments.
    Electronic ISSN: 1471-2105
    Topics: Biology , Computer Science
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 37
    Publication Date: 2014-12-18
    Description: Background: Alternative Splicing (AS) as a post-transcription regulation mechanism is an important application of RNA-seq studies in eukaryotes. A number of software and computational methods have been developed for detecting AS. Most of the methods, however, are designed and tested on animal data, such as human and mouse. Plants genes differ from those of animals in many ways, e.g., the average intron size and preferred AS types. These differences may require different computational approaches and raise questions about their effectiveness on plant data. The goal of this paper is to benchmark existing computational differential splicing (or transcription) detection methods so that biologists can choose the most suitable tools to accomplish their goals. Results: This study compares the eight popular public available software packages for differential splicing analysis using both simulated and real Arabidopsis thaliana RNA-seq data. All software are freely available. The study examines the effect of varying AS ratio, read depth, dispersion pattern, AS types, sample sizes and the influence of annotation. Using a real data, the study looks at the consistences between the packages and verifies a subset of the detected AS events using PCR studies. Conclusions: No single method performs the best in all situations. The accuracy of annotation has a major impact on which method should be chosen for AS analysis. DEXSeq performs well in the simulated data when the AS signal is relative strong and annotation is accurate. Cufflinks achieve a better tradeoff between precision and recall and turns out to be the best one when incomplete annotation is provided. Some methods perform inconsistently for different AS types. Complex AS events that combine several simple AS events impose problems for most methods, especially for MATS. MATS stands out in the analysis of real RNA-seq data when all the AS events being evaluated are simple AS events.
    Electronic ISSN: 1471-2105
    Topics: Biology , Computer Science
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 38
    Publication Date: 2014-11-06
    Print ISSN: 0937-6429
    Electronic ISSN: 1861-8936
    Topics: Computer Science , Economics
    Published by Springer
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 39
    facet.materialart.
    Unknown
    Springer
    Publication Date: 2014-11-06
    Description: Domänenspezifische Modellierungssprachen (engl.: “domain-specific modeling languages”, DSMLs) versprechen deutliche Vorteile gegenüber universellen Modellierungssprachen. Ihr Entwurf ist allerdings mit einer essentiellen Herausforderung verbunden. Um ansehnliche Skaleneffekte zu erreichen, empfiehlt sich die Entwicklung von DSMLs, die in einer größeren Nutzungsbandbreite einsetzbar sind. Gleichzeitig spricht der Gewinn an Modellierungsproduktivität im Einzelfall für Sprachkonzepte, die an individuelle Anforderungen angepasst sind. In dem vorliegenden Beitrag wird ein neuartiger Ansatz zur konzeptuellen Modellierung wie auch zur Konstruktion von Informationssystemen präsentiert, der durch die faktische Nutzung von Fachsprachen inspiriert ist – die Mehrebenen-Modellierung. Im Unterschied zu traditionellen Spracharchitekturen wie der „Meta Object Facility“ (MOF) basiert er auf einer rekursiven Architektur, die eine beliebige Zahl von Klassifikationsebenen ermöglicht und dadurch den Entwurf von Sprachhierarchien – von einer Referenz-DSML bis hin zu „lokalen“ DSMLs – erlaubt. Auf diese Weise wird nicht nur der essentielle Konflikt des Entwurfs von DSMLs deutlich entschärft, sondern auch die Wiederverwendung und Integration von Software-Komponenten im Allgemeinen gefördert. Zudem ermöglicht der Ansatz die Reduktion von Modellkomplexität, indem er die Dichotomie von Spezialisierung und Instanzierung teilweise aufhebt. Darüber hinaus integriert er eine Metamodellierungssprache mit dem Metamodell einer Metaprogrammiersprache, wodurch ausführbare Modelle ermöglicht werden. Die Spezifikation der Spracharchitektur wird ergänzt durch die Darstellung von Anwendungsszenarien, die die Potentiale der Mehrebenen-Modellierung verdeutlichen, sowie eine kritische Betrachtung ihrer Besonderheiten.
    Print ISSN: 0937-6429
    Electronic ISSN: 1861-8936
    Topics: Computer Science , Economics
    Published by Springer
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 40
    Publication Date: 2014-11-07
    Description: Motivation: Mapping of high-throughput sequencing data and other bulk sequence comparison applications have motivated a search for high-efficiency sequence alignment algorithms. The bit-parallel approach represents individual cells in an alignment scoring matrix as bits in computer words and emulates the calculation of scores by a series of logic operations composed of AND, OR, XOR, complement, shift and addition. Bit-parallelism has been successfully applied to the longest common subsequence (LCS) and edit-distance problems, producing fast algorithms in practice. Results: We have developed BitPAl, a bit-parallel algorithm for general, integer-scoring global alignment. Integer-scoring schemes assign integer weights for match, mismatch and insertion/deletion. The BitPAl method uses structural properties in the relationship between adjacent scores in the scoring matrix to construct classes of efficient algorithms, each designed for a particular set of weights. In timed tests, we show that BitPAl runs 7–25 times faster than a standard iterative algorithm. Availability and implementation: Source code is freely available for download at http://lobstah.bu.edu/BitPAl/BitPAl.html . BitPAl is implemented in C and runs on all major operating systems. Contact : jloving@bu.edu or yhernand@bu.edu or gbenson@bu.edu Supplementary information : Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 41
    Publication Date: 2014-11-07
    Description: : Next-generation sequencing (NGS) has a large potential in HIV diagnostics, and genotypic prediction models have been developed and successfully tested in the recent years. However, albeit being highly accurate, these computational models lack computational efficiency to reach their full potential. In this study, we demonstrate the use of graphics processing units (GPUs) in combination with a computational prediction model for HIV tropism. Our new model named gCUP, parallelized and optimized for GPU, is highly accurate and can classify 〉175 000 sequences per second on an NVIDIA GeForce GTX 460. The computational efficiency of our new model is the next step to enable NGS technologies to reach clinical significance in HIV diagnostics. Moreover, our approach is not limited to HIV tropism prediction, but can also be easily adapted to other settings, e.g. drug resistance prediction. Availability and implementation: The source code can be downloaded at http://www.heiderlab.de Contact: d.heider@wz-straubing.de
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 42
    facet.materialart.
    Unknown
    Oxford University Press
    Publication Date: 2014-11-07
    Description: : We present a new method to incrementally construct the FM-index for both short and long sequence reads, up to the size of a genome. It is the first algorithm that can build the index while implicitly sorting the sequences in the reverse (complement) lexicographical order without a separate sorting step. The implementation is among the fastest for indexing short reads and the only one that practically works for reads of averaged kilobases in length. Availability and implementation: https://github.com/lh3/ropebwt2 Contact: hengli@broadinstitute.org
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 43
    Publication Date: 2014-11-07
    Description: : AliView is an alignment viewer and editor designed to meet the requirements of next-generation sequencing era phylogenetic datasets. AliView handles alignments of unlimited size in the formats most commonly used, i.e. FASTA, Phylip, Nexus, Clustal and MSF. The intuitive graphical interface makes it easy to inspect, sort, delete, merge and realign sequences as part of the manual filtering process of large datasets. AliView also works as an easy-to-use alignment editor for small as well as large datasets. Availability and implementation: AliView is released as open-source software under the GNU General Public License, version 3.0 (GPLv3), and is available at GitHub ( www.github.com/AliView ). The program is cross-platform and extensively tested on Linux, Mac OS X and Windows systems. Downloads and help are available at http://ormbunkar.se/aliview Contact: anders.larsson@ebc.uu.se Supplementary information: Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 44
    Publication Date: 2014-11-07
    Description: Motivation: The ability to accurately read the order of nucleotides in DNA and RNA is fundamental for modern biology. Errors in next-generation sequencing can lead to many artifacts, from erroneous genome assemblies to mistaken inferences about RNA editing. Uneven coverage in datasets also contributes to false corrections. Result: We introduce Trowel, a massively parallelized and highly efficient error correction module for Illumina read data. Trowel both corrects erroneous base calls and boosts base qualities based on the k -mer spectrum. With high-quality k -mers and relevant base information, Trowel achieves high accuracy for different short read sequencing applications.The latency in the data path has been significantly reduced because of efficient data access and data structures. In performance evaluations, Trowel was highly competitive with other tools regardless of coverage, genome size read length and fragment size. Availability and implementation: Trowel is written in C++ and is provided under the General Public License v3.0 (GPLv3). It is available at http://trowel-ec.sourceforge.net . Contact: euncheon.lim@tue.mpg.de or weigel@tue.mpg.de Supplementary information: Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 45
    Publication Date: 2014-11-07
    Description: : The application of protein–protein docking in large-scale interactome analysis is a major challenge in structural bioinformatics and requires huge computing resources. In this work, we present MEGADOCK 4.0, an FFT-based docking software that makes extensive use of recent heterogeneous supercomputers and shows powerful, scalable performance of 〉97% strong scaling. Availability and Implementation: MEGADOCK 4.0 is written in C++ with OpenMPI and NVIDIA CUDA 5.0 (or later) and is freely available to all academic and non-profit users at: http://www.bi.cs.titech.ac.jp/megadock . Contact: akiyama@cs.titech.ac.jp Supplementary information: Supplementary data are available at Bioinformatics online
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 46
    Publication Date: 2014-11-07
    Description: Background: PGxClean is a new web application that performs quality control analyses for data produced by the Affymetrix DMET chip or other candidate gene technologies. Importantly, the software does not assume that variants are biallelic single-nucleotide polymorphisms, but can be used on the variety of variant characteristics included on the DMET chip. Once quality control analyses has been completed, the associated PGxClean-Viz web application performs principal component analyses and provides tools for characterizing and visualizing population structure.FindingsThe PGxClean web application accepts genotype data from the Affymetrix DMET chip or the PLINK PED format with genotypes annotated as (A,C,G,T or 1,2,3,4). Options for removing missing data and calculating genotype and allele frequencies are offered. Data can be subdivided by cohort characteristics, such as family ID, sex, phenotype, or case-control status. Once the data has been processed through the PGxClean web application, the output files can be entered into the PGxClean-Viz web application for performing principal component analysis to visualize population substructure. Conclusions: The PGxClean software provides rapid quality-control processing, data analysis, and data visualization for the Affymetrix DMET chip or other candidate gene technologies while improving on common analysis platforms by not assuming that variants are biallelic. The web application is available at www.pgxclean.com.
    Electronic ISSN: 1756-0381
    Topics: Biology , Computer Science
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 47
    Publication Date: 2014-11-07
    Description: Motivation: The identification of active transcriptional regulatory elements is crucial to understand regulatory networks driving cellular processes such as cell development and the onset of diseases. It has recently been shown that chromatin structure information, such as DNase I hypersensitivity (DHS) or histone modifications, significantly improves cell-specific predictions of transcription factor binding sites. However, no method has so far successfully combined both DHS and histone modification data to perform active binding site prediction. Results: We propose here a method based on hidden Markov models to integrate DHS and histone modifications occupancy for the detection of open chromatin regions and active binding sites. We have created a framework that includes treatment of genomic signals, model training and genome-wide application. In a comparative analysis, our method obtained a good trade-off between sensitivity versus specificity and superior area under the curve statistics than competing methods. Moreover, our technique does not require further training or sequence information to generate binding location predictions. Therefore, the method can be easily applied on new cell types and allow flexible downstream analysis such as de novo motif finding. Availability and implementation: Our framework is available as part of the Regulatory Genomics Toolbox. The software information and all benchmarking data are available at http://costalab.org/wp/dh-hmm . Contact: ivan.costa@rwth-aachen.de or eduardo.gusmao@rwth-aachen.de Supplementary information: Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 48
    Publication Date: 2014-11-07
    Description: Motivation: A proper target or marker is essential in any diagnosis (e.g. an infection or cancer). An ideal diagnostic target should be both conserved in and unique to the pathogen. Currently, these targets can only be identified manually, which is time-consuming and usually error-prone. Because of the increasingly frequent occurrences of emerging epidemics and multidrug-resistant ‘superbugs’, a rapid diagnostic target identification process is needed. Results: A new method that can identify uniquely conserved regions (UCRs) as candidate diagnostic targets for a selected group of organisms solely from their genomic sequences has been developed and successfully tested. Using a sequence-indexing algorithm to identify UCRs and a k -mer integer-mapping model for computational efficiency, this method has successfully identified UCRs within the bacteria domain for 15 test groups, including pathogenic, probiotic, commensal and extremophilic bacterial species or strains. Based on the identified UCRs, new diagnostic primer sets were designed, and their specificity and efficiency were tested by polymerase chain reaction amplifications from both pure isolates and samples containing mixed cultures. Availability and implementation: The UCRs identified for the 15 bacterial species are now freely available at http://ucr.synblex.com . The source code of the programs used in this study is accessible at http://ucr.synblex.com/bacterialIdSourceCode.d.zip Contact: yazhousun@synblex.com Supplementary Information: Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 49
    Publication Date: 2014-11-07
    Description: Motivation: A popular method for classification of protein domain movements apportions them into two main types: those with a ‘hinge’ mechanism and those with a ‘shear’ mechanism. The intuitive assignment of domain movements to these classes has limited the number of domain movements that can be classified in this way. Furthermore, whether intended or not, the term ‘shear’ is often interpreted to mean a relative translation of the domains. Results: Numbers of occurrences of four different types of residue contact changes between domains were optimally combined by logistic regression using the training set of domain movements intuitively classified as hinge and shear to produce a predictor for hinge and shear. This predictor was applied to give a 10-fold increase in the number of examples over the number previously available with a high degree of precision. It is shown that overall a relative translation of domains is rare, and that there is no difference between hinge and shear mechanisms in this respect. However, the shear set contains significantly more examples of domains having a relative twisting movement than the hinge set. The angle of rotation is also shown to be a good discriminator between the two mechanisms. Availability and implementation: Results are free to browse at http://www.cmp.uea.ac.uk/dyndom/interface/ . Contact: sjh@cmp.uea.ac.uk . Supplementary information: Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 50
    Publication Date: 2014-11-07
    Description: Motivation: Recent studies on human disease have revealed that aberrant interaction between proteins probably underlies a substantial number of human genetic diseases. This suggests a need to investigate disease inheritance mode using interaction, and based on which to refresh our conceptual understanding of a series of properties regarding inheritance mode of human disease. Results: We observed a strong correlation between the number of protein interactions and the likelihood of a gene causing any dominant diseases or multiple dominant diseases, whereas no correlation was observed between protein interaction and the likelihood of a gene causing recessive diseases. We found that dominant diseases are more likely to be associated with disruption of important interactions. These suggest inheritance mode should be understood using protein interaction. We therefore reviewed the previous studies and refined an interaction model of inheritance mode, and then confirmed that this model is largely reasonable using new evidences. With these findings, we found that the inheritance mode of human genetic diseases can be predicted using protein interaction. By integrating the systems biology perspectives with the classical disease genetics paradigm, our study provides some new insights into genotype–phenotype correlations. Contact: haodapeng@ems.hrbmu.edu.cn or biofomeng@hotmail.com Supplementary information: Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 51
    Publication Date: 2014-11-07
    Description: : Recently, several high profile studies collected cell viability data from panels of cancer cell lines treated with many drugs applied at different concentrations. Such drug sensitivity data for cancer cell lines provide suggestive treatments for different types and subtypes of cancer. Visualization of these datasets can reveal patterns that may not be obvious by examining the data without such efforts. Here we introduce Drug/Cell-line Browser (DCB), an online interactive HTML5 data visualization tool for interacting with three of the recently published datasets of cancer cell lines/drug-viability studies. DCB uses clustering and canvas visualization of the drugs and the cell lines, as well as a bar graph that summarizes drug effectiveness for the tissue of origin or the cancer subtypes for single or multiple drugs. DCB can help in understanding drug response patterns and prioritizing drug/cancer cell line interactions by tissue of origin or cancer subtype. Availability and implementation: DCB is an open source Web-based tool that is freely available at: http://www.maayanlab.net/LINCS/DCB Contact: avi.maayan@mssm.edu Supplementary information: Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 52
    Publication Date: 2014-11-07
    Description: In recent years, Bayesian Network has become an important modeling method for decision making problems of real-world applications. In this paper learning parameters of a fuzzy Bayesian Network (BN) based on imprecise/fuzzy observations is considered, where imprecise observations particularly refers to triangular fuzzy numbers. To achieve this, an extension to fuzzy probability theory based on imprecise observations is proposed which employs both the "truth" concept of Yager and the Extension Principle in fuzzy set theory. In addition, some examples are given to demonstrate the concepts of the proposed idea. The aim of our suggestion is to be able to estimate joint fuzzy probability and the conditional probability tables (CPTs) of Bayesian Network based on imprecise observations. Two real-world datasets, Car Evaluation Database (CED) and Extending Credibility (EC), are employed where some of attributes have crisp (exact) and some of them have fuzzy observations. Estimated parameters of the CED's corresponding network, using our extension, are shown in tables. Then, using Kullback-Leibler divergence, two scenarios are considered to show that fuzzy parameters preserve more knowledge than that of crisp parameters. This phenomenon is also true in cases where there are a small number of observations. Finally, to examine a network with fuzzy parameters versus the network with crisp parameters, accuracy result of predictions is provided which shows improvements in the predictions. Content Type Journal Article Pages 167-180 DOI 10.3233/KES-140296 Authors Mostafa Ghazizadeh-Ahsaee, Ferdowsi University of Mashhad, Mashhad, Iran Mahmoud Naghibzadeh, Ferdowsi University of Mashhad, Mashhad, Iran Bahram Sadeghpour Gildeh, Ferdowsi University of Mashhad, Mashhad, Iran Journal International Journal of Knowledge-Based and Intelligent Engineering Systems Online ISSN 1875-8827 Print ISSN 1327-2314 Journal Volume Volume 18 Journal Issue Volume 18, Number 3 / 2014
    Print ISSN: 1327-2314
    Electronic ISSN: 1875-8827
    Topics: Computer Science
    Published by IOS Press
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 53
    Publication Date: 2014-11-07
    Description: This paper presents an interactive verifier for logic programs. These logic programs are constructed by a schema-based method. Each program is associated with proof schemes due to the program development method. The correctness proof of a program is guided by its associated proof schemes. The main components of the verifier are the prover which carries out the proof steps, the knowledge base (KB) which includes representations of all theories and transformation rules, the KB update which supports the update of KB and the graphical user interface (GUI). The emphasis in the design of this proof checker is on effective guidance of the proof based on the activated proof schemes and on performance by the verifier of tedious, trivial and time consuming tasks. The difficult proof decisions are taken by the user, then, the proof checker applies them. The design of the interface is based on providing the user the required support for the proof of a theorem and for the update of KB. This system is an effective and useful tool for the interactive verification of non-trivial logic programs. Content Type Journal Article Pages 143-156 DOI 10.3233/KES-140294 Authors Emmanouil Marakakis, Department of Informatics Engineering, Technological Educational Institute of Crete, Heraklion, Greece Haridimos Kondylakis, Department of Informatics Engineering, Technological Educational Institute of Crete, Heraklion, Greece Nikos Papadakis, Department of Informatics Engineering, Technological Educational Institute of Crete, Heraklion, Greece Journal International Journal of Knowledge-Based and Intelligent Engineering Systems Online ISSN 1875-8827 Print ISSN 1327-2314 Journal Volume Volume 18 Journal Issue Volume 18, Number 3 / 2014
    Print ISSN: 1327-2314
    Electronic ISSN: 1875-8827
    Topics: Computer Science
    Published by IOS Press
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 54
    Publication Date: 2014-11-07
    Description: We concern the issue of preference recommendation towards more reliable recommender system. General recommender system provides a collection of items or several bests of them, on the basis of a fixed preference constraint. However, the realistic preference from users may be such complicated that makes conventional recommender system unreliable. In real-world applications, e.g. travel planning or hotel selection, specific constraints may be involved like a limited travel time or an appropriate budget for hotel accommodation. Motivated by these applications, we investigate the constrained preference recommendation (CPR), in which two main types are studied including Threshold-CPR (T-CPR) and Range-CPR (R-CPR). We firstly analyze and define the related problems. Then, we provide solutions with illustration of problem-solving procedure. The results are analogized with other representative techniques. Finally, we provide an extension study on the general CPR problems. Therein, we firstly provide a normalized system for constraint representation. Secondly, we utilize the proposed normalized system to analyze representative literature. Content Type Journal Article Pages 157-165 DOI 10.3233/KES-140295 Authors Anming Li, Department of Computing, The Hong Kong Polytechnic University, Hung Hom, Hong Kong, China Junyi Chai, Department of Computing, The Hong Kong Polytechnic University, Hung Hom, Hong Kong, China Journal International Journal of Knowledge-Based and Intelligent Engineering Systems Online ISSN 1875-8827 Print ISSN 1327-2314 Journal Volume Volume 18 Journal Issue Volume 18, Number 3 / 2014
    Print ISSN: 1327-2314
    Electronic ISSN: 1875-8827
    Topics: Computer Science
    Published by IOS Press
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 55
    Publication Date: 2014-11-07
    Description: The purpose of security checks at airports is to achieve a reduction in the risk of malevolent attacks on the aviation system. The introduction of new security measures aims at reducing this perceived level of risk, and often takes place as a direct reaction to (attempted) attacks. This procedure means that offenders remain one step ahead of security agents. The aim of the approach presented here is to overcome this shortfall by supporting decision-making in the context of airport security by a systematically created knowledge base. The combination of two well-accepted methods – scenario analysis and structural complexity management – supports a structured knowledge acquisition process that serves as a basis for the proactive identification of system weaknesses. Furthermore, this combination of methods can be applied to the search for optimisation potentials concerned with possible future threats. The basis for the approach is composed of threat scenario components, security measures and dependencies between these elements. A Multiple-Domain Matrix is applied for system modelling. Clustering of threat scenarios and intensity of relations to security measures are used for analysis. The interpretation of findings makes use of portfolio representations. Content Type Journal Article Pages 191-200 DOI 10.3233/KES-140300 Authors Mara Cole, Bauhaus Luftfahrt e.V., Munich, Germany Maik Maurer, Institute of Product Development, Technische Universität München, Garching, Germany Journal International Journal of Knowledge-Based and Intelligent Engineering Systems Online ISSN 1875-8827 Print ISSN 1327-2314 Journal Volume Volume 18 Journal Issue Volume 18, Number 3 / 2014
    Print ISSN: 1327-2314
    Electronic ISSN: 1875-8827
    Topics: Computer Science
    Published by IOS Press
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 56
    Publication Date: 2014-11-07
    Description: A generalized Bayesian inference nets model (GBINM) to aid developers to construct self-adaptive Bayesian inference nets for various applications and a new approach of defining and assigning statistical parameters to Bayesian inference nodes needed to calculate propagation of probabilities and address uncertainties are proposed. GBINM and the proposed approach are applied to design an intelligent medical system to diagnose cardiovascular diseases. Thousands of site-sampled clinical data are used for designing and testing such a constructed system. The preliminary diagnostic results show that the proposed methodology has salient validity and effectiveness Content Type Journal Article Pages 181-190 DOI 10.3233/KES-140299 Authors Booma Devi Sekar, Department of ECE, Faculty of Science and Technology, University of Macau, Macau, China Mingchui Dong, Department of ECE, Faculty of Science and Technology, University of Macau, Macau, China Journal International Journal of Knowledge-Based and Intelligent Engineering Systems Online ISSN 1875-8827 Print ISSN 1327-2314 Journal Volume Volume 18 Journal Issue Volume 18, Number 3 / 2014
    Print ISSN: 1327-2314
    Electronic ISSN: 1875-8827
    Topics: Computer Science
    Published by IOS Press
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 57
    Publication Date: 2014-11-07
    Description: This paper presents a cost optimization model for scheduling scientific workflows on IaaS clouds such as Amazon EC2 or RackSpace. We assume multiple IaaS clouds with heterogeneous virtual machine instances, with limited number of instances per cloud and hourly billing. Input and output data are stored on a cloud object store such as Amazon S3. Applications are scientific workflows modeled as DAGs as in the Pegasus Workflow Management System. We assume that tasks in the workflows are grouped into levels of identical tasks. Our model is specified using mathematical programming languages (AMPL and CMPL) and allows us to minimize the cost of workflow execution under deadline constraints. We present results obtained using our model and the benchmark workflows representing real scientific applications in a variety of domains. The data used for evaluation come from the synthetic workflows, from general purpose cloud benchmarks, as well as from the data measured in our own experiments with Montage, an astronomical application, executed on Amazon EC2 cloud. We indicate how this model can be used for scenarios that require resource planning for scientific workflows and their ensembles. Content Type Journal Article Pages - DOI 10.3233/SPR-140406 Authors Maciej Malawski, Department of Computer Science AGH, Kraków, Poland Kamil Figiela, Department of Computer Science AGH, Kraków, Poland Marian Bubak, Department of Computer Science AGH, Kraków, Poland Ewa Deelman, USC Information Sciences Institute, Marina del Rey, CA, USA Jarek Nabrzyski, Center for Research Computing, University of Notre Dame, Notre Dame, IN, USA. E-mails: malawski@agh.edu.pl, kfigiela@agh.edu.pl, bubak@agh.edu.pl, deelman@isi.edu, naber@nd.edu Journal Scientific Programming Online ISSN 1875-919X Print ISSN 1058-9244
    Print ISSN: 1058-9244
    Topics: Computer Science
    Published by IOS Press
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 58
    Publication Date: 2014-11-07
    Description: by Ryan Tasseff, Anjali Bheda-Malge, Teresa DiColandrea, Charles C. Bascom, Robert J. Isfort, Richard Gelinas The hair cycle is a dynamic process where follicles repeatedly move through phases of growth, retraction, and relative quiescence. This process is an example of temporal and spatial biological complexity. Understanding of the hair cycle and its regulation would shed light on many other complex systems relevant to biological and medical research. Currently, a systematic characterization of gene expression and summarization within the context of a mathematical model is not yet available. Given the cyclic nature of the hair cycle, we felt it was important to consider a subset of genes with periodic expression. To this end, we combined several mathematical approaches with high-throughput, whole mouse skin, mRNA expression data to characterize aspects of the dynamics and the possible cell populations corresponding to potentially periodic patterns. In particular two gene clusters, demonstrating properties of out-of-phase synchronized expression, were identified. A mean field, phase coupled oscillator model was shown to quantitatively recapitulate the synchronization observed in the data. Furthermore, we found only one configuration of positive-negative coupling to be dynamically stable, which provided insight on general features of the regulation. Subsequent bifurcation analysis was able to identify and describe alternate states based on perturbation of system parameters. A 2-population mixture model and cell type enrichment was used to associate the two gene clusters to features of background mesenchymal populations and rapidly expanding follicular epithelial cells. Distinct timing and localization of expression was also shown by RNA and protein imaging for representative genes. Taken together, the evidence suggests that synchronization between expanding epithelial and background mesenchymal cells may be maintained, in part, by inhibitory regulation, and potential mediators of this regulation were identified. Furthermore, the model suggests that impairing this negative regulation will drive a bifurcation which may represent transition into a pathological state such as hair miniaturization.
    Print ISSN: 1553-734X
    Electronic ISSN: 1553-7358
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 59
    Publication Date: 2014-11-07
    Description: by Varsha Dhankani, J. Nathan Kutz, Joshua T. Schiffer Herpes simplex virus-2 (HSV-2) is a chronic reactivating infection that leads to recurrent shedding episodes in the genital tract. A minority of episodes are prolonged, and associated with development of painful ulcers. However, currently, available tools poorly predict viral trajectories and timing of reactivations in infected individuals. We employed principal components analysis (PCA) and singular value decomposition (SVD) to interpret HSV-2 genital tract shedding time series data, as well as simulation output from a stochastic spatial mathematical model. Empirical and model-derived, time-series data gathered over 〉30 days consists of multiple complex episodes that could not be reduced to a manageable number of descriptive features with PCA and SVD. However, single HSV-2 shedding episodes, even those with prolonged duration and complex morphologies consisting of multiple erratic peaks, were consistently described using a maximum of four dominant features. Modeled and clinical episodes had equivalent distributions of dominant features, implying similar dynamics in real and simulated episodes. We applied linear discriminant analysis (LDA) to simulation output and identified that local immune cell density at the viral reactivation site had a predictive effect on episode duration, though longer term shedding suggested chaotic dynamics and could not be predicted based on spatial patterns of immune cell density. These findings suggest that HSV-2 shedding patterns within an individual are impossible to predict over weeks or months, and that even highly complex single HSV-2 episodes can only be partially predicted based on spatial distribution of immune cell density.
    Print ISSN: 1553-734X
    Electronic ISSN: 1553-7358
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 60
    facet.materialart.
    Unknown
    Public Library of Science (PLoS)
    Publication Date: 2014-11-07
    Description: by Marcin J. Skwark, Daniele Raimondi, Mirco Michel, Arne Elofsson Given sufficient large protein families, and using a global statistical inference approach, it is possible to obtain sufficient accuracy in protein residue contact predictions to predict the structure of many proteins. However, these approaches do not consider the fact that the contacts in a protein are neither randomly, nor independently distributed, but actually follow precise rules governed by the structure of the protein and thus are interdependent. Here, we present PconsC2, a novel method that uses a deep learning approach to identify protein-like contact patterns to improve contact predictions. A substantial enhancement can be seen for all contacts independently on the number of aligned sequences, residue separation or secondary structure type, but is largest for β -sheet containing proteins. In addition to being superior to earlier methods based on statistical inferences, in comparison to state of the art methods using machine learning, PconsC2 is superior for families with more than 100 effective sequence homologs. The improved contact prediction enables improved structure prediction.
    Print ISSN: 1553-734X
    Electronic ISSN: 1553-7358
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 61
    Publication Date: 2014-11-07
    Description: by Veronika Boskova, Sebastian Bonhoeffer, Tanja Stadler Quantifying epidemiological dynamics is crucial for understanding and forecasting the spread of an epidemic. The coalescent and the birth-death model are used interchangeably to infer epidemiological parameters from the genealogical relationships of the pathogen population under study, which in turn are inferred from the pathogen genetic sequencing data. To compare the performance of these widely applied models, we performed a simulation study. We simulated phylogenetic trees under the constant rate birth-death model and the coalescent model with a deterministic exponentially growing infected population. For each tree, we re-estimated the epidemiological parameters using both a birth-death and a coalescent based method, implemented as an MCMC procedure in BEAST v2.0. In our analyses that estimate the growth rate of an epidemic based on simulated birth-death trees, the point estimates such as the maximum a posteriori/maximum likelihood estimates are not very different. However, the estimates of uncertainty are very different. The birth-death model had a higher coverage than the coalescent model, i.e. contained the true value in the highest posterior density (HPD) interval more often (2–13% vs. 31–75% error). The coverage of the coalescent decreases with decreasing basic reproductive ratio and increasing sampling probability of infecteds. We hypothesize that the biases in the coalescent are due to the assumption of deterministic rather than stochastic population size changes. Both methods performed reasonably well when analyzing trees simulated under the coalescent. The methods can also identify other key epidemiological parameters as long as one of the parameters is fixed to its true value. In summary, when using genetic data to estimate epidemic dynamics, our results suggest that the birth-death method will be less sensitive to population fluctuations of early outbreaks than the coalescent method that assumes a deterministic exponentially growing infected population.
    Print ISSN: 1553-734X
    Electronic ISSN: 1553-7358
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 62
    Publication Date: 2014-11-07
    Description: by Junaid Hassan, Linda L. Bergaust, I. David Wheat, Lars R. Bakken In response to impending anoxic conditions, denitrifying bacteria sustain respiratory metabolism by producing enzymes for reducing nitrogen oxyanions/-oxides (NO x ) to N 2 (denitrification). Since denitrifying bacteria are non-fermentative, the initial production of denitrification proteome depends on energy from aerobic respiration. Thus, if a cell fails to synthesise a minimum of denitrification proteome before O 2 is completely exhausted, it will be unable to produce it later due to energy-limitation. Such entrapment in anoxia is recently claimed to be a major phenomenon in batch cultures of the model organism Paracoccus denitrificans on the basis of measured e − -flow rates to O 2 and NO x . Here we constructed a dynamic model and explicitly simulated actual kinetics of recruitment of the cells to denitrification to directly and more accurately estimate the recruited fraction (). Transcription of nirS is pivotal for denitrification, for it triggers a cascade of events leading to the synthesis of a full-fledged denitrification proteome. The model is based on the hypothesis that nirS has a low probability (, h −1 ) of initial transcription, but once initiated, the transcription is greatly enhanced through positive feedback by NO, resulting in the recruitment of the transcribing cell to denitrification. We assume that the recruitment is initiated as [O 2 ] falls below a critical threshold and terminates (assuming energy-limitation) as [O 2 ] exhausts. With  = 0.005 h −1 , the model robustly simulates observed denitrification kinetics for a range of culture conditions. The resulting (fraction of the cells recruited to denitrification) falls within 0.038–0.161. In contrast, if the recruitment of the entire population is assumed, the simulated denitrification kinetics deviate grossly from those observed. The phenomenon can be understood as a ‘bet-hedging strategy’: switching to denitrification is a gain if anoxic spell lasts long but is a waste of energy if anoxia turns out to be a ‘false alarm’.
    Print ISSN: 1553-734X
    Electronic ISSN: 1553-7358
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 63
    Publication Date: 2014-11-07
    Description: by Thomas R. Caulfield, Fabienne C. Fiesel, Elisabeth L. Moussaud-Lamodière, Daniel F. A. R. Dourado, Samuel C. Flores, Wolfdieter Springer Loss-of-function mutations in PINK1 or PARKIN are the most common causes of autosomal recessive Parkinson's disease. Both gene products, the Ser/Thr kinase PINK1 and the E3 Ubiquitin ligase Parkin, functionally cooperate in a mitochondrial quality control pathway. Upon stress, PINK1 activates Parkin and enables its translocation to and ubiquitination of damaged mitochondria to facilitate their clearance from the cell. Though PINK1-dependent phosphorylation of Ser65 is an important initial step, the molecular mechanisms underlying the activation of Parkin's enzymatic functions remain unclear. Using molecular modeling, we generated a complete structural model of human Parkin at all atom resolution. At steady state, the Ub ligase is maintained inactive in a closed, auto-inhibited conformation that results from intra-molecular interactions. Evidently, Parkin has to undergo major structural rearrangements in order to unleash its catalytic activity. As a spark, we have modeled PINK1-dependent Ser65 phosphorylation in silico and provide the first molecular dynamics simulation of Parkin conformations along a sequential unfolding pathway that could release its intertwined domains and enable its catalytic activity. We combined free (unbiased) molecular dynamics simulation, Monte Carlo algorithms, and minimal-biasing methods with cell-based high content imaging and biochemical assays. Phosphorylation of Ser65 results in widening of a newly defined cleft and dissociation of the regulatory N-terminal UBL domain. This motion propagates through further opening conformations that allow binding of an Ub-loaded E2 co-enzyme. Subsequent spatial reorientation of the catalytic centers of both enzymes might facilitate the transfer of the Ub moiety to charge Parkin. Our structure-function study provides the basis to elucidate regulatory mechanisms and activity of the neuroprotective Parkin. This may open up new avenues for the development of small molecule Parkin activators through targeted drug design.
    Print ISSN: 1553-734X
    Electronic ISSN: 1553-7358
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 64
    Publication Date: 2014-11-08
    Description: Background: Accurate prediction of cancer prognosis based on gene expression data is generally difficult, and identifying robust prognostic markers for cancer remains a challenging problem. Recent studies have shown that modular markers, such as pathway markers and subnetwork markers, can provide better snapshots of the underlying biological mechanisms by incorporating additional biological information, thereby leading to more accurate cancer classification. Results: In this paper, we propose a novel method for simultaneously identifying robust synergistic subnetwork markers that can accurately predict cancer prognosis. The proposed method utilizes an efficient message-passing algorithm called affinity propagation, based on which we identify groups ? or subnetworks ? of discriminative and synergistic genes, whose protein products are closely located in the protein-protein interaction (PPI) network. Unlike other existing subnetwork marker identification methods, our proposed method can simultaneously identify multiple nonoverlapping subnetwork markers that can synergistically predict cancer prognosis. Conclusions: Evaluation results based on multiple breast cancer datasets demonstrate that the proposed message-passing approach can identify robust subnetwork markers in the human PPI network, which have higher discriminative power and better reproducibility compared to those identified by previous methods. The identified subnetwork makers can lead to better cancer classifiers with improved overall performance and consistency across independent cancer datasets.
    Print ISSN: 1687-4145
    Electronic ISSN: 1687-4153
    Topics: Biology , Computer Science
    Published by Springer
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 65
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2014-11-08
    Description: In microprocessor-based systems, such as the cloud computing infrastructure, high reliability is essential. As multiprocessor systems become more widespread and increasingly complex, system-level diagnosis will increasingly be adopted to determine their robustness. In this paper, we consider a pessimistic diagnostic strategy for hypermesh multiprocessor systems under the PMC model. The pessimistic strategy is a diagnostic process whereby all faulty processors are correctly identified and at most one fault-free processor may be misjudged to be a faulty processor. We first determine the pessimistic diagnosability of a hypermesh to be ${2}{{n}}({{k}} - {1}) - {{k}}$ . We then propose an efficient pessimistic diagnostic algorithm to identify at most ${ 2}{{n}}({{k}} - { 1}) - {{k}}$ faults in ${{O}}({{N}})$ time, where ${mbi{k}}$ is the radix, ${mbi{n}}$ is the number of dimensions, and ${{N}} = {{k^n}}$ is the total number of processors. This result is superior to the best precise diagnostic algorithm, which runs in ${{O}}({{N}}{log _{{k}}}{{N}})$ time. Furthermore, the Cartesian product network, a subgraph of the hypermesh and the proposed algorithm can be employed to determine faults in the product network.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 66
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2014-11-08
    Description: In a top- $k$ Geometric Intersection Query (top- $k$ GIQ) problem, a set of $n$ weighted, geometric objects in ${bb R}^d$ is to be pre-processed into a compact data structure so that for any query geometric object, $q$ , and integer $k>0$ , the $k$ largest-weight objects intersected by $q$ can be reported efficiently. While the top- $k$ problem has been studied extensively for non-geometric problems (e.g., recommender systems), the geometric version has received little attention. This paper gives a general technique to solve any top-
    Print ISSN: 1041-4347
    Electronic ISSN: 1558-2191
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 67
    Publication Date: 2014-11-08
    Description: Rule induction method based on rough set theory (RST) has received much attention recently since it may generate a minimal set of rules from the decision system for real-life applications by using of attribute reduction and approximations. The decision system may vary with time, e.g., the variation of objects, attributes and attribute values. The reduction and approximations of the decision system may alter on Attribute Values’ Coarsening and Refining (AVCR), a kind of variation of attribute values, which results in the alteration of decision rules simultaneously. This paper aims for dynamic maintenance of decision rules $w.r.t.$ AVCR. The definition of minimal discernibility attribute set is proposed firstly, which aims to improve the efficiency of attribute reduction in RST. Then, principles of updating decision rules in case of AVCR are discussed. Furthermore, the rough set-based methods for updating decision rules in the inconsistent decision system are proposed. The complexity analysis and extensive experiments on UCI data sets have verified the effectiveness and efficiency of the proposed methods.
    Print ISSN: 1041-4347
    Electronic ISSN: 1558-2191
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 68
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2014-11-08
    Description: A major mining task for binary matrixes is the extraction of approximate top- (k) patterns that are able to concisely describe the input data. The top- (k) pattern discovery problem is commonly stated as an optimization one, where the goal is to minimize a given cost function, see the accuracy of the data description. In this work, we review several greedy algorithms, and discuss PaNDa + , an algorithmic framework able to optimize different cost functions generalized into a unifying formulation. We evaluated the goodness of the algorithm by measuring the quality of the extracted patterns. We adapted standard quality measures to assess the capability of the algorithm to discover both the items and transactions of the patterns embedded in the data. The evaluation was conducted on synthetic data, where patterns were artificially embedded, and on real-world text collection, where each document is labeled with a topic. Finally, in order to qualitatively evaluate the usefulness of the discovered patterns, we exploited PaNDa + to detect overlapping communities in a bipartite network. The results show that PaNDa + is able to discover high-quality patterns in both synthetic and real-world datasets.
    Print ISSN: 1041-4347
    Electronic ISSN: 1558-2191
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 69
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2014-11-08
    Description: Mining network evolution has emerged as an intriguing research topic in many domains such as data mining, social networks, and machine learning. While a bulk of research has focused on mining evolutionary patterns of homogeneous networks (e.g., networks of friends), however, most real-world networks are heterogeneous, containing objects of different types, such as authors, papers, venues, and terms in a bibliographic network. Modeling co-evolution of multityped objects can capture richer information than that on single-typed objects alone. For example, studying co-evolution of authors, venues, and terms in a bibliographic network can tell better the evolution of research areas than just examining co-author network or term network alone. In this paper, we study mining co-evolution of multityped objects in a special type of heterogeneous networks, called star networks, and examine how the multityped objects influence each other in the network evolution. A hierarchical Dirichlet process mixture model-based evolution model is proposed, which detects the co-evolution of multityped objects in the form of multityped cluster evolution in dynamic star networks. An efficient inference algorithm is provided to learn the proposed model. Experiments on several real networks (DBLP, Twitter, and Delicious) validate the effectiveness of the model and the scalability of the algorithm.
    Print ISSN: 1041-4347
    Electronic ISSN: 1558-2191
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 70
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2014-11-08
    Description: In this work, we define cost-free learning (CFL) formally in comparison with cost-sensitive learning (CSL). The main difference between them is that a CFL approach seeks optimal classification results without requiring any cost information, even in the class imbalance problem. In fact, several CFL approaches exist in the related studies, such as sampling and some criteria-based approaches. However, to our best knowledge, none of the existing CFL and CSL approaches are able to process the abstaining classifications properly when no information is given about errors and rejects. Based on information theory, we propose a novel CFL which seeks to maximize normalized mutual information of the targets and the decision outputs of classifiers. Using the strategy, we can handle binary/multi-class classifications with/without abstaining. Significant features are observed from the new strategy. While the degree of class imbalance is changing, the proposed strategy is able to balance the errors and rejects accordingly and automatically. Another advantage of the strategy is its ability of deriving optimal rejection thresholds for abstaining classifications and the “equivalent” costs in binary classifications. The connection between rejection thresholds and ROC curve is explored. Empirical investigation is made on several benchmark data sets in comparison with other existing approaches. The classification results demonstrate a promising perspective of the strategy in machine learning.
    Print ISSN: 1041-4347
    Electronic ISSN: 1558-2191
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 71
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2014-11-08
    Description: Multivariate time series are common in many application domains, particularly in industrial processes with a large number of sensors installed for process monitoring and control. Often, such data encapsulate complex relations among individual series. This paper presents a new type of patterns in multivariate time series, referred to as temporal associations, to capture a wide range of local relations along and across individual series. A scalable algorithm is developed to discover frequent associations by incorporating (1) redundancy pruning of patterns in single time series and (2) two conditions to avoid over-counting the occurrences of associations, thus greatly reducing the space and runtime complexity of the discovery process. A statistical significance measure is also introduced for ranking and post-pruning discovered associations. To evaluate the proposed method, synthetic data sets and a real world data set taken from the time series mining repository as well as a large data set obtained from a delayed coking plant are used. The experiments demonstrated that the discovered associations capture the local relations in multiple time series and that the proposed method is scalable to large data sets.
    Print ISSN: 1041-4347
    Electronic ISSN: 1558-2191
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 72
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2014-11-08
    Description: Short texts are popular on today’s web, especially with the emergence of social media. Inferring topics from large scale short texts becomes a critical but challenging task for many content analysis tasks. Conventional topic models such as latent Dirichlet allocation (LDA) and probabilistic latent semantic analysis (PLSA) learn topics from document-level word co-occurrences by modeling each document as a mixture of topics, whose inference suffers from the sparsity of word co-occurrence patterns in short texts. In this paper, we propose a novel way for short text topic modeling, referred as biterm topic model (BTM) . BTM learns topics by directly modeling the generation of word co-occurrence patterns (i.e., biterms) in the corpus, making the inference effective with the rich corpus-level information. To cope with large scale short text data, we further introduce two online algorithms for BTM for efficient topic learning. Experiments on real-word short text collections show that BTM can discover more prominent and coherent topics, and significantly outperform the state-of-the-art baselines. We also demonstrate the appealing performance of the two online BTM algorithms on both time efficiency and topic learning.
    Print ISSN: 1041-4347
    Electronic ISSN: 1558-2191
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 73
    Publication Date: 2014-11-08
    Description: In the literature about association analysis, many interestingness measures have been proposed to assess the quality of obtained association rules in order to select a small set of the most interesting among them. In the particular case of hierarchically organized items and generalized association rules connecting them, a measure that dealt appropriately with the hierarchy would be advantageous. Here we present the further developments of a new class of such hierarchical interestingness measures and compare them with a large set of conventional measures and with three hierarchical pruning methods from the literature. The aim is to find interesting pairwise generalized association rules connecting the concepts of multiple ontologies. Interested in the broad empirical evaluation of interestingness measures, we compared the rules obtained by 37 methods on four real world data sets against predefined ground truth sets of associations. To this end, we adopted a framework of instance-based ontology matching and extended the set of performance measures by two novel measures: relation learning recall and precision which take into account hierarchical relationships.
    Print ISSN: 1041-4347
    Electronic ISSN: 1558-2191
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 74
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2014-11-08
    Description: A highly comparative, feature-based approach to time series classification is introduced that uses an extensive database of algorithms to extract thousands of interpretable features from time series. These features are derived from across the scientific time-series analysis literature, and include summaries of time series in terms of their correlation structure, distribution, entropy, stationarity, scaling properties, and fits to a range of time-series models. After computing thousands of features for each time series in a training set, those that are most informative of the class structure are selected using greedy forward feature selection with a linear classifier. The resulting feature-based classifiers automatically learn the differences between classes using a reduced number of time-series properties, and circumvent the need to calculate distances between time series. Representing time series in this way results in orders of magnitude of dimensionality reduction, allowing the method to perform well on very large data sets containing long time series or time series of different lengths. For many of the data sets studied, classification performance exceeded that of conventional instance-based classifiers, including one nearest neighbor classifiers using euclidean distances and dynamic time warping and, most importantly, the features selected provide an understanding of the properties of the data set, insight that can guide further scientific investigation.
    Print ISSN: 1041-4347
    Electronic ISSN: 1558-2191
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 75
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2014-11-08
    Description: The explosive usage of social media produces massive amount of unlabeled and high-dimensional data. Feature selection has been proven to be effective in dealing with high-dimensional data for efficient learning and data mining. Unsupervised feature selection remains a challenging task due to the absence of label information based on which feature relevance is often assessed. The unique characteristics of social media data further complicate the already challenging problem of unsupervised feature selection, e.g., social media data is inherently linked, which makes invalid the independent and identically distributed assumption, bringing about new challenges to unsupervised feature selection algorithms. In this paper, we investigate a novel problem of feature selection for social media data in an unsupervised scenario. In particular, we analyze the differences between social media data and traditional attribute-value data, investigate how the relations extracted from linked data can be exploited to help select relevant features, and propose a novel unsupervised feature selection framework, LUFS, for linked social media data. We systematically design and conduct systemic experiments to evaluate the proposed framework on data sets from real-world social media websites. The empirical study demonstrates the effectiveness and potential of our proposed framework.
    Print ISSN: 1041-4347
    Electronic ISSN: 1558-2191
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 76
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2014-11-08
    Description: The discovery of process models from event logs has emerged as one of the crucial problems for enabling the continuous support in the life-cycle of an information system. However, in a decade of process discovery research, the algorithms and tools that have appeared are known to have strong limitations in several dimensions. The size of the logs and the formal properties of the model discovered are the two main challenges nowadays. In this paper we propose the use of numerical abstract domains for tackling these two problems, for the particular case of the discovery of Petri nets. First, numerical abstract domains enable the discovery of general process models, requiring no knowledge (e.g., the bound of the Petri net to derive) for the discovery algorithm. Second, by using divide and conquer techniques we are able to control the size of the process discovery problems. The methods proposed in this paper have been implemented in a prototype tool and experiments are reported illustrating the significance of this fresh view of the process discovery problem.
    Print ISSN: 1041-4347
    Electronic ISSN: 1558-2191
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 77
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2014-11-08
    Description: Given a real world graph, how should we lay-out its edges? How can we compress it? These questions are closely related, and the typical approach so far is to find clique-like communities, like the ‘cavemen graph’, and compress them. We show that the block-diagonal mental image of the ‘cavemen graph’ is the wrong paradigm, in full agreement with earlier results that real world graphs have no good cuts. Instead, we propose to envision graphs as a collection of hubs connecting spokes, with super-hubs connecting the hubs, and so on, recursively. Based on the idea, we propose the SlashBurn method to recursively split a graph into hubs and spokes connected only by the hubs. We also propose techniques to select the hubs and give an ordering to the spokes, in addition to the basic SlashBurn. We give theoretical analysis of the proposed hub selection methods. Our view point has several advantages: (a) it avoids the ‘no good cuts’ problem, (b) it gives better compression, and (c) it leads to faster execution times for matrix-vector operations, which are the back-bone of most graph processing tools. Through experiments, we show that SlashBurn consistently outperforms other methods for all data sets, resulting in better compression and faster running time. Moreover, we show that SlashBurn with the appropriate spokes ordering can further improve compression while hardly sacrificing the running time.
    Print ISSN: 1041-4347
    Electronic ISSN: 1558-2191
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 78
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2014-11-08
    Description: In this paper, we introduce “task trail” to understand user search behaviors. We define a task to be an atomic user information need, whereas a task trail represents all user activities within that particular task, such as query reformulations, URL clicks. Previously, web search logs have been studied mainly at session or query level where users may submit several queries within one task and handle several tasks within one session. Although previous studies have addressed the problem of task identification, little is known about the advantage of using task over session or query for search applications. In this paper, we conduct extensive analyses and comparisons to evaluate the effectiveness of task trails in several search applications: determining user satisfaction, predicting user search interests, and suggesting related queries. Experiments on large scale data sets of a commercial search engine show that: (1) Task trail performs better than session and query trails in determining user satisfaction; (2) Task trail increases webpage utilities of end users comparing to session and query trails; (3) Task trails are comparable to query trails but more sensitive than session trails in measuring different ranking functions; (4) Query terms from the same task are more topically consistent to each other than query terms from different tasks; (5) Query suggestion based on task trail is a good complement of query suggestions based on session trail and click-through bipartite. The findings in this paper verify the need of extracting task trails from web search logs and enhance applications in search and recommendation systems.
    Print ISSN: 1041-4347
    Electronic ISSN: 1558-2191
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 79
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2014-11-08
    Description: This paper studies the problem of mining named entity translations by aligning comparable corpora. Current state-of-the-art approaches mine a translation pair by aligning an entity graph in one language to another based on node similarity or propagated similarity of related entities. However, they, building on the assumption of “symmetry”, quickly deteriorate on “weakly” comparable corpora with some asymmetry. In this paper, we pursue two directions for overcoming relation and entity asymmetry respectively. The first approach starts from weakly comparable corpora (for high recall) then ensures precision by selective propagation only to entities of symmetric relations. The second approach starts from parallel corpora (for high precision) then enhances recall by extending the translation matrix based on node similarity and contextual similarity. Our experimental results on English-Chinese corpora show that both approaches are effective and complementary. Our combined approach outperforms the best-performing baseline in terms of F1-score by up to 0.28.
    Print ISSN: 1041-4347
    Electronic ISSN: 1558-2191
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 80
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2014-11-08
    Description: The knowledge remembered by the human body and reflected by the dexterity of body motion is called embodied knowledge. In this paper, we propose a new method using singular value decomposition for extracting embodied knowledge from the time-series data of the motion. We compose a matrix from the time-series data and use the left singular vectors of the matrix as the patterns of the motion and the singular values as a scalar, by which each corresponding left singular vector affects the matrix. Two experiments were conducted to validate the method. One is a gesture recognition experiment in which we categorize gesture motions by two kinds of models with indexes of similarity and estimation that use left singular vectors. The proposed method obtained a higher correct categorization ratio than principal component analysis (PCA) and correlation efficiency (CE). The other is an ambulation evaluation experiment in which we distinguished the levels of walking disability. The first singular values derived from the walking acceleration were suggested to be a reliable criterion to evaluate walking disability. Finally we discuss the characteristic and significance of the embodied knowledge extraction using the singular value decomposition proposed in this paper.
    Print ISSN: 1041-4347
    Electronic ISSN: 1558-2191
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 81
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2014-11-08
    Description: Edit distance is widely used for measuring the similarity between two strings. As a primitive operation, edit distance based string similarity search is to find strings in a collection that are similar to a given query string using edit distance. Existing approaches for answering such string similarity queries follow the filter-and-verify framework by using various indexes. Typically, most approaches assume that indexes and data sets are maintained in main memory. To overcome this limitation, in this paper, we propose B $^+$ -tree based approaches to answer edit distance based string similarity queries, and hence, our approaches can be easily integrated into existing RDBMSs. In general, we answer string similarity search using pruning techniques employed in the metric space in that edit distance is a metric. First, we split the string collection into partitions according to a set of reference strings. Then, we index strings in all partitions using a single B $^+$ -tree based on the distances of these strings to their corresponding reference strings. Finally, we propose two approaches to efficiently answer range and KNN queries, respectively, based on the B $^+$ -tree. We prove that the optimal partitioning of the data set is an NP-hard problem, and therefore propose a heuristic approach for selecting the reference strings greedily and present an optimal partition assignment strategy to minimize the expected number of strings that need to be verified during the query evaluation. Through extensive experiments over a variety of real data sets, we demonstrate that our B $^+$ -tree based approaches provide superior performance over state-of-the-art techniques on both range and KNN queries in most cases.
    Print ISSN: 1041-4347
    Electronic ISSN: 1558-2191
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 82
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2014-11-08
    Description: A top- k query retrieves the best (k) tuples by assigning scores for each tuple in a target relation with respect to a user-specific scoring function. This paper studies the problem of constructing an indexing structure for supporting top- k queries over varying scoring functions and retrieval sizes. The existing research efforts can be categorized into three approaches: list- , layer- , and view-based approaches. In this paper, we mainly focus on the layer-based approach that pre-materializes tuples into consecutive multiple layers. We first propose a dual-resolution layer that consists of coarse-level and fine-level layers. Specifically, we build coarse-level layers using skylines , and divide each coarse-level layer into fine-level sublayers using convex skylines . To make our proposed dual-resolution layer scalable , we then address the following optimization directions: 1) index construction; 2) disk-based storage scheme; 3) the design of the virtual layer; and 4) index maintenance for tuple updates. Our evaluation results show that our proposed method is more scalable than the state-of-the-art methods.
    Print ISSN: 1041-4347
    Electronic ISSN: 1558-2191
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 83
    Publication Date: 2014-11-09
    Description: Semantic Web services frameworks provides the means to automatically discover, rank, compose and invoke services according to user requirements and preferences. However, current preference models offer limited expressiveness and they are tightly coupled with underlying discovery and ranking mechanisms. Furthermore, these mechanisms present performance, interoperability and integration issues that prevent the uptake of semantic technologies in these scenarios. In this work, we discuss three interrelated contributions on preference modeling, discovery optimization, and flexible, integrated ranking, tackling specifically the identified challenges on those areas using a lightweight approach. Content Type Journal Article Pages - DOI 10.3233/AIC-140644 Authors José María García, University of Seville, Sevilla, Spain. E-mail: josemgarcia@us.es Journal AI Communications Online ISSN 1875-8452 Print ISSN 0921-7126
    Print ISSN: 0921-7126
    Electronic ISSN: 1875-8452
    Topics: Computer Science
    Published by IOS Press
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 84
    Publication Date: 2014-11-09
    Description: Background: The rapid accumulation of whole-genome data has renewed interest in the study of using gene-order data for phylogenetic analyses and ancestral reconstruction. Current software and web servers typically do not support duplication and loss events along with rearrangements. Results: MLGOMLGO (Maximum Likelihood for Gene-Order Analysis) is a web tool for the reconstruction of phylogeny and/or ancestral genomes from gene-order data. MLGOMLGO is based on likelihood computation and shows advantages over existing methods in terms of accuracy, scalability and flexibility. Conclusions: To the best of our knowledge, it is the first web tool for analysis of large-scale genomic changes including not only rearrangements but also gene insertions, deletions and duplications. The web tool is available from http://www.geneorder.org/server.php.
    Electronic ISSN: 1471-2105
    Topics: Biology , Computer Science
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 85
    Publication Date: 2014-11-09
    Description: This work proposes novel methodologies to improve the use of Light Detection And Ranging (LiDAR) for environmental purposes, especially for thematic mapping (LiDAR only or fused with other remote sensors) and the estimation of forest variables. The methodologies make use of well-known techniques from soft computing (machine learning and evolutionary computation) and their adaptation to develop LiDAR-derived products. Content Type Journal Article Pages - DOI 10.3233/AIC-140643 Authors Jorge Garcia-Gutierrez, Department of Computer Languages and Systems, University of Seville, Seville, Spain. E-mail: jorgarcia@us.es Journal AI Communications Online ISSN 1875-8452 Print ISSN 0921-7126
    Print ISSN: 0921-7126
    Electronic ISSN: 1875-8452
    Topics: Computer Science
    Published by IOS Press
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 86
    Publication Date: 2014-11-11
    Print ISSN: 0340-1200
    Electronic ISSN: 1432-0770
    Topics: Biology , Computer Science , Physics
    Published by Springer
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 87
    Publication Date: 2014-11-11
    Electronic ISSN: 1867-0202
    Topics: Computer Science , Economics
    Published by Springer
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 88
    facet.materialart.
    Unknown
    Springer
    In: Computing
    Publication Date: 2014-11-11
    Description: When optimizing performance on a GPU, control flow divergence of threads in one warp can make up the possible performance bottlenecks. In our hand-coded GPU stencil computation optimization, with a view to remove this control flow divergence brought by conventional mapping method between global memory and shared memory, we devise a new mapping mechanism by modeling the coalesced memory accesses of GPU threads and the aligned ghost zone overheads to remove conditional statements of the boundary XY-tile stencil computation points for improved performance. In addition, we utilize only one XY-tile loaded into registers in every stencil computation iteration, common sub-expression elimination and software prefetching to reduce overheads. Finally, detailed performance evaluation demonstrates that global memory access traffic is close to the idealized lower bound value through our optimized policies, that is to say, in every computed point of one XY-tile the memory access traffic is roughly 6 and 4 % more than 8 bytes per XY-tile point of the idealized lower bound memory access traffic in which ghost zone overheads are not taken into consideration on Tesla C2050 and Kepler K20X respectively.
    Print ISSN: 0010-485X
    Electronic ISSN: 1436-5057
    Topics: Computer Science
    Published by Springer
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 89
    Publication Date: 2014-11-04
    Description: by The PLOS Computational Biology Staff
    Print ISSN: 1553-734X
    Electronic ISSN: 1553-7358
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 90
    Publication Date: 2014-11-04
    Description: by The PLOS Computational Biology Staff
    Print ISSN: 1553-734X
    Electronic ISSN: 1553-7358
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 91
    Publication Date: 2014-11-05
    Description: Background: The major histocompatibility complex (MHC) is responsible for presenting antigens (epitopes) on the surface of antigen-presenting cells (APCs). When pathogen-derived epitopes are presented by MHC class II on an APC surface, T cells may be able to trigger an specific immune response. Prediction of MHC-II epitopes is particularly challenging because the open binding cleft of the MHC-II molecule allows epitopes to bind beyond the peptide binding groove; therefore, the molecule is capable of accommodating peptides of variable length. Among the methods proposed to predict MHC-II epitopes, artificial neural networks (ANNs) and support vector machines (SVMs) are the most effective methods. We propose a novel classification algorithm to predict MHC-II called sparse representation via l1-minimization. Results: We obtained a collection of experimentally confirmed MHC-II epitopes from the Immune Epitope Database and Analysis Resource (IEDB) and applied our l1-minimization algorithm. To benchmark the performance of our proposed algorithm, we compared our predictions against a SVM classifier. We measured sensitivity, specificity and accuracy; then we used Receiver Operating Characteristic (ROC) analysis to evaluate the performance of our method.The prediction performance of MHC-II epitopes of the l1-minimization algorithm was generally comparable and, in some cases, superior to the standard SVM classification method and overcame the lack of robustness of other methods with respect to outliers. While our method consistently favored DPPS encoding with the alleles tested, SVM showed a slightly better accuracy when "11-factor" encoding was used. Conclusions: l1-minimization has similar accuracy than SVM, and has additional advantages, such as overcoming the lack of robustness with respect to outliers. With l1-minimization no model selection dependency is involved.
    Electronic ISSN: 1756-0381
    Topics: Biology , Computer Science
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 92
    Publication Date: 2014-11-05
    Description: Reasoning that overexpression of multiple E2F-responsive genes might be a useful marker for RB1 dysfunction, we compiled a list of E2F-responsive genes from the literature and evaluated their expression in publicly available gene expression microarray data of patients with breast cancer, serous ovarian cancer, and prostate cancer. In breast cancer, a group of tumors was identified, each of which simultaneously overexpressed multiple E2F-responsive genes. Seventy percent of these genes were concerned with cell cycle progression, DNA repair, or mitosis. These E2F-responsive gene overexpressing (ERGO) tumors frequently exhibited additional evidence of Rb/E2F axis dysfunction, were mostly triple negative, and preferentially overexpressed multiple basal cytokeratins, suggesting that they overlapped substantially with the basal-like tumor subset. ERGO tumors were also identified in serous ovarian cancer and prostate cancer. In these cancer types, there was no evidence for a tumor subset comparable to the breast cancer basal-like subset. A core group of about 30 E2F-responsive genes were overexpressed in all three cancer types. Thus, it appears that disorders of the Rb/E2F axis can arise at multiple organ sites and produce tumors that simultaneously overexpress multiple E2F-responsive genes.
    Electronic ISSN: 1176-9351
    Topics: Computer Science , Medicine
    Published by Libertas Academica
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 93
    Publication Date: 2014-11-05
    Description: A variety of applications, such as information extraction, intrusion detection and protein fold recognition, can be expressed as sequences of discrete events or elements (rather than unordered sets of features), that is, there is an order dependence among the elements composing each data instance. These applications may be modeled as classification problems, and in this case the classifier should exploit sequential interactions among the elements, so that the ordering relationship among them is properly captured. Dominant approaches to this problem include: (i) learning Hidden Markov Models, (ii) exploiting frequent sequences extracted from the data and (iii) computing string kernels. Such approaches, however, are computationally hard and vulnerable to noise, especially if the data shows long range dependencies (i.e., long subsequences are necessary in order to model the data). In this paper we provide simple algorithms that build highly effective sequential classifiers. Our algorithms are based on enumerating approximately contiguous subsequences from the training set on a demand-driven basis, exploiting a lightweight and flexible subsequence matching function and an innovative subsequence enumeration strategy called pattern silhouettes , making our learning algorithms fast and the corresponding classifiers robust to noisy data. Our empirical results on a variety of datasets indicate that the best trade-off between accuracy and learning time is usually obtained by limiting the length of the subsequences by a factor of \(\log {n}\) , which leads to a \(O(n\log {n})\) learning cost (where \(n\) is the length of the sequence being classified). Finally, we show that, in most of the cases, our classifiers are faster than existing solutions (sometimes, by orders of magnitude), also providing significant accuracy improvements in most of the evaluated cases.
    Print ISSN: 1384-5810
    Electronic ISSN: 1573-756X
    Topics: Computer Science
    Published by Springer
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 94
    Publication Date: 2014-11-05
    Description: An important challenge in the domain of vehicular ad hoc networks (VANET) is the scalability of data dissemination. Under dense traffic conditions, the large number of communicating vehicles can easily result in a congested wireless channel. In that situation, delays and packet losses increase to a level where the VANET cannot be applied for road safety applications anymore. This paper introduces scalable data dissemination in vehicular ad hoc networks (SDDV), a holistic solution to this problem. It is composed of several techniques spread across the different layers of the protocol stack. Simulation results are presented that illustrate the severity of the scalability problem when applying common state-of-the-art techniques and parameters. Starting from such a baseline solution, optimization techniques are gradually added to SDDV until the scalability problem is entirely solved. Besides the performance evaluation based on simulations, the paper ends with an evaluation of the final SDDV configuration on real hardware. Experiments including 110 nodes are performed on the iMinds w-iLab.t wireless lab. The results of these experiments confirm the results obtained in the corresponding simulations.
    Print ISSN: 1687-1472
    Electronic ISSN: 1687-1499
    Topics: Electrical Engineering, Measurement and Control Technology , Computer Science
    Published by Springer
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 95
    Publication Date: 2014-11-05
    Description: Adaptive spatial multiplexing (SM) Multiple Input Multiple Output (MIMO) techniques enhance the spectral efficiency of wideband wireless communications in favorable radio channel conditions. In this study, we show the benefits of combining a traditional fixed-beam scheme and multiuser opportunistic radio resource allocation in space-time-frequency domains. This combining is feasible in wideband Orthogonal Frequency Domain Multiple Access (OFDMA) systems such as the Universal Mobile Telecommunications System (UMTS) Long Term Evolution (LTE). This study brings novel knowledge which indicates that an orthogonal fixed beam approach benefits from opportunistic radio resource allocation clearly more than a conventional antenna domain approach in frequency-selective radio channels. It was found out that the fixed beam radio link performance is enhanced in wideband adaptive radio links due to the fact that orthogonal beams reduce correlation between the MIMO channels in the allocated sub-bands. Moreover, beamforming gains bring higher Eigenvalues of the MIMO channel matrix in the opportunistically selected sub-bands. It is shown that the fixed beam deployment changes the distribution of the MIMO channel correlation values and Eigenvalues in a manner which enhances opportunistic multiuser gains in wideband time-frequency-selective MIMO radio channels. Simulation results with 2 x 2 spatial multiplexing in an OFDMA uplink indicate that the proposed beam domain scheme gives up to 80% data throughput gain over the corresponding antenna domain scheme in a pedestrian microcell environment.
    Print ISSN: 1687-1472
    Electronic ISSN: 1687-1499
    Topics: Electrical Engineering, Measurement and Control Technology , Computer Science
    Published by Springer
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 96
    Publication Date: 2014-11-05
    Description: The constantly growing amount ofWeb content and the success of the SocialWeb lead to increasing needs for Web archiving. These needs go beyond the pure preservationo of Web pages. Web archives are turning into “community memories” that aim at building a better understanding of the public view on, e.g., celebrities, court decisions and other events. Due to the size of the Web, the traditional “collect-all” strategy is in many cases not the best method to build Web archives. In this paper, we present the ARCOMEM (From Future Internet 2014, 6 689 Collect-All Archives to Community Memories) architecture and implementation that uses semantic information, such as entities, topics and events, complemented with information from the Social Web to guide a novel Web crawler. The resulting archives are automatically enriched with semantic meta-information to ease the access and allow retrieval based on conditions that involve high-level concepts..
    Electronic ISSN: 1999-5903
    Topics: Computer Science
    Published by MDPI Publishing
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 97
    Publication Date: 2014-11-05
    Description: There is excitement within the algorithms community about a new partitioning method introduced by Yaroslavskiy. This algorithm renders Quicksort slightly faster than the case when it runs under classic partitioning methods. We show that this improved performance in Quicksort is not sustained in Quickselect; a variant of Quicksort for finding order statistics. We investigate the number of comparisons made by Quickselect to find a key with a randomly selected rank under Yaroslavskiy’s algorithm. This grand averaging is a smoothing operator over all individual distributions for specific fixed order statistics. We give the exact grand average. The grand distribution of the number of comparison (when suitably scaled) is given as the fixed-point solution of a distributional equation of a contraction in the Zolotarev metric space. Our investigation shows that Quickselect under older partitioning methods slightly outperforms Quickselect under Yaroslavskiy’s algorithm, for an order statistic of a random rank. Similar results are obtained for extremal order statistics, where again we find the exact average, and the distribution for the number of comparisons (when suitably scaled). Both limiting distributions are of perpetuities (a sum of products of independent mixed continuous random variables).
    Print ISSN: 0178-4617
    Electronic ISSN: 1432-0541
    Topics: Computer Science , Mathematics
    Published by Springer
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 98
    Publication Date: 2014-11-05
    Description: Data mining is the process of determining new, unanticipated, valuable patterns from existing databases by considering historical and recent developments in statistics, artificial intelligence, and machine learning. It can help companies focus on the most important information in their data warehouses. Association rule mining is one of the most highly researched and popular data mining techniques for finding associations between items in a set. It is frequently used in marketing, advertising, and inventory control. Typically, association rules only consider items in transactions (positive association rules). They do not consider items that do not occur together, which can be used to create rules that are also useful for market basket analysis. Also, existing algorithms often generate too many candidate itemsets when mining the data and scan the database multiple times. To resolve these issues in association rule mining algorithms, we propose SARIC (set particle swarm optimization for association rules using the itemset range and correlation coefficient). Our method uses set particle swarm optimization to generate association rules from a database and considers both positive and negative occurrences of attributes. SARIC applies the itemset range and correlation coefficient so that we do not need to specify the minimum support and confidence, because it automatically determines them quickly and objectively. We verified the efficiency of SARIC using two differently sized databases. Our simulation results demonstrate that SARIC generates more promising results than Apriori, Eclat, HMINE, and a genetic algorithm.
    Print ISSN: 0219-1377
    Electronic ISSN: 0219-3116
    Topics: Computer Science
    Published by Springer
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 99
    Publication Date: 2014-11-05
    Description: The wideband code division multiple access (WCDMA) network planning problem requires to determine the location and the configuration parameters of the base stations (BSs) so as to maximize the capacity and minimize the installation cost. This problem can be formulated as a complex set covering problem. Compared to the classical set covering problems, the coverage area of each BS is unknown in advance. This makes that the selection of each BS location and configuration parameters is determined by the location and configuration parameters of the neighbor BSs. Accordingly, we will conduct a competition and cooperation model based on the re-covered area of the BSs to measure the relationship of the BSs. Then, an efficient genetic operation based on this model is proposed to generate new-quality solutions. Further, four BS configuration parameters, i.e., the antenna height, antenna tilt, sector orientation and pilot signal power, are taken into account as well. Since there are too many combination levels of the configuration parameters, an encoding method based on orthogonal design is presented to reduce the search space. Subsequently, we merge the proposed encoding method and genetic operation into the multiobjective evolutionary algorithm-based decomposition (MOEA/D-M2M) to solve the WCDMA network planning problem. Simulation results show the efficacy of the proposed encoding and genetic operation in comparison with the existing counterpart.
    Print ISSN: 0219-1377
    Electronic ISSN: 0219-3116
    Topics: Computer Science
    Published by Springer
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 100
    facet.materialart.
    Unknown
    Springer
    Publication Date: 2014-12-19
    Description: We study an online version of linear Fisher market. In this market there are \(m\) buyers and a set of \(n\) dividable goods to be allocated to the buyers. The utility that buyer \(i\) derives from good \(j\) is \(u_{ij}\) . Given an allocation \(\hat{U}\) in which buyer \(i\) has utility \(\hat{U}_i\) we study a quality measure that is based on taking an average of the ratios \(U_{i}/\hat{U}_i\) with respect to any other allocation \(U\) . Market equilibrium allocation is the optimal solution with respect to this measure. Our setting is online and so the allocation of each good should be done without any knowledge of the upcoming goods. We design an online algorithm for the problem that is only worse by a logarithmic factor than any other solution with respect to this quality measure, and in particular competes with the market equilibrium allocation. We prove a tight lower bound which shows that our algorithm is optimal up to constants. Our algorithm uses a primal dual convex programming scheme. To the best of our knowledge this is the first time that such a scheme is used in the online framework.
    Print ISSN: 0178-4617
    Electronic ISSN: 1432-0541
    Topics: Computer Science , Mathematics
    Published by Springer
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
Close ⊗
This website uses cookies and the analysis tool Matomo. More information can be found here...