ALBERT

All Library Books, journals and Electronic Records Telegrafenberg

feed icon rss

Your email was sent successfully. Check your inbox.

An error occurred while sending the email. Please try again.

Proceed reservation?

Export
  • 1
    Electronic Resource
    Electronic Resource
    Springer
    Data mining and knowledge discovery 2 (1998), S. 195-224 
    ISSN: 1573-756X
    Keywords: association rules ; data mining and relational databases
    Source: Springer Online Journal Archives 1860-2000
    Topics: Computer Science
    Notes: Abstract Data mining evolved as a collection of applicative problems and efficient solution algorithms relative to rather peculiar problems, all focused on the discovery of relevant information hidden in databases of huge dimensions. In particular, one of the most investigated topics is the discovery of association rules. This work proposes a unifying model that enables a uniform description of the problem of discovering association rules. The model provides a SQL-like operator, named X⇒Y, which is capable of expressing all the problems presented so far in the literature concerning the mining of association rules. We demonstrate the expressive power of the new operator by means of several examples, some of which are classical, while some others are fully original and correspond to novel and unusual applications. We also present the operational semantics of the operator by means of an extended relational algebra.
    Type of Medium: Electronic Resource
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 2
    Electronic Resource
    Electronic Resource
    Springer
    Distributed and parallel databases 1 (1993), S. 337-382 
    ISSN: 1573-7578
    Keywords: Recursion ; parallel algorithms ; query optimization ; deductive databases
    Source: Springer Online Journal Archives 1860-2000
    Topics: Computer Science
    Notes: Abstract An important feature of database technology of the nineties is the use of parallelism for speeding up the execution of complex queries. This technology is being tested in several experimental database architectures and a few commercial systems for conventional select-project-join queries. In particular, hash-based fragmentation is used to distribute data to disks under the control of different processors in order to perform selections and joins in parallel. With the development of new query languages, and in particular with the definition of transitive closure queries and of more general logic programming queries, the new dimension of recursion has been added to query processing. Recursive queries are complex; at the same time, their regular structure is particularly suited for parallel execution, and parallelism may give a high efficiency gain. We survey the approaches to parallel execution of recursive queries that have been presented in the recent literature. We observe that research on parallel execution of recursive queries is separated into two distinct subareas, one focused on the transitive closure of Relational Algebra expressions, the other one focused on optimization of more general Datalog queries. Though the subareas seem radically different because of the approach and formalism used, they have many common features. This is not surprising, because most typical Datalog queries can be solved by means of the transitive closure of simple algebraic expressions. We first analyze the relationship between the transitive closure of expressions in Relational Algebra and Datalog programs. We then review sequential methods for evaluating transitive closure, distinguishing iterative and direct methods. We address the parallelization of these methods, by discussing various forms of parallelization. Data fragmentation plays an important role in obtaining parallel execution; we describe hash-based and semantic fragmentation. Finally, we consider Datalog queries, and present general methods for parallel rule execution; we recognize the similarities between these methods and the methods reviewed previously, when the former are applied to linear Datalog queries. We also provide a quantitative analysis that shows the impact of the initial data distribution on the performance of methods.
    Type of Medium: Electronic Resource
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 3
    Electronic Resource
    Electronic Resource
    Springer
    Journal of intelligent information systems 7 (1996), S. 129-149 
    ISSN: 1573-7675
    Keywords: database design ; active databases ; design support environment
    Source: Springer Online Journal Archives 1860-2000
    Topics: Computer Science
    Notes: Abstract The lack of tools for rule generation, analysis, and run-time monitoring appears one of the main obstacles to the widespreading of active database applications. This paper describes a complete tool environment for assisting the design of active rules applications; the tools were developed at Politecnico di Milano in the context of the IDEA Project, a 4-years Esprit project sponsored by the European Commission which was launched in June 1992. We describe tools for active rule generation, analysis, debugging, and browsing; rules are defined in Chimera, a conceptual design model and language for the specification of active rules applications. We also introduce a tool for mapping from Chimera into Oracle, a relational product supporting triggers. Most of the tools described in this paper are fully implemented and currently in operation (beta-testing) within the companies participating to the IDEA Project, with the exception of two of them (called Argonaut-V and Pandora), which will be completed by the end of 1996.
    Type of Medium: Electronic Resource
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 4
    Electronic Resource
    Electronic Resource
    Springer
    Journal of systems integration 7 (1997), S. 327-347 
    ISSN: 1573-8787
    Keywords: active rule debugging ; active databases ; design support environment
    Source: Springer Online Journal Archives 1860-2000
    Topics: Computer Science
    Notes: Abstract Especially during the design and tuning of active rules, it is possible that rule execution enters an endless loop, where rules “cascade” by triggering each other indefinitely, so that their processing does not terminate. Commercial systems detect this situation in a simple way, by keeping counters on the number or depth of cascading rules, and suspending an execution when the counters exceed given thresholds. However, the setting of these counters is quite critical: too low thresholds may cause the halting of rule processing in absence of loops, too high thresholds may reveal a loop only after an expensive processing. In this paper, we propose a technique for revealing loops, which is based on recognizing that a given situation has already occurred in the past and therefore will occur an infinite number of times in the future. We exploit this property to develop cycle monitors, which check at run time that critical rule sequences, detected at compile time, do not repeat forever. We describe the run-time monitoring environment of Chimera, an active DBMS prototype currently under development at the Politecnico di Milano, and we illustrate with a concrete applicative example the results obtained with the cycle monitoring technique.
    Type of Medium: Electronic Resource
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 5
    Publication Date: 2016-12-01
    Print ISSN: 1046-2023
    Electronic ISSN: 1095-9130
    Topics: Biology , Medicine
    Published by Elsevier
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 6
    Publication Date: 2020-09-12
    Description: Next Generation Sequencing technologies have produced a substantial increase of publicly available genomic data and related clinical/biospecimen information. New models and methods to easily access, integrate and search them effectively are needed. An effort was made by the Genomic Data Commons (GDC), which defined strict procedures for harmonizing genomic and clinical data of cancer, and created the GDC data portal with its application programming interface (API). In this work, we enhance GDC harmonization by applying a state of the art data model (called Genomic Data Model) made of two components: the genomic data, in Browser Extensible Data (BED) format, and the related metadata, in a tab-delimited key-value format. Furthermore, we extend the GDC genomic data with information extracted from other public genomic databases (e.g., GENCODE, HGNC and miRBase). For metadata, we implemented automatic procedures to extract and normalize them, recognizing and eliminating redundant ones, from both Clinical/Biospecimen Supplements and GDC Data Model, that are present on the two sources of GDC (i.e., data portal and API). We developed and released the OpenGDC software, which is able to extract, integrate, extend, and standardize genomic and clinical data of The Cancer Genome Atlas (TCGA) from the GDC. Additionally, we created a publicly accessible repository, containing such homogenized and enhanced TCGA data (resulting in about 1.3 TB). Our approach, implemented in the OpenGDC software, provides a step forward to the effective and efficient management of big genomic and clinical data of cancer. The strong usability of our data model and utility of our work is demonstrated through the application of the GenoMetric Query Language (GMQL) on the transformed TCGA data from the GDC, achieving promising results, facilitating information retrieval and knowledge discovery analyses.
    Electronic ISSN: 2076-3417
    Topics: Natural Sciences in General
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 7
    Publication Date: 2017-04-01
    Print ISSN: 0020-0255
    Electronic ISSN: 1872-6291
    Topics: Computer Science
    Published by Elsevier
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 8
    Publication Date: 2020-07-07
    Description: Motivation With the spreading of biological and clinical uses of next-generation sequencing (NGS) data, many laboratories and health organizations are facing the need of sharing NGS data resources and easily accessing and processing comprehensively shared genomic data; in most cases, primary and secondary data management of NGS data is done at sequencing stations, and sharing applies to processed data. Based on the previous single-instance GMQL system architecture, here we review the model, language and architectural extensions that make the GMQL centralized system innovatively open to federated computing. Results A well-designed extension of a centralized system architecture to support federated data sharing and query processing. Data is federated thanks to simple data sharing instructions. Queries are assigned to execution nodes; they are translated into an intermediate representation, whose computation drives data and processing distributions. The approach allows writing federated applications according to classical styles: centralized, distributed or externalized. Availability The federated genomic data management system is freely available for non-commercial use as an open source project at http://www.bioinformatics.deib.polimi.it/FederatedGMQLsystem/ Contact {arif.canakoglu, pietro.pinoli}@polimi.it Summary
    Print ISSN: 1467-5463
    Electronic ISSN: 1477-4054
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 9
    Publication Date: 2020-10-12
    Description: ViruSurf, available at http://gmql.eu/virusurf/, is a large public database of viral sequences and integrated and curated metadata from heterogeneous sources (RefSeq, GenBank, COG-UK and NMDC); it also exposes computed nucleotide and amino acid variants, called from original sequences. A GISAID-specific ViruSurf database, available at http://gmql.eu/virusurf_gisaid/, offers a subset of these functionalities. Given the current pandemic outbreak, SARS-CoV-2 data are collected from the four sources; but ViruSurf contains other virus species harmful to humans, including SARS-CoV, MERS-CoV, Ebola and Dengue. The database is centered on sequences, described from their biological, technological and organizational dimensions. In addition, the analytical dimension characterizes the sequence in terms of its annotations and variants. The web interface enables expressing complex search queries in a simple way; arbitrary search queries can freely combine conditions on attributes from the four dimensions, extracting the resulting sequences. Several example queries on the database confirm and possibly improve results from recent research papers; results can be recomputed over time and upon selected populations. Effective search over large and curated sequence data may enable faster responses to future threats that could arise from new viruses.
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 10
    Publication Date: 2019-11-08
    Electronic ISSN: 1471-2105
    Topics: Biology , Computer Science
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
Close ⊗
This website uses cookies and the analysis tool Matomo. More information can be found here...