ALBERT — All Library Books, journals and Electronic Records Telegrafenberg

1

Electronic Resource

An Extension to SQL for Mining Association Rules (1998)

Meo, Rosa ; Psaila, Giuseppe ; Ceri, Stefano

Springer

Data mining and knowledge discovery 2 (1998), S. 195-224

add to mindlist on the mindlist

Details

ISSN: 1573-756X

Keywords: association rules ; data mining and relational databases

Source: Springer Online Journal Archives 1860-2000

Topics: Computer Science

Notes: Abstract Data mining evolved as a collection of applicative problems and efficient solution algorithms relative to rather peculiar problems, all focused on the discovery of relevant information hidden in databases of huge dimensions. In particular, one of the most investigated topics is the discovery of association rules. This work proposes a unifying model that enables a uniform description of the problem of discovering association rules. The model provides a SQL-like operator, named X⇒Y, which is capable of expressing all the problems presented so far in the literature concerning the mining of association rules. We demonstrate the expressive power of the new operator by means of several examples, some of which are classical, while some others are fully original and correspond to novel and unusual applications. We also present the operational semantics of the operator by means of an extended relational algebra.

Type of Medium: Electronic Resource

URL: http://dx.doi.org/10.1023/A:1009774406717

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

Paper (German National Licenses)

Fulltext

2

Electronic Resource

A survey of parallel execution strategies for transitive closure and logic programs (1993)

Cacace, Filippo ; Ceri, Stefano ; Houtsma, Maurice

Springer

Distributed and parallel databases 1 (1993), S. 337-382

add to mindlist on the mindlist

Details

ISSN: 1573-7578

Keywords: Recursion ; parallel algorithms ; query optimization ; deductive databases

Source: Springer Online Journal Archives 1860-2000

Topics: Computer Science

Notes: Abstract An important feature of database technology of the nineties is the use of parallelism for speeding up the execution of complex queries. This technology is being tested in several experimental database architectures and a few commercial systems for conventional select-project-join queries. In particular, hash-based fragmentation is used to distribute data to disks under the control of different processors in order to perform selections and joins in parallel. With the development of new query languages, and in particular with the definition of transitive closure queries and of more general logic programming queries, the new dimension of recursion has been added to query processing. Recursive queries are complex; at the same time, their regular structure is particularly suited for parallel execution, and parallelism may give a high efficiency gain. We survey the approaches to parallel execution of recursive queries that have been presented in the recent literature. We observe that research on parallel execution of recursive queries is separated into two distinct subareas, one focused on the transitive closure of Relational Algebra expressions, the other one focused on optimization of more general Datalog queries. Though the subareas seem radically different because of the approach and formalism used, they have many common features. This is not surprising, because most typical Datalog queries can be solved by means of the transitive closure of simple algebraic expressions. We first analyze the relationship between the transitive closure of expressions in Relational Algebra and Datalog programs. We then review sequential methods for evaluating transitive closure, distinguishing iterative and direct methods. We address the parallelization of these methods, by discussing various forms of parallelization. Data fragmentation plays an important role in obtaining parallel execution; we describe hash-based and semantic fragmentation. Finally, we consider Datalog queries, and present general methods for parallel rule execution; we recognize the similarities between these methods and the methods reviewed previously, when the former are applied to linear Datalog queries. We also provide a quantitative analysis that shows the impact of the initial data distribution on the performance of methods.

Type of Medium: Electronic Resource

URL: http://dx.doi.org/10.1007/BF01264013

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

Paper (German National Licenses)

Fulltext

3

Electronic Resource

Support environment for active rule design (1996)

Baralis, Elena ; Ceri, Stefano ; Fraternali, Piero ; [et al.]

Springer

Journal of intelligent information systems 7 (1996), S. 129-149

add to mindlist on the mindlist

Details

ISSN: 1573-7675

Keywords: database design ; active databases ; design support environment

Source: Springer Online Journal Archives 1860-2000

Topics: Computer Science

Notes: Abstract The lack of tools for rule generation, analysis, and run-time monitoring appears one of the main obstacles to the widespreading of active database applications. This paper describes a complete tool environment for assisting the design of active rules applications; the tools were developed at Politecnico di Milano in the context of the IDEA Project, a 4-years Esprit project sponsored by the European Commission which was launched in June 1992. We describe tools for active rule generation, analysis, debugging, and browsing; rules are defined in Chimera, a conceptual design model and language for the specification of active rules applications. We also introduce a tool for mapping from Chimera into Oracle, a relational product supporting triggers. Most of the tools described in this paper are fully implemented and currently in operation (beta-testing) within the companies participating to the IDEA Project, with the exception of two of them (called Argonaut-V and Pandora), which will be completed by the end of 1996.

Type of Medium: Electronic Resource

URL: http://dx.doi.org/10.1007/BF00127779

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

Paper (German National Licenses)

Fulltext

4

Electronic Resource

Debugging and Run-time Monitoring of Active Rules (1997)

Baralis, Elena ; Ceri, Stefano ; Paraboschi, Stefano

Springer

Journal of systems integration 7 (1997), S. 327-347

add to mindlist on the mindlist

Details

ISSN: 1573-8787

Keywords: active rule debugging ; active databases ; design support environment

Source: Springer Online Journal Archives 1860-2000

Topics: Computer Science

Notes: Abstract Especially during the design and tuning of active rules, it is possible that rule execution enters an endless loop, where rules “cascade” by triggering each other indefinitely, so that their processing does not terminate. Commercial systems detect this situation in a simple way, by keeping counters on the number or depth of cascading rules, and suspending an execution when the counters exceed given thresholds. However, the setting of these counters is quite critical: too low thresholds may cause the halting of rule processing in absence of loops, too high thresholds may reveal a loop only after an expensive processing. In this paper, we propose a technique for revealing loops, which is based on recognizing that a given situation has already occurred in the past and therefore will occur an infinite number of times in the future. We exploit this property to develop cycle monitors, which check at run time that critical rule sequences, detected at compile time, do not repeat forever. We describe the run-time monitoring environment of Chimera, an active DBMS prototype currently under development at the Politecnico di Milano, and we illustrate with a concrete applicative example the results obtained with the cycle monitoring technique.

Type of Medium: Electronic Resource

URL: http://dx.doi.org/10.1023/A:1008283421564

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

Paper (German National Licenses)

Fulltext

5

Unknown

Modeling and interoperability of heterogeneous genomic big data for integrative processing and querying (2016)

Masseroli, Marco ; Kaitoua, Abdulrahman ; Pinoli, Pietro ; [et al.]

Elsevier

In: Methods : A Companion to Methods in Enzymology. 2016; 111: 3-11. Published 2016 Dec 01. doi: 10.1016/j.ymeth.2016.09.002.

add to mindlist on the mindlist

Details

Publication Date: 2016-12-01

Print ISSN: 1046-2023

Electronic ISSN: 1095-9130

Topics: Biology , Medicine

Published by Elsevier

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

6

Unknown

OpenGDC: Unifying, Modeling, Integrating Cancer Genomic Data and Clinical Metadata (2020)

Cappelli, Eleonora ; Cumbo, Fabio ; Bernasconi, Anna ; [et al.]

Molecular Diversity Preservation International

In: Applied Sciences. 2020; 10(18): 6367. Published 2020 Sep 12. doi: 10.3390/app10186367.

add to mindlist on the mindlist

Details

Publication Date: 2020-09-12

Description: Next Generation Sequencing technologies have produced a substantial increase of publicly available genomic data and related clinical/biospecimen information. New models and methods to easily access, integrate and search them effectively are needed. An effort was made by the Genomic Data Commons (GDC), which defined strict procedures for harmonizing genomic and clinical data of cancer, and created the GDC data portal with its application programming interface (API). In this work, we enhance GDC harmonization by applying a state of the art data model (called Genomic Data Model) made of two components: the genomic data, in Browser Extensible Data (BED) format, and the related metadata, in a tab-delimited key-value format. Furthermore, we extend the GDC genomic data with information extracted from other public genomic databases (e.g., GENCODE, HGNC and miRBase). For metadata, we implemented automatic procedures to extract and normalize them, recognizing and eliminating redundant ones, from both Clinical/Biospecimen Supplements and GDC Data Model, that are present on the two sources of GDC (i.e., data portal and API). We developed and released the OpenGDC software, which is able to extract, integrate, extend, and standardize genomic and clinical data of The Cancer Genome Atlas (TCGA) from the GDC. Additionally, we created a publicly accessible repository, containing such homogenized and enhanced TCGA data (resulting in about 1.3 TB). Our approach, implemented in the OpenGDC software, provides a step forward to the effective and efficient management of big genomic and clinical data of cancer. The strong usability of our data model and utility of our work is demonstrated through the application of the GenoMetric Query Language (GMQL) on the transformed TCGA data from the GDC, achieving promising results, facilitating information retrieval and knowledge discovery analyses.

Electronic ISSN: 2076-3417

Topics: Natural Sciences in General

Published by Molecular Diversity Preservation International

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

7

Unknown

Indexing Next-Generation Sequencing data (2017)

Jalili, Vahid ; Matteucci, Matteo ; Masseroli, Marco ; [et al.]

Elsevier

In: Information Sciences. 2017; 384: 90-109. Published 2017 Apr 01. doi: 10.1016/j.ins.2016.08.085.

add to mindlist on the mindlist

Details

Publication Date: 2017-04-01

Print ISSN: 0020-0255

Electronic ISSN: 1872-6291

Topics: Computer Science

Published by Elsevier

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

8

Unknown

Federated sharing and processing of genomic datasets for tertiary data analysis (2020)

Canakoglu, Arif ; Pinoli, Pietro ; Gulino, Andrea ; [et al.]

Oxford University Press

In: Briefings in Bioinformatics. 2020; Published 2020 Jul 07. doi: 10.1093/bib/bbaa091. [early online release]

add to mindlist on the mindlist

Details

Publication Date: 2020-07-07

Description: Motivation With the spreading of biological and clinical uses of next-generation sequencing (NGS) data, many laboratories and health organizations are facing the need of sharing NGS data resources and easily accessing and processing comprehensively shared genomic data; in most cases, primary and secondary data management of NGS data is done at sequencing stations, and sharing applies to processed data. Based on the previous single-instance GMQL system architecture, here we review the model, language and architectural extensions that make the GMQL centralized system innovatively open to federated computing. Results A well-designed extension of a centralized system architecture to support federated data sharing and query processing. Data is federated thanks to simple data sharing instructions. Queries are assigned to execution nodes; they are translated into an intermediate representation, whose computation drives data and processing distributions. The approach allows writing federated applications according to classical styles: centralized, distributed or externalized. Availability The federated genomic data management system is freely available for non-commercial use as an open source project at http://www.bioinformatics.deib.polimi.it/FederatedGMQLsystem/ Contact {arif.canakoglu, pietro.pinoli}@polimi.it Summary

Print ISSN: 1467-5463

Electronic ISSN: 1477-4054

Topics: Biology , Computer Science

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

9

Unknown

ViruSurf: an integrated database to investigate viral sequences (2020)

Canakoglu, Arif ; Pinoli, Pietro ; Bernasconi, Anna ; [et al.]

Oxford University Press

In: Nucleic Acids Research. 2020; Published 2020 Oct 12. doi: 10.1093/nar/gkaa846. [early online release]

add to mindlist on the mindlist

Details

Publication Date: 2020-10-12

Description: ViruSurf, available at http://gmql.eu/virusurf/, is a large public database of viral sequences and integrated and curated metadata from heterogeneous sources (RefSeq, GenBank, COG-UK and NMDC); it also exposes computed nucleotide and amino acid variants, called from original sequences. A GISAID-specific ViruSurf database, available at http://gmql.eu/virusurf_gisaid/, offers a subset of these functionalities. Given the current pandemic outbreak, SARS-CoV-2 data are collected from the four sources; but ViruSurf contains other virus species harmful to humans, including SARS-CoV, MERS-CoV, Ebola and Dengue. The database is centered on sequences, described from their biological, technological and organizational dimensions. In addition, the analytical dimension characterizes the sequence in terms of its annotations and variants. The web interface enables expressing complex search queries in a simple way; arbitrary search queries can freely combine conditions on attributes from the four dimensions, extracting the resulting sequences. Several example queries on the database confirm and possibly improve results from recent research papers; results can be recomputed over time and upon selected populations. Effective search over large and curated sequence data may enable faster responses to future threats that could arise from new viruses.

Print ISSN: 0305-1048

Electronic ISSN: 1362-4962

Topics: Biology

Published by Oxford University Press

Permalink