ALBERT — All Library Books, journals and Electronic Records Telegrafenberg

1

Unknown

How to Inspect and Measure Data Quality about Scientific Publications: Use Case of Wikipedia and CRIS Databases (2020)

Azeroual, Otmane ; Lewoniewski, Włodzimierz

Molecular Diversity Preservation International

In: Algorithms. 2020; 13(5): 107. Published 2020 Apr 26. doi: 10.3390/a13050107.

add to mindlist on the mindlist

Details

Publication Date: 2020-04-26

Description: The quality assurance of publication data in collaborative knowledge bases and in current research information systems (CRIS) becomes more and more relevant by the use of freely available spatial information in different application scenarios. When integrating this data into CRIS, it is necessary to be able to recognize and assess their quality. Only then is it possible to compile a result from the available data that fulfills its purpose for the user, namely to deliver reliable data and information. This paper discussed the quality problems of source metadata in Wikipedia and CRIS. Based on real data from over 40 million Wikipedia articles in various languages, we performed preliminary quality analysis of the metadata of scientific publications using a data quality tool. So far, no data quality measurements have been programmed with Python to assess the quality of metadata from scientific publications in Wikipedia and CRIS. With this in mind, we programmed the methods and algorithms as code, but presented it in the form of pseudocode in this paper to measure the quality related to objective data quality dimensions such as completeness, correctness, consistency, and timeliness. This was prepared as a macro service so that the users can use the measurement results with the program code to make a statement about their scientific publications metadata so that the management can rely on high-quality data when making decisions.

Electronic ISSN: 1999-4893

Topics: Computer Science

Published by Molecular Diversity Preservation International

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

2

Unknown

Data measurement in research information systems: metrics for the evaluation of data quality (2018)

Azeroual, Otmane ; Saake, Gunter ; Wastl, Jürgen

Springer

In: Scientometrics. 2018; 115(3): 1271-1290. Published 2018 Apr 06. doi: 10.1007/s11192-018-2735-5.

add to mindlist on the mindlist

Details

Publication Date: 2018-04-06

Print ISSN: 0138-9130

Electronic ISSN: 1588-2861

Topics: Information Science and Librarianship , Nature of Science, Research, Systems of Higher Education, Museum Science

Published by Springer

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

3

Unknown

ETL Best Practices for Data Quality Checks in RIS Databases (2019)

Azeroual, Otmane ; Saake, Gunter ; Abuosba, Mohammad

Molecular Diversity Preservation International

In: Informatics. 2019; 6(1): 10. Published 2019 Mar 05. doi: 10.3390/informatics6010010.

add to mindlist on the mindlist

Details

Publication Date: 2019-03-05

Description: The topic of data integration from external data sources or independent IT-systems has received increasing attention recently in IT departments as well as at management level, in particular concerning data integration in federated database systems. An example of the latter are commercial research information systems (RIS), which regularly import, cleanse, transform and prepare the analysis research information of the institutions of a variety of databases. In addition, all these so-called steps must be provided in a secured quality. As several internal and external data sources are loaded for integration into the RIS, ensuring information quality is becoming increasingly challenging for the research institutions. Before the research information is transferred to a RIS, it must be checked and cleaned up. An important factor for successful or competent data integration is therefore always the data quality. The removal of data errors (such as duplicates and harmonization of the data structure, inconsistent data and outdated data, etc.) are essential tasks of data integration using extract, transform, and load (ETL) processes. Data is extracted from the source systems, transformed and loaded into the RIS. At this point conflicts between different data sources are controlled and solved, as well as data quality issues during data integration are eliminated. Against this background, our paper presents the process of data transformation in the context of RIS which gains an overview of the quality of research information in an institution’s internal and external data sources during its integration into RIS. In addition, the question of how to control and improve the quality issues during the integration process in RIS will be addressed.

Electronic ISSN: 2227-9709

Topics: Computer Science

Published by Molecular Diversity Preservation International

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

4

Unknown

Treatment of Bad Big Data in Research Data Management (RDM) Systems (2020)

Azeroual, Otmane

Molecular Diversity Preservation International

In: Big Data and Cognitive Computing. 2020; 4(4): 29. Published 2020 Oct 18. doi: 10.3390/bdcc4040029.

add to mindlist on the mindlist

Details

Publication Date: 2020-10-18

Description: Databases such as research data management systems (RDMS) provide the research data in which information is to be searched for. They provide techniques with which even large amounts of data can be evaluated efficiently. This includes the management of research data and the optimization of access to this data, especially if it cannot be fully loaded into the main memory. They also provide methods for grouping and sorting and optimize requests that are made to them so that they can be processed efficiently even when accessing large amounts of data. Research data offer one thing above all: the opportunity to generate valuable knowledge. The quality of research data is of primary importance for this. Only flawless research data can deliver reliable, beneficial results and enable sound decision-making. Correct, complete and up-to-date research data are therefore essential for successful operational processes. Wrong decisions and inefficiencies in day-to-day operations are only the tip of the iceberg, since the problems with poor data quality span various areas and weaken entire university processes. Therefore, this paper addresses the problems of data quality in the context of RDMS and tries to shed light on the solution for ensuring data quality and to show a way to fix the dirty research data that arise during its integration before it has a negative impact on business success.

Electronic ISSN: 2504-2289

Topics: Computer Science

Published by Molecular Diversity Preservation International

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

5

Unknown

Processing Big Data with Apache Hadoop in the Current Challenging Era of COVID-19 (2021)

Azeroual, Otmane ; Fabre, Renaud

Molecular Diversity Preservation International

In: Big Data and Cognitive Computing. 2021; 5(1): 12. Published 2021 Mar 09. doi: 10.3390/bdcc5010012.

add to mindlist on the mindlist

Details

Publication Date: 2021-03-09

Description: Big data have become a global strategic issue, as increasingly large amounts of unstructured data challenge the IT infrastructure of global organizations and threaten their capacity for strategic forecasting. As experienced in former massive information issues, big data technologies, such as Hadoop, should efficiently tackle the incoming large amounts of data and provide organizations with relevant processed information that was formerly neither visible nor manageable. After having briefly recalled the strategic advantages of big data solutions in the introductory remarks, in the first part of this paper, we focus on the advantages of big data solutions in the currently difficult time of the COVID-19 pandemic. We characterize it as an endemic heterogeneous data context; we then outline the advantages of technologies such as Hadoop and its IT suitability in this context. In the second part, we identify two specific advantages of Hadoop solutions, globality combined with flexibility, and we notice that they are at work with a “Hadoop Fusion Approach” that we describe as an optimal response to the context. In the third part, we justify selected qualifications of globality and flexibility by the fact that Hadoop solutions enable comparable returns in opposite contexts of models of partial submodels and of models of final exact systems. In part four, we remark that in both these opposite contexts, Hadoop’s solutions allow a large range of needs to be fulfilled, which fits with requirements previously identified as the current heterogeneous data structure of COVID-19 information. In the final part, we propose a framework of strategic data processing conditions. To the best of our knowledge, they appear to be the most suitable to overcome COVID-19 massive information challenges.

Electronic ISSN: 2504-2289

Topics: Computer Science

Published by Molecular Diversity Preservation International

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext