ALBERT — All Library Books, journals and Electronic Records Telegrafenberg

1

Unknown

Efficient Indexing of Top-k Entities in Systems of Engagement with Extensions for Geo-tagged Entities (2021)

Mondal, Anirban ; Kakkar, Ayaan ; Padhariya, Nilesh ; [et al.]

Springer

In: Data Science and Engineering. 2021; 6(4): 411-433. Published 2021 Oct 11. doi: 10.1007/s41019-021-00173-1.

add to mindlist on the mindlist

Details

Publication Date: 2021-10-11

Description: Next-generation enterprise management systems are beginning to be developed based on the Systems of Engagement (SOE) model. We visualize an SOE as a set of entities. Each entity is modeled by a single parent document with dynamic embedded links (i.e., child documents) that contain multi-modal information about the entity from various networks. Since entities in an SOE are generally queried using keywords, our goal is to efficiently retrieve the top-k entities related to a given keyword-based query by considering the relevance scores of both their parent and child documents. Furthermore, we extend the afore-mentioned problem to incorporate the case where the entities are geo-tagged. The main contributions of this work are three-fold. First, it proposes an efficient bitmap-based approach for quickly identifying the candidate set of entities, whose parent documents contain all queried keywords. A variant of this approach is also proposed to reduce memory consumption by exploiting skews in keyword popularity. Second, it proposes the two-tier HI-tree index, which uses both hashing and inverted indexes, for efficient document relevance score lookups. Third, it proposes an R-tree-based approach to extend the afore-mentioned approaches for the case where the entities are geo-tagged. Fourth, it performs comprehensive experiments with both real and synthetic datasets to demonstrate that our proposed schemes are indeed effective in providing good top-k result recall performance within acceptable query response times.

Print ISSN: 2364-1185

Electronic ISSN: 2364-1541

Topics: Computer Science

Published by Springer

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

2

Unknown

Context-Based Resilience in Cyber-Physical Production System (2021)

Bagozi, Ada ; Bianchini, Devis ; Antonellis, Valeria De

Springer

In: Data Science and Engineering. 2021; 6(4): 434-454. Published 2021 Oct 11. doi: 10.1007/s41019-021-00172-2.

add to mindlist on the mindlist

Details

Publication Date: 2021-10-11

Description: Cyber-physical systems are hybrid networked cyber and engineered physical elements that record data (e.g. using sensors), analyse them using connected services, influence physical processes and interact with human actors using multi-channel interfaces. Examples of CPS interacting with humans in industrial production environments are the so-called cyber-physical production systems (CPPS), where operators supervise the industrial machines, according to the human-in-the-loop paradigm. In this scenario, research challenges for implementing CPPS resilience, promptly reacting to faults, concern: (i) the complex structure of CPPS, which cannot be addressed as a monolithic system, but as a dynamic ecosystem of single CPS interacting and influencing each other; (ii) the volume, velocity and variety of data (Big Data) on which resilience is based, which call for novel methods and techniques to ensure recovery procedures; (iii) the involvement of human factors in these systems. In this paper, we address the design of resilient cyber-physical production systems (R-CPPS) in digital factories by facing these challenges. Specifically, each component of the R-CPPS is modelled as a smart machine, that is, a cyber-physical system equipped with a set of recovery services, a Sensor Data API used to collect sensor data acquired from the physical side for monitoring the component behaviour, and an operator interface for displaying detected anomalous conditions and notifying necessary recovery actions to on-field operators. A context-based mediator, at shop floor level, is in charge of ensuring resilience by gathering data from the CPPS, selecting the proper recovery actions and invoking corresponding recovery services on the target CPS. Finally, data summarisation and relevance evaluation techniques are used for supporting the identification of anomalous conditions in the presence of high volume and velocity of data collected through the Sensor Data API. The approach is validated in a food industry real case study.

Print ISSN: 2364-1185

Electronic ISSN: 2364-1541

Topics: Computer Science

Published by Springer

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

3

Unknown

A Crowd-Powered Task Generation Method for Study of Struggling Search (2021)

Xu, Luyan ; Zhou, Xuan

Springer

In: Data Science and Engineering. 2021; 6(4): 472-484. Published 2021 Sep 09. doi: 10.1007/s41019-021-00171-3.

add to mindlist on the mindlist

Details

Publication Date: 2021-09-09

Description: Evaluation of interactive search systems and study of users’ struggling search behaviors require a significant number of search tasks. However, generation of such tasks is inherently difficult, as each task is supposed to trigger struggling search behavior rather than simple search behavior. To the best of our knowledge, there has not been a commonly used task set for research in struggling search. Moreover, the everchanging landscape of information needs would render old task sets less ideal if not unusable for evaluation. To deal with this problem, we propose a crowd-powered task generation method and develop a platform to efficiently generate struggling search tasks on basis of online wikis such as Wikipedia. Our experiments and analysis show that the generated tasks are qualified to emulate struggling search behaviors consisting of “repeated similar queries” and “quick-back clicks”; tasks of diverse topics, high quality and difficulty can be created using this method. For benefit of the community, we publicly released a task generation platform TaskGenie, a task set of 80 topically diverse struggling search tasks with “baselines,” and the corresponding anonymized user behavior logs.

Print ISSN: 2364-1185

Electronic ISSN: 2364-1541

Topics: Computer Science

Published by Springer

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

4

Unknown

FairLOF: Fairness in Outlier Detection (2021)

P, Deepak ; Abraham, Savitha Sam

Springer

In: Data Science and Engineering. 2021; 6(4): 485-499. Published 2021 Aug 29. doi: 10.1007/s41019-021-00169-x.

add to mindlist on the mindlist

Details

Publication Date: 2021-08-29

Description: An outlier detection method may be considered fair over specified sensitive attributes if the results of outlier detection are not skewed toward particular groups defined on such sensitive attributes. In this paper, we consider the task of fair outlier detection. Our focus is on the task of fair outlier detection over multiple multi-valued sensitive attributes (e.g., gender, race, religion, nationality and marital status, among others), one that has broad applications across modern data scenarios. We propose a fair outlier detection method, FairLOF, that is inspired by the popular LOF formulation for neighborhood-based outlier detection. We outline ways in which unfairness could be induced within LOF and develop three heuristic principles to enhance fairness, which form the basis of the FairLOF method. Being a novel task, we develop an evaluation framework for fair outlier detection, and use that to benchmark FairLOF on quality and fairness of results. Through an extensive empirical evaluation over real-world datasets, we illustrate that FairLOF is able to achieve significant improvements in fairness at sometimes marginal degradations on result quality as measured against the fairness-agnostic LOF method. We also show that a generalization of our method, named FairLOF-Flex, is able to open possibilities of further deepening fairness in outlier detection beyond what is offered by FairLOF.

Print ISSN: 2364-1185

Electronic ISSN: 2364-1541

Topics: Computer Science

Published by Springer

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

5

Unknown

Set-Based Adaptive Distributed Differential Evolution for Anonymity-Driven Database Fragmentation (2021)

Ge, Yong-Feng ; Cao, Jinli ; Wang, Hua ; [et al.]

Springer

In: Data Science and Engineering. 2021; 6(4): 380-391. Published 2021 Aug 21. doi: 10.1007/s41019-021-00170-4.

add to mindlist on the mindlist

Details

Publication Date: 2021-08-21

Description: By breaking sensitive associations between attributes, database fragmentation can protect the privacy of outsourced data storage. Database fragmentation algorithms need prior knowledge of sensitive associations in the tackled database and set it as the optimization objective. Thus, the effectiveness of these algorithms is limited by prior knowledge. Inspired by the anonymity degree measurement in anonymity techniques such as k-anonymity, an anonymity-driven database fragmentation problem is defined in this paper. For this problem, a set-based adaptive distributed differential evolution (S-ADDE) algorithm is proposed. S-ADDE adopts an island model to maintain population diversity. Two set-based operators, i.e., set-based mutation and set-based crossover, are designed in which the continuous domain in the traditional differential evolution is transferred to the discrete domain in the anonymity-driven database fragmentation problem. Moreover, in the set-based mutation operator, each individual’s mutation strategy is adaptively selected according to the performance. The experimental results demonstrate that the proposed S-ADDE is significantly better than the compared approaches. The effectiveness of the proposed operators is verified.

Print ISSN: 2364-1185

Electronic ISSN: 2364-1541

Topics: Computer Science

Published by Springer

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

6

Unknown

Fine-Grained Multi-label Sexism Classification Using a Semi-Supervised Multi-level Neural Approach (2021)

Abburi, Harika ; Parikh, Pulkit ; Chhaya, Niyati ; [et al.]

Springer

In: Data Science and Engineering. 2021; 6(4): 359-379. Published 2021 Aug 17. doi: 10.1007/s41019-021-00168-y.

add to mindlist on the mindlist

Details

Publication Date: 2021-08-17

Description: Sexism, a permeate form of oppression, causes profound suffering through various manifestations. Given the increasing number of experiences of sexism shared online, categorizing these recollections automatically can support the battle against sexism, since it can promote successful evaluations by gender studies researchers and government representatives engaged in policy making. In this paper, we examine the fine-grained, multi-label classification of accounts (reports) of sexism. To the best of our knowledge, we consider substantially more categories of sexism than any related prior work through our 23-class problem formulation. Moreover, we present the first semi-supervised work for the multi-label classification of accounts describing any type(s) of sexism. We devise self-training-based techniques tailor-made for the multi-label nature of the problem to utilize unlabeled samples for augmenting the labeled set. We identify high textual diversity with respect to the existing labeled set as a desirable quality for candidate unlabeled instances and develop methods for incorporating it into our approach. We also explore ways of infusing class imbalance alleviation for multi-label classification into our semi-supervised learning, independently and in conjunction with the method involving diversity. In addition to data augmentation methods, we develop a neural model which combines biLSTM and attention with a domain-adapted BERT model in an end-to-end trainable manner. Further, we formulate a multi-level training approach in which models are sequentially trained using categories of sexism of different levels of granularity. Moreover, we devise a loss function that exploits any label confidence scores associated with the data. Several proposed methods outperform various baselines on a recently released dataset for multi-label sexism categorization across several standard metrics.

Print ISSN: 2364-1185

Electronic ISSN: 2364-1541

Topics: Computer Science

Published by Springer

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

7

Unknown

Image Preprocessing in Classification and Identification of Diabetic Eye Diseases (2021)

Sarki, Rubina ; Ahmed, Khandakar ; Wang, Hua ; [et al.]

Springer

In: Data Science and Engineering. 2021; 6(4): 455-471. Published 2021 Aug 17. doi: 10.1007/s41019-021-00167-z.

add to mindlist on the mindlist

Details

Publication Date: 2021-08-17

Description: Diabetic eye disease (DED) is a cluster of eye problem that affects diabetic patients. Identifying DED is a crucial activity in retinal fundus images because early diagnosis and treatment can eventually minimize the risk of visual impairment. The retinal fundus image plays a significant role in early DED classification and identification. An accurate diagnostic model’s development using a retinal fundus image depends highly on image quality and quantity. This paper presents a methodical study on the significance of image processing for DED classification. The proposed automated classification framework for DED was achieved in several steps: image quality enhancement, image segmentation (region of interest), image augmentation (geometric transformation), and classification. The optimal results were obtained using traditional image processing methods with a new build convolution neural network (CNN) architecture. The new built CNN combined with the traditional image processing approach presented the best performance with accuracy for DED classification problems. The results of the experiments conducted showed adequate accuracy, specificity, and sensitivity.

Print ISSN: 2364-1185

Electronic ISSN: 2364-1541

Topics: Computer Science

Published by Springer

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

8

Unknown

Predicting Geolocation of Tweets: Using Combination of CNN and BiLSTM (2021)

Mahajan, Rhea ; Mansotra, Vibhakar

Springer

In: Data Science and Engineering. 2021; 6(4): 402-410. Published 2021 Jul 08. doi: 10.1007/s41019-021-00165-1.

add to mindlist on the mindlist

Details

Publication Date: 2021-07-08

Description: Twitter is one of the most popular micro-blogging and social networking platforms where users post their opinions, preferences, activities, thoughts, views, etc., in form of tweets within the limit of 280 characters. In order to study and analyse the social behavior and activities of a user across a region, it becomes necessary to identify the location of the tweet. This paper aims to predict geolocation of real-time tweets at the city level collected for a period of 30 days by using a combination of convolutional neural network and a bidirectional long short-term memory by extracting features within the tweets and features associated with the tweets. We have also compared our results with previous baseline models and the findings of our experiment show a significant improvement over baselines methods achieving an accuracy of 92.6 with a median error of 22.4 km at city level prediction.

Print ISSN: 2364-1185

Electronic ISSN: 2364-1541

Topics: Computer Science

Published by Springer

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

9

Unknown

Top-k Competitive Location Selection over Moving Objects (2021)

Liu, Ping ; Wang, Meng ; Cui, Jiangtao ; [et al.]

Springer

In: Data Science and Engineering. 2021; 6(4): 392-401. Published 2021 May 31. doi: 10.1007/s41019-021-00157-1.

add to mindlist on the mindlist

Details

Publication Date: 2021-05-31

Description: The location selection (LS) problem identifies an optimal site to place a new facility such that its influence on given objects can be maximized. With the proliferation of GPS-enabled mobile devices, LS studies have made progress for moving objects. However, the state-of-the-art LS techniques over moving objects assume the new facility has no competitor, which is too restrictive and unrealistic for real-world business. In this paper we study Competitive Location Selection over Moving objects (CLS-M), which takes into account competition against existing facilities in mobile scenarios. We present a competition-based influence score model to evaluate the influence of a candidate. To solve the problem, we propose an influence pruning algorithm to prune objects who are either influenced by inferior candidates or affected by no candidate. Experimental study over two real-world datasets demonstrates that the proposed algorithm outperforms state-of-the-art LS techniques in terms of efficiency.

Print ISSN: 2364-1185

Electronic ISSN: 2364-1541

Topics: Computer Science

Published by Springer

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

10

Unknown

Efficient Personalized Influential Community Search in Large Networks (2021)

Wu, Yanping ; Zhao, Jun ; Sun, Renjie ; [et al.]

Springer

In: Data Science and Engineering. 2021; 6(3): 310-322. Published 2021 Apr 29. doi: 10.1007/s41019-021-00163-3.

add to mindlist on the mindlist

Details

Publication Date: 2021-04-29

Description: Community search, which aims to retrieve important communities (i.e., subgraphs) for a given query vertex, has been widely studied in the literature. In the recent, plenty of research is conducted to detect influential communities, where each vertex in the network is associated with an influence value. Nevertheless, there is a paucity of work that can support personalized requirement. In this paper, we propose a new problem, i.e., maximal personalized influential community search. Given a graph G, an integer k and a query vertex u, we aim to obtain the most influential community for u by leveraging the k-core concept. To handle larger networks efficiently, two algorithms, i.e., top-down algorithm and bottom-up algorithm, are developed. In real-life applications, there may be a lot of queries issued. Therefore, an optimal index-based approach is proposed in order to meet the online requirement. In many scenarios, users may want to find multiple communities for a given query. Thus, we further extend the proposed techniques for the top-r case, i.e., retrieving r communities with the largest influence value for a given query. Finally, we conduct extensive experiments on 6 real-world networks to demonstrate the advantage of proposed techniques.

Print ISSN: 2364-1185

Electronic ISSN: 2364-1541

Topics: Computer Science

Published by Springer

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext