ALBERT — All Library Books, journals and Electronic Records Telegrafenberg

1

Unknown

Attribute interaction aware matrix factorization method for recommendation (2021)

Wan, Yongquan ; Zhu, Lihua ; Yan, Cairong ; [et al.]

IOS Press

In: Intelligent Data Analysis . 2021; 25(5): 1115-1130. Published 2021 Sep 15. doi: 10.3233/ida-205407.

add to mindlist on the mindlist

Details

Publication Date: 2021-09-15

Description: Matrix factorization (MF) models are effective and easy to expand and are widely used in industry, such as rating prediction and item recommendation. The basic MF model is relatively simple. In practical applications, side information such as attributes or implicit feedback is often combined to improve accuracy by modifying the model and optimizing the algorithm. In this paper, we propose an attribute interaction-aware matrix factorization (AIMF) method for recommendation tasks. We partition the original rating matrix into different sub-matrices according to the attribute interactions, train each sub-matrix independently, and merge all the latent vectors to generate the final score. Since the generated sub-matrices vary in size, an adaptive regularization coefficient optimization strategy and an adaptive latent vector dimension optimization strategy are proposed for sub-matrix training, and a variety of latent vector merging methods are put forward. The method AIMF has two advantages. When the original rating matrix is particularly large, the training time complexity of the MF-based model becomes higher and the update cost of the model is also higher. In AIMF, because each sub-matrix is usually much smaller than the original rating matrix, the training time complexity is greatly reduced after using parallel computing technology. Secondly, in AIMF, it is not necessary to modify the matrix factorization model to incorporate attributes and their interactive information into the model to improve the performance. The experimental results on the two classic public datasets MovieLens 1M and MovieLens 100k show that AIMF can not only effectively improve the accuracy of recommendation, but also make full use of parallel computing technology to improve training efficiency without modifying the matrix factorization model.

Print ISSN: 1088-467X

Electronic ISSN: 1571-4128

Topics: Computer Science

Published by IOS Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

2

Unknown

Predictive modelling of hospital readmission: Evaluation of different preprocessing techniques on machine learning classifiers (2021)

Miswan, Nor Hamizah ; Chan, Chee Seng ; Ng, Chong Guan

IOS Press

In: Intelligent Data Analysis . 2021; 25(5): 1073-1098. Published 2021 Sep 15. doi: 10.3233/ida-205468.

add to mindlist on the mindlist

Details

Publication Date: 2021-09-15

Description: Hospital readmission is a major cost for healthcare systems worldwide. If patients with a higher potential of readmission could be identified at the start, existing resources could be used more efficiently, and appropriate plans could be implemented to reduce the risk of readmission. Therefore, it is important to predict the right target patients. Medical data is usually noisy, incomplete, and inconsistent. Hence, before developing a prediction model, it is crucial to efficiently set up the predictive model so that improved predictive performance is achieved. The current study aims to analyse the impact of different preprocessing methods on the performance of different machine learning classifiers. The preprocessing applied by previous hospital readmission studies were compared, and the most common approaches highlighted such as missing value imputation, feature selection, data balancing, and feature scaling. The hyperparameters were selected using Bayesian optimisation. The different preprocessing pipelines were assessed using various performance metrics and computational costs. The results indicated that the preprocessing approaches helped improve the model’s prediction of hospital readmission.

Print ISSN: 1088-467X

Electronic ISSN: 1571-4128

Topics: Computer Science

Published by IOS Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

3

Unknown

Robust face recognition based on a new Kernel-PCA using RRQR factorization (2021)

Maafiri, Ayyad ; Chougdali, Khalid

IOS Press

In: Intelligent Data Analysis . 2021; 25(5): 1233-1245. Published 2021 Sep 15. doi: 10.3233/ida-205377.

add to mindlist on the mindlist

Details

Publication Date: 2021-09-15

Description: In the last ten years, many variants of the principal component analysis were suggested to fight against the curse of dimensionality. Recently, A. Sharma et al. have proposed a stable numerical algorithm based on Householder QR decomposition (HQR) called QR PCA. This approach improves the performance of the PCA algorithm via a singular value decomposition (SVD) in terms of computation complexity. In this paper, we propose a new algorithm called RRQR PCA in order to enhance the QR PCA performance by exploiting the Rank-Revealing QR Factorization (RRQR). We have also improved the recognition rate of RRQR PCA by developing a nonlinear extension of RRQR PCA. In addition, a new robust RBF Lp-norm kernel is proposed in order to reduce the effect of outliers and noises. Extensive experiments on two well-known standard face databases which are ORL and FERET prove that the proposed algorithm is more robust than conventional PCA, 2DPCA, PCA-L1, WTPCA-L1, LDA, and 2DLDA in terms of face recognition accuracy.

Print ISSN: 1088-467X

Electronic ISSN: 1571-4128

Topics: Computer Science

Published by IOS Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

4

Unknown

Virtual samples based robust block-diagonal dictionary learning for face recognition (2021)

Wang, Shuangxi ; Ge, Hongwei ; Yang, Jinlong ; [et al.]

IOS Press

In: Intelligent Data Analysis . 2021; 25(5): 1273-1290. Published 2021 Sep 15. doi: 10.3233/ida-205466.

add to mindlist on the mindlist

Details

Publication Date: 2021-09-15

Description: It is an open question to learn an over-complete dictionary from a limited number of face samples, and the inherent attributes of the samples are underutilized. Besides, the recognition performance may be adversely affected by the noise (and outliers), and the strict binary label based linear classifier is not appropriate for face recognition. To solve above problems, we propose a virtual samples based robust block-diagonal dictionary learning for face recognition. In the proposed model, the original samples and virtual samples are combined to solve the small sample size problem, and both the structure constraint and the low rank constraint are exploited to preserve the intrinsic attributes of the samples. In addition, the fidelity term can effectively reduce negative effects of noise (and outliers), and the ε-dragging is utilized to promote the performance of the linear classifier. Finally, extensive experiments are conducted in comparison with many state-of-the-art methods on benchmark face datasets, and experimental results demonstrate the efficacy of the proposed method.

Print ISSN: 1088-467X

Electronic ISSN: 1571-4128

Topics: Computer Science

Published by IOS Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

5

Unknown

GOW-Stream: A novel approach of graph-of-words based mixture model for semantic-enhanced text stream clustering (2021)

Vo, Tham ; Do, Phuc

IOS Press

In: Intelligent Data Analysis . 2021; 25(5): 1211-1231. Published 2021 Sep 15. doi: 10.3233/ida-205443.

add to mindlist on the mindlist

Details

Publication Date: 2021-09-15

Description: Recently, rapid growth of social networks and online news resources from Internet have made text stream clustering become an insufficient application in multiple domains (e.g.: text retrieval diversification, social event detection, text summarization, etc.) Different from traditional static text clustering approach, text stream clustering task has specific key challenges related to the rapid change of topics/clusters and high-velocity of coming streaming document batches. Recent well-known model-based text stream clustering models, such as: DTM, DCT, MStream, etc. are considered as word-independent evaluation approach which means largely ignoring the relations between words while sampling clusters/topics. It definitely leads to the decrease of overall model accuracy performance, especially for short-length text documents such as comments, microblogs, etc. in social networks. To tackle these existing problems, in this paper we propose a novel approach of graph-of-words (GOWs) based text stream clustering, called GOW-Stream. The application of common GOWs which are generated from each document batch while sampling clusters/topics can support to overcome the word-independent evaluation challenge. Our proposed GOW-Stream is promising to significantly achieve better text stream clustering performance than recent state-of-the-art baselines. Extensive experiments on multiple benchmark real-world datasets demonstrate the effectiveness of our proposed model in both accuracy and time-consuming performances.

Print ISSN: 1088-467X

Electronic ISSN: 1571-4128

Topics: Computer Science

Published by IOS Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

6

Unknown

ADAW: Age decay accuracy weighted ensemble method for drifting data stream mining (2021)

Srivastava, Ritesh ; Mittal, Veena

IOS Press

In: Intelligent Data Analysis . 2021; 25(5): 1131-1152. Published 2021 Sep 15. doi: 10.3233/ida-205249.

add to mindlist on the mindlist

Details

Publication Date: 2021-09-15

Description: Dynamic environment data generators are very often in real-world that produce data streams. A data source of a dynamic environment generates data streams in which the underlying data distribution changes very frequently with respect to time and hence results in concept drifts. As compared to the stationary environment, learning in the dynamic environment is very difficult due to the presence of concept drifts. Learning in dynamic environment requires evolutionary and adaptive approaches to be accommodated with the learning algorithms. Ensemble methods are commonly used to build classifiers for learning in a dynamic environment. The ensemble methods of learning are generally described at three very crucial aspects, namely, the learning and testing method employed, result integration method and forgetting mechanism for old concepts. In this paper, we propose a novel approach called Age Decay Accuracy Weighted (ADAW) ensemble architecture for learning in concept drifting data streams. The ADAW method assigned weights to the component classifiers based on its accuracy and its remaining life-time in the ensemble is such a way that ensures maximum accuracy. We empirically evaluated ADAW on benchmark artificial drifting data stream generators and real datasets and compared its performance with ten well-known state-of-the-art existing methods. The experimental results show that ADAW outperforms over the existing methods.

Print ISSN: 1088-467X

Electronic ISSN: 1571-4128

Topics: Computer Science

Published by IOS Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

7

Unknown

Biogc: A novel framework for biological network classification via machine learning (2021)

Li, Bentian ; Pi, Dechang ; Lin, Yunxia ; [et al.]

IOS Press

In: Intelligent Data Analysis . 2021; 25(5): 1153-1168. Published 2021 Sep 15. doi: 10.3233/ida-205240.

add to mindlist on the mindlist

Details

Publication Date: 2021-09-15

Description: Biological network classification is an eminently challenging task in the domain of data mining since the networks contain complex structural information. Conventional biochemical experimental methods and the existing intelligent algorithms still suffer from some limitations such as immense experimental cost and inferior accuracy rate. To solve these problems, in this paper, we propose a novel framework for Biological graph classification named Biogc, which is specifically developed to predict the label of both small-scale and large-scale biological network data flexibly and efficiently. Our framework firstly presents a simplified graph kernel method to capture the structural information of each graph. Then, the obtained informative features are adopted to train different scale biological network data-oriented classifiers to construct the prediction model. Extensive experiments on five benchmark biological network datasets on graph classification task show that the proposed model Biogc outperforms the state-of-the-art methods with an accuracy rate of 98.90% on a larger dataset and 99.32% on a smaller dataset.

Print ISSN: 1088-467X

Electronic ISSN: 1571-4128

Topics: Computer Science

Published by IOS Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

8

Unknown

Location prediction for facility placement by incorporating multi-characteristic information (2021)

Wang, Pu ; Chen, Wei ; Huang, Jinjing ; [et al.]

IOS Press

In: Intelligent Data Analysis . 2021; 25(5): 1187-1210. Published 2021 Sep 15. doi: 10.3233/ida-205420.

add to mindlist on the mindlist

Details

Publication Date: 2021-09-15

Description: In the course of recommending locations for establishing new facilities on urban planning or commercial programming, the location prediction offers the optimal candidates, which maximizes the number of served customers or minimize customer inconvenience, therefore brings the maximum profits. In most existing studies, only the spatial-temporal features are recognized to evaluate the location popularity, where social relationships of customers, which are significant factors for popularity assessing, have been ignored. Additionally, current researches also fail to take capacities and categories of the facilities into consideration. To overcome the drawbacks, we introduce a novel model of Multi-characteristic Information based Top-k Location Prediction (MITLP), it captures the spatio-temporal behaviors of customers based on historical trajectories, exploits the social relevancy from their friend relationships, as well as examines the category competitiveness of specific facilities thoroughly. Subsequently, by drawing on the feature evaluation and popularity quantization, MITLP will be implemented within a hybrid B-tree-liked recommending framework, Constrained Location and Social-Trajectory Clustered forest (CLSTC-forest), which can not only produce better performance in practice but also address the facility service constraints. Finally, extensive experiments conducted on real-world datasets demonstrate the higher efficiency and effectiveness of the proposed model.

Print ISSN: 1088-467X

Electronic ISSN: 1571-4128

Topics: Computer Science

Published by IOS Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

9

Unknown

A two-stage clustering-based cold-start method for active learning (2021)

He, Deniu ; Yu, Hong ; Wang, Guoyin ; [et al.]

IOS Press

In: Intelligent Data Analysis . 2021; 25(5): 1169-1185. Published 2021 Sep 15. doi: 10.3233/ida-205393.

add to mindlist on the mindlist

Details

Publication Date: 2021-09-15

Description: The problem of initialization of active learning is considered in this paper. Especially, this paper studies the problem in an imbalanced data scenario, which is called as class-imbalance active learning cold-start. The novel method is two-stage clustering-based active learning cold-start (ALCS). In the first stage, to separate the instances of minority class from that of majority class, a multi-center clustering is constructed based on a new inter-cluster tightness measure, thus the data is grouped into multiple clusters. Then, in the second stage, the initial training instances are selected from each cluster based on an adaptive candidate representative instances determination mechanism and a clusters-cyclic instance query mechanism. The comprehensive experiments demonstrate the effectiveness of the proposed method from the aspects of class coverage, classification performance, and impact on active learning.

Print ISSN: 1088-467X

Electronic ISSN: 1571-4128

Topics: Computer Science

Published by IOS Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

10

Unknown

A non-binary hierarchical tree overlapping community detection based on multi-dimensional similarity (2021)

Chen, Jie ; Wang, Huijun ; Zhao, Shu ; [et al.]

IOS Press

In: Intelligent Data Analysis . 2021; 25(5): 1099-1113. Published 2021 Sep 15. doi: 10.3233/ida-205418.

add to mindlist on the mindlist

Details

Publication Date: 2021-09-15

Description: Overlapping communities exist in real networks, where the communities represent hierarchical community structures, such as schools and government departments. A non-binary tree allows a vertex to belong to multiple communities to obtain a more realistic overlapping community structure. It is challenging to select appropriate leaf vertices and construct a hierarchical tree that considers a large amount of structural information. In this paper, we propose a non-binary hierarchical tree overlapping community detection based on multi-dimensional similarity. The multi-dimensional similarity fully considers the local structure characteristics between vertices to calculate the similarity between vertices. First, we construct a similarity matrix based on the first and second-order neighbor vertices and select a leaf vertex. Second, we expand the leaf vertex based on the principle of maximum community density and construct a non-binary tree. Finally, we choose the layer with the largest overlapping modularity as the result of community division. Experiments on real-world networks demonstrate that our proposed algorithm is superior to other representative algorithms in terms of the quality of overlapping community detection.

Print ISSN: 1088-467X

Electronic ISSN: 1571-4128

Topics: Computer Science

Published by IOS Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext