ALBERT — All Library Books, journals and Electronic Records Telegrafenberg

1

Unknown

Missing value estimation for microarray data through cluster analysis (2017)

Springer

In: Knowledge and Information Systems

add to mindlist on the mindlist

Details

Publication Date: 2017-02-14

Description: Microarray datasets with missing values need to impute accurately before analyzing diseases. The proposed method first discretizes the samples and temporarily assigns a value in missing position of a gene by the mean value of all samples in the same class. The frequencies of each gene value in both types of samples for all genes are calculated separately and if the maximum frequency occurs for same expression value in both types, then the whole gene is entered into a subset; otherwise, each portion of the gene of respective sample type (i.e., normal or disease) is entered into two separate subsets. Thus, for each gene expression value, maximum three different clusters of genes are formed. Each gene subset is further partitioned into a stable number of clusters using proposed splitting and merging clustering algorithm that overcomes the weakness of Euclidian distance metric used in high-dimensional space. Finally, similarity between a gene with missing values and centroids of the clusters are measured and the missing values are estimated by corresponding expression values of a centroid having maximum similarity. The method is compared with various statistical, cluster-based and regression-based methods with respect to statistical and biological metrics using microarray datasets to measure its effectiveness.

Print ISSN: 0219-1377

Electronic ISSN: 0219-3116

Topics: Computer Science

Published by Springer

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

2

Unknown

Monitoring stealthy diffusion (2017)

Springer

In: Knowledge and Information Systems

add to mindlist on the mindlist

Details

Publication Date: 2017-02-14

Description: A broad variety of problems, such as targeted marketing and the spread of viruses and malware, have been modeled as maximizing the reach of diffusion through a network. In cyber-security applications, however, a key consideration largely ignored in this literature is stealth. In particular, an attacker who has a specific target in mind succeeds only if the target is reached before the malicious payload is detected and corresponding countermeasures deployed. The dual side of this problem is deployment of a limited number of monitoring units, such as cyber-forensics specialists, to limit the success of such targeted and stealthy diffusion processes. We investigate the problem of optimal monitoring of targeted stealthy diffusion processes. While natural variants of this problem are NP-hard, we show that if stealthy diffusion starts from randomly selected nodes, the defender’s objective is submodular and can be approximately optimized. In addition, we present approximation algorithms for the setting where the choice of the starting point is adversarial. We further extend our results to settings where the diffusion starts at multiple-seed nodes simultaneously, and where there is an inherent delay in detecting the infection. Our experimental results show that the proposed algorithms are highly effective and scalable.

Print ISSN: 0219-1377

Electronic ISSN: 0219-3116

Topics: Computer Science

Published by Springer

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

3

Unknown

The cascading neural network: building the Internet of Smart Things (2017)

Springer

In: Knowledge and Information Systems

add to mindlist on the mindlist

Details

Publication Date: 2017-02-18

Description: Most of the research on deep neural networks so far has been focused on obtaining higher accuracy levels by building increasingly large and deep architectures. Training and evaluating these models is only feasible when large amounts of resources such as processing power and memory are available. Typical applications that could benefit from these models are, however, executed on resource-constrained devices. Mobile devices such as smartphones already use deep learning techniques, but they often have to perform all processing on a remote cloud. We propose a new architecture called a cascading network that is capable of distributing a deep neural network between a local device and the cloud while keeping the required communication network traffic to a minimum. The network begins processing on the constrained device, and only relies on the remote part when the local part does not provide an accurate enough result. The cascading network allows for an early-stopping mechanism during the recall phase of the network. We evaluated our approach in an Internet of Things context where a deep neural network adds intelligence to a large amount of heterogeneous connected devices. This technique enables a whole variety of autonomous systems where sensors, actuators and computing nodes can work together. We show that the cascading architecture allows for a substantial improvement in evaluation speed on constrained devices while the loss in accuracy is kept to a minimum.

Print ISSN: 0219-1377

Electronic ISSN: 0219-3116

Topics: Computer Science

Published by Springer

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

4

Unknown

DeepAM: a heterogeneous deep learning framework for intelligent malware detection (2017)

Springer

In: Knowledge and Information Systems

add to mindlist on the mindlist

Details

Publication Date: 2017-05-10

Description: With computers and the Internet being essential in everyday life, malware poses serious and evolving threats to their security, making the detection of malware of utmost concern. Accordingly, there have been many researches on intelligent malware detection by applying data mining and machine learning techniques. Though great results have been achieved with these methods, most of them are built on shallow learning architectures. Due to its superior ability in feature learning through multilayer deep architecture, deep learning is starting to be leveraged in industrial and academic research for different applications. In this paper, based on the Windows application programming interface calls extracted from the portable executable files, we study how a deep learning architecture can be designed for intelligent malware detection. We propose a heterogeneous deep learning framework composed of an AutoEncoder stacked up with multilayer restricted Boltzmann machines and a layer of associative memory to detect newly unknown malware. The proposed deep learning model performs as a greedy layer-wise training operation for unsupervised feature learning, followed by supervised parameter fine-tuning. Different from the existing works which only made use of the files with class labels (either malicious or benign) during the training phase, we utilize both labeled and unlabeled file samples to pre-train multiple layers in the heterogeneous deep learning framework from bottom to up for feature learning. A comprehensive experimental study on a real and large file collection from Comodo Cloud Security Center is performed to compare various malware detection approaches. Promising experimental results demonstrate that our proposed deep learning framework can further improve the overall performance in malware detection compared with traditional shallow learning methods, deep learning methods with homogeneous framework, and other existing anti-malware scanners. The proposed heterogeneous deep learning framework can also be readily applied to other malware detection tasks.

Print ISSN: 0219-1377

Electronic ISSN: 0219-3116

Topics: Computer Science

Published by Springer

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

5

Unknown

STEM: a suffix tree-based method for web data records extraction (2017)

Springer

In: Knowledge and Information Systems

add to mindlist on the mindlist

Details

Publication Date: 2017-05-10

Description: To automatically extract data records from Web pages, the data record extraction algorithm is required to be robust and efficient. However, most of existing algorithms are not robust enough to cope with rich information or noisy data. In this paper, we propose a novel suffix tree-based extraction method (STEM) for this challenging task. First, we extract a sequence of identifiers from the tag paths of Web pages. Then, a suffix tree is built on top of this sequence and four refining filters are proposed to screen out data regions that might not contain data records. To evaluate model performance, we define an evaluation metric called pattern similarity and perform rigorous experiments on five real data sets. The promising experimental results have demonstrated that the proposed STEM is superior to the state-of-the-art algorithms like MDR, TPC and CTVS with respect to precision, recall and pattern similarity. Moreover, the time complexity of STEM is linear to the total number of HTML tags contained in Web pages, which indicates the potential applicability of STEM in a wide range of Web-scale data record extraction applications.

Print ISSN: 0219-1377

Electronic ISSN: 0219-3116

Topics: Computer Science

Published by Springer

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

6

Unknown

Event stream-based process discovery using abstract representations (2017)

Springer

In: Knowledge and Information Systems

add to mindlist on the mindlist

Details

Publication Date: 2017-05-17

Description: The aim of process discovery, originating from the area of process mining, is to discover a process model based on business process execution data. A majority of process discovery techniques relies on an event log as an input. An event log is a static source of historical data capturing the execution of a business process. In this paper, we focus on process discovery relying on online streams of business process execution events. Learning process models from event streams poses both challenges and opportunities, i.e. we need to handle unlimited amounts of data using finite memory and, preferably, constant time. We propose a generic architecture that allows for adopting several classes of existing process discovery techniques in context of event streams. Moreover, we provide several instantiations of the architecture, accompanied by implementations in the process mining toolkit ProM ( http://promtools.org ). Using these instantiations, we evaluate several dimensions of stream-based process discovery. The evaluation shows that the proposed architecture allows us to lift process discovery to the streaming domain.

Print ISSN: 0219-1377

Electronic ISSN: 0219-3116

Topics: Computer Science

Published by Springer

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

7

Unknown

Document-level sentiment classification using hybrid machine learning approach (2017)

Springer

In: Knowledge and Information Systems

add to mindlist on the mindlist

Details

Publication Date: 2017-05-11

Description: It is a practice that users or customers intend to share their comments or reviews about any product in different social networking sites. An analyst usually processes to reviews properly to obtain any meaningful information from it. Classification of sentiments associated with reviews is one of these processing steps. The reviews framed are often made in text format. While processing the text reviews, each word of the review is considered as a feature. Thus, selection of right kind of features needs to be carried out to select the best feature from the set of all features. In this paper, the machine learning algorithm, i.e., support vector machine, is used to select the best features from the training data. These features are then given input to artificial neural network method, to process further. Different performance evaluation parameters such as precision, recall, f-measure, accuracy have been considered to evaluate the performance of the proposed approach on two different datasets, i.e., IMDb dataset and polarity dataset.

Print ISSN: 0219-1377

Electronic ISSN: 0219-3116

Topics: Computer Science

Published by Springer

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

8

Unknown

A novel classifier ensemble approach for financial distress prediction (2017)

Springer

In: Knowledge and Information Systems

add to mindlist on the mindlist

Details

Publication Date: 2017-05-13

Description: Financial distress prediction is very important to financial institutions who must be able to make critical decisions regarding customer loans. Bankruptcy prediction and credit scoring are the two main aspects considered in financial distress prediction. To assist in this determination, thereby lowering the risk borne by the financial institution, it is necessary to develop effective prediction models for prediction of the likelihood of bankruptcy and estimation of credit risk. A number of financial distress prediction models have been constructed, which utilize various machine learning techniques, such as single classifiers and classifier ensembles, but improving the prediction accuracy is the major research issue. In addition, aside from improving the prediction accuracy, there have been very few studies that specifically consider lowering the Type I error. In practice, Type I errors need to receive careful consideration during model construction because they can affect the cost to the financial institution. In this study, we introduce a classifier ensemble approach designed to reduce the misclassification cost. The outputs produced by multiple classifiers are combined by utilizing the unanimous voting (UV) method to find the final prediction result. Experimental results obtained based on four relevant datasets show that our UV ensemble approach outperforms the baseline single classifiers and classifier ensembles. Specifically, the UV ensemble not only provides relatively good prediction accuracy and minimizes Type I/II errors, but also produces the smallest misclassification cost.

Print ISSN: 0219-1377

Electronic ISSN: 0219-3116

Topics: Computer Science

Published by Springer

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

9

Unknown

Community-preserving anonymization of graphs (2017)

Springer

In: Knowledge and Information Systems

add to mindlist on the mindlist

Details

Publication Date: 2017-05-18

Description: In this paper, we propose a novel edge modification technique that better preserves the communities of a graph while anonymizing it. By maintaining the core number sequence of a graph, its coreness , we retain most of the information contained in the network while allowing changes in the degree sequence, i. e. obfuscating the visible data an attacker has access to. We reach a better trade-off between data privacy and data utility than with existing methods by capitalizing on the slack between apparent degree (node degree) and true degree (node core number). Our extensive experiments on six diverse standard network datasets support this claim. Our framework compares our method to other that are used as proxies for privacy protection in the relevant literature. We demonstrate that our method leads to higher data utility preservation, especially in clustering, for the same levels of randomization and k -anonymity.

Print ISSN: 0219-1377

Electronic ISSN: 0219-3116

Topics: Computer Science

Published by Springer

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

10

Unknown

Recent advances in feature selection and its applications (2017)

Springer

In: Knowledge and Information Systems

add to mindlist on the mindlist

Details

Publication Date: 2017-05-07

Description: Feature selection is one of the key problems for machine learning and data mining. In this review paper, a brief historical background of the field is given, followed by a selection of challenges which are of particular current interests, such as feature selection for high-dimensional small sample size data, large-scale data, and secure feature selection. Along with these challenges, some hot topics for feature selection have emerged, e.g., stable feature selection, multi-view feature selection, distributed feature selection, multi-label feature selection, online feature selection, and adversarial feature selection. Then, the recent advances of these topics are surveyed in this paper. For each topic, the existing problems are analyzed, and then, current solutions to these problems are presented and discussed. Besides the topics, some representative applications of feature selection are also introduced, such as applications in bioinformatics, social media, and multimedia retrieval.

Print ISSN: 0219-1377

Electronic ISSN: 0219-3116

Topics: Computer Science

Published by Springer

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext