ALBERT — All Library Books, journals and Electronic Records Telegrafenberg

1

Unknown

Generalized Dirichlet priors for Naïve Bayesian classifiers with multinomial models in document classification (2012)

Springer

In: Data Mining and Knowledge Discovery

add to mindlist on the mindlist

Details

Publication Date: 2012-11-10

Description: The generalized Dirichlet distribution has been shown to be a more appropriate prior than the Dirichlet distribution for naïve Bayesian classifiers. When the dimension of a generalized Dirichlet random vector is large, the computational effort for calculating the expected value of a random variable can be high. In document classification, the number of distinct words that is the dimension of a prior for naïve Bayesian classifiers is generally more than ten thousand. Generalized Dirichlet priors can therefore be inapplicable for document classification from the viewpoint of computational efficiency. In this paper, some properties of the generalized Dirichlet distribution are established to accelerate the calculation of the expected values of random variables. Those properties are then used to construct noninformative generalized Dirichlet priors for naïve Bayesian classifiers with multinomial models. Our experimental results on two document sets show that generalized Dirichlet priors can achieve a significantly higher prediction accuracy and that the computational efficiency of naïve Bayesian classifiers is preserved. Content Type Journal Article Pages 1-22 DOI 10.1007/s10618-012-0296-4 Authors Tzu-Tsung Wong, Institute of Information Management, National Cheng Kung University, 1, Ta-Sheuh Road, Tainan, 701 Taiwan, ROC Journal Data Mining and Knowledge Discovery Online ISSN 1573-756X Print ISSN 1384-5810

Print ISSN: 1384-5810

Electronic ISSN: 1573-756X

Topics: Computer Science

Published by Springer

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

2

Unknown

The effect of homogeneity on the computational complexity of combinatorial data anonymization (2012)

Springer

In: Data Mining and Knowledge Discovery

add to mindlist on the mindlist

Details

Publication Date: 2012-10-16

Description: A matrix M is said to be k -anonymous if for each row r in M there are at least k − 1 other rows in M which are identical to r . The NP-hard k - Anonymity problem asks, given an n × m -matrix M over a fixed alphabet and an integer s 〉 0, whether M can be made k -anonymous by suppressing (blanking out) at most s entries. Complementing previous work, we introduce two new “data-driven” parameterizations for k - Anonymity —the number t in of different input rows and the number t out of different output rows—both modeling aspects of data homogeneity. We show that k - Anonymity is fixed-parameter tractable for the parameter t in , and that it is NP-hard even for t out = 2 and alphabet size four. Notably, our fixed-parameter tractability result implies that k - Anonymity can be solved in linear time when t in is a constant. Our computational hardness results also extend to the related privacy problems p - Sensitivity and ℓ - Diversity , while our fixed-parameter tractability results extend to p - Sensitivity and the usage of domain generalization hierarchies, where the entries are replaced by more general data instead of being completely suppressed. Content Type Journal Article Pages 1-27 DOI 10.1007/s10618-012-0293-7 Authors Robert Bredereck, Institut für Softwaretechnik und Theoretische Informatik, TU Berlin, Berlin, Germany André Nichterlein, Institut für Softwaretechnik und Theoretische Informatik, TU Berlin, Berlin, Germany Rolf Niedermeier, Institut für Softwaretechnik und Theoretische Informatik, TU Berlin, Berlin, Germany Geevarghese Philip, Max-Planck-Institut für Informatik, Saarbrücken, Germany Journal Data Mining and Knowledge Discovery Online ISSN 1573-756X Print ISSN 1384-5810

Print ISSN: 1384-5810

Electronic ISSN: 1573-756X

Topics: Computer Science

Published by Springer

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

3

Unknown

A regularized graph layout framework for dynamic network visualization (2012)

Springer

In: Data Mining and Knowledge Discovery

add to mindlist on the mindlist

Details

Publication Date: 2012-09-03

Description: Many real-world networks, including social and information networks, are dynamic structures that evolve over time. Such dynamic networks are typically visualized using a sequence of static graph layouts. In addition to providing a visual representation of the network structure at each time step, the sequence should preserve the mental map between layouts of consecutive time steps to allow a human to interpret the temporal evolution of the network. In this paper, we propose a framework for dynamic network visualization in the on-line setting where only present and past graph snapshots are available to create the present layout. The proposed framework creates regularized graph layouts by augmenting the cost function of a static graph layout algorithm with a grouping penalty , which discourages nodes from deviating too far from other nodes belonging to the same group, and a temporal penalty , which discourages large node movements between consecutive time steps. The penalties increase the stability of the layout sequence, thus preserving the mental map. We introduce two dynamic layout algorithms within the proposed framework, namely dynamic multidimensional scaling and dynamic graph Laplacian layout. We apply these algorithms on several data sets to illustrate the importance of both grouping and temporal regularization for producing interpretable visualizations of dynamic networks. Content Type Journal Article Pages 1-33 DOI 10.1007/s10618-012-0286-6 Authors Kevin S. Xu, EECS Department, University of Michigan, 1301 Beal Avenue, Ann Arbor, MI 48109-2122, USA Mark Kliger, Omek Interactive, Ltd., Beit Shemesh, Israel Alfred O. Hero III, EECS Department, University of Michigan, 1301 Beal Avenue, Ann Arbor, MI 48109-2122, USA Journal Data Mining and Knowledge Discovery Online ISSN 1573-756X Print ISSN 1384-5810

Print ISSN: 1384-5810

Electronic ISSN: 1573-756X

Topics: Computer Science

Published by Springer

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

4

Unknown

Discovery of extreme events-related communities in contrasting groups of physical system networks (2012)

Springer

In: Data Mining and Knowledge Discovery

add to mindlist on the mindlist

Details

Publication Date: 2012-09-03

Description: The latent behavior of a physical system that can exhibit extreme events such as hurricanes or rainfalls, is complex. Recently, a very promising means for studying complex systems has emerged through the concept of complex networks. Networks representing relationships between individual objects usually exhibit community dynamics. Conventional community detection methods mainly focus on either mining frequent subgraphs in a network or detecting stable communities in time-varying networks. In this paper, we formulate a novel problem— detection of predictive and phase-biased communities in contrasting groups of networks , and propose an efficient and effective machine learning solution for finding such anomalous communities. We build different groups of networks corresponding to different system’s phases, such as higher or low hurricane activity, discover phase-related system components as seeds to help bound the search space of community generation in each network, and use the proposed contrast-based technique to identify the changing communities across different groups. The detected anomalous communities are hypothesized (1) to play an important role in defining the target system’s state(s) and (2) to improve the predictive skill of the system’s states when used collectively in the ensemble of predictive models. When tested on the two important extreme event problems—identification of tropical cyclone-related and of African Sahel rainfall-related climate indices—our algorithm demonstrated the superior performance in terms of various skill and robustness metrics, including 8–16 % accuracy increase, as well as physical interpretability of detected communities. The experimental results also show the efficiency of our algorithm on synthetic datasets. Content Type Journal Article Pages 1-34 DOI 10.1007/s10618-012-0289-3 Authors Zhengzhang Chen, North Carolina State University, Raleigh, NC 27695, USA William Hendrix, North Carolina State University, Raleigh, NC 27695, USA Hang Guan, Zhejiang University, Hangzhou, 31000 Zhejiang, China Isaac K. Tetteh, North Carolina State University, Raleigh, NC 27695, USA Alok Choudhary, Northwestern University, Evanston, IL 60201, USA Fredrick Semazzi, North Carolina State University, Raleigh, NC 27695, USA Nagiza F. Samatova, North Carolina State University, Raleigh, NC 27695, USA Journal Data Mining and Knowledge Discovery Online ISSN 1573-756X Print ISSN 1384-5810

Print ISSN: 1384-5810

Electronic ISSN: 1573-756X

Topics: Computer Science

Published by Springer

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

5

Unknown

How to “alternatize” a clustering algorithm (2012)

Springer

In: Data Mining and Knowledge Discovery

add to mindlist on the mindlist

Details

Publication Date: 2012-09-03

Description: Given a clustering algorithm, how can we adapt it to find multiple, nonredundant, high-quality clusterings? We focus on algorithms based on vector quantization and describe a framework for automatic ‘alternatization’ of such algorithms. Our framework works in both simultaneous and sequential learning formulations and can mine an arbitrary number of alternative clusterings. We demonstrate its applicability to various clustering algorithms— k -means, spectral clustering, constrained clustering, and co-clustering—and effectiveness in mining a variety of datasets. Content Type Journal Article Pages 1-32 DOI 10.1007/s10618-012-0288-4 Authors M. Shahriar Hossain, Department of Mathematics and Computer Science, Virginia State University, 1 Hayden Drive, Petersburg, VA 23806, USA Naren Ramakrishnan, Department of Computer Science, Virginia Polytechnic Institute and State University, Blacksburg, VA 24061, USA Ian Davidson, Department of Computer Science, University of California, Davis, CA 95616, USA Layne T. Watson, Department of Computer Science, Virginia Polytechnic Institute and State University, Blacksburg, VA 24061, USA Journal Data Mining and Knowledge Discovery Online ISSN 1573-756X Print ISSN 1384-5810

Print ISSN: 1384-5810

Electronic ISSN: 1573-756X

Topics: Computer Science

Published by Springer

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

6

Unknown

A visual analytics framework for spatio-temporal analysis and modelling (2012)

Springer

In: Data Mining and Knowledge Discovery

add to mindlist on the mindlist

Details

Publication Date: 2012-08-27

Description: To support analysis and modelling of large amounts of spatio-temporal data having the form of spatially referenced time series (TS) of numeric values, we combine interactive visual techniques with computational methods from machine learning and statistics. Clustering methods and interactive techniques are used to group TS by similarity. Statistical methods for TS modelling are then applied to representative TS derived from the groups of similar TS. The framework includes interactive visual interfaces to a library of modelling methods supporting the selection of a suitable method, adjustment of model parameters, and evaluation of the models obtained. The models can be externally stored, communicated, and used for prediction and in further computational analyses. From the visual analytics perspective, the framework suggests a way to externalize spatio-temporal patterns emerging in the mind of the analyst as a result of interactive visual analysis: the patterns are represented in the form of computer-processable and reusable models. From the statistical analysis perspective, the framework demonstrates how TS analysis and modelling can be supported by interactive visual interfaces, particularly, in a case of numerous TS that are hard to analyse individually. From the application perspective, the framework suggests a way to analyse large numbers of spatial TS with the use of well-established statistical methods for TS analysis. Content Type Journal Article Pages 1-29 DOI 10.1007/s10618-012-0285-7 Authors Natalia Andrienko, Fraunhofer Institute IAIS (Intelligent Analysis and Information Systems), Sankt Augustin, Germany Gennady Andrienko, Fraunhofer Institute IAIS (Intelligent Analysis and Information Systems), Sankt Augustin, Germany Journal Data Mining and Knowledge Discovery Online ISSN 1573-756X Print ISSN 1384-5810

Print ISSN: 1384-5810

Electronic ISSN: 1573-756X

Topics: Computer Science

Published by Springer

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

7

Unknown

On studying a 3D user interface for OLAP (2012)

Springer

In: Data Mining and Knowledge Discovery

add to mindlist on the mindlist

Details

Publication Date: 2012-07-09

Description: In this paper, a new visual and interactive user interface for OLAP is presented, and its strengths and weaknesses examined. A survey on 3D interfaces for OLAP is detailed, which shows that only one interface that uses Virtual Reality has been proposed. Then we present our approach: it consists of a 3D representation of OLAP cubes where many OLAP operators have been integrated and where several measures can be visualized. A 3D stereoscopic screen can be used in conjunction with a 3D mouse. Finally a user study is reported that compares standard dynamic cross-tables with our interface on different tasks. We conclude that 3D with stereoscopy is not as promising as expected even with recent 3D devices. Content Type Journal Article Pages 1-18 DOI 10.1007/s10618-012-0279-5 Authors Sébastien Lafon, Computer Science Laboratory, University François-Rabelais of Tours, 64 Avenue Jean Portalis, 37200 Tours, France Fatma Bouali, IUT, University of Lille 2, 25–27 Rue du Maréchal Foch, 59100 Roubaix, France Christiane Guinot, CE.R.I.E.S., 20 Rue Victor Noir, 92521 Neuilly-sur-Seine, France Gilles Venturini, Computer Science Laboratory, University François-Rabelais of Tours, 64 Avenue Jean Portalis, 37200 Tours, France Journal Data Mining and Knowledge Discovery Online ISSN 1573-756X Print ISSN 1384-5810

Print ISSN: 1384-5810

Electronic ISSN: 1573-756X

Topics: Computer Science

Published by Springer

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

8

Unknown

Fast projections onto mixed-norm balls with applications (2012)

Springer

In: Data Mining and Knowledge Discovery

add to mindlist on the mindlist

Details

Publication Date: 2012-07-09

Description: Joint sparsity offers powerful structural cues for feature selection, especially for variables that are expected to demonstrate a “grouped” behavior. Such behavior is commonly modeled via group-lasso, multitask lasso, and related methods where feature selection is effected via mixed-norms. Several mixed-norm based sparse models have received substantial attention, and for some cases efficient algorithms are also available. Surprisingly, several constrained sparse models seem to be lacking scalable algorithms. We address this deficiency by presenting batch and online (stochastic-gradient) optimization methods, both of which rely on efficient projections onto mixed-norm balls. We illustrate our methods by applying them to the multitask lasso. We conclude by mentioning some open problems. Content Type Journal Article Pages 1-20 DOI 10.1007/s10618-012-0277-7 Authors Suvrit Sra, Max Planck Institute for Intelligent Systems, Tübingen, Germany Journal Data Mining and Knowledge Discovery Online ISSN 1573-756X Print ISSN 1384-5810

Print ISSN: 1384-5810

Electronic ISSN: 1573-756X

Topics: Computer Science

Published by Springer

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

9

Unknown

Actively learning to infer social ties (2012)

Springer

In: Data Mining and Knowledge Discovery

add to mindlist on the mindlist

Details

Publication Date: 2012-07-09

Description: We study the extent to which social ties between people can be inferred in large social network, in particular via active user interactions. In most online social networks, relationships are lack of meaning labels (e.g., “colleague” and “intimate friends”) due to various reasons. Understanding the formation of different types of social relationships can provide us insights into the micro-level dynamics of the social network. In this work, we precisely define the problem of inferring social ties and propose a Partially-Labeled Pairwise Factor Graph Model (PLP-FGM) for learning to infer the type of social relationships. The model formalizes the problem of inferring social ties into a flexible semi-supervised framework. We test the model on three different genres of data sets and demonstrate its effectiveness. We further study how to leverage user interactions to help improve the inferring accuracy. Two active learning algorithms are proposed to actively select relationships to query users for their labels. Experimental results show that with only a few user corrections, the accuracy of inferring social ties can be significantly improved. Finally, to scale the model to handle real large networks, a distributed learning algorithm has been developed. Content Type Journal Article Pages 1-28 DOI 10.1007/s10618-012-0274-x Authors Honglei Zhuang, Department of Computer Science and Technology, Tsinghua University, Beijing, China Jie Tang, Department of Computer Science and Technology, Tsinghua University, Beijing, China Wenbin Tang, Department of Computer Science and Technology, Tsinghua University, Beijing, China Tiancheng Lou, Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing, China Alvin Chin, Nokia Research Center, Beijing, China Xia Wang, Nokia Research Center, Beijing, China Journal Data Mining and Knowledge Discovery Online ISSN 1573-756X Print ISSN 1384-5810

Print ISSN: 1384-5810

Electronic ISSN: 1573-756X

Topics: Computer Science

Published by Springer

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

10

Unknown

Simultaneous classification and community detection on heterogeneous network data (2012)

Springer

In: Data Mining and Knowledge Discovery

add to mindlist on the mindlist

Details

Publication Date: 2012-07-09

Description: Previous studies on network mining have focused primarily on learning a single task (such as classification or community detection) on a given network. This paper considers the problem of multi-task learning on heterogeneous network data. Specifically, we present a novel framework that enables one to perform classification on one network and community detection in another related network. Multi-task learning is accomplished by introducing a joint objective function that must be optimized to ensure the classes in one network are consistent with the link structure, nodal attributes, as well as the communities detected in another network. We provide both theoretical and empirical analysis of the framework. We also show that the framework can be extended to incorporate prior information about the correspondences between the clusters and classes in different networks. Experiments performed on both real-world and synthetic data sets demonstrate the effectiveness of the joint framework compared to applying classification and community detection algorithms on each network separately. Content Type Journal Article Pages 1-30 DOI 10.1007/s10618-012-0260-3 Authors Prakash Mandayam Comar, Department of Computer Science & Engineering, Michigan State University, East Lansing, MI, USA Pang-Ning Tan, Department of Computer Science & Engineering, Michigan State University, East Lansing, MI, USA Anil K. Jain, Department of Computer Science & Engineering, Michigan State University, East Lansing, MI, USA Journal Data Mining and Knowledge Discovery Online ISSN 1573-756X Print ISSN 1384-5810

Print ISSN: 1384-5810

Electronic ISSN: 1573-756X

Topics: Computer Science

Published by Springer

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext