ALBERT

All Library Books, journals and Electronic Records Telegrafenberg

Ihre E-Mail wurde erfolgreich gesendet. Bitte prüfen Sie Ihren Maileingang.

Leider ist ein Fehler beim E-Mail-Versand aufgetreten. Bitte versuchen Sie es erneut.

Vorgang fortführen?

Exportieren
Filter
  • Artikel  (5.184)
  • IEEE Transactions on Knowledge and Data Engineering  (1.300)
  • Algorithmica  (744)
  • 1274
  • 825
  • Informatik  (5.184)
  • 1
    Publikationsdatum: 2020-07-02
    Print ISSN: 0178-4617
    Digitale ISSN: 1432-0541
    Thema: Informatik , Mathematik
    Publiziert von Springer
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 2
    Publikationsdatum: 2020-07-15
    Beschreibung: It is generally accepted that populations are useful for the global exploration of multi-modal optimisation problems. Indeed, several theoretical results are available showing such advantages over single-trajectory search heuristics. In this paper we provide evidence that evolving populations via crossover and mutation may also benefit the optimisation time for hillclimbing unimodal functions. In particular, we prove bounds on the expected runtime of the standard ($$mu +1$$ μ + 1 ) GA for OneMax that are lower than its unary black box complexity and decrease in the leading constant with the population size up to $$mu =oleft( sqrt{log n} ight) $$ μ = o log n . Our analysis suggests that the optimal mutation strategy is to flip two bits most of the time. To achieve the results we provide two interesting contributions to the theory of randomised search heuristics: (1) A novel application of drift analysis which compares absorption times of different Markov chains without defining an explicit potential function. (2) The inversion of fundamental matrices to calculate the absorption times of the Markov chains. The latter strategy was previously proposed in the literature but to the best of our knowledge this is the first time is has been used to show non-trivial bounds on expected runtimes.
    Print ISSN: 0178-4617
    Digitale ISSN: 1432-0541
    Thema: Informatik , Mathematik
    Publiziert von Springer
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 3
    Publikationsdatum: 2020-07-15
    Beschreibung: Given two strings S and T, each of length at most n, the longest common substring (LCS) problem is to find a longest substring common to S and T. This is a classical problem in computer science with an $$mathcal {O}(n)$$ O ( n ) -time solution. In the fully dynamic setting, edit operations are allowed in either of the two strings, and the problem is to find an LCS after each edit. We present the first solution to the fully dynamic LCS problem requiring sublinear time in n per edit operation. In particular, we show how to find an LCS after each edit operation in $$ilde{mathcal {O}}(n^{2/3})$$ O ~ ( n 2 / 3 ) time, after $$ilde{mathcal {O}}(n)$$ O ~ ( n ) -time and space preprocessing. This line of research has been recently initiated in a somewhat restricted dynamic variant by Amir et al. [SPIRE 2017]. More specifically, the authors presented an $$ilde{mathcal {O}}(n)$$ O ~ ( n ) -sized data structure that returns an LCS of the two strings after a single edit operation (that is reverted afterwards) in $$ilde{mathcal {O}}(1)$$ O ~ ( 1 ) time. At CPM 2018, three papers (Abedin et al., Funakoshi et al., and Urabe et al.) studied analogously restricted dynamic variants of problems on strings; specifically, computing the longest palindrome and the Lyndon factorization of a string after a single edit operation. We develop dynamic sublinear-time algorithms for both of these problems as well. We also consider internal LCS queries, that is, queries in which we are to return an LCS of a pair of substrings of S and T. We show that answering such queries is hard in general and propose efficient data structures for several restricted cases.
    Print ISSN: 0178-4617
    Digitale ISSN: 1432-0541
    Thema: Informatik , Mathematik
    Publiziert von Springer
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 4
    Publikationsdatum: 2020-07-10
    Print ISSN: 0178-4617
    Digitale ISSN: 1432-0541
    Thema: Informatik , Mathematik
    Publiziert von Springer
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 5
    facet.materialart.
    Unbekannt
    Springer
    Publikationsdatum: 2015-08-13
    Beschreibung: In the classic k -center problem, we are given a metric graph, and the objective is to select k nodes as centers such that the maximum distance from any vertex to its closest center is minimized. In this paper, we consider two important generalizations of k -center, the matroid center problem and the knapsack center problem. Both problems are motivated by recent content distribution network applications. Our contributions can be summarized as follows: (1) We consider the matroid center problem in which the centers are required to form an independent set of a given matroid. We show this problem is NP-hard even on a line. We present a 3-approximation algorithm for the problem on general metrics. We also consider the outlier version of the problem where a given number of vertices can be excluded as outliers from the solution. We present a 7-approximation for the outlier version. (2) We consider the (multi-)knapsack center problem in which the centers are required to satisfy one (or more) knapsack constraint(s). It is known that the knapsack center problem with a single knapsack constraint admits a 3-approximation. However, when there are at least two knapsack constraints, we show this problem is not approximable at all. To complement the hardness result, we present a polynomial time algorithm that gives a 3-approximate solution such that one knapsack constraint is satisfied and the others may be violated by at most a factor of \(1+\epsilon \) . We also obtain a 3-approximation for the outlier version that may violate the knapsack constraint by \(1+\epsilon \) .
    Print ISSN: 0178-4617
    Digitale ISSN: 1432-0541
    Thema: Informatik , Mathematik
    Publiziert von Springer
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 6
    facet.materialart.
    Unbekannt
    Springer
    Publikationsdatum: 2015-08-13
    Beschreibung: We design a succinct data structure for representing a poset that, given two elements, can report whether one precedes the other in constant time. This is equivalent to succinctly representing the transitive closure graph of the poset, and we note that the same method can also be used to succinctly represent the transitive reduction graph. For an n element poset, the data structure occupies \(n^2/4 + o(n^2)\) bits in the worst case. Furthermore, a slight extension to this data structure yields a succinct oracle for reachability in arbitrary directed graphs. Thus, using no more than a quarter of the space required to represent an arbitrary directed graph, reachability queries can be supported in constant time. We also consider the operation of listing all the successors or predecessors of a given element, and show how to do this in constant time per element reported using a slightly modified version of our succinct data structure.
    Print ISSN: 0178-4617
    Digitale ISSN: 1432-0541
    Thema: Informatik , Mathematik
    Publiziert von Springer
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 7
    facet.materialart.
    Unbekannt
    Springer
    Publikationsdatum: 2015-08-14
    Beschreibung: We consider a graph observability problem: how many edge colors are needed for an unlabeled graph so that an agent, walking from node to node, can uniquely determine its location from just the observed color sequence of the walk? Specifically, let G ( n ,  d ) be an edge-colored subgraph of d -dimensional (directed or undirected) lattice of size \(n^d = n \times n \times \cdots \times n\) . We say that G ( n ,  d ) is t -observable if an agent can uniquely determine its current position in the graph from the color sequence of any t -dimensional walk, where the dimension is the number of different directions spanned by the edges of the walk. A walk in an undirected lattice G ( n ,  d ) has dimension between 1 and d , but a directed walk can have dimension between 1 and 2 d because of two different orientations for each axis. We derive bounds on the number of colors needed for t -observability. Our main result is that \(\varTheta (n^{d/t})\) colors are both necessary and sufficient for t -observability of G ( n ,  d ), where d is considered a constant. This shows an interesting dependence of graph observability on the ratio between the dimension of the lattice and that of the walk. In particular, the number of colors for full-dimensional walks is \(\varTheta (n^{1/2})\) in the directed case, and \(\varTheta (n)\) in the undirected case, independent of the lattice dimension. All of our results extend easily to non-square lattices: given a lattice graph of size \(N = n_1 \times n_2 \times \cdots \times n_d\) , the number of colors for t -observability is \(\varTheta (\root t \of {N})\) .
    Print ISSN: 0178-4617
    Digitale ISSN: 1432-0541
    Thema: Informatik , Mathematik
    Publiziert von Springer
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 8
    facet.materialart.
    Unbekannt
    Springer
    Publikationsdatum: 2015-08-04
    Beschreibung: We revisit the matrix problems sparse null space and matrix sparsification , and show that they are equivalent. We then proceed to seek algorithms for these problems: we prove the hardness of approximation of these problems, and also give a powerful tool to extend algorithms and heuristics for sparse approximation theory to these problems.
    Print ISSN: 0178-4617
    Digitale ISSN: 1432-0541
    Thema: Informatik , Mathematik
    Publiziert von Springer
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 9
    facet.materialart.
    Unbekannt
    Springer
    Publikationsdatum: 2015-08-04
    Beschreibung: We introduce Planar Disjoint Paths Completion , a completion counterpart of the Disjoint Paths problem, and study its parameterized complexity. The problem can be stated as follows: given a, not necessarily connected, plane graph G ,  k pairs of terminals, and a face F of G ,  find a minimum-size set of edges, if one exists, to be added inside F so that the embedding remains planar and the pairs become connected by k disjoint paths in the augmented network. Our results are twofold: first, we give an upper bound on the number of necessary additional edges when a solution exists. This bound is a function of k , independent of the size of G . Second, we show that the problem is fixed-parameter tractable, in particular, it can be solved in time \(f(k)\cdot n^{2}\) .
    Print ISSN: 0178-4617
    Digitale ISSN: 1432-0541
    Thema: Informatik , Mathematik
    Publiziert von Springer
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 10
    facet.materialart.
    Unbekannt
    Institute of Electrical and Electronics Engineers (IEEE)
    Publikationsdatum: 2015-08-07
    Beschreibung: Given a database table with records that can be ranked, an interesting problem is to identify selection conditions for the table, which are qualified by an input record and render its ranking as high as possible among the qualifying tuples. In this paper, we study this standing maximization problem, which finds application in object promotion and characterization. After showing the hardness of the problem, we propose greedy methods, which are experimentally shown to achieve high accuracy compared to exhaustive enumeration, while scaling very well to the problem input size. Our contributions include a linear-time algorithm for determining the optimal selection range for an ordinal attribute and techniques for choosing and prioritizing the most promising selection predicates to apply. Experiments on real datasets confirm the effectiveness and efficiency of our techniques.
    Print ISSN: 1041-4347
    Digitale ISSN: 1558-2191
    Thema: Informatik
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 11
    facet.materialart.
    Unbekannt
    Institute of Electrical and Electronics Engineers (IEEE)
    Publikationsdatum: 2015-08-07
    Beschreibung: Some fairly recent research has focused on providing XACML-based solutions for dynamic privacy policy management. In this regard, a number of works have provided enhancements to the performance of XACML policy enforcement point (PEP) component, but very few have focused on enhancing the accuracy of that component. This paper improves the accuracy of an XACML PEP by filling some gaps in the existing works. In particular, dynamically incorporating user access context into the privacy policy decision, and its enforcement. We provide an XACML-based implementation of a dynamic privacy policy management framework and an evaluation of the applicability of our system in comparison to some of the existing approaches.
    Print ISSN: 1041-4347
    Digitale ISSN: 1558-2191
    Thema: Informatik
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 12
    facet.materialart.
    Unbekannt
    Institute of Electrical and Electronics Engineers (IEEE)
    Publikationsdatum: 2015-08-07
    Beschreibung: This paper first introduces pattern aided regression (PXR) models, a new type of regression models designed to represent accurate and interpretable prediction models. This was motivated by two observations: (1) Regression modeling applications often involve complex diverse predictor-response relationships , which occur when the optimal regression models (of given regression model type) fitting two or more distinct logical groups of data are highly different. (2) State-of-the-art regression methods are often unable to adequately model such relationships. This paper defines PXR models using several patterns and local regression models, which respectively serve as logical and behavioral characterizations of distinct predictor-response relationships. The paper also introduces a contrast pattern aided regression (CPXR) method, to build accurate PXR models. In experiments, the PXR models built by CPXR are very accurate in general, often outperforming state-of-the-art regression methods by big margins. Usually using (a) around seven simple patterns and (b) linear local regression models, those PXR models are easy to interpret; in fact, their complexity is just a bit higher than that of (piecewise) linear regression models and is significantly lower than that of traditional ensemble based regression models. CPXR is especially effective for high-dimensional data. The paper also discusses how to use CPXR methodology for analyzing prediction models and correcting their prediction errors.
    Print ISSN: 1041-4347
    Digitale ISSN: 1558-2191
    Thema: Informatik
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 13
    facet.materialart.
    Unbekannt
    Institute of Electrical and Electronics Engineers (IEEE)
    Publikationsdatum: 2015-08-07
    Beschreibung: We analyze models for predicting the probability of a strikeout for a batter/pitcher matchup in baseball using player descriptors that can be estimated accurately from small samples. We start with the log5 model which has been used extensively for describing matchups in sports. Log5 is a special case of a logit model and we use constrained logistic regression over nearly one million matchup observations to assess the use of the log5 explanatory variables for this application. We also show that a batter/pitcher ground ball rate interaction variable is significant for the prediction of strikeout probability and we provide physical justification for the inclusion of this variable in the model. We quantify the differences among the models and show that batters control the majority of the variance in predicted strikeout rate.
    Print ISSN: 1041-4347
    Digitale ISSN: 1558-2191
    Thema: Informatik
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 14
    facet.materialart.
    Unbekannt
    Springer
    Publikationsdatum: 2015-08-22
    Beschreibung: We prove that in a graph with n vertices, induced chordal and interval subgraphs with the maximum number of vertices can be found in time \(\mathcal {O}(2^{\lambda n})\) for some \(\lambda 〈1\) . These are the first algorithms breaking the trivial \(2^n n^{\mathcal {O}(1)}\) bound of the brute-force search for these problems.
    Print ISSN: 0178-4617
    Digitale ISSN: 1432-0541
    Thema: Informatik , Mathematik
    Publiziert von Springer
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 15
    facet.materialart.
    Unbekannt
    Springer
    Publikationsdatum: 2015-08-25
    Beschreibung: We investigate the effect of limiting the number of reserve prices on the revenue in a probabilistic single item auction. In the model considered, bidders compete for an impression drawn from a known distribution of possible types. The auction mechanism sets up to \(\ell \) reserve prices, and each impression type is assigned the highest reserve price lower than the valuation of some bidder for it. The bidder proposing the highest bid for an arriving impression gets it provided his bid is at least the corresponding reserve price, and pays the maximum between the reserve price and the second highest bid. Since the number of impression types may be huge, we consider the revenue \(R_{\ell }\) that can be ensured using only \(\ell \) reserve prices. Our main results are tight lower bounds on \(R_{\ell }\) for the cases where the impressions are drawn from the uniform or a general probability distribution.
    Print ISSN: 0178-4617
    Digitale ISSN: 1432-0541
    Thema: Informatik , Mathematik
    Publiziert von Springer
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 16
    facet.materialart.
    Unbekannt
    Institute of Electrical and Electronics Engineers (IEEE)
    Publikationsdatum: 2015-08-07
    Beschreibung: Over the past decade or so, several research groups have addressed the problem of multi-label classification where each example can belong to more than one class at the same time. A common approach, called  Binary Relevance (BR) , addresses this problem by inducing a separate classifier for each class. Research has shown that this framework can be improved if mutual class dependence is exploited: an example that belongs to class $X$ is likely to belong also to class $Y$ ; conversely, belonging to $X$ can make an example less likely to belong to $Z$ . Several works sought to model this information by using the vector of class labels as additional example attributes. To fill the unknown values of these attributes during prediction, existing methods resort to using outputs of other classifiers, and this makes them prone to errors. This is where our paper wants to contribute. We identified two potential ways to prune unnecessary dependencies and to reduce error-propagation in our new classifier-stacking technique, which is named PruDent . Experimental results indicate that the classification performance of PruDent compares favorably with that of other state-of-the-art approaches over a broad range of testbeds. Mor- over, its computational costs grow only linearly in the number of classes.
    Print ISSN: 1041-4347
    Digitale ISSN: 1558-2191
    Thema: Informatik
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 17
    facet.materialart.
    Unbekannt
    Institute of Electrical and Electronics Engineers (IEEE)
    Publikationsdatum: 2015-08-07
    Beschreibung: This work deals with the problem of producing a fast and accurate data classification, learning it from a possibly small set of records that are already classified. The proposed approach is based on the framework of the so-called Logical Analysis of Data (LAD), but enriched with information obtained from statistical considerations on the data. A number of discrete optimization problems are solved in the different steps of the procedure, but their computational demand can be controlled. The accuracy of the proposed approach is compared to that of the standard LAD algorithm, of support vector machines and of label propagation algorithm on publicly available datasets of the UCI repository. Encouraging results are obtained and discussed.
    Print ISSN: 1041-4347
    Digitale ISSN: 1558-2191
    Thema: Informatik
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 18
    facet.materialart.
    Unbekannt
    Institute of Electrical and Electronics Engineers (IEEE)
    Publikationsdatum: 2015-08-07
    Beschreibung: A new graph based constrained semi-supervised learning (G-CSSL) framework is proposed. Pairwise constraints (PC) are used to specify the types (intra- or inter-class) of points with labels. Since the number of labeled data is typically small in SSL setting, the core idea of this framework is to create and enrich the PC sets using the propagated soft labels from both labeled and unlabeled data by special label propagation (SLP), and hence obtaining more supervised information for delivering enhanced performance. We also propose a Two-stage Sparse Coding, termed TSC, for achieving adaptive neighborhood for SLP. The first stage aims at correcting the possible corruptions in data and training an informative dictionary, and the second stage focuses on sparse coding. To deliver enhanced inter-class separation and intra-class compactness, we also present a mixed soft-similarity measure to evaluate the similarity/dissimilarity of constrained pairs using the sparse codes and outputted probabilistic values by SLP. Simulations on the synthetic and real datasets demonstrated the validity of our algorithms for data representation and image recognition, compared with other related state-of-the-art graph based semi-supervised techniques.
    Print ISSN: 1041-4347
    Digitale ISSN: 1558-2191
    Thema: Informatik
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 19
    facet.materialart.
    Unbekannt
    Institute of Electrical and Electronics Engineers (IEEE)
    Publikationsdatum: 2015-08-07
    Beschreibung: In large databases, the amount and the complexity of the data calls for data summarization techniques. Such summaries are used to assist fast approximate query answering or query optimization. Histograms are a prominent class of model-free data summaries and are widely used in database systems. So-called self-tuning histograms look at query-execution results to refine themselves. An assumption with such histograms, which has not been questioned so far, is that they can learn the dataset from scratch, that is—starting with an empty bucket configuration. We show that this is not the case. Self-tuning methods are very sensitive to the initial configuration. Three major problems stem from this. Traditional self-tuning is unable to learn projections of multi-dimensional data, is sensitive to the order of queries, and reaches only local optima with high estimation errors. We show how to improve a self-tuning method significantly by starting with a carefully chosen initial configuration. We propose initialization by dense subspace clusters in projections of the data, which improves both accuracy and robustness of self-tuning. Our experiments on different datasets show that the error rate is typically halved compared to the uninitialized version.
    Print ISSN: 1041-4347
    Digitale ISSN: 1558-2191
    Thema: Informatik
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 20
    facet.materialart.
    Unbekannt
    Institute of Electrical and Electronics Engineers (IEEE)
    Publikationsdatum: 2015-08-07
    Beschreibung: Recently, two ideas have been explored that lead to more accurate algorithms for time-series classification (TSC). First, it has been shown that the simplest way to gain improvement on TSC problems is to transform into an alternative data space where discriminatory features are more easily detected. Second, it was demonstrated that with a single data representation, improved accuracy can be achieved through simple ensemble schemes. We combine these two principles to test the hypothesis that forming a collective of ensembles of classifiers on different data transformations improves the accuracy of time-series classification. The collective contains classifiers constructed in the time, frequency, change, and shapelet transformation domains. For the time domain, we use a set of elastic distance measures. For the other domains, we use a range of standard classifiers. Through extensive experimentation on 72 datasets, including all of the 46 UCR datasets, we demonstrate that the simple collective formed by including all classifiers in one ensemble is significantly more accurate than any of its components and any other previously published TSC algorithm. We investigate alternative hierarchical collective structures and demonstrate the utility of the approach on a new problem involving classifying Caenorhabditis elegans mutant types.
    Print ISSN: 1041-4347
    Digitale ISSN: 1558-2191
    Thema: Informatik
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 21
    facet.materialart.
    Unbekannt
    Institute of Electrical and Electronics Engineers (IEEE)
    Publikationsdatum: 2015-08-07
    Beschreibung: In real-world graphs such as social networks, Semantic Web and biological networks, each vertex usually contains rich information, which can be modeled by a set of tokens or elements. In this paper, we study a subgraph matching with set similarity (SMS $^2$ ) query over a large graph database, which retrieves subgraphs that are structurally isomorphic to the query graph, and meanwhile satisfy the condition of vertex pair matching with the (dynamic) weighted set similarity. To efficiently process the SMS $^2$ query, this paper designs a novel lattice-based index for data graph, and lightweight signatures for both query vertices and data vertices. Based on the index and signatures, we propose an efficient two-phase pruning strategy including set similarity pruning and structure-based pruning, which exploits the unique features of both (dynamic) weighted set similarity and graph topology. We also propose an efficient dominating-set-based subgraph matching algorithm guided by a dominating set selection algorithm to achieve better query performance. Extensive experiments on both real and synthetic datasets demonstrate that our method outperforms state-of-the-art methods by an order of magnitude.
    Print ISSN: 1041-4347
    Digitale ISSN: 1558-2191
    Thema: Informatik
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 22
    facet.materialart.
    Unbekannt
    Institute of Electrical and Electronics Engineers (IEEE)
    Publikationsdatum: 2015-08-07
    Beschreibung: Data imputation aims at filling in missing attribute values in databases. Most existing imputation methods to string attribute values are inferring-based approaches, which usually fail to reach a high imputation recall by just inferring missing values from the complete part of the data set. Recently, some retrieving-based methods are proposed to retrieve missing values from external resources such as the World Wide Web, which tend to reach a much higher imputation recall, but inevitably bring a large overhead by issuing a large number of search queries. In this paper, we investigate the interaction between the inferring-based methods and the retrieving-based methods. We show that retrieving a small number of selected missing values can greatly improve the imputation recall of the inferring-based methods. With this intuition, we propose an inTeractive Retrieving-Inferring data imPutation approach (TRIP), which performs retrieving and inferring alternately in filling in missing attribute values in a data set. To ensure the high recall at the minimum cost, TRIP faces a challenge of selecting the least number of missing values for retrieving to maximize the number of inferable values. Our proposed solution is able to identify an optimal retrieving-inferring scheduling scheme in deterministic data imputation, and the optimality of the generated scheme is theoretically analyzed with proofs. We also analyze with an example that the optimal scheme is not feasible to be achieved in $tau$ -constrained stochastic data imputation ( $tau$ -SDI), but still, our proposed solution identifies an expected-optimal scheme in $tau$ -SDI. Extensive experiments on four data collections show that TRIP retrieves on average 20 percent missing values and achieves the same high recall that was reached by the retrieving-based approach.
    Print ISSN: 1041-4347
    Digitale ISSN: 1558-2191
    Thema: Informatik
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 23
    facet.materialart.
    Unbekannt
    Institute of Electrical and Electronics Engineers (IEEE)
    Publikationsdatum: 2015-08-07
    Beschreibung: Visual classification has attracted considerable research interests in the past decades. In this paper, a novel $ell _1$ -hypergraph model for visual classification is proposed. Hypergraph learning, as a natural extension of graph model, has been widely used in many machine learning tasks. In previous work, hypergraph is usually constructed by attribute-based or neighborhood-based methods. That is, a hyperedge is generated by connecting a set of samples sharing a same feature attribute or in a neighborhood. However, these methods are unable to explore feature space globally or sensitive to noises. To address these problems, we propose a novel hypergraph construction approach that leverages sparse representation to generate hyperedges and learns the relationship among hyperedges and their vertices. First, for each sample, a hyperedge is generated by regarding it as the centroid and linking it as well as its nearest neighbors. Then, the sparse representation method is applied to represent the centroid vertex by other vertices within the same hyperedge. The vertices with zero coefficients are removed from the hyperedge. Finally, the representation coefficients are used to define the incidence relation between the hyperedge and the vertices. In our approach, we also optimize the hyperedge weights to modulate the effects of different hyperedges. We leverage the prior knowledge on the hyperedges so that the hyperedges sharing more vertices can have closer weights, where a graph Laplacian is used to regularize the optimization of the weights. Our approach is named $ell _1$ -hypergraph since the $ell _1$ sparse representation is employed in the hypergraph construction process. The method is evaluated on various visual classification tasks, and it demonstrates promising performance.
    Print ISSN: 1041-4347
    Digitale ISSN: 1558-2191
    Thema: Informatik
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 24
    Publikationsdatum: 2015-08-08
    Beschreibung: The new dual-pivot Quicksort by Vladimir Yaroslavskiy—used in Oracle’s Java runtime library since version 7—features intriguing asymmetries. They make a basic variant of this algorithm use less comparisons than classic single-pivot Quicksort. In this paper, we extend the analysis to the case where the two pivots are chosen as fixed order statistics of a random sample. Surprisingly, dual-pivot Quicksort then needs more comparisons than a corresponding version of classic Quicksort, so it is clear that counting comparisons is not sufficient to explain the running time advantages observed for Yaroslavskiy’s algorithm in practice. Consequently, we take a more holistic approach and give also the precise leading term of the average number of swaps, the number of executed Java Bytecode instructions and the number of scanned elements, a new simple cost measure that approximates I/O costs in the memory hierarchy. We determine optimal order statistics for each of the cost measures. It turns out that the asymmetries in Yaroslavskiy’s algorithm render pivots with a systematic skew more efficient than the symmetric choice. Moreover, we finally have a convincing explanation for the success of Yaroslavskiy’s algorithm in practice: compared with corresponding versions of classic single-pivot Quicksort, dual-pivot Quicksort needs significantly less I/Os, both with and without pivot sampling.
    Print ISSN: 0178-4617
    Digitale ISSN: 1432-0541
    Thema: Informatik , Mathematik
    Publiziert von Springer
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 25
    Publikationsdatum: 2015-06-05
    Beschreibung: In this paper, we consider the Unsplittable (hard) Capacitated Facility Location Problem (UCFLP) with uniform capacities and present new approximation algorithms for it. This problem is a generalization of the classical facility location problem where each facility can serve at most u units of demand and each client must be served by exactly one facility. This problem is motivated by its applications in many practical problems including supply chain problems of indivisible goods (Verter in Foundations of location analysis, chapter 2. International series in operations research and management science. Springer, Berlin, 2011 ) and the assignment problem in the content distribution networks (Bateni and Hajiaghayi in Proceedings of the nineteenth annual ACM-SIAM symposium on discrete algorithms, pp 805–814, 2009 ). While there are several approximation algorithms for the soft capacitated version of this problem (in which one can open multiple copies of each facility) or the splittable version (in which the demand of each client can be divided to be served by multiple open facilities), there are very few results for the UCFLP. It is known that it is NP-hard to approximate this problem within any factor without violating the capacities. So we consider bicriteria \((\alpha ,\beta )\) -approximations where the algorithm returns a solution whose cost is within factor \(\alpha \) of the optimum and violates the capacity constraints within factor \(\beta \) . Shmoys et al. (Proceedings of the twenty-ninth annual ACM symposium on theory of computing, pp 265–274, 1997 ) were the first to consider this problem and gave a (9, 4)-approximation. Later results imply ( O (1), 2)-approximations, however, no constant factor approximation is known with capacity violation of less than 2. We present a framework for designing bicriteria approximation algorithms for this problem and show two new approximation algorithms with factors (9, 3 / 2) and (29.315, 4 / 3). These are the first algorithms with constant approximation in which the violation of capacities is below 2. The heart of our algorithm is a reduction from the UCFLP to a restricted version of the problem. One feature of this reduction is that any \((O(1),1+{\epsilon })\) -approximation for the restricted version implies an \((O(1),1+{\epsilon })\) -approximation for the UCFLP and we believe our techniques might be useful towards finding such approximations or perhaps \((f({\epsilon }),1+{\epsilon })\) -approximation for the UCFLP for some function f . In addition, we present a quasi-polynomial time \((1+\epsilon ,1+\epsilon )\) -approximation for the (uniform) UCFLP in Euclidean metrics, for any constant \({\epsilon }〉0\) .
    Print ISSN: 0178-4617
    Digitale ISSN: 1432-0541
    Thema: Informatik , Mathematik
    Publiziert von Springer
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 26
    facet.materialart.
    Unbekannt
    Institute of Electrical and Electronics Engineers (IEEE)
    Publikationsdatum: 2015-08-07
    Beschreibung: We consider the problem of adaptively routing a fleet of cooperative vehicles within a road network in the presence of uncertain and dynamic congestion conditions. To tackle this problem, we first propose a Gaussian process dynamic congestion model that can effectively characterize both the dynamics and the uncertainty of congestion conditions. Our model is efficient and thus facilitates real-time adaptive routing in the face of uncertainty. Using this congestion model, we develop efficient algorithms for non-myopic adaptive routing to minimize the collective travel time of all vehicles in the system. A key property of our approach is the ability to efficiently reason about the long-term value of exploration, which enables collectively balancing the exploration/exploitation trade-off for entire fleets of vehicles. Our approach is validated by traffic data from two large Asian cities. Our congestion model is shown to be effective in modeling dynamic congestion conditions. Our routing algorithms also generate significantly faster routes compared to standard baselines, and achieve near-optimal performance compared to an omniscient routing algorithm. We also present the results from a preliminary field study, which showcases the efficacy of our approach.
    Print ISSN: 1041-4347
    Digitale ISSN: 1558-2191
    Thema: Informatik
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 27
    facet.materialart.
    Unbekannt
    Institute of Electrical and Electronics Engineers (IEEE)
    Publikationsdatum: 2015-08-07
    Beschreibung: Betweenness centrality is a classic measure that quantifies the importance of a graph element (vertex or edge) according to the fraction of shortest paths passing through it. This measure is notoriously expensive to compute, and the best known algorithm runs in $mathcal {O}(nm)$ time. The problems of efficiency and scalability are exacerbated in a dynamic setting, where the input is an evolving graph seen edge by edge, and the goal is to keep the betweenness centrality up to date. In this paper, we propose the first truly scalable algorithm for online computation of betweenness centrality of both vertices and edges in an evolving graph where new edges are added and existing edges are removed. Our algorithm is carefully engineered with out-of-core techniques and tailored for modern parallel stream processing engines that run on clusters of shared-nothing commodity hardware. Hence, it is amenable to real-world deployment. We experiment on graphs that are two orders of magnitude larger than previous studies. Our method is able to keep the betweenness centrality measures up-to-date online, i.e., the time to update the measures is smaller than the inter-arrival time between two consecutive updates.
    Print ISSN: 1041-4347
    Digitale ISSN: 1558-2191
    Thema: Informatik
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 28
    facet.materialart.
    Unbekannt
    Institute of Electrical and Electronics Engineers (IEEE)
    Publikationsdatum: 2015-08-07
    Beschreibung: Phase change memory (PCM) is non-volatile memory that is byte-addressable. It is two to four times denser than DRAM, orders of magnitude better than NAND Flash memory in read latency, and 10 times better than NAND Flash memory in write endurance. However, it still limits the number of write operations to at most $10^6$ times per PCM cell. To extend its lifetime, it is necessary to evenly distribute write operations over all the memory cells. Up to now, the $mathrm{B^{+}}$ -Tree index structure has been used to quickly locate a search key in a relational database management system (RDBMS). All the record keys in each node are sorted and packed upon insertion in, and deletion from, the $mathrm{B^{+}}$ -Tree. In addition, a counter keeps track of the number of valid keys in the $mathrm{B^{+}}$ -Tree. Consequently, a $mathrm{B^{+}}$ -Tree algorithm results in a large number of write operations, which deteriorates the endurance of PCM. This restricts the usage of PCM on a database server and deteriorates performance of database servers. In this paper, we propose a novel PCM-aware $math- m{B^{+}}$ -Tree index structure, called $mathrm{PB^{+}}$ -Tree, to provide wear-leveling in PCM. According to our experiment results, $mathrm{PB^{+}}$ -Tree is much faster than the existing $mathrm{B^{+}}$ -Tree algorithms for PCM and NAND Flash memory with versatile workloads. More importantly, our scheme also greatly reduces the number of write operations compared to other $mathrm{B^{+}}$ -Tree algorithms. All of these results suggest that $mathrm{PB^{+}}$ -Tree is the $mathrm{B^{+}}$ -Tree algorithm best fitted to PCM.
    Print ISSN: 1041-4347
    Digitale ISSN: 1558-2191
    Thema: Informatik
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 29
    facet.materialart.
    Unbekannt
    Springer
    Publikationsdatum: 2015-08-11
    Beschreibung: A commonly studied means of parameterizing graph problems is the deletion distance from triviality (Guo et al., Parameterized and exact computation, Springer, Berlin, pp. 162–173, 2004 ), which counts vertices that need to be deleted from a graph to place it in some class for which efficient algorithms are known. In the context of graph isomorphism, we define triviality to mean a graph with maximum degree bounded by a constant, as such graph classes admit polynomial-time isomorphism tests. We generalise deletion distance to a measure we call elimination distance to triviality, based on elimination trees or tree-depth decompositions. We establish that graph canonisation, and thus graph isomorphism, is \(\mathsf {FPT}\) when parameterized by elimination distance to bounded degree, extending results of Bouland et al. (Parameterized and exact computation, Springer, Berlin, pp. 218–230, 2012 ).
    Print ISSN: 0178-4617
    Digitale ISSN: 1432-0541
    Thema: Informatik , Mathematik
    Publiziert von Springer
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 30
    facet.materialart.
    Unbekannt
    Springer
    Publikationsdatum: 2015-08-11
    Beschreibung: Recent advances in drift analysis have given us better and better tools for understanding random processes, including the run time of randomized search heuristics. In the setting of multiplicative drift we do not only have excellent bounds on the expected run time, but also more general results showing the strong concentration of the run time. In this paper we investigate the setting of additive drift under the assumption of strong concentration of the “step size” of the process. Under sufficiently strong drift towards the goal we show a strong concentration of the hitting time. In contrast to this, we show that in the presence of small drift a Gambler’s-Ruin-like behavior of the process overrides the influence of the drift, leading to a maximal movement of about \(\sqrt{t}\) steps within t iterations. Finally, in the presence of sufficiently strong negative drift the hitting time is superpolynomial with high probability; this corresponds to the well-known Negative Drift Theorem.
    Print ISSN: 0178-4617
    Digitale ISSN: 1432-0541
    Thema: Informatik , Mathematik
    Publiziert von Springer
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 31
    facet.materialart.
    Unbekannt
    Institute of Electrical and Electronics Engineers (IEEE)
    Publikationsdatum: 2015-09-11
    Beschreibung: The $k$ nearest neighbor ( $k$ NN) search on road networks is an important function in web mapping services. These services are now dealing with rapidly arriving queries, that are issued by a massive amount of users. While overlay graph-based indices can answer shortest path queries efficiently, there have been no studies on utilizing such indices to answer $k$ NN queries efficiently. In this paper, we fill this research gap and present two efficient $k$ NN search solutions on overlay graph-based indices. Experimental results show that our solutions offer very low query latency (0.1 ms) and require only small index sizes, even for 10-million-node networks.
    Print ISSN: 1041-4347
    Digitale ISSN: 1558-2191
    Thema: Informatik
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 32
    facet.materialart.
    Unbekannt
    Institute of Electrical and Electronics Engineers (IEEE)
    Publikationsdatum: 2015-09-11
    Beschreibung: Measuring semantic similarity between two terms is essential for a variety of text analytics and understanding applications. Currently, there are two main approaches for this task, namely the knowledge based and the corpus based approaches. However, existing approaches are more suitable for semantic similarity between words rather than the more general multi-word expressions (MWEs), and they do not scale very well. Contrary to these existing techniques, we propose an efficient and effective approach for semantic similarity using a large scale semantic network. This semantic network is automatically acquired from billions of web documents. It consists of millions of concepts, which explicitly model the context of semantic relationships. In this paper, we first show how to map two terms into the concept space, and compare their similarity there. Then, we introduce a clustering approach to orthogonalize the concept space in order to improve the accuracy of the similarity measure. Finally, we conduct extensive studies to demonstrate that our approach can accurately compute the semantic similarity between terms of MWEs and with ambiguity, and significantly outperforms 12 competing methods under Pearson Correlation Coefficient. Meanwhile, our approach is much more efficient than all competing algorithms, and can be used to compute semantic similarity in a large scale.
    Print ISSN: 1041-4347
    Digitale ISSN: 1558-2191
    Thema: Informatik
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 33
    facet.materialart.
    Unbekannt
    Institute of Electrical and Electronics Engineers (IEEE)
    Publikationsdatum: 2015-09-11
    Beschreibung: Given a spatio-temporal network, a source, a destination, and a desired departure time interval, the All-departure-time Lagrangian Shortest Paths (ALSP) problem determines a set which includes the shortest path for every departure time in the given interval. ALSP is important for critical societal applications such as eco-routing. However, ALSP is computationally challenging due to the non-stationary ranking of the candidate paths across distinct departure-times. Current related work for reducing the redundant work, across consecutive departure-times sharing a common solution, exploits only partial information e.g., the earliest feasible arrival time of a path. In contrast, our approach uses all available information, e.g., the entire time series of arrival times for all departure-times. This allows elimination of all knowable redundant computation based on complete information available at hand. We operationalize this idea through the concept of critical-time-points (CTP), i.e., departure-times before which ranking among candidate paths cannot change. In our preliminary work, we proposed a CTP based forward search strategy. In this paper, we propose a CTP based temporal bi-directional search for the ALSP problem via a novel impromptu rendezvous termination condition. Theoretical and experimental analysis show that the proposed approach outperforms the related work approaches particularly when there are few critical-time-points.
    Print ISSN: 1041-4347
    Digitale ISSN: 1558-2191
    Thema: Informatik
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 34
    facet.materialart.
    Unbekannt
    Institute of Electrical and Electronics Engineers (IEEE)
    Publikationsdatum: 2015-09-11
    Beschreibung: Computing connected components is a core operation on graph data. Since billion-scale graphs cannot be resident in memory of a single server, several approaches based on distributed machines have recently been proposed. The representative methods are $mathsf{Hashhbox{-}Tohbox{-}Min}$ and $mathsf{PowerGraph}$ . $mathsf{Hashhbox{-}Tohbox{-}Min}$ is the state-of-the art disk-based distributed method which minimizes the number of MapReduce rounds. $mathsf{PowerGraph}$ is the-state-of-the-art in-memory distributed system, which is typically faster than the disk-based distributed one, however, requires a lot of machines for handling billion-scale graphs. In this paper, we propose an I/O efficient parallel algorithm for billion-scale graphs in a single PC. We first propose the Disk-based Sequential access-oriented Parallel processing  (DSP) model that exploits sequential disk access in terms of disk I/Os and parallel processing in terms of computation. We then propose an ultra-fast disk-based parallel algorithm for computing connected components, $mathsf{DSPhbox{-}CC}$ , which largely improves the performance through sequential disk scan and page-level cache-conscious parallel processing . Extensive experimental results show that $mathsf{DSPhbox{-}CC}$ 1) computes connected components in billion-scale graphs using the limited memory size whereas in-memory algorithms can only support medium-sized graphs with the same memory size, and 2) significantly outperforms all distributed competitors as well as a representative disk-based parallel method.
    Print ISSN: 1041-4347
    Digitale ISSN: 1558-2191
    Thema: Informatik
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 35
    facet.materialart.
    Unbekannt
    Institute of Electrical and Electronics Engineers (IEEE)
    Publikationsdatum: 2015-09-11
    Beschreibung: Answering why-not questions in databases is promised to have wide application prospect in many areas and thereby, has attracted recent attention in the database research community. This paper addresses the problem of answering these so-called why-not questions in similar graph matching for graph databases. Given a set of answer graphs of an initial query graph $q$ and a set of missing ( why-not ) graphs, we aim to modify $q$ into a new query graph $q^*$ such that the missing graphs are included in the new answer set of $q^*$ . We present an approximate solution to address the above as the optimal solution is NP-hard to compute. In our approach, we first compute the bounded search space and the distance to be minimized for $q^*$ . Then, we present a two-phase algorithm to find the new query $q^*$ . In the first phase, we generate a set of candidate edges to be added/deleted into/from the initial query $q$ within the bounded search space and in the second phase, we select a subset of candidate edges generated in the first phase to minimize the distance for $q^*$ . We also demonstrate the effectiveness and efficiency of our approach by conducting extensive experiments on two real datasets.
    Print ISSN: 1041-4347
    Digitale ISSN: 1558-2191
    Thema: Informatik
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 36
    facet.materialart.
    Unbekannt
    Institute of Electrical and Electronics Engineers (IEEE)
    Publikationsdatum: 2015-09-11
    Beschreibung: How has the interdisciplinary data mining field been practiced in Network and Systems Management (NSM)? In Science and Technology, there is a wide use of data mining in areas like bioinformatics, genetics, Web, and, more recently, astroinformatics. However, the application in NSM has been limited and inconsiderable. In this article, we provide an account of how data mining has been applied in managing networks and systems for the past four decades, presumably since its birth. We look into the field’s applications in the key NSM activities—discovery, monitoring, analysis, reporting, and domain knowledge acquisition. In the end, we discuss our perspective on the issues that are considered critical for the effective application of data mining in the modern systems which are characterized by heterogeneity and high dynamism.
    Print ISSN: 1041-4347
    Digitale ISSN: 1558-2191
    Thema: Informatik
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 37
    facet.materialart.
    Unbekannt
    Institute of Electrical and Electronics Engineers (IEEE)
    Publikationsdatum: 2015-09-11
    Beschreibung: With the rapid development of location-aware mobile devices, ubiquitous Internet access and social computing technologies, lots of users’ personal information, such as location data and social data, has been readily accessible from various mobile platforms and online social networks. The convergence of these two types of data, known as geo-social data , has enabled collaborative spatial computing that explicitly combines both location and social factors to answer useful geo-social queries for either business or social good. In this paper, we study a new type of Geo-Social K-Cover Group (GSKCG) queries that, given a set of query points and a social network, retrieves a minimum user group in which each user is socially related to at least $k$ other users and the users’ associated regions (e.g., familiar regions or service regions) can jointly cover all the query points. Albeit its practical usefulness, the GSKCG query problem is NP-complete. We consequently explore a set of effective pruning strategies to derive an efficient algorithm for finding the optimal solution. Moreover, we design a novel index structure tailored to our problem to further accelerate query processing. Extensive experiments demonstrate that our algorithm achieves desirable performance on real-life datasets.
    Print ISSN: 1041-4347
    Digitale ISSN: 1558-2191
    Thema: Informatik
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 38
    facet.materialart.
    Unbekannt
    Institute of Electrical and Electronics Engineers (IEEE)
    Publikationsdatum: 2015-09-11
    Beschreibung: High utility sequential pattern mining has been considered as an important research problem and a number of relevant algorithms have been proposed for this topic. The main challenge of high utility sequential pattern mining is that, the search space is large and the efficiency of the solutions is directly affected by the degree at which they can eliminate the candidate patterns. Therefore, the efficiency of any high utility sequential pattern mining solution depends on its ability to reduce this big search space, and as a result, lower the computational complexity of calculating the utilities of the candidate patterns. In this paper, we propose efficient data structures and pruning technique which is based on Cumulated Rest of Match (CRoM) based upper bound. CRoM, by defining a tighter upper bound on the utility of the candidates, allows more conservative pruning before candidate pattern generation in comparison to the existing techniques. In addition, we have developed an efficient algorithm, High Utility Sequential Pattern Extraction (HuspExt), which calculates the utilities of the child patterns based on that of the parents’. Substantial experiments on both synthetic and real datasets from different domains show that, the proposed solution efficiently discovers high utility sequential patterns from large scale datasets with different data characteristics, under low utility thresholds.
    Print ISSN: 1041-4347
    Digitale ISSN: 1558-2191
    Thema: Informatik
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 39
    facet.materialart.
    Unbekannt
    Institute of Electrical and Electronics Engineers (IEEE)
    Publikationsdatum: 2015-09-11
    Beschreibung: Ordinal classification with a monotonicity constraint is a kind of classification tasks, in which the objects with better attribute values should not be assigned to a worse decision class. Several learning algorithms have been proposed to handle this kind of tasks in recent years. The rank entropy-based monotonic decision tree is very representative thanks to its better robustness and generalization. Ensemble learning is an effective strategy to significantly improve the generalization ability of machine learning systems. The objective of this work is to develop a method of fusing monotonic decision trees. In order to achieve this goal, we take two factors into account: attribute reduction and fusing principle. Through introducing variable dominance rough sets, we firstly propose an attribute reduction approach with rank-preservation for learning base classifiers, which can effectively avoid overfitting and improve classification performance. Then, we establish a fusing principe based on maximal probability through combining the base classifiers, which is used to further improve generalization ability of the learning system. The experimental analysis shows that the proposed fusing method can significantly improve classification performance of the learning system constructed by monotonic decision trees.
    Print ISSN: 1041-4347
    Digitale ISSN: 1558-2191
    Thema: Informatik
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 40
    facet.materialart.
    Unbekannt
    Institute of Electrical and Electronics Engineers (IEEE)
    Publikationsdatum: 2015-09-11
    Beschreibung: Influence maximization, defined as finding a small subset of nodes that maximizes spread of influence in social networks, is NP-hard under both Independent Cascade (IC) and Linear Threshold (LT) models, where many greedy-based algorithms have been proposed with the best approximation guarantee. However, existing greedy-based algorithms are inefficient on large networks, as it demands heavy Monte-Carlo simulations of the spread functions for each node at the initial step [7] . In this paper, we establish new upper bounds to significantly reduce the number of Monte-Carlo simulations in greedy-based algorithms, especially at the initial step. We theoretically prove that the bound is tight and convergent when the summation of weights towards (or from) each node is less than 1. Based on the bound, we propose a new Upper Bound based Lazy Forward algorithm ( UBLF in short) for discovering the top-k influential nodes in social networks. We test and compare UBLF with prior greedy algorithms, especially CELF [30] . Experimental results show that UBLF reduces more than 95 percent Monte-Carlo simulations of CELF and achieves about $2hbox{-}10$ times speedup when the seed set is small.
    Print ISSN: 1041-4347
    Digitale ISSN: 1558-2191
    Thema: Informatik
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 41
    facet.materialart.
    Unbekannt
    Institute of Electrical and Electronics Engineers (IEEE)
    Publikationsdatum: 2015-09-11
    Beschreibung: Feature selection has been an important research topic in data mining, because the real data sets often have high-dimensional features, such as the bioinformatics and text mining applications. Many existing filter feature selection methods rank features by optimizing certain feature ranking criterions, such that correlated features often have similar rankings. These correlated features are redundant and don’t provide large mutual information to help data mining. Thus, when we select a limited number of features, we hope to select the top non-redundant features such that the useful mutual information can be maximized. In previous research, Ding et al. recognized this important issue and proposed the minimum Redundancy Maximum Relevance Feature Selection (mRMR) model to minimize the redundancy between sequentially selected features. However, this method used the greedy search, thus the global feature redundancy wasn’t considered and the results are not optimal. In this paper, we propose a new feature selection framework to globally minimize the feature redundancy with maximizing the given feature ranking scores, which can come from any supervised or unsupervised methods. Our new model has no parameter so that it is especially suitable for practical data mining application. Experimental results on benchmark data sets show that the proposed method consistently improves the feature selection results compared to the original methods. Meanwhile, we introduce a new unsupervised global and local discriminative feature selection method which can be unified with the global feature redundancy minimization framework and shows superior performance.
    Print ISSN: 1041-4347
    Digitale ISSN: 1558-2191
    Thema: Informatik
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 42
    facet.materialart.
    Unbekannt
    Institute of Electrical and Electronics Engineers (IEEE)
    Publikationsdatum: 2015-09-11
    Beschreibung: Multimedia information retrieval usually involves two key modules including effective feature representation and ranking model construction. Most existing approaches are incapable of well modeling the inherent correlations and interactions between them, resulting in the loss of the latent consensus structure information. To alleviate this problem, we propose a learning to rank approach that simultaneously obtains a set of deep linear features and constructs structure-aware ranking models in a joint learning framework. Specifically, the deep linear feature learning corresponds to a series of matrix factorization tasks in a hierarchical manner, while the learning-to-rank part concentrates on building a ranking model that effectively encodes the intrinsic ranking information by structural SVM learning. Through a joint learning mechanism, the two parts are mutually reinforced in our approach, and meanwhile their underlying interaction relationships are implicitly reflected by solving an alternating optimization problem. Due to the intrinsic correlations among different queries (i.e., similar queries for similar ranking lists), we further formulate the learning-to-rank problem as a multi-task problem, which is associated with a set of mutually related query-specific learning-to-rank subproblems. For computational efficiency and scalability, we design a MapReduce-based parallelization approach to speed up the learning processes. Experimental results demonstrate the efficiency, effectiveness, and scalability of the proposed approach in multimedia information retrieval.
    Print ISSN: 1041-4347
    Digitale ISSN: 1558-2191
    Thema: Informatik
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 43
    facet.materialart.
    Unbekannt
    Institute of Electrical and Electronics Engineers (IEEE)
    Publikationsdatum: 2015-09-11
    Beschreibung: In a peer-to-peer system, a node should estimate reputation of other peers not only on the basis of its own interaction, but also on the basis of expression of other nodes. Reputation aggregation mechanism implements strategy for achieving this. Reputation aggregation in peer to peer networks is generally a very time and resource consuming process. Moreover, most of the methods consider that a node will have the same reputation after aggregation with all the nodes in the network, which is not true. This paper proposes a reputation aggregation algorithm that uses a variant of gossip algorithm called differential gossip. In this paper, estimate of reputation is considered to be having two parts, one common component which is same with every node, and the other one is the information received from immediate neighbours based on the neighbours’ direct interaction with the node. The differential gossip is fast and requires a lesser amount of resources. This mechanism allows computation of independent reputation value by every node, of every other node in the network. The differential gossip trust has been investigated for a power law network formed using preferential attachment (PA) Model. The reputation computed using differential gossip trust shows good amount of immunity to the collusion. We have verified the performance of the algorithm on the power law networks with sizes ranging from 100 nodes to 50,000 nodes.
    Print ISSN: 1041-4347
    Digitale ISSN: 1558-2191
    Thema: Informatik
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 44
    facet.materialart.
    Unbekannt
    Institute of Electrical and Electronics Engineers (IEEE)
    Publikationsdatum: 2015-09-11
    Beschreibung: In this paper, the limitation that is prominent in most existing works of change-point detection methods is addressed by proposing a nonparametric, computationally efficient method. The limitation is that most works assume that each data point observed at each time step is a single multi-dimensional vector. However, there are many situations where this does not hold. Therefore, a setting where each observation is a collection of random variables, which we call a bag of data, is considered. After estimating the underlying distribution behind each bag of data and embedding those distributions in a metric space, the change-point score is derived by evaluating how the sequence of distributions is fluctuating in the metric space using a distance-based information estimator. Also, a procedure that adaptively determines when to raise alerts is incorporated by calculating the confidence interval of the change-point score at each time step. This avoids raising false alarms in highly noisy situations and enables detecting changes of various magnitudes. A number of experimental studies and numerical examples are provided to demonstrate the generality and the effectiveness of our approach with both synthetic and real datasets.
    Print ISSN: 1041-4347
    Digitale ISSN: 1558-2191
    Thema: Informatik
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 45
    facet.materialart.
    Unbekannt
    Institute of Electrical and Electronics Engineers (IEEE)
    Publikationsdatum: 2015-09-11
    Beschreibung: In many applications, top- k query is an important operation to return a set of interesting points in a potentially huge data space. It is analyzed in this paper that the existing algorithms cannot process top- k query on massive data efficiently. This paper proposes a novel table-scan-based T2S algorithm to efficiently compute top- k results on massive data. T2S first constructs the presorted table, whose tuples are arranged in the order of the round-robin retrieval on the sorted lists. T2S maintains only fixed number of tuples to compute results. The early termination checking for T2S is presented in this paper, along with the analysis of scan depth. The selective retrieval is devised to skip the tuples in the presorted table which are not top- k results. The theoretical analysis proves that selective retrieval can reduce the number of the retrieved tuples significantly. The construction and incremental-update/batch-processing methods for the used structures are proposed in this paper. The extensive experimental results, conducted on synthetic and real-life data sets, show that T2S has a significant advantage over the existing algorithms.
    Print ISSN: 1041-4347
    Digitale ISSN: 1558-2191
    Thema: Informatik
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 46
    facet.materialart.
    Unbekannt
    Institute of Electrical and Electronics Engineers (IEEE)
    Publikationsdatum: 2015-09-11
    Beschreibung: We witness an unprecedented proliferation of knowledge graphs that record millions of entities and their relationships. While knowledge graphs are structure-flexible and content-rich, they are difficult to use. The challenge lies in the gap between their overwhelming complexity and the limited database knowledge of non-professional users. If writing structured queries over “simple” tables is difficult, complex graphs are only harder to query. As an initial step toward improving the usability of knowledge graphs, we propose to query such data by example entity tuples, without requiring users to form complex graph queries. Our system, Graph Query By Example ( $mathsf {GQBE}$ ), automatically discovers a weighted hidden maximum query graph based on input query tuples, to capture a user’s query intent. It then efficiently finds and ranks the top approximate matching answer graphs and answer tuples. We conducted experiments and user studies on the large Freebase and DBpedia datasets and observed appealing accuracy and efficiency. Our system provides a complementary approach to the existing keyword-based methods, facilitating user-friendly graph querying. To the best of our knowledge, there was no such proposal in the past in the context of graphs.
    Print ISSN: 1041-4347
    Digitale ISSN: 1558-2191
    Thema: Informatik
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 47
    facet.materialart.
    Unbekannt
    Institute of Electrical and Electronics Engineers (IEEE)
    Publikationsdatum: 2015-09-11
    Beschreibung: This paper explores combinatorial optimization for problems of max-weight graph matching on multi-partite graphs, which arise in integrating multiple data sources. In the most common two-source case, it is often desirable for the final matching to be one-to-one; the database and statistical record linkage communities accomplish this by weighted bipartite graph matching on similarity scores. Such matchings are intuitively appealing: they leverage a natural global property of many real-world entity stores—that of being nearly deduped—and are known to provide significant improvements to precision and recall. Unfortunately, unlike the bipartite case, exact max-weight matching on multi-partite graphs is known to be NP-hard. Our two-fold algorithmic contributions approximate multi-partite max-weight matching: our first algorithm borrows optimization techniques common to Bayesian probabilistic inference; our second is a greedy approximation algorithm. In addition to a theoretical guarantee on the latter, we present comparisons on a real-world entity resolution problem from Bing significantly larger than typically found in the literature, on publication data, and on a series of synthetic problems. Our results quantify significant improvements due to exploiting multiple sources, which are made possible by global one-to-one constraints linking otherwise independent matching sub-problems. We also discover that our algorithms are complementary: one being much more robust under noise, and the other being simple to implement and very fast to run.
    Print ISSN: 1041-4347
    Digitale ISSN: 1558-2191
    Thema: Informatik
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 48
    facet.materialart.
    Unbekannt
    Institute of Electrical and Electronics Engineers (IEEE)
    Publikationsdatum: 2015-09-11
    Beschreibung: Internet users can be classified as two types: (a) active users —actively contribute to blogs, publish opinions, write comments in Youtube, tweet messages, etc. and (b) passive consumers who only consume Internet information without contributing to it. While the majority of current social media research deals with active user analysis, there is very little work in understanding the dynamics of passive consumers and their influence. Our global scale Internet measurement of user access patterns of a diverse set of Internet media services indicates conclusively that majority of consumers are passive. In this paper, we develop a spatio-temporal mathematical model and the corresponding stochastic analysis to understand the passive consumer dynamics. Both discrete and continuous time analysis are presented. We also show how the analysis can be used to identify spatial points of influence, i.e., spatial locations that have maximum expertise or influence on a topic. The analysis takes into account, the initial level of consumption at each spatial location and the influence of different passive consumers at different geographic locations on each other. The effect of information noise is taken into account to derive fundamental limits of passive information consumption. Theoretical results are verified using real Internet measurement data. The large scale data have been made available for other researchers.
    Print ISSN: 1041-4347
    Digitale ISSN: 1558-2191
    Thema: Informatik
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 49
    facet.materialart.
    Unbekannt
    Institute of Electrical and Electronics Engineers (IEEE)
    Publikationsdatum: 2015-09-11
    Beschreibung: The objective of the paper is a contribution to data mining within the framework of the observational calculus, through introducing ǵeneralized quantifiers related to copulas. Fitting copulas to multidimensional data is an increasingly important method for analyzing dependencies, and the proposed quantifiers of observational calculus assess the results of estimating the structure of joint distributions of continuous variables by means of hierarchical Archimedean copulas. To this end, the existing theory of hierarchical Archimedean copulas has been slightly extended in the paper: It has been proven that sufficient conditions for the function defining a hierarchical Archimedean copula to be indeed a copula, which have so far been rigorously established only for the special case of fully nested Archimedean copulas, hold in general. These conditions allow us to define three new generalized quantifiers, which are then thoroughly validated on four benchmark data sets and one data set from a real-world application. The paper concludes by comparing the proposed quantifiers to a more traditional approach—maximum weight spanning trees.
    Print ISSN: 1041-4347
    Digitale ISSN: 1558-2191
    Thema: Informatik
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 50
    facet.materialart.
    Unbekannt
    Institute of Electrical and Electronics Engineers (IEEE)
    Publikationsdatum: 2015-09-11
    Beschreibung: In this work, we develop an approach for the safe distribution and parallel execution of data-centric workflows over the publish/subscribe abstraction. In essence, we design a unique representation of data-centric workflows, specifically designed to exploit the loosely coupled and distributed nature of publish/subscribe systems. Furthermore, we argue for the practicality and expressiveness of our approach by mapping a standard and industry-strength data-centric workflow model, namely, IBM Business Artifacts with Guard-Stage-Milestone (GSM), into the publish/subscribe abstraction. In short, the contributions of this work are three-fold: (1) mapping of data-centric workflows into publish/subscribe to achieve distributed and parallel execution; (2) detailed theoretical analysis of the mapping; and (3) formulation of the complexity of the optimal workflow distribution over the publish/subscribe abstraction as an NP-hard problem.
    Print ISSN: 1041-4347
    Digitale ISSN: 1558-2191
    Thema: Informatik
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 51
    facet.materialart.
    Unbekannt
    Springer
    Publikationsdatum: 2015-09-15
    Beschreibung: Binets and trinets are phylogenetic networks with two and three leaves, respectively. Here we consider the problem of deciding if there exists a binary level-1 phylogenetic network displaying a given set  \(\mathbb {T}\) of binary binets or trinets over a taxon set  X , and constructing such a network whenever it exists. We show that this is NP-hard for trinets but polynomial-time solvable for binets. Moreover, we show that the problem is still polynomial-time solvable for inputs consisting of binets and trinets as long as the cycles in the trinets have size three. Finally, we present an  \(O(3^{|X|} poly(|X|))\) time algorithm for general sets of binets and trinets. The latter two algorithms generalise to instances containing level-1 networks with arbitrarily many leaves, and thus provide some of the first supernetwork algorithms for computing networks from a set of rooted phylogenetic networks.
    Print ISSN: 0178-4617
    Digitale ISSN: 1432-0541
    Thema: Informatik , Mathematik
    Publiziert von Springer
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 52
    facet.materialart.
    Unbekannt
    Springer
    Publikationsdatum: 2015-09-17
    Beschreibung: We consider stochastic versions of OneMax and LeadingOnes and analyze the performance of evolutionary algorithms with and without populations on these problems. It is known that the ( \(1+1\) ) EA on OneMax performs well in the presence of very small noise, but poorly for higher noise levels. We extend these results to LeadingOnes and to many different noise models, showing how the application of drift theory can significantly simplify and generalize previous analyses. Most surprisingly, even small populations (of size \(\varTheta (\log n)\) ) can make evolutionary algorithms perform well for high noise levels, well outside the abilities of the ( \(1+1\) ) EA. Larger population sizes are even more beneficial; we consider both parent and offspring populations. In this sense, populations are robust in these stochastic settings.
    Print ISSN: 0178-4617
    Digitale ISSN: 1432-0541
    Thema: Informatik , Mathematik
    Publiziert von Springer
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 53
    facet.materialart.
    Unbekannt
    Springer
    Publikationsdatum: 2015-09-19
    Beschreibung: Since Tinhofer proposed the MinGreedy algorithm for maximum cardinality matching in 1984, several experimental studies found the randomized algorithm to perform excellently for various classes of random graphs and benchmark instances. In contrast, only few analytical results are known. We show that MinGreedy cannot improve on the trivial approximation ratio of  \(\frac{1}{2}\) whp., even for bipartite graphs. Our hard inputs seem to require a small number of high-degree nodes. This motivates an investigation of greedy algorithms on graphs with maximum degree  \(\varDelta \) : we show that  MinGreedy achieves a  \({\frac{{\varDelta }-1}{2{\varDelta }-3}} \) -approximation for graphs with  \({\varDelta } {=} 3\) and for \(\varDelta \) -regular graphs, and a guarantee of  \({\frac{{\varDelta }-1/2}{2{\varDelta }-2}} \) for graphs with maximum degree  \({\varDelta } \) . Interestingly, our bounds even hold for the deterministic MinGreedy that breaks all ties arbitrarily. Moreover, we investigate the limitations of the greedy paradigm, using the model of priority algorithms introduced by Borodin, Nielsen, and Rackoff. We study deterministic priority algorithms and prove a  \({\frac{{\varDelta }-1}{2{\varDelta }-3}}\) -inapproximability result for graphs with maximum degree  \({\varDelta } \) ; thus, these greedy algorithms do not achieve a  \(\frac{1}{2} {+} \varepsilon \) -approximation and in particular the  \(\frac{2}{3}\) -approximation obtained by the deterministic MinGreedy for  \({\varDelta } {=} 3\) is optimal in this class. For  k -uniform hypergraphs we show a tight  \(\frac{1}{k}\) -inapproximability bound. We also study fully randomized priority algorithms and give a  \(\frac{5}{6}\) -inapproximability bound. Thus, they cannot compete with matching algorithms of other paradigms.
    Print ISSN: 0178-4617
    Digitale ISSN: 1432-0541
    Thema: Informatik , Mathematik
    Publiziert von Springer
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 54
    facet.materialart.
    Unbekannt
    Springer
    Publikationsdatum: 2015-09-19
    Beschreibung: We discuss how string sorting algorithms can be parallelized on modern multi-core shared memory machines. As a synthesis of the best sequential string sorting algorithms and successful parallel sorting algorithms for atomic objects, we first propose string sample sort. The algorithm makes effective use of the memory hierarchy, uses additional word level parallelism, and largely avoids branch mispredictions. Then we focus on NUMA architectures, and develop parallel multiway LCP-merge and -mergesort to reduce the number of random memory accesses to remote nodes. Additionally, we parallelize variants of multikey quicksort and radix sort that are also useful in certain situations. As base-case sorter for LCP-aware string sorting we describe sequential LCP-insertion sort which calculates the LCP array and accelerates its insertions using it. Comprehensive experiments on five current multi-core platforms are then reported and discussed. The experiments show that our parallel string sorting implementations scale very well on real-world inputs and modern machines.
    Print ISSN: 0178-4617
    Digitale ISSN: 1432-0541
    Thema: Informatik , Mathematik
    Publiziert von Springer
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 55
    facet.materialart.
    Unbekannt
    Springer
    Publikationsdatum: 2015-11-21
    Beschreibung: Stochastic boolean function evaluation (SBFE) is the problem of determining the value of a given boolean function f on an unknown input x , when each bit \(x_i\) of x can only be determined by paying a given associated cost \(c_i\) . Further, x is drawn from a given product distribution: for each \(x_i\) , \(\mathbf{Pr}[x_i=1] = p_i\) and the bits are independent. The goal is to minimize the expected cost of evaluation. In this paper, we study the complexity of the SBFE problem for classes of DNF formulas. We consider both exact and approximate versions of the problem for subclasses of DNF, for arbitrary costs and product distributions, and for unit costs and/or the uniform distribution.
    Print ISSN: 0178-4617
    Digitale ISSN: 1432-0541
    Thema: Informatik , Mathematik
    Publiziert von Springer
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 56
    facet.materialart.
    Unbekannt
    Springer
    Publikationsdatum: 2016-07-12
    Print ISSN: 0178-4617
    Digitale ISSN: 1432-0541
    Thema: Informatik , Mathematik
    Publiziert von Springer
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 57
    facet.materialart.
    Unbekannt
    Springer
    Publikationsdatum: 2016-07-23
    Beschreibung: Interval graphs are intersection graphs of closed intervals of the real-line. The well-known computational problem, called recognition , asks whether an input graph G can be represented by closed intervals, i.e., whether G is an interval graph. There are several linear-time algorithms known for recognizing interval graphs, the oldest one is by Booth and Lueker (J Comput Syst Sci 13:335–379, 1976 ) based on PQ-trees. In this paper, we study a generalization of recognition, called partial representation extension . The input of this problem consists of a graph G with a partial representation \({{{\mathcal {R}}}}'\) fixing the positions of some intervals. The problem asks whether it is possible to place the remaining interval and create an interval representation \({{{\mathcal {R}}}}\) of the entire graph G extending \({{{\mathcal {R}}}}'\) . We generalize the characterization of interval graphs by Fulkerson and Gross (Pac J Math 15:835–855, 1965 ) to extendible partial representations. Using it, we give a linear-time algorithm for partial representation extension based on a reordering problem of PQ-trees.
    Print ISSN: 0178-4617
    Digitale ISSN: 1432-0541
    Thema: Informatik , Mathematik
    Publiziert von Springer
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 58
    facet.materialart.
    Unbekannt
    Springer
    Publikationsdatum: 2016-08-05
    Beschreibung: Modern data management systems often need to deal with massive, dynamic and inherently distributed data sources. We collect the data using a distributed network, and at the same time try to maintain a global view of the data at a central coordinator using a minimal amount of communication. Such applications have been captured by the distributed monitoring model which has attracted a lot of attention in recent years. In this paper we investigate the monitoring of the entropy functions, which are very useful in network monitoring applications such as detecting distributed denial-of-service attacks. Our results improve the previous best results by Arackaparambil et al. in ICLP 1: 95–106 ( 2009 ). Our technical contribution also includes implementing the celebrated AMS sampling method (by Alon et al. in J Comput Syst Sci 58(1): 137–147 1999 ) in the distributed monitoring model, which could be of independent interest.
    Print ISSN: 0178-4617
    Digitale ISSN: 1432-0541
    Thema: Informatik , Mathematik
    Publiziert von Springer
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 59
    facet.materialart.
    Unbekannt
    Institute of Electrical and Electronics Engineers (IEEE)
    Publikationsdatum: 2016-08-05
    Beschreibung: Since Jeff Howe introduced the term Crowdsourcing in 2006, this human-powered problem-solving paradigm has gained a lot of attention and has been a hot research topic in the field of computer science. Even though a lot of work has been conducted on this topic, so far we do not have a comprehensive survey on most relevant work done in the crowdsourcing field. In this paper, we aim to offer an overall picture of the current state of the art techniques in general-purpose crowdsourcing. According to their focus, we divide this work into three parts, which are: incentive design, task assignment, and quality control. For each part, we start with different problems faced in that area followed by a brief description of existing work and a discussion of pros and cons. In addition, we also present a real scenario on how the different techniques are used in implementing a location-based crowdsourcing platform, gMission. Finally, we highlight the limitations of the current general-purpose crowdsourcing techniques and present some open problems in this area.
    Print ISSN: 1041-4347
    Digitale ISSN: 1558-2191
    Thema: Informatik
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 60
    facet.materialart.
    Unbekannt
    Institute of Electrical and Electronics Engineers (IEEE)
    Publikationsdatum: 2016-08-05
    Beschreibung: Any important data management and analytics tasks cannot be completely addressed by automated processes. These tasks, such as entity resolution, sentiment analysis, and image recognition can be enhanced through the use of human cognitive ability. Crowdsouring platforms are an effective way to harness the capabilities of people (i.e., the crowd) to apply human computation for such tasks. Thus, crowdsourced data management has become an area of increasing interest in research and industry. We identify three important problems in crowdsourced data management. (1) Quality Control: Workers may return noisy or incorrect results so effective techniques are required to achieve high quality; (2) Cost Control: The crowd is not free, and cost control aims to reduce the monetary cost; (3) Latency Control: The human workers can be slow, particularly compared to automated computing time scales, so latency-control techniques are required. There has been significant work addressing these three factors for designing crowdsourced tasks, developing crowdsourced data manipulation operators, and optimizing plans consisting of multiple operators. In this paper, we survey and synthesize a wide spectrum of existing studies on crowdsourced data management. Based on this analysis we then outline key factors that need to be considered to improve crowdsourced data management.
    Print ISSN: 1041-4347
    Digitale ISSN: 1558-2191
    Thema: Informatik
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 61
    facet.materialart.
    Unbekannt
    Institute of Electrical and Electronics Engineers (IEEE)
    Publikationsdatum: 2016-08-05
    Beschreibung: Principal component analysis and the residual error is an effective anomaly detection technique. In an environment where anomalies are present in the training set, the derived principal components can be skewed by the anomalies. A further aspect of anomaly detection is that data might be distributed across different nodes in a network and their communication to a centralized processing unit is prohibited due to communication cost. Current solutions to distributed anomaly detection rely on a hierarchical network infrastructure to aggregate data or models; however, in this environment, links close to the root of the tree become critical and congested. In this paper, an algorithm is proposed that is more robust in its derivation of the principal components of a training set containing anomalies. A distributed form of the algorithm is then derived where each node in a network can iterate towards the centralized solution by exchanging small matrices with neighboring nodes. Experimental evaluations on both synthetic and real-world data sets demonstrate the superior performance of the proposed approach in comparison to principal component analysis and alternative anomaly detection techniques. In addition, it is shown that in a variety of network infrastructures, the distributed form of the anomaly detection model is able to derive a close approximation of the centralized model.
    Print ISSN: 1041-4347
    Digitale ISSN: 1558-2191
    Thema: Informatik
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 62
    facet.materialart.
    Unbekannt
    Institute of Electrical and Electronics Engineers (IEEE)
    Publikationsdatum: 2016-08-05
    Beschreibung: Many applications deal with moving object datasets, e.g., mobile phone social networking, scientific simulations, and ride-sharing services. These applications need to handle a tremendous number of spatial objects that continuously move and execute spatial queries to explore their surroundings. To manage such update-heavy workloads, several throwaway index structures have recently been proposed, where a static index is rebuilt periodically from scratch rather than updated incrementally. It has been shown that throwaway indices outperform specialized moving-object indices that maintain location updates incrementally. However, throwaway indices suffer from scalability due to their single-server design and the only distributed throwaway index (D-MOVIES), extension of a centralized approach, does not scale out as the number of servers increases, especially during query processing phase.
    Print ISSN: 1041-4347
    Digitale ISSN: 1558-2191
    Thema: Informatik
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 63
    facet.materialart.
    Unbekannt
    Institute of Electrical and Electronics Engineers (IEEE)
    Publikationsdatum: 2016-08-05
    Beschreibung: Collaborative filtering (CF) is out of question the most widely adopted and successful recommendation approach. A typical CF-based recommender system associates a user with a group of like-minded users based on their individual preferences over all the items, either explicit or implicit, and then recommends to the user some unobserved items enjoyed by the group. However, we find that two users with similar tastes on one item subset may have totally different tastes on another set. In other words, there exist many user-item subgroups each consisting of a subset of items and a group of like-minded users on these items. It is more reasonable to predict preferences through one user's correlated subgroups, but not the entire user-item matrix. In this paper, to find meaningful subgroups, we formulate a new Multiclass Co-Clustering (MCoC) model, which captures relations of user-to-item, user-to-user, and item-to-item simultaneously. Then, we combine traditional CF algorithms with subgroups for improving their top- $N$ recommendation performance. Our approach can be seen as a new extension of traditional clustering CF models. Systematic experiments on several real data sets have demonstrated the effectiveness of our proposed approach.
    Print ISSN: 1041-4347
    Digitale ISSN: 1558-2191
    Thema: Informatik
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 64
    facet.materialart.
    Unbekannt
    Institute of Electrical and Electronics Engineers (IEEE)
    Publikationsdatum: 2016-08-05
    Beschreibung: With the rapid development of Web 2.0 and Online To Offline (O2O) marketing model, various online e vent- b ased s ocial n etwork s (EBSNs) are getting popular. An important task of EBSNs is to facilitate the most satisfactory event-participant arrangement for both sides, i.e., events enroll more participants and participants are arranged with personally interesting events. Existing approaches usually focus on the arrangement of each single event to a set of potential users, or ignore the conflicts between different events, which leads to infeasible or redundant arrangements. In this paper, to address the shortcomings of existing approaches, we first identify a more general and useful event-participant arrangement problem, called G lobal E vent-participant A rrangement with C onflict and C apacity ( $GEACC$ ) problem, focusing on the conflicts of different events and making event-participant arrangements in a global view. We find that the GEACC problem is NP-hard due to the conflicts among events. Thus, we design two approximation algorithms with provable approximation ratios and an exact algorithm with pruning technique to address this problem. In addition, we propose an online setting of GEACC, called OnlineGEACC, which is also practical in real-world scenarios. We further design an online algorithm with provable performance guarantee. Finally, we verify the effectiveness and efficiency of the proposed methods through extensive experiments on real and synthetic datasets.
    Print ISSN: 1041-4347
    Digitale ISSN: 1558-2191
    Thema: Informatik
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 65
    facet.materialart.
    Unbekannt
    Institute of Electrical and Electronics Engineers (IEEE)
    Publikationsdatum: 2016-08-05
    Beschreibung: Given a point $p$ and a set of points $S$ , the kNN operation finds the $k$ closest points to in $S$ . It is a computational intensive task with a large range of applications such as knowledge discovery or data mining. However, as the volume and the dimension of data increase, only distributed approaches can perform such costly operation in a reasonable time. Recent works have focused on implementing efficient solutions using the MapReduce programming model because it is suitable for distributed large scale data processing. Although these works provide different solutions to the same problem, each one has particular constraints and properties. In this paper, we compare the different existing approaches for computing kNN on MapReduce, first theoretically, and then by performing an extensive experimental evaluation. To be able to compare solutions, we identify three generic steps for kNN computation on MapReduce: data pre-processing, data partitioning, and computation. We then analyze each step from load balancing, accuracy, and complexity aspects. Experiments in this paper use a variety of datasets, and analyze the impact of data volume, data dimension, and the value of k from many perspectives like time and space complexity, and accuracy. The experimental part brings new advantages and shortcomings that are discussed for each algorithm. To the best of our knowledge, this is the first pape- that compares kNN computing methods on MapReduce both theoretically and experimentally with the same setting. Overall, this paper can be used as a guide to tackle kNN-based practical problems in the context of big data.
    Print ISSN: 1041-4347
    Digitale ISSN: 1558-2191
    Thema: Informatik
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 66
    facet.materialart.
    Unbekannt
    Institute of Electrical and Electronics Engineers (IEEE)
    Publikationsdatum: 2016-08-05
    Beschreibung: Mining communities or clusters in networks is valuable in analyzing, designing, and optimizing many natural and engineering complex systems, e.g., protein networks, power grid, and transportation systems. Most of the existing techniques view the community mining problem as an optimization problem based on a given quality function(e.g., modularity), however none of them are grounded with a systematic theory to identify the central nodes in the network. Moreover, how to reconcile the mining efficiency and the community quality still remains an open problem. In this paper, we attempt to address the above challenges by introducing a novel algorithm. First, a kernel function with a tunable influence factor is proposed to measure the leadership of each node, those nodes with highest local leadership can be viewed as the candidate central nodes. Then, we use a discrete-time dynamical system to describe the dynamical assignment of community membership; and formulate the serval conditions to guarantee the convergence of each node's dynamic trajectory, by which the hierarchical community structure of the network can be revealed. The proposed dynamical system is independent of the quality function used, so could also be applied in other community mining models. Our algorithm is highly efficient: the computational complexity analysis shows that the execution time is nearly linearly dependent on the number of nodes in sparse networks. We finally give demonstrative applications of the algorithm to a set of synthetic benchmark networks and also real-world networks to verify the algorithmic performance.
    Print ISSN: 1041-4347
    Digitale ISSN: 1558-2191
    Thema: Informatik
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 67
    facet.materialart.
    Unbekannt
    Institute of Electrical and Electronics Engineers (IEEE)
    Publikationsdatum: 2016-08-05
    Beschreibung: When a microblogging user adopts some content propagated to her, we can attribute that to three behavioral factors, namely, topic virality , user virality , and user susceptibility . Topic virality measures the degree to which a topic attracts propagations by users. User virality and susceptibility refer to the ability of a user to propagate content to other users, and the propensity of a user adopting content propagated to her, respectively. In this paper, we study the problem of mining these behavioral factors specific to topics from microblogging content propagation data. We first construct a three dimensional tensor for representing the propagation instances. We then propose a tensor factorization framework to simultaneously derive the three sets of behavioral factors. Based on this framework, we develop a numerical factorization model and another probabilistic factorization variant. We also develop an efficient algorithm for the models’ parameters learning. Our experiments on a large Twitter dataset and synthetic datasets show that the proposed models can effectively mine the topic-specific behavioral factors of users and tweet topics. We further demonstrate that the proposed models consistently outperforms the other state-of-the-art content based models in retweet prediction over time.
    Print ISSN: 1041-4347
    Digitale ISSN: 1558-2191
    Thema: Informatik
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 68
    facet.materialart.
    Unbekannt
    Institute of Electrical and Electronics Engineers (IEEE)
    Publikationsdatum: 2016-08-05
    Beschreibung: We propose an algorithm for detecting patterns exhibited by anomalous clusters in high dimensional discrete data. Unlike most anomaly detection (AD) methods, which detect individual anomalies, our proposed method detects groups ( clusters ) of anomalies; i.e., sets of points which collectively exhibit abnormal patterns. In many applications, this can lead to a better understanding of the nature of the atypical behavior and to identifying the sources of the anomalies. Moreover, we consider the case where the atypical patterns exhibit on only a small (salient) subset of the very high dimensional feature space. Individual AD techniques and techniques that detect anomalies using all the features typically fail to detect such anomalies, but our method can detect such instances collectively, discover the shared anomalous patterns exhibited by them, and identify the subsets of salient features. In this paper, we focus on detecting anomalous topics in a batch of text documents, developing our algorithm based on topic models. Results of our experiments show that our method can accurately detect anomalous topics and salient features (words) under each such topic in a synthetic data set and two real-world text corpora and achieves better performance compared to both standard group AD and individual AD techniques. All required code to reproduce our experiments is available from https://github.com/hsoleimani/ATD .
    Print ISSN: 1041-4347
    Digitale ISSN: 1558-2191
    Thema: Informatik
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 69
    facet.materialart.
    Unbekannt
    Institute of Electrical and Electronics Engineers (IEEE)
    Publikationsdatum: 2016-08-05
    Beschreibung: Multilabel classification is prevalent in many real-world applications where data instances may be associated with multiple labels simultaneously. In multilabel classification, exploiting label correlations is an essential but nontrivial task. Most of the existing multilabel learning algorithms are either ineffective or computationally demanding and less scalable in exploiting label correlations. In this paper, we propose a co-evolutionary multilabel hypernetwork (Co-MLHN) as an attempt to exploit label correlations in an effective and efficient way. To this end, we firstly convert the traditional hypernetwork into a multilabel hypernetwork (MLHN) where label correlations are explicitly represented. We then propose a co-evolutionary learning algorithm to learn an integrated classification model for all labels. The proposed Co-MLHN exploits arbitrary order label correlations and has linear computational complexity with respect to the number of labels. Empirical studies on a broad range of multilabel data sets demonstrate that Co-MLHN achieves competitive results against state-of-the-art multilabel learning algorithms, in terms of both classification performance and scalability with respect to the number of labels.
    Print ISSN: 1041-4347
    Digitale ISSN: 1558-2191
    Thema: Informatik
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 70
    facet.materialart.
    Unbekannt
    Institute of Electrical and Electronics Engineers (IEEE)
    Publikationsdatum: 2016-08-05
    Beschreibung: Query auto completion (QAC) methods recommend queries to search engine users when they start entering a query. Current QAC methods mostly rank query completions based on their past popularity, i.e., on the number of times they have previously been submitted as a query. However, query popularity changes over time and may vary drastically across users. Accordingly, the ranking of query completions should be adjusted. Previous time-sensitive and user-specific QAC methods have been developed separately, yielding significant improvements over methods that are neither time-sensitive nor personalized. We propose a hybrid QAC method that is both time-sensitive and personalized. We extend it to handle long-tail prefixes, which we achieve by assigning optimal weights to the contribution from time-sensitivity and personalization. Using real-world search log datasets, we return top $N$ query suggestions ranked by predicted popularity as estimated from popularity trends and cyclic popularity behavior; we rerank them by integrating similarities to a user's previous queries (both in the current session and in previous sessions). Our method outperforms state-of-the-art time-sensitive QAC baselines, achieving total improvements of between 3 and 7 percent in terms of mean reciprocal rank (MRR). After optimizing the weights, our extended model achieves MRR improvements of between 4 and 8 percent.
    Print ISSN: 1041-4347
    Digitale ISSN: 1558-2191
    Thema: Informatik
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 71
    facet.materialart.
    Unbekannt
    Institute of Electrical and Electronics Engineers (IEEE)
    Publikationsdatum: 2016-08-05
    Beschreibung: Wrappers are pieces of software used to extract data from websites and structure them for further application processing. Unfortunately, websites are continuously evolving and structural changes happen with no forewarning, which usually results in wrappers working incorrectly. Thus, wrappers maintenance is necessary for detecting whether wrapper is extracting erroneous data. The solution consists of using verification models to detect whether wrapper output is statistically similar to the output produced by the wrapper itself when it was successfully invoked in the past. Current proposals present some weaknesses, as the data used to build these models are supposed to be homogeneous, independent, or representative enough, or following a single predefined mathematical model. In this paper, we present MAVE, a novel multilevel wrapper verification system that is based on one-class classification techniques to overcome previous weaknesses. The experimental results show that our proposal outperforms accuracy of current solutions.
    Print ISSN: 1041-4347
    Digitale ISSN: 1558-2191
    Thema: Informatik
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 72
    facet.materialart.
    Unbekannt
    Springer
    Publikationsdatum: 2016-08-06
    Beschreibung: In the Boundary Labeling problem, we are given a set of  n points, referred to as sites , inside an axis-parallel rectangle  R , and a set of  n pairwise disjoint rectangular labels that are attached to  R from the outside. The task is to connect the sites to the labels by non-intersecting rectilinear paths, so-called leaders , with at most one bend. In this paper, we study the Multi-Sided Boundary Labeling problem, with labels lying on at least two sides of the enclosing rectangle. We present a polynomial-time algorithm that computes a crossing-free leader layout if one exists. So far, such an algorithm has only been known for the cases in which labels lie on one side or on two opposite sides of  R (here a crossing-free solution always exists). The case where labels may lie on adjacent sides is more difficult. We present efficient algorithms for testing the existence of a crossing-free leader layout that labels all sites and also for maximizing the number of labeled sites in a crossing-free leader layout. For two-sided boundary labeling with adjacent sides, we further show how to minimize the total leader length in a crossing-free layout.
    Print ISSN: 0178-4617
    Digitale ISSN: 1432-0541
    Thema: Informatik , Mathematik
    Publiziert von Springer
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 73
    facet.materialart.
    Unbekannt
    Springer
    Publikationsdatum: 2016-08-06
    Beschreibung: In this paper we consider the problem of the strict self-assembly of infinite fractals within tile self-assembly. In particular, we provide tile assembly algorithms for the assembly of a Sierpinski triangle and the discrete Sierpinski carpet within a class of models we term the h - handed assembly model ( h -HAM), which generalizes the 2-HAM to allow up to h assemblies to combine in a single assembly step. Despite substantial consideration, no purely growth self-assembly model has yet been shown to strictly assemble an infinite fractal without significant modification to the fractal shape. In this paper we not only achieve this, but in the case of the Sierpinski carpet are able to achieve it within the 2-HAM, one of the most well studied tile assembly models in the literature. Our specific results are as follows: We provide a 6-HAM construction for a Sierpinski triangle that works at scale factor 1, 30 tile types, and assembles the fractal in a near perfect way in which all intermediate assemblies are finite-sized iterations of the recursive fractal. We further assemble a Sierpinski triangle within the 3-HAM at scale factor 3 and 990 tile types. For the Sierpinski carpet, we present a 2-HAM result that works at scale factor 3 and uses 1216 tile types. We further include analysis showing that the aTAM is incapable of strictly assembling the Sierpinski triangle considered in this paper, and that multiple hands are needed for the near-perfect assembly of a Sierpinski triangle and the Sierpinski carpet.
    Print ISSN: 0178-4617
    Digitale ISSN: 1432-0541
    Thema: Informatik , Mathematik
    Publiziert von Springer
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 74
    facet.materialart.
    Unbekannt
    Springer
    Publikationsdatum: 2016-08-06
    Beschreibung: We consider k -Facility Location games, where n strategic agents report their locations on the real line and a mechanism maps them to k facilities. Each agent seeks to minimize his connection cost, given by a nonnegative increasing function of his distance to the nearest facility. Departing from previous work, that mostly considers the identity cost function, we are interested in mechanisms without payments that are (group) strategyproof for any given cost function, and achieve a good approximation ratio for the social cost and/or the maximum cost of the agents. We present a randomized mechanism, called Equal Cost , which is group strategyproof and achieves a bounded approximation ratio for all k and n , for any given concave cost function. The approximation ratio is at most 2 for Max Cost and at most n for Social Cost . To the best of our knowledge, this is the first mechanism with a bounded approximation ratio for instances with \(k \ge 3\) facilities and any number of agents. Our result implies an interesting separation between deterministic mechanisms, whose approximation ratio for Max Cost jumps from 2 to unbounded when k increases from 2 to 3, and randomized mechanisms, whose approximation ratio remains at most 2 for all k . On the negative side, we exclude the possibility of a mechanism with the properties of Equal Cost for strictly convex cost functions. We also present a randomized mechanism, called Pick the Loser , which applies to instances with k facilities and only \(n = k+1\) agents. For any given concave cost function, Pick the Loser is strongly group strategyproof and achieves an approximation ratio of 2 for Social Cost .
    Print ISSN: 0178-4617
    Digitale ISSN: 1432-0541
    Thema: Informatik , Mathematik
    Publiziert von Springer
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 75
    facet.materialart.
    Unbekannt
    Springer
    Publikationsdatum: 2016-08-06
    Beschreibung: We give a deterministic #SAT algorithm for de Morgan formulas of size up to \(n^{2.63}\) , which runs in time \(2^{n-n^{{\varOmega }(1)}}\) . This improves upon the deterministic #SAT algorithm of Chen et al. (Proceedings of the twenty-ninth annual IEEE conference on computational complexity, 2014 ), which has similar running time but works only for formulas of size less than \(n^{2.5}\) . Our new algorithm is based on the shrinkage of de Morgan formulas under random restrictions, shown by Paterson and Zwick (Random Struct Algorithms 4(2):135–150, 1993 ). We prove a concentrated and constructive version of their shrinkage result. Namely, we give a deterministic polynomial-time algorithm that selects variables in a given de Morgan formula so that, with high probability over the random assignments to the chosen variables, the original formula shrinks in size, when simplified using a given deterministic polynomial-time formula-simplification algorithm.
    Print ISSN: 0178-4617
    Digitale ISSN: 1432-0541
    Thema: Informatik , Mathematik
    Publiziert von Springer
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 76
    facet.materialart.
    Unbekannt
    Springer
    Publikationsdatum: 2016-08-06
    Beschreibung: We consider Conditional random fields ( CRFs ) with pattern-based potentials defined on a chain. In this model the energy of a string (labeling) \(x_1\ldots x_n\) is the sum of terms over intervals [ i ,  j ] where each term is non-zero only if the substring \(x_i\ldots x_j\) equals a prespecified pattern w . Such CRFs can be naturally applied to many sequence tagging problems. We present efficient algorithms for the three standard inference tasks in a CRF, namely computing (i) the partition function, (ii) marginals, and (iii) computing the MAP. Their complexities are respectively \(O(\textit{nL})\) , \(O(\textit{nL} \ell _{\max })\) and \(O(\textit{nL} \min \{|D|,\log (\ell _{\max }\!+\!1)\})\) where L is the combined length of input patterns, \(\ell _{\max }\) is the maximum length of a pattern, and D is the input alphabet. This improves on the previous algorithms of Ye et al. (NIPS, 2009 ) whose complexities are respectively \(O(\textit{nL} |D|)\) , \(O\left( n |\varGamma | L^2 \ell _{\max }^2\right) \) and \(O(\textit{nL} |D|)\) , where \(|\varGamma |\) is the number of input patterns. In addition, we give an efficient algorithm for sampling, and revisit the case of MAP with non-positive weights.
    Print ISSN: 0178-4617
    Digitale ISSN: 1432-0541
    Thema: Informatik , Mathematik
    Publiziert von Springer
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 77
    facet.materialart.
    Unbekannt
    Springer
    Publikationsdatum: 2016-08-06
    Beschreibung: Estimating the number of triangles in graph streams using a limited amount of memory has become a popular topic in the last decade. Different variations of the problem have been studied, depending on whether the graph edges are provided in an arbitrary order or as incidence lists. However, with a few exceptions, the algorithms have considered insert-only streams. We present a new algorithm estimating the number of triangles in dynamic graph streams where edges can be both inserted and deleted. We show that our algorithm achieves better time and space complexity than previous solutions for various graph classes, for example sparse graphs with a relatively small number of triangles. Also, for graphs with constant transitivity coefficient, a common situation in real graphs, this is the first algorithm achieving constant processing time per edge. The result is achieved by a novel approach combining sampling of vertex triples and sparsification of the input graph. In the course of the analysis of the algorithm we present a lower bound on the number of pairwise independent 2-paths in general graphs which might be of independent interest. At the end of the paper we discuss lower bounds on the space complexity of triangle counting algorithms that make no assumptions on the structure of the graph.
    Print ISSN: 0178-4617
    Digitale ISSN: 1432-0541
    Thema: Informatik , Mathematik
    Publiziert von Springer
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 78
    facet.materialart.
    Unbekannt
    Springer
    Publikationsdatum: 2016-07-15
    Print ISSN: 0178-4617
    Digitale ISSN: 1432-0541
    Thema: Informatik , Mathematik
    Publiziert von Springer
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 79
    Publikationsdatum: 2016-07-15
    Beschreibung: We consider the problems of finding optimal identifying codes, (open) locating-dominating sets and resolving sets (denoted Identifying Code , (Open) Open Locating-Dominating Set and Metric Dimension ) of an interval or a permutation graph. In these problems, one asks to distinguish all vertices of a graph by a subset of the vertices, using either the neighbourhood within the solution set or the distances to the solution vertices. Using a general reduction for this class of problems, we prove that the decision problems associated to these four notions are NP-complete, even for interval graphs of diameter 2 and permutation graphs of diameter 2. While Identifying Code and (Open) Locating-Dominating Set are trivially fixed-parameter-tractable when parameterized by solution size, it is known that in the same setting Metric Dimension is W [2]-hard. We show that for interval graphs, this parameterization of Metric Dimension is fixed-parameter-tractable.
    Print ISSN: 0178-4617
    Digitale ISSN: 1432-0541
    Thema: Informatik , Mathematik
    Publiziert von Springer
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 80
    facet.materialart.
    Unbekannt
    Springer
    Publikationsdatum: 2016-07-28
    Beschreibung: Given a partition of an n element set into equivalence classes, we study the problem of assigning unique labels to these elements in order to support the query that asks whether the elements corresponding to two given labels belong to the same equivalence class. This has various applications including for testing whether two vertices are in the same connected component in an undirected graph or in the same strongly connected component in a directed graph. We consider the problem in several models. Concerning labeling schemes where we assign labels to elements and the query is to be answered just by examining the labels of the queried elements (without any extra space): if each vertex is required to have a unique label, then we show that a label space of \(\sum _{i=1}^n \lfloor {n \over i} \rfloor \) is necessary and sufficient. In other words, \(\lg n + \lg \lg n + O(1)\) bits of space are necessary and sufficient for representing each of the labels. This slightly strengthens the known lower bound and is in contrast to the known necessary and sufficient bound of \(\lceil \lg n \rceil \) for the label length, if each vertex need not get a unique label. Concerning succinct data structures for the problem when the n elements are to be uniquely assigned labels from label set \(\{1,\ldots , n\}\) , we first show that \(\varTheta (\sqrt{n})\) bits are necessary and sufficient to represent the equivalence class information. This space includes the space for implicitly encoding the vertex labels. We can support the query in such a structure in O (1) time in the standard word RAM model. We also develop a dynamic structure that uses \(O(\sqrt{n} \lg n)\) bits to support equivalence queries and unions in \(O(\lg n/\lg \lg n)\) worst case time or \(O(\alpha (n))\) expected amortized time where \(\alpha (n)\) is the inverse Ackermann function. Concerning succinct data structures for the problem when the n elements are to be uniquely assigned labels from label set \(\{1,\ldots , cn\}\) for any constant \(c 〉 1\) , we show that \(\varTheta (\lg n)\) bits are necessary and sufficient to represent the equivalence class information. Moreover, we can support the query in such a structure in O (1) time in the standard word RAM model. We believe that our work can trigger further work on tradeoffs between label space and auxiliary data structure space for other labeling problems.
    Print ISSN: 0178-4617
    Digitale ISSN: 1432-0541
    Thema: Informatik , Mathematik
    Publiziert von Springer
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 81
    facet.materialart.
    Unbekannt
    Springer
    Publikationsdatum: 2016-08-02
    Beschreibung: We perform a thorough study of various characteristics of the asynchronous push–pull protocol for spreading a rumor on Erdős–Rényi random graphs \(G_{n,p}\) , for any \(p〉c\ln (n)/n\) with \(c〉1\) . In particular, we provide a simple strategy for analyzing the asynchronous push–pull protocol on arbitrary graph topologies and apply this strategy to \(G_{n,p}\) . We prove tight bounds of logarithmic order for the total time that is needed until the information has spread to all nodes. Surprisingly, the time required by the asynchronous push–pull protocol is asymptotically almost unaffected by the average degree of the graph. Similarly tight bounds for Erdős–Rényi random graphs have previously only been obtained for the synchronous push protocol, where it has been observed that the total running time increases significantly for sparse random graphs. Finally, we quantify the robustness of the protocol with respect to transmission and node failures. Our analysis suggests that the asynchronous protocols are particularly robust with respect to these failures compared to their synchronous counterparts.
    Print ISSN: 0178-4617
    Digitale ISSN: 1432-0541
    Thema: Informatik , Mathematik
    Publiziert von Springer
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 82
    facet.materialart.
    Unbekannt
    Springer
    Publikationsdatum: 2016-08-04
    Beschreibung: We study connectivity relations among points, where the precise location of each input point lies in a region of uncertainty. We distinguish two fundamental scenarios under which uncertainty arises. In the favorable Best-Case Uncertainty , each input point can be chosen from a given set to yield the best possible objective value. In the unfavorable Worst-Case Uncertainty , the input set has worst possible objective value among all possible point locations, which are uncertain due, for example, to imprecise data. We consider these notions of uncertainty for the bottleneck spanning tree problem, giving rise to the following Best-Case Connectivity with Uncertainty problem: given a family of geometric regions, choose one point per region, such that the longest edge length of an associated geometric spanning tree is minimized. We show that this problem is NP-hard even for very simple scenarios in which the regions are line segments or squares. On the other hand, we give an exact solution for the case in which there are \(n+k\) regions, where k of the regions are line segments and n of the regions are fixed points. We then give approximation algorithms for cases where the regions are either all line segments or all unit discs. We also provide approximation methods for the corresponding Worst-Case Connectivity with Uncertainty problem: Given a set of uncertainty regions, find the minimal distance r such that for any choice of points, one per region, there is a spanning tree among the points with edge length at most r .
    Print ISSN: 0178-4617
    Digitale ISSN: 1432-0541
    Thema: Informatik , Mathematik
    Publiziert von Springer
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 83
    facet.materialart.
    Unbekannt
    Institute of Electrical and Electronics Engineers (IEEE)
    Publikationsdatum: 2016-08-05
    Beschreibung: Online portfolio selection has attracted increasing attention from data mining and machine learning communities in recent years. An important theory in financial markets is mean reversion, which plays a critical role in some state-of-the-art portfolio selection strategies. Although existing mean reversion strategies have been shown to achieve good empirical performance on certain datasets, they seldom carefully deal with noise and outliers in the data, leading to suboptimal portfolios, and consequently yielding poor performance in practice. In this paper, we propose to exploit the reversion phenomenon by using robust $L_1$ -median estimators, and design a novel online portfolio selection strategy named “Robust Median Reversion” (RMR), which constructs optimal portfolios based on the improved reversion estimator. We examine the performance of the proposed algorithms on various real markets with extensive experiments. Empirical results show that RMR can overcome the drawbacks of existing mean reversion algorithms and achieve significantly better results. Finally, RMR runs in linear time, and thus is suitable for large-scale real-time algorithmic trading applications.
    Print ISSN: 1041-4347
    Digitale ISSN: 1558-2191
    Thema: Informatik
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 84
    facet.materialart.
    Unbekannt
    Institute of Electrical and Electronics Engineers (IEEE)
    Publikationsdatum: 2016-08-05
    Beschreibung: The development of a topic in a set of topic documents is constituted by a series of person interactions at a specific time and place. Knowing the interactions of the persons mentioned in these documents is helpful for readers to better comprehend the documents. In this paper, we propose a topic person interaction detection method called SPIRIT, which classifies the text segments in a set of topic documents that convey person interactions. We design the rich interactive tree structure to represent syntactic, context, and semantic information of text, and this structure is incorporated into a tree-based convolution kernel to identify interactive segments. Experiment results based on real world topics demonstrate that the proposed rich interactive tree structure effectively detects the topic person interactions and that our method outperforms many well-known relation extraction and protein-protein interaction methods.
    Print ISSN: 1041-4347
    Digitale ISSN: 1558-2191
    Thema: Informatik
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 85
    facet.materialart.
    Unbekannt
    Institute of Electrical and Electronics Engineers (IEEE)
    Publikationsdatum: 2016-08-05
    Beschreibung: A social recommendation system has attracted a lot of attention recently in the research communities of information retrieval, machine learning, and data mining. Traditional social recommendation algorithms are often based on batch machine learning methods which suffer from several critical limitations, e.g., extremely expensive model retraining cost whenever new user ratings arrive, unable to capture the change of user preferences over time. Therefore, it is important to make social recommendation system suitable for real-world online applications where data often arrives sequentially and user preferences may change dynamically and rapidly. In this paper, we present a new framework of online social recommendation from the viewpoint of online graph regularized user preference learning (OGRPL), which incorporates both collaborative user-item relationship as well as item content features into an unified preference learning process. We further develop an efficient iterative procedure, OGRPL-FW which utilizes the Frank-Wolfe algorithm, to solve the proposed online optimization problem. We conduct extensive experiments on several large-scale datasets, in which the encouraging results demonstrate that the proposed algorithms obtain significantly lower errors (in terms of both RMSE and MAE) than the state-of-the-art online recommendation methods when receiving the same amount of training data in the online learning process.
    Print ISSN: 1041-4347
    Digitale ISSN: 1558-2191
    Thema: Informatik
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 86
    facet.materialart.
    Unbekannt
    Institute of Electrical and Electronics Engineers (IEEE)
    Publikationsdatum: 2016-08-05
    Beschreibung: In this paper, we revisit the private over-threshold data aggregation problem. We formally define the problem's security requirements as both data and user privacy goals. To achieve both goals, and to strike a balance between efficiency and functionality, we devise an efficient cryptographic construction and its proxy-based variant. Both schemes are provably secure in the semi-honest model. Our key idea for the constructions and their malicious variants is to compose two encryption functions tightly coupled in a way that the two functions are commutative and one public-key encryption has an additive homomorphism. We call that double encryption. We analyze the computational and communication complexities of our construction, and show that it is much more efficient than the existing protocols in the literature. Specifically, our protocol has linear complexity in computation and communication with respect to the number of users. Its round complexity is also linear in the number of users. Finally, we show that our basic protocol is efficiently transformed into a stronger protocol secure in the presence of malicious adversaries, and provide the resulting protocol's performance and security analysis.
    Print ISSN: 1041-4347
    Digitale ISSN: 1558-2191
    Thema: Informatik
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 87
    facet.materialart.
    Unbekannt
    Institute of Electrical and Electronics Engineers (IEEE)
    Publikationsdatum: 2016-08-05
    Beschreibung: General health examination is an integral part of healthcare in many countries. Identifying the participants at risk is important for early warning and preventive intervention. The fundamental challenge of learning a classification model for risk prediction lies in the unlabeled data that constitutes the majority of the collected dataset. Particularly, the unlabeled data describes the participants in health examinations whose health conditions can vary greatly from healthy to very-ill. There is no ground truth for differentiating their states of health. In this paper, we propose a graph-based, semi-supervised learning algorithm called SHG-Health (Semi-supervised Heterogeneous Graph on Health) for risk predictions to classify a progressively developing situation with the majority of the data unlabeled. An efficient iterative algorithm is designed and the proof of convergence is given. Extensive experiments based on both real health examination datasets and synthetic datasets are performed to show the effectiveness and efficiency of our method.
    Print ISSN: 1041-4347
    Digitale ISSN: 1558-2191
    Thema: Informatik
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 88
    facet.materialart.
    Unbekannt
    Institute of Electrical and Electronics Engineers (IEEE)
    Publikationsdatum: 2016-08-05
    Beschreibung: Automated feature selection is important for text categorization to reduce feature size and to speed up learning process of classifiers. In this paper, we present a novel and efficient feature selection framework based on the Information Theory, which aims to rank the features with their discriminative capacity for classification. We first revisit two information measures: Kullback-Leibler divergence and Jeffreys divergence for binary hypothesis testing, and analyze their asymptotic properties relating to type I and type II errors of a Bayesian classifier. We then introduce a new divergence measure, called Jeffreys-Multi-Hypothesis (JMH) divergence, to measure multi-distribution divergence for multi-class classification. Based on the JMH-divergence, we develop two efficient feature selection methods, termed maximum discrimination ( $MD$ ) and methods, for text categorization. The promising results of extensive experiments demonstrate the effectiveness of the proposed approaches.
    Print ISSN: 1041-4347
    Digitale ISSN: 1558-2191
    Thema: Informatik
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 89
    facet.materialart.
    Unbekannt
    Springer
    Publikationsdatum: 2016-08-06
    Beschreibung: We define the modular treewidth of a graph as its treewidth after contraction of modules. This parameter properly generalizes treewidth and is itself properly generalized by clique-width. We show that the number of satisfying assignments can be computed in polynomial time for CNF formulas whose incidence graphs have bounded modular treewidth. Our result generalizes known results for the treewidth of incidence graphs and is incomparable with known results for clique-width (or rank-width) of signed incidence graphs. The contraction of modules is an effective data reduction procedure. Our algorithm is the first one to harness this technique for #SAT. The order of the polynomial bounding the runtime of our algorithm depends on the modular treewidth of the input formula. We show that it is unlikely that this dependency can be avoided by proving that SAT is W[1]-hard when parameterized by the modular incidence treewidth of the given CNF formula.
    Print ISSN: 0178-4617
    Digitale ISSN: 1432-0541
    Thema: Informatik , Mathematik
    Publiziert von Springer
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 90
    facet.materialart.
    Unbekannt
    Springer
    Publikationsdatum: 2016-08-06
    Beschreibung: In a classical online network design problem, traffic requirements are gradually revealed to an algorithm. Each time a new request arrives, the algorithm has to satisfy it by augmenting the network under construction in a proper way (with no possibility of recovery). In this paper we study a natural generalization of online network design problems, where a fraction of the requests (the outliers ) can be disregarded. Now, each time a request arrives, the algorithm first decides whether to satisfy it or not, and only in the first case it acts accordingly. We cast three classical network design problems into this framework: (i) Online Steiner tree with outliers In this case a set of t terminals that belong to an n -node graph is presented, one at a time, to an algorithm. Each time a new terminal arrives, the algorithm can either discard or select it. In the latter case, the algorithm connects it to the Steiner tree under construction (initially consisting of a given root node). At the end of the process, at least k terminals must be selected. (ii) Online TSP with outliers This is the same problem as above, but with the Steiner tree replaced by a TSP tour. (iii) Online facility location with outliers In this case, we are also given a set of facility nodes, each one with an opening cost. Each time a terminal is selected, we have to connect it to some facility (and open that facility, if it is not already open). We focus on the known distribution model, where terminals are independently sampled from a given distribution. For all the above problems, we present bicriteria online algorithms that, for any constant \(\epsilon 〉0\) , select at least \((1-\epsilon )k\) terminals with high probability and pay in expectation \(O(\log ^2n)\) times more than the expected cost of the optimal offline solution (selecting k terminals). These upper bounds are complemented by inapproximability results for the case that one insists on selecting exactly k terminals, and by lower bounds including an \(\varOmega (\log n/\log \log n)\) lower bound for the case that the online algorithm is allowed to select \(\alpha \,k\) terminals only, for a sufficiently large constant \(\alpha \in (0,1)\) .
    Print ISSN: 0178-4617
    Digitale ISSN: 1432-0541
    Thema: Informatik , Mathematik
    Publiziert von Springer
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 91
    facet.materialart.
    Unbekannt
    Springer
    Publikationsdatum: 2016-08-06
    Beschreibung: A well known result in graph algorithms, due to Edmonds, states that given a digraph D and a positive integer \(\ell \) , we can test whether D contains \(\ell \) arc-disjoint out-branchings in polynomial time. However, if we ask whether there exists an out-branching and an in-branching which are arc-disjoint, then the problem becomes NP -complete. In fact, even deciding whether a digraph D contains an out-branching which is arc-disjoint from some spanning tree in the underlying undirected graph remains NP -complete. In this paper we formulate some natural optimization questions around these problems and initiate its study in the realm of parameterized complexity. More precisely, the problems we study are the following: Arc - Disjoint Branchings and Non - Disconnecting Out - Branching . In Arc - Disjoint Branchings ( Non - Disconnecting Out - Branching ), a digraph D and a positive integer k are given as input and the goal is to test whether there exist an out-branching and in-branching (respectively, a spanning tree in the underlying undirected graph) that differ on at least k arcs. We obtain the following results for these problems. Non - Disconnecting Out - Branching is fixed parameter tractable (FPT) and admits a linear vertex kernel. Arc - Disjoint Branchings is FPT on strong digraphs. The algorithm for Non - Disconnecting Out - Branching runs in time \(2^{\mathcal {O}(k)}n^{\mathcal {O}(1)}\) and the approach we use to obtain this algorithms seems useful in designing other moderately exponential time algorithms for edge/arc partitioning problems.
    Print ISSN: 0178-4617
    Digitale ISSN: 1432-0541
    Thema: Informatik , Mathematik
    Publiziert von Springer
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 92
    facet.materialart.
    Unbekannt
    Springer
    Publikationsdatum: 2016-08-06
    Beschreibung: We study the stable roommates problem in networks where players are embedded in a social context and may incorporate positive externalities into their decisions. Each player is a node in a social network and strives to form a good match with a neighboring player. We consider the existence, computation, and inefficiency of stable matchings from which no pair of players wants to deviate. We characterize prices of anarchy and stability, which capture the ratio of the total profit in the optimum matching over the total profit of the worst and best stable matching, respectively. When the benefit from a match (which we model by associating a reward with each edge) is the same for both players, we show that externalities can significantly improve the price of stability, while the price of anarchy remains unaffected. Furthermore, a good stable matching achieving the bound on the price of stability can be obtained in polynomial time. We extend these results to more general matching rewards, when players matched to each other may receive different benefits from the match. For this more general case, we show that network externalities (i.e., “caring about your friends”) can make an even larger difference and greatly reduce the price of anarchy. We show a variety of existence results and present upper and lower bounds on the prices of anarchy and stability for various structures of matching benefits. All our results on stable matchings immediately extend to the more general case of fractional stable matchings.
    Print ISSN: 0178-4617
    Digitale ISSN: 1432-0541
    Thema: Informatik , Mathematik
    Publiziert von Springer
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 93
    facet.materialart.
    Unbekannt
    Springer
    Publikationsdatum: 2016-08-06
    Beschreibung: We present the first general bounds on the mixing time of the Markov chain associated to the logit dynamics for wide classes of strategic games. The logit dynamics with inverse noise \(\beta \) describes the behavior of a complex system whose individual components act selfishly according to some partial (“noisy”) knowledge of the system, where the capacity of the agent to know the system and compute her best move is measured by parameter \(\beta \) . In particular, we prove nearly tight bounds for potential games and games with dominant strategies. Our results show that for potential games the mixing time is bounded by an exponential in \(\beta \) and in the maximum potential difference. Instead, for games with dominant strategies the mixing time cannot grow arbitrarily with \(\beta \) . Finally, we refine our analysis for a subclass of potential games called graphical coordination games, often used for modeling the diffusion of new technologies. We prove that the mixing time of the logit dynamics for these games can be upper bounded by a function that is exponential in the cutwidth of the underlying graph and in \(\beta \) . Moreover, we consider two specific and popular network topologies, the clique and the ring. For the clique, we prove an almost matching lower bound on the mixing time of the logit dynamics that is exponential in \(\beta \) and in the maximum potential difference, while for the ring we prove that the time of convergence of the logit dynamics to its stationary distribution is significantly shorter.
    Print ISSN: 0178-4617
    Digitale ISSN: 1432-0541
    Thema: Informatik , Mathematik
    Publiziert von Springer
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 94
    facet.materialart.
    Unbekannt
    Springer
    Publikationsdatum: 2016-08-06
    Beschreibung: Given a plane graph G (i.e., a planar graph with a fixed planar embedding and outer face) and a biconnected subgraph \(G^{\prime }\) with a fixed planar straight-line convex drawing, we consider the question whether this drawing can be extended to a planar straight-line drawing of G . We characterize when this is possible in terms of simple necessary conditions, which we prove to be sufficient. This also leads to a linear-time testing algorithm. If a drawing extension exists, one can be computed in the same running time.
    Print ISSN: 0178-4617
    Digitale ISSN: 1432-0541
    Thema: Informatik , Mathematik
    Publiziert von Springer
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 95
    facet.materialart.
    Unbekannt
    Institute of Electrical and Electronics Engineers (IEEE)
    Publikationsdatum: 2016-07-08
    Beschreibung: The proliferation of location-based social networks, such as Foursquare and Facebook Places, offers a variety of ways to record human mobility, including user generated geo-tagged contents, check-in services, and mobile apps. Although trajectory data is of great value to many applications, it is challenging to analyze and mine trajectory data due to the complex characteristics reflected in human mobility, which is affected by multiple contextual information. In this paper, we propose a Multi-Context Trajectory Embedding Model, called MC-TEM, to explore contexts in a systematic way. MC-TEM is developed in the distributed representation learning framework, and it is flexible to characterize various kinds of useful contexts for different applications. To the best of our knowledge, it is the first time that the distributed representation learning methods apply to trajectory data. We formally incorporate multiple context information of trajectory data into the proposed model, including user-level, trajectory-level, location-level, and temporal contexts. All the context information is represented in the same embedding space. We apply MC-TEM to two challenging tasks, namely location recommendation and social link prediction. We conduct extensive experiments on three real-world datasets. Extensive experiment results have demonstrated the superiority of our MC-TEM model over several state-of-the-art methods.
    Print ISSN: 1041-4347
    Digitale ISSN: 1558-2191
    Thema: Informatik
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 96
    facet.materialart.
    Unbekannt
    Institute of Electrical and Electronics Engineers (IEEE)
    Publikationsdatum: 2016-07-08
    Beschreibung: The study of urban networks reveals that the accessibility of important city objects for the vehicle traffic and pedestrians is significantly correlated to the popularity, micro-criminality, micro-economic vitality, and social liveability of the city, and is always the chief factor in regulating the growth and expansion of the city. The accessibility between different components of an urban structure are frequently measured along the streets and routes considered as edges of a planar graph, while the traffic ultimate destination points and street junctions are treated as vertices. For estimation of the accessibility of destination vertex $j$ from vertex $i$ through urban networks, in particular, the random walks are used to calculate the expected distance a random walker starting from $i$ makes before $j$ is visited (known as access time ). The state-of-the-art of access time computation is costly in large planar graphs since it involves matrix operation over entire graph. The time complexity is $O(n^{2.376})$ where $n$ is the number of vertices in the planar graph. To enable efficient access time query answering in large planar graphs, this work proposes the first access time oracle which is based on the proposed access time decomposition and reconstruction scheme. The oracle is a hierarchical data structure with deliberate design on the relationships between different hierarchical levels. The storage requirement of the proposed oracle is $O(n^{frac{4}{3}}log log n)$ and the access time query response time is $O(n^{frac{2}{3}})$ . The extensive tests on a number of large real-world road networks (with up to about 2 million vertices) have verified the superiority of the proposed oracle.
    Print ISSN: 1041-4347
    Digitale ISSN: 1558-2191
    Thema: Informatik
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 97
    facet.materialart.
    Unbekannt
    Institute of Electrical and Electronics Engineers (IEEE)
    Publikationsdatum: 2016-07-08
    Beschreibung: Clustering is one of the research hotspots in the field of data mining and has extensive applications in practice. Recently, Rodriguez and Laio [1] published a clustering algorithm on Science that identifies the clustering centers in an intuitive way and clusters objects efficiently and effectively. However, the algorithm is sensitive to a preassigned parameter and suffers from the identification of the “ideal” number of clusters. To overcome these shortages, this paper proposes a new clustering algorithm that can detect the clustering centers automatically via statistical testing. Specifically, the proposed algorithm first defines a new metric to measure the density of an object that is more robust to the preassigned parameter, further generates a metric to evaluate the centrality of each object. Afterwards, it identifies the objects with extremely large centrality metrics as the clustering centers via an outward statistical testing method. Finally, it groups the remaining objects into clusters containing their nearest neighbors with higher density. Extensive experiments are conducted over different kinds of clustering data sets to evaluate the performance of the proposed algorithm and compare with the algorithm in Science. The results show the effectiveness and robustness of the proposed algorithm.
    Print ISSN: 1041-4347
    Digitale ISSN: 1558-2191
    Thema: Informatik
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 98
    facet.materialart.
    Unbekannt
    Institute of Electrical and Electronics Engineers (IEEE)
    Publikationsdatum: 2016-07-08
    Beschreibung: A representative skyline contains $k$ skyline points that can represent its corresponding full skyline. The existing measuring criteria of $k$ representative skylines are specifically designed for static data, and they cannot effectively handle streaming data. In this paper, we focus on the problem of calculating the $k$ representative skyline over data streams. First, we propose a new criterion to choose $k$ skyline points as the $k$ representative skyline for data stream environments, termed the $k$ largest dominance skyline ( $k$ -LDS), which is representative to the entire data set and is highly stable over the streaming data. Second, we propose an efficient exact algorithm, called Prefix-based Algorithm (PBA), to solve the $k$ -LDS problem in a 2-dimensional space. The time complexity of PBA is only $mathcal {O}((M-k)times k)$ where $M$ is the size of the full skyline set. Third, the $k$ -LDS problem for a $d$ -dimensional ( $dge 3$ ) space turns out to be very complex. Therefore, a greedy algorithm is designed to answer $k$ -LDS queries. To further accelerate the calculation, we propose a $epsilon$ -greedy algorithm which can achieve an approximate factor of $frac{1}{(1+epsilon)}(1-frac{1}{sqrt{e}})$
    Print ISSN: 1041-4347
    Digitale ISSN: 1558-2191
    Thema: Informatik
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 99
    facet.materialart.
    Unbekannt
    Institute of Electrical and Electronics Engineers (IEEE)
    Publikationsdatum: 2016-07-08
    Beschreibung: Domain adaptation generalizes a learning model across source domain and target domain that are sampled from different distributions. It is widely applied to cross-domain data mining for reusing labeled information and mitigating labeling consumption. Recent studies reveal that deep neural networks can learn abstract feature representation, which can reduce, but not remove, the cross-domain discrepancy. To enhance the invariance of deep representation and make it more transferable across domains, we propose a unified deep adaptation framework for jointly learning transferable representation and classifier to enable scalable domain adaptation, by taking the advantages of both deep learning and optimal two-sample matching. The framework constitutes two inter-dependent paradigms, unsupervised pre-training for effective training of deep models using deep denoising autoencoders, and supervised fine-tuning for effective exploitation of discriminative information using deep neural networks, both learned by embedding the deep representations to reproducing kernel Hilbert spaces (RKHSs) and optimally matching different domain distributions. To enable scalable learning, we develop a linear-time algorithm using unbiased estimate that scales linearly to large samples. Extensive empirical results show that the proposed framework significantly outperforms state of the art methods on diverse adaptation tasks: sentiment polarity prediction, email spam filtering, newsgroup content categorization, and visual object recognition.
    Print ISSN: 1041-4347
    Digitale ISSN: 1558-2191
    Thema: Informatik
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 100
    facet.materialart.
    Unbekannt
    Institute of Electrical and Electronics Engineers (IEEE)
    Publikationsdatum: 2016-07-08
    Beschreibung: With the rapid growth of various applications on the Internet, recommender systems become fundamental for helping users alleviate the problem of information overload. Since contextual information is a significant factor in modeling the user behavior, various context-aware recommendation methods have been proposed recently. The state-of-the-art context modeling methods usually treat contexts as certain dimensions similar to those of users and items, and capture relevances between contexts and users/items. However, such kind of relevance has much difficulty in explanation. Some works on multi-domain relation prediction can also be used for the context-aware recommendation, but they have limitations in generating recommendations under a large amount of contextual information. Motivated by recent works in natural language processing, we represent each context value with a latent vector, and model the contextual information as a semantic operation on the user and item. Besides, we use the contextual operating tensor to capture the common semantic effects of contexts. Experimental results show that the proposed Context Operating Tensor (COT) model yields significant improvements over the competitive compared methods on three typical datasets. From the experimental results of COT, we also obtain some interesting observations which follow our intuition.
    Print ISSN: 1041-4347
    Digitale ISSN: 1558-2191
    Thema: Informatik
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
Schließen ⊗
Diese Webseite nutzt Cookies und das Analyse-Tool Matomo. Weitere Informationen finden Sie hier...