ALBERT

All Library Books, journals and Electronic Records Telegrafenberg

Your email was sent successfully. Check your inbox.

An error occurred while sending the email. Please try again.

Proceed reservation?

Export
Filter
  • Articles  (2,880)
  • 2020-2022  (496)
  • 2015-2019  (2,384)
  • 1945-1949
  • Algorithms  (852)
  • IEEE Transactions on Knowledge and Data Engineering  (728)
  • 110151
  • 1274
  • Computer Science  (2,880)
  • 1
    Publication Date: 2020-08-29
    Description: Healthcare facilities are constantly deteriorating due to tight budgets allocated to the upkeep of building assets. This entails the need for improved deterioration modeling of such buildings in order to enforce a predictive maintenance approach that decreases the unexpected occurrence of failures and the corresponding downtime elapsed to repair or replace the faulty asset components. Currently, hospitals utilize subjective deterioration prediction methodologies that mostly rely on age as the sole indicator of degradation to forecast the useful lives of the building components. Thus, this paper aims at formulating a more efficient stochastic deterioration prediction model that integrates the latest observed condition into the forecasting procedure to overcome the subjectivity and uncertainties associated with the currently employed methods. This is achieved by means of developing a hybrid genetic algorithm-based fuzzy Markovian model that simulates the deterioration process given the scarcity of available data demonstrating the condition assessment and evaluation for such critical facilities. A nonhomogeneous transition probability matrix (TPM) based on fuzzy membership functions representing the condition, age and relative deterioration rate of the hospital systems is utilized to address the inherited uncertainties. The TPM is further calibrated by means of a genetic algorithm to circumvent the drawbacks of the expert-based models. A sensitivity analysis was carried out to analyze the possible changes in the output resulting from predefined modifications to the input parameters in order to ensure the robustness of the model. The performance of the deterioration prediction model developed is then validated through a comparison with a state-of-art stochastic model in contrast to real hospital datasets, and the results obtained from the developed model significantly outperformed the long-established Weibull distribution-based deterioration prediction methodology with mean absolute errors of 1.405 and 9.852, respectively. Therefore, the developed model is expected to assist decision-makers in creating more efficient maintenance programs as well as more data-driven capital renewal plans.
    Electronic ISSN: 1999-4893
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 2
    Publication Date: 2020-08-29
    Description: The harmonic closeness centrality measure associates, to each node of a graph, the average of the inverse of its distances from all the other nodes (by assuming that unreachable nodes are at infinite distance). This notion has been adapted to temporal graphs (that is, graphs in which edges can appear and disappear during time) and in this paper we address the question of finding the top-k nodes for this metric. Computing the temporal closeness for one node can be done in O(m) time, where m is the number of temporal edges. Therefore computing exactly the closeness for all nodes, in order to find the ones with top closeness, would require O(nm) time, where n is the number of nodes. This time complexity is intractable for large temporal graphs. Instead, we show how this measure can be efficiently approximated by using a “backward” temporal breadth-first search algorithm and a classical sampling technique. Our experimental results show that the approximation is excellent for nodes with high closeness, allowing us to detect them in practice in a fraction of the time needed for computing the exact closeness of all nodes. We validate our approach with an extensive set of experiments.
    Electronic ISSN: 1999-4893
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 3
    Publication Date: 2020-07-16
    Description: High order convective Cahn-Hilliard type equations describe the faceting of a growing surface, or the dynamics of phase transitions in ternary oil-water-surfactant systems. In this paper, we prove the well-posedness of the classical solutions for the Cauchy problem, associated with this equation.
    Electronic ISSN: 1999-4893
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 4
    Publication Date: 2020-07-08
    Description: We consider a rather general problem of nonparametric estimation of an uncountable set of probability density functions (p.d.f.’s) of the form: f ( x ; r ) , where r is a non-random real variable and ranges from R 1 to R 2 . We put emphasis on the algorithmic aspects of this problem, since they are crucial for exploratory analysis of big data that are needed for the estimation. A specialized learning algorithm, based on the 2D FFT, is proposed and tested on observations that allow for estimate p.d.f.’s of a jet engine temperatures as a function of its rotation speed. We also derive theoretical results concerning the convergence of the estimation procedure that contains hints on selecting parameters of the estimation algorithm.
    Electronic ISSN: 1999-4893
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 5
    Publication Date: 2020-07-09
    Description: We report the design of a Spiking Neural Network (SNN) edge detector with biologically inspired neurons that has a conceptual similarity with both Hodgkin-Huxley (HH) model neurons and Leaky Integrate-and-Fire (LIF) neurons. The computation of the membrane potential, which is used to determine the occurrence or absence of spike events, at each time step, is carried out by using the analytical solution to a simplified version of the HH neuron model. We find that the SNN based edge detector detects more edge pixels in images than those obtained by a Sobel edge detector. We designed a pipeline for image classification with a low-exposure frame simulation layer, SNN edge detection layers as pre-processing layers and a Convolutional Neural Network (CNN) as a classification module. We tested this pipeline for the task of classification with the Digits dataset, which is available in MATLAB. We find that the SNN based edge detection layer increases the image classification accuracy at lower exposure times, that is, for 1 〈 t 〈 T /4, where t is the number of milliseconds in a simulated exposure frame and T is the total exposure time, with reference to a Sobel edge or Canny edge detection layer in the pipeline. These results pave the way for developing novel cognitive neuromorphic computing architectures for millisecond timescale detection and object classification applications using event or spike cameras.
    Electronic ISSN: 1999-4893
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 6
    Publication Date: 2020-07-05
    Description: Microscopic crowd simulation can help to enhance the safety of pedestrians in situations that range from museum visits to music festivals. To obtain a useful prediction, the input parameters must be chosen carefully. In many cases, a lack of knowledge or limited measurement accuracy add uncertainty to the input. In addition, for meaningful parameter studies, we first need to identify the most influential parameters of our parametric computer models. The field of uncertainty quantification offers standardized and fully automatized methods that we believe to be beneficial for pedestrian dynamics. In addition, many methods come at a comparatively low cost, even for computationally expensive problems. This allows for their application to larger scenarios. We aim to identify and adapt fitting methods to microscopic crowd simulation in order to explore their potential in pedestrian dynamics. In this work, we first perform a variance-based sensitivity analysis using Sobol’ indices and then crosscheck the results by a derivative-based measure, the activity scores. We apply both methods to a typical scenario in crowd simulation, a bottleneck. Because constrictions can lead to high crowd densities and delays in evacuations, several experiments and simulation studies have been conducted for this setting. We show qualitative agreement between the results of both methods. Additionally, we identify a one-dimensional subspace in the input parameter space and discuss its impact on the simulation. Moreover, we analyze and interpret the sensitivity indices with respect to the bottleneck scenario.
    Electronic ISSN: 1999-4893
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 7
    Publication Date: 2020-06-30
    Description: Standard (Lomb-Scargle, likelihood, etc.) procedures for power-spectrum analysis provide convenient estimates of the significance of any peak in a power spectrum, based—typically—on the assumption that the measurements being analyzed have a normal (i.e., Gaussian) distribution. However, the measurement sequence provided by a real experiment or a real observational program may not meet this requirement. The RONO (rank-order normalization) procedure generates a proxy distribution that retains the rank-order of the original measurements but has a strictly normal distribution. The proxy distribution may then be analyzed by standard power-spectrum analysis. We show by an example that the resulting power spectrum may prove to be quite close to the power spectrum obtained from the original data by a standard procedure, even if the distribution of the original measurements is far from normal. Such a comparison would tend to validate the original analysis.
    Electronic ISSN: 1999-4893
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 8
    Publication Date: 2020-06-30
    Description: Toward strong demand for very high-speed I/O for processors, physical performance growth of hardware I/O speed was drastically increased in this decade. However, the recent Big Data applications still demand the larger I/O bandwidth and the lower latency for the speed. Because the current I/O performance does not improve so drastically, it is the time to consider another way to increase it. To overcome this challenge, we focus on lossless data compression technology to decrease the amount of data itself in the data communication path. The recent Big Data applications treat data stream that flows continuously and never allow stalling processing due to the high speed. Therefore, an elegant hardware-based data compression technology is demanded. This paper proposes a novel lossless data compression, called ASE coding. It encodes streaming data by applying the entropy coding approach. ASE coding instantly assigns the fewest bits to the corresponding compressed data according to the number of occupied entries in a look-up table. This paper describes the detailed mechanism of ASE coding. Furthermore, the paper demonstrates performance evaluations to promise that ASE coding adaptively shrinks streaming data and also works on a small amount of hardware resources without stalling or buffering any part of data stream.
    Electronic ISSN: 1999-4893
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 9
    Publication Date: 2020-07-01
    Description: Text annotation is the process of identifying the sense of a textual segment within a given context to a corresponding entity on a concept ontology. As the bag of words paradigm’s limitations become increasingly discernible in modern applications, several information retrieval and artificial intelligence tasks are shifting to semantic representations for addressing the inherent natural language polysemy and homonymy challenges. With extensive application in a broad range of scientific fields, such as digital marketing, bioinformatics, chemical engineering, neuroscience, and social sciences, community detection has attracted great scientific interest. Focusing on linguistics, by aiming to identify groups of densely interconnected subgroups of semantic ontologies, community detection application has proven beneficial in terms of disambiguation improvement and ontology enhancement. In this paper we introduce a novel distributed supervised knowledge-based methodology employing community detection algorithms for text annotation with Wikipedia Entities, establishing the unprecedented concept of community Coherence as a metric for local contextual coherence compatibility. Our experimental evaluation revealed that deeper inference of relatedness and local entity community coherence in the Wikipedia graph bears substantial improvements overall via a focus on accuracy amelioration of less common annotations. The proposed methodology is propitious for wider adoption, attaining robust disambiguation performance.
    Electronic ISSN: 1999-4893
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 10
    Publication Date: 2020-06-30
    Description: Geomechanical modelling of the processes associated to the exploitation of subsurface resources, such as land subsidence or triggered/induced seismicity, is a common practice of major interest. The prediction reliability depends on different sources of uncertainty, such as the parameterization of the constitutive model characterizing the deep rock behaviour. In this study, we focus on a Sobol’-based sensitivity analysis and uncertainty reduction via assimilation of land deformations. A synthetic test case application on a deep hydrocarbon reservoir is considered, where land settlements are predicted with the aid of a 3-D Finite Element (FE) model. Data assimilation is performed via the Ensemble Smoother (ES) technique and its variation in the form of Multiple Data Assimilation (ES-MDA). However, the ES convergence is guaranteed with a large number of Monte Carlo (MC) simulations, that may be computationally infeasible in large scale and complex systems. For this reason, a surrogate model based on the generalized Polynomial Chaos Expansion (gPCE) is proposed as an approximation of the forward problem. This approach allows to efficiently compute the Sobol’ indices for the sensitivity analysis and greatly reduce the computational cost of the original ES and MDA formulations, also enhancing the accuracy of the overall prediction process.
    Electronic ISSN: 1999-4893
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 11
    Publication Date: 2020-06-30
    Description: Clustering is an unsupervised machine learning technique with many practical applications that has gathered extensive research interest. Aside from deterministic or probabilistic techniques, fuzzy C-means clustering (FCM) is also a common clustering technique. Since the advent of the FCM method, many improvements have been made to increase clustering efficiency. These improvements focus on adjusting the membership representation of elements in the clusters, or on fuzzifying and defuzzifying techniques, as well as the distance function between elements. This study proposes a novel fuzzy clustering algorithm using multiple different fuzzification coefficients depending on the characteristics of each data sample. The proposed fuzzy clustering method has similar calculation steps to FCM with some modifications. The formulas are derived to ensure convergence. The main contribution of this approach is the utilization of multiple fuzzification coefficients as opposed to only one coefficient in the original FCM algorithm. The new algorithm is then evaluated with experiments on several common datasets and the results show that the proposed algorithm is more efficient compared to the original FCM as well as other clustering methods.
    Electronic ISSN: 1999-4893
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 12
    Publication Date: 2020-07-03
    Description: Business processes evolve over time to adapt to changing business environments. This requires continuous monitoring of business processes to gain insights into whether they conform to the intended design or deviate from it. The situation when a business process changes while being analysed is denoted as Concept Drift. Its analysis is concerned with studying how a business process changes, in terms of detecting and localising changes and studying the effects of the latter. Concept drift analysis is crucial to enable early detection and management of changes, that is, whether to promote a change to become part of an improved process, or to reject the change and make decisions to mitigate its effects. Despite its importance, there exists no comprehensive framework for analysing concept drift types, affected process perspectives, and granularity levels of a business process. This article proposes the CONcept Drift Analysis in Process Mining (CONDA-PM) framework describing phases and requirements of a concept drift analysis approach. CONDA-PM was derived from a Systematic Literature Review (SLR) of current approaches analysing concept drift. We apply the CONDA-PM framework on current approaches to concept drift analysis and evaluate their maturity. Applying CONDA-PM framework highlights areas where research is needed to complement existing efforts.
    Electronic ISSN: 1999-4893
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 13
    Publication Date: 2020-04-14
    Description: Let P be a set of n points in R d , k ≥ 1 be an integer and ε ∈ ( 0 , 1 ) be a constant. An ε-coreset is a subset C ⊆ P with appropriate non-negative weights (scalars), that approximates any given set Q ⊆ R d of k centers. That is, the sum of squared distances over every point in P to its closest point in Q is the same, up to a factor of 1 ± ε to the weighted sum of C to the same k centers. If the coreset is small, we can solve problems such as k-means clustering or its variants (e.g., discrete k-means, where the centers are restricted to be in P, or other restricted zones) on the small coreset to get faster provable approximations. Moreover, it is known that such coreset support streaming, dynamic and distributed data using the classic merge-reduce trees. The fact that the coreset is a subset implies that it preserves the sparsity of the data. However, existing such coresets are randomized and their size has at least linear dependency on the dimension d. We suggest the first such coreset of size independent of d. This is also the first deterministic coreset construction whose resulting size is not exponential in d. Extensive experimental results and benchmarks are provided on public datasets, including the first coreset of the English Wikipedia using Amazon’s cloud.
    Electronic ISSN: 1999-4893
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 14
    Publication Date: 2015-08-12
    Description: We examine a distributed detection problem in a wireless sensor network, where sensor nodes collaborate to detect a Gaussian signal with an unknown change of power, i.e., a scale parameter. Due to power/bandwidth constraints, we consider the case where each sensor quantizes its observation into a binary digit. The binary data are then transmitted through error-prone wireless links to a fusion center, where a generalized likelihood ratio test (GLRT) detector is employed to perform a global decision. We study the design of a binary quantizer based on an asymptotic analysis of the GLRT. Interestingly, the quantization threshold of the quantizer is independent of the unknown scale parameter. Numerical results are included to illustrate the performance of the proposed quantizer and GLRT in binary symmetric channels (BSCs).
    Electronic ISSN: 1999-4893
    Topics: Computer Science
    Published by MDPI Publishing
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 15
    Publication Date: 2015-08-13
    Description: More and more hybrid electric vehicles are driven since they offer such advantages as energy savings and better active safety performance. Hybrid vehicles have two or more power driving systems and frequently switch working condition, so controlling stability is very important. In this work, a two-stage Kalman algorithm method is used to fuse data in hybrid vehicle stability testing. First, the RT3102 navigation system and Dewetron system are introduced. Second, a modeling of data fusion is proposed based on the Kalman filter. Then, this modeling is simulated and tested on a sample vehicle, using Carsim and Simulink software to test the results. The results showed the merits of this modeling.
    Electronic ISSN: 1999-4893
    Topics: Computer Science
    Published by MDPI Publishing
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 16
    Publication Date: 2015-08-05
    Description: Currently deep learning has made great breakthroughs in visual and speech processing, mainly because it draws lessons from the hierarchical mode that brain deals with images and speech. In the field of NLP, a topic model is one of the important ways for modeling documents. Topic models are built on a generative model that clearly does not match the way humans write. In this paper, we propose Event Model, which is unsupervised and based on the language processing mechanism of neurolinguistics, to model documents. In Event Model, documents are descriptions of concrete or abstract events seen, heard, or sensed by people and words are objects in the events. Event Model has two stages: word learning and dimensionality reduction. Word learning is to learn semantics of words based on deep learning. Dimensionality reduction is the process that representing a document as a low dimensional vector by a linear mode that is completely different from topic models. Event Model achieves state-of-the-art results on document retrieval tasks.
    Electronic ISSN: 1999-4893
    Topics: Computer Science
    Published by MDPI Publishing
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 17
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-08-07
    Description: Given a database table with records that can be ranked, an interesting problem is to identify selection conditions for the table, which are qualified by an input record and render its ranking as high as possible among the qualifying tuples. In this paper, we study this standing maximization problem, which finds application in object promotion and characterization. After showing the hardness of the problem, we propose greedy methods, which are experimentally shown to achieve high accuracy compared to exhaustive enumeration, while scaling very well to the problem input size. Our contributions include a linear-time algorithm for determining the optimal selection range for an ordinal attribute and techniques for choosing and prioritizing the most promising selection predicates to apply. Experiments on real datasets confirm the effectiveness and efficiency of our techniques.
    Print ISSN: 1041-4347
    Electronic ISSN: 1558-2191
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 18
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-08-07
    Description: Some fairly recent research has focused on providing XACML-based solutions for dynamic privacy policy management. In this regard, a number of works have provided enhancements to the performance of XACML policy enforcement point (PEP) component, but very few have focused on enhancing the accuracy of that component. This paper improves the accuracy of an XACML PEP by filling some gaps in the existing works. In particular, dynamically incorporating user access context into the privacy policy decision, and its enforcement. We provide an XACML-based implementation of a dynamic privacy policy management framework and an evaluation of the applicability of our system in comparison to some of the existing approaches.
    Print ISSN: 1041-4347
    Electronic ISSN: 1558-2191
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 19
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-08-07
    Description: This paper first introduces pattern aided regression (PXR) models, a new type of regression models designed to represent accurate and interpretable prediction models. This was motivated by two observations: (1) Regression modeling applications often involve complex diverse predictor-response relationships , which occur when the optimal regression models (of given regression model type) fitting two or more distinct logical groups of data are highly different. (2) State-of-the-art regression methods are often unable to adequately model such relationships. This paper defines PXR models using several patterns and local regression models, which respectively serve as logical and behavioral characterizations of distinct predictor-response relationships. The paper also introduces a contrast pattern aided regression (CPXR) method, to build accurate PXR models. In experiments, the PXR models built by CPXR are very accurate in general, often outperforming state-of-the-art regression methods by big margins. Usually using (a) around seven simple patterns and (b) linear local regression models, those PXR models are easy to interpret; in fact, their complexity is just a bit higher than that of (piecewise) linear regression models and is significantly lower than that of traditional ensemble based regression models. CPXR is especially effective for high-dimensional data. The paper also discusses how to use CPXR methodology for analyzing prediction models and correcting their prediction errors.
    Print ISSN: 1041-4347
    Electronic ISSN: 1558-2191
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 20
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-08-07
    Description: We analyze models for predicting the probability of a strikeout for a batter/pitcher matchup in baseball using player descriptors that can be estimated accurately from small samples. We start with the log5 model which has been used extensively for describing matchups in sports. Log5 is a special case of a logit model and we use constrained logistic regression over nearly one million matchup observations to assess the use of the log5 explanatory variables for this application. We also show that a batter/pitcher ground ball rate interaction variable is significant for the prediction of strikeout probability and we provide physical justification for the inclusion of this variable in the model. We quantify the differences among the models and show that batters control the majority of the variance in predicted strikeout rate.
    Print ISSN: 1041-4347
    Electronic ISSN: 1558-2191
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 21
    Publication Date: 2015-08-22
    Description: Community detection in a complex network is an important problem of much interest in recent years. In general, a community detection algorithm chooses an objective function and captures the communities of the network by optimizing the objective function, and then, one uses various heuristics to solve the optimization problem to extract the interesting communities for the user. In this article, we demonstrate the procedure to transform a graph into points of a metric space and develop the methods of community detection with the help of a metric defined for a pair of points. We have also studied and analyzed the community structure of the network therein. The results obtained with our approach are very competitive with most of the well-known algorithms in the literature, and this is justified over the large collection of datasets. On the other hand, it can be observed that time taken by our algorithm is quite less compared to other methods and justifies the theoretical findings.
    Electronic ISSN: 1999-4893
    Topics: Computer Science
    Published by MDPI Publishing
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 22
    Publication Date: 2015-08-21
    Description: A three-step iterative method with fifth-order convergence as a new modification of Newton’s method was presented. This method is for finding multiple roots of nonlinear equation with unknown multiplicity m whose multiplicity m is the highest multiplicity. Its order of convergence is analyzed and proved. Results for some numerical examples show the efficiency of the new method.
    Electronic ISSN: 1999-4893
    Topics: Computer Science
    Published by MDPI Publishing
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 23
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-08-07
    Description: Over the past decade or so, several research groups have addressed the problem of multi-label classification where each example can belong to more than one class at the same time. A common approach, called  Binary Relevance (BR) , addresses this problem by inducing a separate classifier for each class. Research has shown that this framework can be improved if mutual class dependence is exploited: an example that belongs to class $X$ is likely to belong also to class $Y$ ; conversely, belonging to $X$ can make an example less likely to belong to $Z$ . Several works sought to model this information by using the vector of class labels as additional example attributes. To fill the unknown values of these attributes during prediction, existing methods resort to using outputs of other classifiers, and this makes them prone to errors. This is where our paper wants to contribute. We identified two potential ways to prune unnecessary dependencies and to reduce error-propagation in our new classifier-stacking technique, which is named PruDent . Experimental results indicate that the classification performance of PruDent compares favorably with that of other state-of-the-art approaches over a broad range of testbeds. Mor- over, its computational costs grow only linearly in the number of classes.
    Print ISSN: 1041-4347
    Electronic ISSN: 1558-2191
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 24
    Publication Date: 2015-08-07
    Description: This work deals with the problem of producing a fast and accurate data classification, learning it from a possibly small set of records that are already classified. The proposed approach is based on the framework of the so-called Logical Analysis of Data (LAD), but enriched with information obtained from statistical considerations on the data. A number of discrete optimization problems are solved in the different steps of the procedure, but their computational demand can be controlled. The accuracy of the proposed approach is compared to that of the standard LAD algorithm, of support vector machines and of label propagation algorithm on publicly available datasets of the UCI repository. Encouraging results are obtained and discussed.
    Print ISSN: 1041-4347
    Electronic ISSN: 1558-2191
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 25
    Publication Date: 2015-08-07
    Description: A new graph based constrained semi-supervised learning (G-CSSL) framework is proposed. Pairwise constraints (PC) are used to specify the types (intra- or inter-class) of points with labels. Since the number of labeled data is typically small in SSL setting, the core idea of this framework is to create and enrich the PC sets using the propagated soft labels from both labeled and unlabeled data by special label propagation (SLP), and hence obtaining more supervised information for delivering enhanced performance. We also propose a Two-stage Sparse Coding, termed TSC, for achieving adaptive neighborhood for SLP. The first stage aims at correcting the possible corruptions in data and training an informative dictionary, and the second stage focuses on sparse coding. To deliver enhanced inter-class separation and intra-class compactness, we also present a mixed soft-similarity measure to evaluate the similarity/dissimilarity of constrained pairs using the sparse codes and outputted probabilistic values by SLP. Simulations on the synthetic and real datasets demonstrated the validity of our algorithms for data representation and image recognition, compared with other related state-of-the-art graph based semi-supervised techniques.
    Print ISSN: 1041-4347
    Electronic ISSN: 1558-2191
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 26
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-08-07
    Description: In large databases, the amount and the complexity of the data calls for data summarization techniques. Such summaries are used to assist fast approximate query answering or query optimization. Histograms are a prominent class of model-free data summaries and are widely used in database systems. So-called self-tuning histograms look at query-execution results to refine themselves. An assumption with such histograms, which has not been questioned so far, is that they can learn the dataset from scratch, that is—starting with an empty bucket configuration. We show that this is not the case. Self-tuning methods are very sensitive to the initial configuration. Three major problems stem from this. Traditional self-tuning is unable to learn projections of multi-dimensional data, is sensitive to the order of queries, and reaches only local optima with high estimation errors. We show how to improve a self-tuning method significantly by starting with a carefully chosen initial configuration. We propose initialization by dense subspace clusters in projections of the data, which improves both accuracy and robustness of self-tuning. Our experiments on different datasets show that the error rate is typically halved compared to the uninitialized version.
    Print ISSN: 1041-4347
    Electronic ISSN: 1558-2191
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 27
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-08-07
    Description: Recently, two ideas have been explored that lead to more accurate algorithms for time-series classification (TSC). First, it has been shown that the simplest way to gain improvement on TSC problems is to transform into an alternative data space where discriminatory features are more easily detected. Second, it was demonstrated that with a single data representation, improved accuracy can be achieved through simple ensemble schemes. We combine these two principles to test the hypothesis that forming a collective of ensembles of classifiers on different data transformations improves the accuracy of time-series classification. The collective contains classifiers constructed in the time, frequency, change, and shapelet transformation domains. For the time domain, we use a set of elastic distance measures. For the other domains, we use a range of standard classifiers. Through extensive experimentation on 72 datasets, including all of the 46 UCR datasets, we demonstrate that the simple collective formed by including all classifiers in one ensemble is significantly more accurate than any of its components and any other previously published TSC algorithm. We investigate alternative hierarchical collective structures and demonstrate the utility of the approach on a new problem involving classifying Caenorhabditis elegans mutant types.
    Print ISSN: 1041-4347
    Electronic ISSN: 1558-2191
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 28
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-08-07
    Description: In real-world graphs such as social networks, Semantic Web and biological networks, each vertex usually contains rich information, which can be modeled by a set of tokens or elements. In this paper, we study a subgraph matching with set similarity (SMS $^2$ ) query over a large graph database, which retrieves subgraphs that are structurally isomorphic to the query graph, and meanwhile satisfy the condition of vertex pair matching with the (dynamic) weighted set similarity. To efficiently process the SMS $^2$ query, this paper designs a novel lattice-based index for data graph, and lightweight signatures for both query vertices and data vertices. Based on the index and signatures, we propose an efficient two-phase pruning strategy including set similarity pruning and structure-based pruning, which exploits the unique features of both (dynamic) weighted set similarity and graph topology. We also propose an efficient dominating-set-based subgraph matching algorithm guided by a dominating set selection algorithm to achieve better query performance. Extensive experiments on both real and synthetic datasets demonstrate that our method outperforms state-of-the-art methods by an order of magnitude.
    Print ISSN: 1041-4347
    Electronic ISSN: 1558-2191
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 29
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-08-07
    Description: Data imputation aims at filling in missing attribute values in databases. Most existing imputation methods to string attribute values are inferring-based approaches, which usually fail to reach a high imputation recall by just inferring missing values from the complete part of the data set. Recently, some retrieving-based methods are proposed to retrieve missing values from external resources such as the World Wide Web, which tend to reach a much higher imputation recall, but inevitably bring a large overhead by issuing a large number of search queries. In this paper, we investigate the interaction between the inferring-based methods and the retrieving-based methods. We show that retrieving a small number of selected missing values can greatly improve the imputation recall of the inferring-based methods. With this intuition, we propose an inTeractive Retrieving-Inferring data imPutation approach (TRIP), which performs retrieving and inferring alternately in filling in missing attribute values in a data set. To ensure the high recall at the minimum cost, TRIP faces a challenge of selecting the least number of missing values for retrieving to maximize the number of inferable values. Our proposed solution is able to identify an optimal retrieving-inferring scheduling scheme in deterministic data imputation, and the optimality of the generated scheme is theoretically analyzed with proofs. We also analyze with an example that the optimal scheme is not feasible to be achieved in $tau$ -constrained stochastic data imputation ( $tau$ -SDI), but still, our proposed solution identifies an expected-optimal scheme in $tau$ -SDI. Extensive experiments on four data collections show that TRIP retrieves on average 20 percent missing values and achieves the same high recall that was reached by the retrieving-based approach.
    Print ISSN: 1041-4347
    Electronic ISSN: 1558-2191
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 30
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-08-07
    Description: Visual classification has attracted considerable research interests in the past decades. In this paper, a novel $ell _1$ -hypergraph model for visual classification is proposed. Hypergraph learning, as a natural extension of graph model, has been widely used in many machine learning tasks. In previous work, hypergraph is usually constructed by attribute-based or neighborhood-based methods. That is, a hyperedge is generated by connecting a set of samples sharing a same feature attribute or in a neighborhood. However, these methods are unable to explore feature space globally or sensitive to noises. To address these problems, we propose a novel hypergraph construction approach that leverages sparse representation to generate hyperedges and learns the relationship among hyperedges and their vertices. First, for each sample, a hyperedge is generated by regarding it as the centroid and linking it as well as its nearest neighbors. Then, the sparse representation method is applied to represent the centroid vertex by other vertices within the same hyperedge. The vertices with zero coefficients are removed from the hyperedge. Finally, the representation coefficients are used to define the incidence relation between the hyperedge and the vertices. In our approach, we also optimize the hyperedge weights to modulate the effects of different hyperedges. We leverage the prior knowledge on the hyperedges so that the hyperedges sharing more vertices can have closer weights, where a graph Laplacian is used to regularize the optimization of the weights. Our approach is named $ell _1$ -hypergraph since the $ell _1$ sparse representation is employed in the hypergraph construction process. The method is evaluated on various visual classification tasks, and it demonstrates promising performance.
    Print ISSN: 1041-4347
    Electronic ISSN: 1558-2191
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 31
    Publication Date: 2015-07-30
    Description: In this paper, we present three improvements to a three-point third order variant of Newton’s method derived from the Simpson rule. The first one is a fifth order method using the same number of functional evaluations as the third order method, the second one is a four-point 10th order method and the last one is a five-point 20th order method. In terms of computational point of view, our methods require four evaluations (one function and three first derivatives) to get fifth order, five evaluations (two functions and three derivatives) to get 10th order and six evaluations (three functions and three derivatives) to get 20th order. Hence, these methods have efficiency indexes of 1.495, 1.585 and 1.648, respectively which are better than the efficiency index of 1.316 of the third order method. We test the methods through some numerical experiments which show that the 20th order method is very efficient.
    Electronic ISSN: 1999-4893
    Topics: Computer Science
    Published by MDPI Publishing
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 32
    Publication Date: 2015-07-30
    Description: Robust small target detection of low signal-to-noise ratio (SNR) is very important in infrared search and track applications for self-defense or attacks. Due to the complex background, current algorithms have some unsolved issues with false alarm rate. In order to reduce the false alarm rate, an infrared small target detection algorithm based on saliency detection and support vector machine was proposed. Firstly, we detect salient regions that may contain targets with phase spectrum Fourier transform (PFT) approach. Then, target recognition was performed in the salient regions. Experimental results show the proposed algorithm has ideal robustness and efficiency for real infrared small target detection applications.
    Electronic ISSN: 1999-4893
    Topics: Computer Science
    Published by MDPI Publishing
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 33
    Publication Date: 2015-08-06
    Description: In dynamic propagation environments, beamforming algorithms may suffer from strong interference, steering vector mismatches, a low convergence speed and a high computational complexity. Reduced-rank signal processing techniques provide a way to address the problems mentioned above. This paper presents a low-complexity robust data-dependent dimensionality reduction based on an iterative optimization with steering vector perturbation (IOVP) algorithm for reduced-rank beamforming and steering vector estimation. The proposed robust optimization procedure jointly adjusts the parameters of a rank reduction matrix and an adaptive beamformer. The optimized rank reduction matrix projects the received signal vector onto a subspace with lower dimension. The beamformer/steering vector optimization is then performed in a reduced dimension subspace. We devise efficient stochastic gradient and recursive least-squares algorithms for implementing the proposed robust IOVP design. The proposed robust IOVP beamforming algorithms result in a faster convergence speed and an improved performance. Simulation results show that the proposed IOVP algorithms outperform some existing full-rank and reduced-rank algorithms with a comparable complexity.
    Electronic ISSN: 1999-4893
    Topics: Computer Science
    Published by MDPI Publishing
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 34
    Publication Date: 2015-08-07
    Description: Recently, wireless sensor networks (WSNs) have drawn great interest due to their outstanding monitoring and management potential in medical, environmental and industrial applications. Most of the applications that employ WSNs demand all of the sensor nodes to run on a common time scale, a requirement that highlights the importance of clock synchronization. The clock synchronization problem in WSNs is inherently related to parameter estimation. The accuracy of clock synchronization algorithms depends essentially on the statistical properties of the parameter estimation algorithms. Recently, studies dedicated to the estimation of synchronization parameters, such as clock offset and skew, have begun to emerge in the literature. The aim of this article is to provide an overview of the state-of-the-art clock synchronization algorithms for WSNs from a statistical signal processing point of view. This article focuses on describing the key features of the class of clock synchronization algorithms that exploit the traditional two-way message (signal) exchange mechanism. Upon introducing the two-way message exchange mechanism, the main clock offset estimation algorithms for pairwise synchronization of sensor nodes are first reviewed, and their performance is compared. The class of fully-distributed clock offset estimation algorithms for network-wide synchronization is then surveyed. The paper concludes with a list of open research problems pertaining to clock synchronization of WSNs.
    Electronic ISSN: 1999-4893
    Topics: Computer Science
    Published by MDPI Publishing
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 35
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-08-07
    Description: We consider the problem of adaptively routing a fleet of cooperative vehicles within a road network in the presence of uncertain and dynamic congestion conditions. To tackle this problem, we first propose a Gaussian process dynamic congestion model that can effectively characterize both the dynamics and the uncertainty of congestion conditions. Our model is efficient and thus facilitates real-time adaptive routing in the face of uncertainty. Using this congestion model, we develop efficient algorithms for non-myopic adaptive routing to minimize the collective travel time of all vehicles in the system. A key property of our approach is the ability to efficiently reason about the long-term value of exploration, which enables collectively balancing the exploration/exploitation trade-off for entire fleets of vehicles. Our approach is validated by traffic data from two large Asian cities. Our congestion model is shown to be effective in modeling dynamic congestion conditions. Our routing algorithms also generate significantly faster routes compared to standard baselines, and achieve near-optimal performance compared to an omniscient routing algorithm. We also present the results from a preliminary field study, which showcases the efficacy of our approach.
    Print ISSN: 1041-4347
    Electronic ISSN: 1558-2191
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 36
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-08-07
    Description: Betweenness centrality is a classic measure that quantifies the importance of a graph element (vertex or edge) according to the fraction of shortest paths passing through it. This measure is notoriously expensive to compute, and the best known algorithm runs in $mathcal {O}(nm)$ time. The problems of efficiency and scalability are exacerbated in a dynamic setting, where the input is an evolving graph seen edge by edge, and the goal is to keep the betweenness centrality up to date. In this paper, we propose the first truly scalable algorithm for online computation of betweenness centrality of both vertices and edges in an evolving graph where new edges are added and existing edges are removed. Our algorithm is carefully engineered with out-of-core techniques and tailored for modern parallel stream processing engines that run on clusters of shared-nothing commodity hardware. Hence, it is amenable to real-world deployment. We experiment on graphs that are two orders of magnitude larger than previous studies. Our method is able to keep the betweenness centrality measures up-to-date online, i.e., the time to update the measures is smaller than the inter-arrival time between two consecutive updates.
    Print ISSN: 1041-4347
    Electronic ISSN: 1558-2191
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 37
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-08-07
    Description: Phase change memory (PCM) is non-volatile memory that is byte-addressable. It is two to four times denser than DRAM, orders of magnitude better than NAND Flash memory in read latency, and 10 times better than NAND Flash memory in write endurance. However, it still limits the number of write operations to at most $10^6$ times per PCM cell. To extend its lifetime, it is necessary to evenly distribute write operations over all the memory cells. Up to now, the $mathrm{B^{+}}$ -Tree index structure has been used to quickly locate a search key in a relational database management system (RDBMS). All the record keys in each node are sorted and packed upon insertion in, and deletion from, the $mathrm{B^{+}}$ -Tree. In addition, a counter keeps track of the number of valid keys in the $mathrm{B^{+}}$ -Tree. Consequently, a $mathrm{B^{+}}$ -Tree algorithm results in a large number of write operations, which deteriorates the endurance of PCM. This restricts the usage of PCM on a database server and deteriorates performance of database servers. In this paper, we propose a novel PCM-aware $math- m{B^{+}}$ -Tree index structure, called $mathrm{PB^{+}}$ -Tree, to provide wear-leveling in PCM. According to our experiment results, $mathrm{PB^{+}}$ -Tree is much faster than the existing $mathrm{B^{+}}$ -Tree algorithms for PCM and NAND Flash memory with versatile workloads. More importantly, our scheme also greatly reduces the number of write operations compared to other $mathrm{B^{+}}$ -Tree algorithms. All of these results suggest that $mathrm{PB^{+}}$ -Tree is the $mathrm{B^{+}}$ -Tree algorithm best fitted to PCM.
    Print ISSN: 1041-4347
    Electronic ISSN: 1558-2191
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 38
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-09-11
    Description: The $k$ nearest neighbor ( $k$ NN) search on road networks is an important function in web mapping services. These services are now dealing with rapidly arriving queries, that are issued by a massive amount of users. While overlay graph-based indices can answer shortest path queries efficiently, there have been no studies on utilizing such indices to answer $k$ NN queries efficiently. In this paper, we fill this research gap and present two efficient $k$ NN search solutions on overlay graph-based indices. Experimental results show that our solutions offer very low query latency (0.1 ms) and require only small index sizes, even for 10-million-node networks.
    Print ISSN: 1041-4347
    Electronic ISSN: 1558-2191
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 39
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-09-11
    Description: Measuring semantic similarity between two terms is essential for a variety of text analytics and understanding applications. Currently, there are two main approaches for this task, namely the knowledge based and the corpus based approaches. However, existing approaches are more suitable for semantic similarity between words rather than the more general multi-word expressions (MWEs), and they do not scale very well. Contrary to these existing techniques, we propose an efficient and effective approach for semantic similarity using a large scale semantic network. This semantic network is automatically acquired from billions of web documents. It consists of millions of concepts, which explicitly model the context of semantic relationships. In this paper, we first show how to map two terms into the concept space, and compare their similarity there. Then, we introduce a clustering approach to orthogonalize the concept space in order to improve the accuracy of the similarity measure. Finally, we conduct extensive studies to demonstrate that our approach can accurately compute the semantic similarity between terms of MWEs and with ambiguity, and significantly outperforms 12 competing methods under Pearson Correlation Coefficient. Meanwhile, our approach is much more efficient than all competing algorithms, and can be used to compute semantic similarity in a large scale.
    Print ISSN: 1041-4347
    Electronic ISSN: 1558-2191
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 40
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-09-11
    Description: Given a spatio-temporal network, a source, a destination, and a desired departure time interval, the All-departure-time Lagrangian Shortest Paths (ALSP) problem determines a set which includes the shortest path for every departure time in the given interval. ALSP is important for critical societal applications such as eco-routing. However, ALSP is computationally challenging due to the non-stationary ranking of the candidate paths across distinct departure-times. Current related work for reducing the redundant work, across consecutive departure-times sharing a common solution, exploits only partial information e.g., the earliest feasible arrival time of a path. In contrast, our approach uses all available information, e.g., the entire time series of arrival times for all departure-times. This allows elimination of all knowable redundant computation based on complete information available at hand. We operationalize this idea through the concept of critical-time-points (CTP), i.e., departure-times before which ranking among candidate paths cannot change. In our preliminary work, we proposed a CTP based forward search strategy. In this paper, we propose a CTP based temporal bi-directional search for the ALSP problem via a novel impromptu rendezvous termination condition. Theoretical and experimental analysis show that the proposed approach outperforms the related work approaches particularly when there are few critical-time-points.
    Print ISSN: 1041-4347
    Electronic ISSN: 1558-2191
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 41
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-09-11
    Description: Computing connected components is a core operation on graph data. Since billion-scale graphs cannot be resident in memory of a single server, several approaches based on distributed machines have recently been proposed. The representative methods are $mathsf{Hashhbox{-}Tohbox{-}Min}$ and $mathsf{PowerGraph}$ . $mathsf{Hashhbox{-}Tohbox{-}Min}$ is the state-of-the art disk-based distributed method which minimizes the number of MapReduce rounds. $mathsf{PowerGraph}$ is the-state-of-the-art in-memory distributed system, which is typically faster than the disk-based distributed one, however, requires a lot of machines for handling billion-scale graphs. In this paper, we propose an I/O efficient parallel algorithm for billion-scale graphs in a single PC. We first propose the Disk-based Sequential access-oriented Parallel processing  (DSP) model that exploits sequential disk access in terms of disk I/Os and parallel processing in terms of computation. We then propose an ultra-fast disk-based parallel algorithm for computing connected components, $mathsf{DSPhbox{-}CC}$ , which largely improves the performance through sequential disk scan and page-level cache-conscious parallel processing . Extensive experimental results show that $mathsf{DSPhbox{-}CC}$ 1) computes connected components in billion-scale graphs using the limited memory size whereas in-memory algorithms can only support medium-sized graphs with the same memory size, and 2) significantly outperforms all distributed competitors as well as a representative disk-based parallel method.
    Print ISSN: 1041-4347
    Electronic ISSN: 1558-2191
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 42
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-09-11
    Description: Answering why-not questions in databases is promised to have wide application prospect in many areas and thereby, has attracted recent attention in the database research community. This paper addresses the problem of answering these so-called why-not questions in similar graph matching for graph databases. Given a set of answer graphs of an initial query graph $q$ and a set of missing ( why-not ) graphs, we aim to modify $q$ into a new query graph $q^*$ such that the missing graphs are included in the new answer set of $q^*$ . We present an approximate solution to address the above as the optimal solution is NP-hard to compute. In our approach, we first compute the bounded search space and the distance to be minimized for $q^*$ . Then, we present a two-phase algorithm to find the new query $q^*$ . In the first phase, we generate a set of candidate edges to be added/deleted into/from the initial query $q$ within the bounded search space and in the second phase, we select a subset of candidate edges generated in the first phase to minimize the distance for $q^*$ . We also demonstrate the effectiveness and efficiency of our approach by conducting extensive experiments on two real datasets.
    Print ISSN: 1041-4347
    Electronic ISSN: 1558-2191
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 43
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-09-11
    Description: How has the interdisciplinary data mining field been practiced in Network and Systems Management (NSM)? In Science and Technology, there is a wide use of data mining in areas like bioinformatics, genetics, Web, and, more recently, astroinformatics. However, the application in NSM has been limited and inconsiderable. In this article, we provide an account of how data mining has been applied in managing networks and systems for the past four decades, presumably since its birth. We look into the field’s applications in the key NSM activities—discovery, monitoring, analysis, reporting, and domain knowledge acquisition. In the end, we discuss our perspective on the issues that are considered critical for the effective application of data mining in the modern systems which are characterized by heterogeneity and high dynamism.
    Print ISSN: 1041-4347
    Electronic ISSN: 1558-2191
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 44
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-09-11
    Description: With the rapid development of location-aware mobile devices, ubiquitous Internet access and social computing technologies, lots of users’ personal information, such as location data and social data, has been readily accessible from various mobile platforms and online social networks. The convergence of these two types of data, known as geo-social data , has enabled collaborative spatial computing that explicitly combines both location and social factors to answer useful geo-social queries for either business or social good. In this paper, we study a new type of Geo-Social K-Cover Group (GSKCG) queries that, given a set of query points and a social network, retrieves a minimum user group in which each user is socially related to at least $k$ other users and the users’ associated regions (e.g., familiar regions or service regions) can jointly cover all the query points. Albeit its practical usefulness, the GSKCG query problem is NP-complete. We consequently explore a set of effective pruning strategies to derive an efficient algorithm for finding the optimal solution. Moreover, we design a novel index structure tailored to our problem to further accelerate query processing. Extensive experiments demonstrate that our algorithm achieves desirable performance on real-life datasets.
    Print ISSN: 1041-4347
    Electronic ISSN: 1558-2191
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 45
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-09-11
    Description: High utility sequential pattern mining has been considered as an important research problem and a number of relevant algorithms have been proposed for this topic. The main challenge of high utility sequential pattern mining is that, the search space is large and the efficiency of the solutions is directly affected by the degree at which they can eliminate the candidate patterns. Therefore, the efficiency of any high utility sequential pattern mining solution depends on its ability to reduce this big search space, and as a result, lower the computational complexity of calculating the utilities of the candidate patterns. In this paper, we propose efficient data structures and pruning technique which is based on Cumulated Rest of Match (CRoM) based upper bound. CRoM, by defining a tighter upper bound on the utility of the candidates, allows more conservative pruning before candidate pattern generation in comparison to the existing techniques. In addition, we have developed an efficient algorithm, High Utility Sequential Pattern Extraction (HuspExt), which calculates the utilities of the child patterns based on that of the parents’. Substantial experiments on both synthetic and real datasets from different domains show that, the proposed solution efficiently discovers high utility sequential patterns from large scale datasets with different data characteristics, under low utility thresholds.
    Print ISSN: 1041-4347
    Electronic ISSN: 1558-2191
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 46
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-09-11
    Description: Ordinal classification with a monotonicity constraint is a kind of classification tasks, in which the objects with better attribute values should not be assigned to a worse decision class. Several learning algorithms have been proposed to handle this kind of tasks in recent years. The rank entropy-based monotonic decision tree is very representative thanks to its better robustness and generalization. Ensemble learning is an effective strategy to significantly improve the generalization ability of machine learning systems. The objective of this work is to develop a method of fusing monotonic decision trees. In order to achieve this goal, we take two factors into account: attribute reduction and fusing principle. Through introducing variable dominance rough sets, we firstly propose an attribute reduction approach with rank-preservation for learning base classifiers, which can effectively avoid overfitting and improve classification performance. Then, we establish a fusing principe based on maximal probability through combining the base classifiers, which is used to further improve generalization ability of the learning system. The experimental analysis shows that the proposed fusing method can significantly improve classification performance of the learning system constructed by monotonic decision trees.
    Print ISSN: 1041-4347
    Electronic ISSN: 1558-2191
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 47
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-09-11
    Description: Influence maximization, defined as finding a small subset of nodes that maximizes spread of influence in social networks, is NP-hard under both Independent Cascade (IC) and Linear Threshold (LT) models, where many greedy-based algorithms have been proposed with the best approximation guarantee. However, existing greedy-based algorithms are inefficient on large networks, as it demands heavy Monte-Carlo simulations of the spread functions for each node at the initial step [7] . In this paper, we establish new upper bounds to significantly reduce the number of Monte-Carlo simulations in greedy-based algorithms, especially at the initial step. We theoretically prove that the bound is tight and convergent when the summation of weights towards (or from) each node is less than 1. Based on the bound, we propose a new Upper Bound based Lazy Forward algorithm ( UBLF in short) for discovering the top-k influential nodes in social networks. We test and compare UBLF with prior greedy algorithms, especially CELF [30] . Experimental results show that UBLF reduces more than 95 percent Monte-Carlo simulations of CELF and achieves about $2hbox{-}10$ times speedup when the seed set is small.
    Print ISSN: 1041-4347
    Electronic ISSN: 1558-2191
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 48
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-09-11
    Description: Feature selection has been an important research topic in data mining, because the real data sets often have high-dimensional features, such as the bioinformatics and text mining applications. Many existing filter feature selection methods rank features by optimizing certain feature ranking criterions, such that correlated features often have similar rankings. These correlated features are redundant and don’t provide large mutual information to help data mining. Thus, when we select a limited number of features, we hope to select the top non-redundant features such that the useful mutual information can be maximized. In previous research, Ding et al. recognized this important issue and proposed the minimum Redundancy Maximum Relevance Feature Selection (mRMR) model to minimize the redundancy between sequentially selected features. However, this method used the greedy search, thus the global feature redundancy wasn’t considered and the results are not optimal. In this paper, we propose a new feature selection framework to globally minimize the feature redundancy with maximizing the given feature ranking scores, which can come from any supervised or unsupervised methods. Our new model has no parameter so that it is especially suitable for practical data mining application. Experimental results on benchmark data sets show that the proposed method consistently improves the feature selection results compared to the original methods. Meanwhile, we introduce a new unsupervised global and local discriminative feature selection method which can be unified with the global feature redundancy minimization framework and shows superior performance.
    Print ISSN: 1041-4347
    Electronic ISSN: 1558-2191
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 49
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-09-11
    Description: Multimedia information retrieval usually involves two key modules including effective feature representation and ranking model construction. Most existing approaches are incapable of well modeling the inherent correlations and interactions between them, resulting in the loss of the latent consensus structure information. To alleviate this problem, we propose a learning to rank approach that simultaneously obtains a set of deep linear features and constructs structure-aware ranking models in a joint learning framework. Specifically, the deep linear feature learning corresponds to a series of matrix factorization tasks in a hierarchical manner, while the learning-to-rank part concentrates on building a ranking model that effectively encodes the intrinsic ranking information by structural SVM learning. Through a joint learning mechanism, the two parts are mutually reinforced in our approach, and meanwhile their underlying interaction relationships are implicitly reflected by solving an alternating optimization problem. Due to the intrinsic correlations among different queries (i.e., similar queries for similar ranking lists), we further formulate the learning-to-rank problem as a multi-task problem, which is associated with a set of mutually related query-specific learning-to-rank subproblems. For computational efficiency and scalability, we design a MapReduce-based parallelization approach to speed up the learning processes. Experimental results demonstrate the efficiency, effectiveness, and scalability of the proposed approach in multimedia information retrieval.
    Print ISSN: 1041-4347
    Electronic ISSN: 1558-2191
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 50
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-09-11
    Description: In a peer-to-peer system, a node should estimate reputation of other peers not only on the basis of its own interaction, but also on the basis of expression of other nodes. Reputation aggregation mechanism implements strategy for achieving this. Reputation aggregation in peer to peer networks is generally a very time and resource consuming process. Moreover, most of the methods consider that a node will have the same reputation after aggregation with all the nodes in the network, which is not true. This paper proposes a reputation aggregation algorithm that uses a variant of gossip algorithm called differential gossip. In this paper, estimate of reputation is considered to be having two parts, one common component which is same with every node, and the other one is the information received from immediate neighbours based on the neighbours’ direct interaction with the node. The differential gossip is fast and requires a lesser amount of resources. This mechanism allows computation of independent reputation value by every node, of every other node in the network. The differential gossip trust has been investigated for a power law network formed using preferential attachment (PA) Model. The reputation computed using differential gossip trust shows good amount of immunity to the collusion. We have verified the performance of the algorithm on the power law networks with sizes ranging from 100 nodes to 50,000 nodes.
    Print ISSN: 1041-4347
    Electronic ISSN: 1558-2191
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 51
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-09-11
    Description: In this paper, the limitation that is prominent in most existing works of change-point detection methods is addressed by proposing a nonparametric, computationally efficient method. The limitation is that most works assume that each data point observed at each time step is a single multi-dimensional vector. However, there are many situations where this does not hold. Therefore, a setting where each observation is a collection of random variables, which we call a bag of data, is considered. After estimating the underlying distribution behind each bag of data and embedding those distributions in a metric space, the change-point score is derived by evaluating how the sequence of distributions is fluctuating in the metric space using a distance-based information estimator. Also, a procedure that adaptively determines when to raise alerts is incorporated by calculating the confidence interval of the change-point score at each time step. This avoids raising false alarms in highly noisy situations and enables detecting changes of various magnitudes. A number of experimental studies and numerical examples are provided to demonstrate the generality and the effectiveness of our approach with both synthetic and real datasets.
    Print ISSN: 1041-4347
    Electronic ISSN: 1558-2191
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 52
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-09-11
    Description: In many applications, top- k query is an important operation to return a set of interesting points in a potentially huge data space. It is analyzed in this paper that the existing algorithms cannot process top- k query on massive data efficiently. This paper proposes a novel table-scan-based T2S algorithm to efficiently compute top- k results on massive data. T2S first constructs the presorted table, whose tuples are arranged in the order of the round-robin retrieval on the sorted lists. T2S maintains only fixed number of tuples to compute results. The early termination checking for T2S is presented in this paper, along with the analysis of scan depth. The selective retrieval is devised to skip the tuples in the presorted table which are not top- k results. The theoretical analysis proves that selective retrieval can reduce the number of the retrieved tuples significantly. The construction and incremental-update/batch-processing methods for the used structures are proposed in this paper. The extensive experimental results, conducted on synthetic and real-life data sets, show that T2S has a significant advantage over the existing algorithms.
    Print ISSN: 1041-4347
    Electronic ISSN: 1558-2191
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 53
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-09-11
    Description: We witness an unprecedented proliferation of knowledge graphs that record millions of entities and their relationships. While knowledge graphs are structure-flexible and content-rich, they are difficult to use. The challenge lies in the gap between their overwhelming complexity and the limited database knowledge of non-professional users. If writing structured queries over “simple” tables is difficult, complex graphs are only harder to query. As an initial step toward improving the usability of knowledge graphs, we propose to query such data by example entity tuples, without requiring users to form complex graph queries. Our system, Graph Query By Example ( $mathsf {GQBE}$ ), automatically discovers a weighted hidden maximum query graph based on input query tuples, to capture a user’s query intent. It then efficiently finds and ranks the top approximate matching answer graphs and answer tuples. We conducted experiments and user studies on the large Freebase and DBpedia datasets and observed appealing accuracy and efficiency. Our system provides a complementary approach to the existing keyword-based methods, facilitating user-friendly graph querying. To the best of our knowledge, there was no such proposal in the past in the context of graphs.
    Print ISSN: 1041-4347
    Electronic ISSN: 1558-2191
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 54
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-09-11
    Description: This paper explores combinatorial optimization for problems of max-weight graph matching on multi-partite graphs, which arise in integrating multiple data sources. In the most common two-source case, it is often desirable for the final matching to be one-to-one; the database and statistical record linkage communities accomplish this by weighted bipartite graph matching on similarity scores. Such matchings are intuitively appealing: they leverage a natural global property of many real-world entity stores—that of being nearly deduped—and are known to provide significant improvements to precision and recall. Unfortunately, unlike the bipartite case, exact max-weight matching on multi-partite graphs is known to be NP-hard. Our two-fold algorithmic contributions approximate multi-partite max-weight matching: our first algorithm borrows optimization techniques common to Bayesian probabilistic inference; our second is a greedy approximation algorithm. In addition to a theoretical guarantee on the latter, we present comparisons on a real-world entity resolution problem from Bing significantly larger than typically found in the literature, on publication data, and on a series of synthetic problems. Our results quantify significant improvements due to exploiting multiple sources, which are made possible by global one-to-one constraints linking otherwise independent matching sub-problems. We also discover that our algorithms are complementary: one being much more robust under noise, and the other being simple to implement and very fast to run.
    Print ISSN: 1041-4347
    Electronic ISSN: 1558-2191
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 55
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-09-11
    Description: Internet users can be classified as two types: (a) active users —actively contribute to blogs, publish opinions, write comments in Youtube, tweet messages, etc. and (b) passive consumers who only consume Internet information without contributing to it. While the majority of current social media research deals with active user analysis, there is very little work in understanding the dynamics of passive consumers and their influence. Our global scale Internet measurement of user access patterns of a diverse set of Internet media services indicates conclusively that majority of consumers are passive. In this paper, we develop a spatio-temporal mathematical model and the corresponding stochastic analysis to understand the passive consumer dynamics. Both discrete and continuous time analysis are presented. We also show how the analysis can be used to identify spatial points of influence, i.e., spatial locations that have maximum expertise or influence on a topic. The analysis takes into account, the initial level of consumption at each spatial location and the influence of different passive consumers at different geographic locations on each other. The effect of information noise is taken into account to derive fundamental limits of passive information consumption. Theoretical results are verified using real Internet measurement data. The large scale data have been made available for other researchers.
    Print ISSN: 1041-4347
    Electronic ISSN: 1558-2191
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 56
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-09-11
    Description: The objective of the paper is a contribution to data mining within the framework of the observational calculus, through introducing ǵeneralized quantifiers related to copulas. Fitting copulas to multidimensional data is an increasingly important method for analyzing dependencies, and the proposed quantifiers of observational calculus assess the results of estimating the structure of joint distributions of continuous variables by means of hierarchical Archimedean copulas. To this end, the existing theory of hierarchical Archimedean copulas has been slightly extended in the paper: It has been proven that sufficient conditions for the function defining a hierarchical Archimedean copula to be indeed a copula, which have so far been rigorously established only for the special case of fully nested Archimedean copulas, hold in general. These conditions allow us to define three new generalized quantifiers, which are then thoroughly validated on four benchmark data sets and one data set from a real-world application. The paper concludes by comparing the proposed quantifiers to a more traditional approach—maximum weight spanning trees.
    Print ISSN: 1041-4347
    Electronic ISSN: 1558-2191
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 57
    Publication Date: 2015-09-11
    Description: In this work, we develop an approach for the safe distribution and parallel execution of data-centric workflows over the publish/subscribe abstraction. In essence, we design a unique representation of data-centric workflows, specifically designed to exploit the loosely coupled and distributed nature of publish/subscribe systems. Furthermore, we argue for the practicality and expressiveness of our approach by mapping a standard and industry-strength data-centric workflow model, namely, IBM Business Artifacts with Guard-Stage-Milestone (GSM), into the publish/subscribe abstraction. In short, the contributions of this work are three-fold: (1) mapping of data-centric workflows into publish/subscribe to achieve distributed and parallel execution; (2) detailed theoretical analysis of the mapping; and (3) formulation of the complexity of the optimal workflow distribution over the publish/subscribe abstraction as an NP-hard problem.
    Print ISSN: 1041-4347
    Electronic ISSN: 1558-2191
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 58
    Publication Date: 2015-09-16
    Description: In this paper we investigate some parallel variants of Broyden’s method and, for the basic variant, we present its convergence properties. The main result is that the behavior of the considered parallel Broyden’s variants is comparable with the classical parallel Newton method, and significantly better than the parallel Cimmino method, both for linear and nonlinear cases. The considered variants are also compared with two more recently proposed parallel Broyden’s method. Some numerical experiments are presented to illustrate the advantages and limits of the proposed algorithms.
    Electronic ISSN: 1999-4893
    Topics: Computer Science
    Published by MDPI Publishing
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 59
    Publication Date: 2015-09-26
    Description: The sign least mean square with reweighted L1-norm constraint (SLMS-RL1) algorithm is an attractive sparse channel estimation method among Gaussian mixture model (GMM) based algorithms for use in impulsive noise environments. The channel sparsity can be exploited by SLMS-RL1 algorithm based on appropriate reweighted factor, which is one of key parameters to adjust the sparse constraint for SLMS-RL1 algorithm. However, to the best of the authors’ knowledge, a reweighted factor selection scheme has not been developed. This paper proposes a Monte-Carlo (MC) based reweighted factor selection method to further strengthen the performance of SLMS-RL1 algorithm. To validate the performance of SLMS-RL1 using the proposed reweighted factor, simulations results are provided to demonstrate that convergence speed can be reduced by increasing the channel sparsity, while the steady-state MSE performance only slightly changes with different GMM impulsive-noise strengths.
    Electronic ISSN: 1999-4893
    Topics: Computer Science
    Published by MDPI Publishing
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 60
    Publication Date: 2015-11-21
    Description: We present a local convergence analysis of an eighth order three step methodin order to approximate a locally unique solution of nonlinear equation in a Banach spacesetting. In an earlier study by Sharma and Arora (2015), the order of convergence wasshown using Taylor series expansions and hypotheses up to the fourth order derivative oreven higher of the function involved which restrict the applicability of the proposed scheme.However, only first order derivative appears in the proposed scheme. In order to overcomethis problem, we proposed the hypotheses up to only the first order derivative. In this way,we not only expand the applicability of the methods but also propose convergence domain.Finally, where earlier studies cannot be applied, a variety of concrete numerical examplesare proposed to obtain the solutions of nonlinear equations. Our study does not exhibit thistype of problem/restriction.
    Electronic ISSN: 1999-4893
    Topics: Computer Science
    Published by MDPI Publishing
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 61
    Publication Date: 2015-11-21
    Description: Lung cancer continues to rank as the leading cause of cancer deaths worldwide. One of the most promising techniques for early detection of cancerous cells relies on sputum cell analysis. This was the motivation behind the design and the development of a new computer aided diagnosis (CAD) system for early detection of lung cancer based on the analysis of sputum color images. The proposed CAD system encompasses four main processing steps. First is the preprocessing step which utilizes a Bayesian classification method using histogram analysis. Then, in the second step, mean shift segmentation is applied to segment the nuclei from the cytoplasm. The third step is the feature analysis. In this step, geometric and chromatic features are extracted from the nucleus region. These features are used in the diagnostic process of the sputum images. Finally, the diagnosis is completed using an artificial neural network and support vector machine (SVM) for classifying the cells into benign or malignant. The performance of the system was analyzed based on different criteria such as sensitivity, specificity and accuracy. The evaluation was carried out using Receiver Operating Characteristic (ROC) curve. The experimental results demonstrate the efficiency of the SVM classifier over other classifiers, with 97% sensitivity and accuracy as well as a significant reduction in the number of false positive and false negative rates.
    Electronic ISSN: 1999-4893
    Topics: Computer Science
    Published by MDPI Publishing
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 62
    Publication Date: 2015-11-20
    Description: Near-infrared spectroscopy (NIRS) enables the non-invasive measurement of changes in hemodynamics and oxygenation in tissue. Changes in light-coupling due to movement of the subject can cause movement artifacts (MAs) in the recorded signals. Several methods have been developed so far that facilitate the detection and reduction of MAs in the data. However, due to fixed parameter values (e.g., global threshold) none of these methods are perfectly suitable for long-term (i.e., hours) recordings or were not time-effective when applied to large datasets. We aimed to overcome these limitations by automation, i.e., data adaptive thresholding specifically designed for long-term measurements, and by introducing a stable long-term signal reconstruction. Our new technique (“acceleration-based movement artifact reduction algorithm”, AMARA) is based on combining two methods: the “movement artifact reduction algorithm” (MARA, Scholkmann et al. Phys. Meas. 2010, 31, 649–662), and the “accelerometer-based motion artifact removal” (ABAMAR, Virtanen et al. J. Biomed. Opt. 2011, 16, 087005). We describe AMARA in detail and report about successful validation of the algorithm using empirical NIRS data, measured over the prefrontal cortex in adolescents during sleep. In addition, we compared the performance of AMARA to that of MARA and ABAMAR based on validation data.
    Electronic ISSN: 1999-4893
    Topics: Computer Science
    Published by MDPI Publishing
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 63
    Publication Date: 2015-08-27
    Description: This paper focuses on the parameter identification problem for Wiener nonlinear dynamic systems with moving average noises. In order to improve the convergence rate, the gradient-based iterative algorithm is presented by replacing the unmeasurable variables with their corresponding iterative estimates, and to compute iteratively the noise estimates based on the obtained parameter estimates. The simulation results show that the proposed algorithm can effectively estimate the parameters of Wiener systems with moving average noises.
    Electronic ISSN: 1999-4893
    Topics: Computer Science
    Published by MDPI Publishing
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 64
    Publication Date: 2015-06-02
    Description: In this paper, the dynamical behavior of different optimal iterative schemes for solving nonlinear equations with increasing order, is studied. The tendency of the complexity of the Julia set is analyzed and referred to the fractal dimension. In fact, this fractal dimension can be shown to be a powerful tool to compare iterative schemes that estimate the solution of a nonlinear equation. Based on the box-counting algorithm, several iterative derivative-free methods of different convergence orders are compared.
    Electronic ISSN: 1999-4893
    Topics: Computer Science
    Published by MDPI Publishing
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 65
    Publication Date: 2016-07-22
    Description: Clustering is a fundamental task in data mining. Affinity propagation clustering (APC) is an effective and efficient clustering technique that has been applied in various domains. APC iteratively propagates information between affinity samples, updates the responsibility matrix and availability matrix, and employs these matrices to choose cluster centers (or exemplars) of respective clusters. However, since it mainly uses negative Euclidean distance between exemplars and samples as the similarity between them, it is difficult to identify clusters with complex structure. Therefore, the performance of APC deteriorates on samples distributed with complex structure. To mitigate this problem, we propose an improved APC based on a path-based similarity (APC-PS). APC-PS firstly utilizes negative Euclidean distance to find exemplars of clusters. Then, it employs the path-based similarity to measure the similarity between exemplars and samples, and to explore the underlying structure of clusters. Next, it assigns non-exemplar samples to their respective clusters via that similarity. Our empirical study on synthetic and UCI datasets shows that the proposed APC-PS significantly outperforms original APC and other related approaches.
    Electronic ISSN: 1999-4893
    Topics: Computer Science
    Published by MDPI Publishing
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 66
    Publication Date: 2016-07-23
    Description: Graph-based semi-supervised classification uses a graph to capture the relationship between samples and exploits label propagation techniques on the graph to predict the labels of unlabeled samples. However, it is difficult to construct a graph that faithfully describes the relationship between high-dimensional samples. Recently, low-rank representation has been introduced to construct a graph, which can preserve the global structure of high-dimensional samples and help to train accurate transductive classifiers. In this paper, we take advantage of low-rank representation for graph construction and propose an inductive semi-supervised classifier called Semi-Supervised Classification based on Low-Rank Representation (SSC-LRR). SSC-LRR first utilizes a linearized alternating direction method with adaptive penalty to compute the coefficient matrix of low-rank representation of samples. Then, the coefficient matrix is adopted to define a graph. Finally, SSC-LRR incorporates this graph into a graph-based semi-supervised linear classifier to classify unlabeled samples. Experiments are conducted on four widely used facial datasets to validate the effectiveness of the proposed SSC-LRR and the results demonstrate that SSC-LRR achieves higher accuracy than other related methods.
    Electronic ISSN: 1999-4893
    Topics: Computer Science
    Published by MDPI Publishing
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 67
    Publication Date: 2016-07-23
    Description: This research proposes a two-stage user-based collaborative filtering process using an artificial immune system for the prediction of student grades, along with a filter for professor ratings in the course recommendation for college students. We test for cosine similarity and Karl Pearson (KP) correlation in affinity calculations for clustering and prediction. This research uses student information and professor information datasets of Yuan Ze University from the years 2005–2009 for the purpose of testing and training. The mean average error and confusion matrix analysis form the testing parameters. A minimum professor rating was tested to check the results, and observed that the recommendation systems herein provide highly accurate results for students with higher mean grades.
    Electronic ISSN: 1999-4893
    Topics: Computer Science
    Published by MDPI Publishing
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 68
    Publication Date: 2016-07-31
    Description: This paper is concerned with the application of computational intelligence techniques to the conceptual design and development of a large-scale floating settlement. The settlement in question is a design for the area of Urla, which is a rural touristic region located on the west coast of Turkey, near the metropolis of Izmir. The problem at hand includes both engineering and architectural aspects that need to be addressed in a comprehensive manner. We thus adapt the view as a multi-objective constrained real-parameter optimization problem. Specifically, we consider three objectives, which are conflicting. The first one aims at maximizing accessibility of urban functions such as housing and public spaces, as well as special functions, such as a marina for yachts and a yacht club. The second one aims at ensuring the wind protection of the general areas of the settlement, by adequately placing them in between neighboring land masses. The third one aims at maximizing visibility of the settlement from external observation points, so as to maximize the exposure of the settlement. To address this complex multi-objective optimization problem and identify lucrative alternative design solutions, a multi-objective harmony search algorithm (MOHS) is developed and applied in this paper. When compared to the Differential Evolution algorithm developed for the problem in the literature, we demonstrate that MOHS achieves competitive or slightly better performance in terms of hyper volume calculation, and gives promising results when the Pareto front approximation is examined.
    Electronic ISSN: 1999-4893
    Topics: Computer Science
    Published by MDPI Publishing
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 69
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-08-05
    Description: Since Jeff Howe introduced the term Crowdsourcing in 2006, this human-powered problem-solving paradigm has gained a lot of attention and has been a hot research topic in the field of computer science. Even though a lot of work has been conducted on this topic, so far we do not have a comprehensive survey on most relevant work done in the crowdsourcing field. In this paper, we aim to offer an overall picture of the current state of the art techniques in general-purpose crowdsourcing. According to their focus, we divide this work into three parts, which are: incentive design, task assignment, and quality control. For each part, we start with different problems faced in that area followed by a brief description of existing work and a discussion of pros and cons. In addition, we also present a real scenario on how the different techniques are used in implementing a location-based crowdsourcing platform, gMission. Finally, we highlight the limitations of the current general-purpose crowdsourcing techniques and present some open problems in this area.
    Print ISSN: 1041-4347
    Electronic ISSN: 1558-2191
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 70
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-08-05
    Description: Any important data management and analytics tasks cannot be completely addressed by automated processes. These tasks, such as entity resolution, sentiment analysis, and image recognition can be enhanced through the use of human cognitive ability. Crowdsouring platforms are an effective way to harness the capabilities of people (i.e., the crowd) to apply human computation for such tasks. Thus, crowdsourced data management has become an area of increasing interest in research and industry. We identify three important problems in crowdsourced data management. (1) Quality Control: Workers may return noisy or incorrect results so effective techniques are required to achieve high quality; (2) Cost Control: The crowd is not free, and cost control aims to reduce the monetary cost; (3) Latency Control: The human workers can be slow, particularly compared to automated computing time scales, so latency-control techniques are required. There has been significant work addressing these three factors for designing crowdsourced tasks, developing crowdsourced data manipulation operators, and optimizing plans consisting of multiple operators. In this paper, we survey and synthesize a wide spectrum of existing studies on crowdsourced data management. Based on this analysis we then outline key factors that need to be considered to improve crowdsourced data management.
    Print ISSN: 1041-4347
    Electronic ISSN: 1558-2191
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 71
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-08-05
    Description: Principal component analysis and the residual error is an effective anomaly detection technique. In an environment where anomalies are present in the training set, the derived principal components can be skewed by the anomalies. A further aspect of anomaly detection is that data might be distributed across different nodes in a network and their communication to a centralized processing unit is prohibited due to communication cost. Current solutions to distributed anomaly detection rely on a hierarchical network infrastructure to aggregate data or models; however, in this environment, links close to the root of the tree become critical and congested. In this paper, an algorithm is proposed that is more robust in its derivation of the principal components of a training set containing anomalies. A distributed form of the algorithm is then derived where each node in a network can iterate towards the centralized solution by exchanging small matrices with neighboring nodes. Experimental evaluations on both synthetic and real-world data sets demonstrate the superior performance of the proposed approach in comparison to principal component analysis and alternative anomaly detection techniques. In addition, it is shown that in a variety of network infrastructures, the distributed form of the anomaly detection model is able to derive a close approximation of the centralized model.
    Print ISSN: 1041-4347
    Electronic ISSN: 1558-2191
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 72
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-08-05
    Description: Many applications deal with moving object datasets, e.g., mobile phone social networking, scientific simulations, and ride-sharing services. These applications need to handle a tremendous number of spatial objects that continuously move and execute spatial queries to explore their surroundings. To manage such update-heavy workloads, several throwaway index structures have recently been proposed, where a static index is rebuilt periodically from scratch rather than updated incrementally. It has been shown that throwaway indices outperform specialized moving-object indices that maintain location updates incrementally. However, throwaway indices suffer from scalability due to their single-server design and the only distributed throwaway index (D-MOVIES), extension of a centralized approach, does not scale out as the number of servers increases, especially during query processing phase.
    Print ISSN: 1041-4347
    Electronic ISSN: 1558-2191
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 73
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-08-05
    Description: Collaborative filtering (CF) is out of question the most widely adopted and successful recommendation approach. A typical CF-based recommender system associates a user with a group of like-minded users based on their individual preferences over all the items, either explicit or implicit, and then recommends to the user some unobserved items enjoyed by the group. However, we find that two users with similar tastes on one item subset may have totally different tastes on another set. In other words, there exist many user-item subgroups each consisting of a subset of items and a group of like-minded users on these items. It is more reasonable to predict preferences through one user's correlated subgroups, but not the entire user-item matrix. In this paper, to find meaningful subgroups, we formulate a new Multiclass Co-Clustering (MCoC) model, which captures relations of user-to-item, user-to-user, and item-to-item simultaneously. Then, we combine traditional CF algorithms with subgroups for improving their top- $N$ recommendation performance. Our approach can be seen as a new extension of traditional clustering CF models. Systematic experiments on several real data sets have demonstrated the effectiveness of our proposed approach.
    Print ISSN: 1041-4347
    Electronic ISSN: 1558-2191
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 74
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-08-05
    Description: With the rapid development of Web 2.0 and Online To Offline (O2O) marketing model, various online e vent- b ased s ocial n etwork s (EBSNs) are getting popular. An important task of EBSNs is to facilitate the most satisfactory event-participant arrangement for both sides, i.e., events enroll more participants and participants are arranged with personally interesting events. Existing approaches usually focus on the arrangement of each single event to a set of potential users, or ignore the conflicts between different events, which leads to infeasible or redundant arrangements. In this paper, to address the shortcomings of existing approaches, we first identify a more general and useful event-participant arrangement problem, called G lobal E vent-participant A rrangement with C onflict and C apacity ( $GEACC$ ) problem, focusing on the conflicts of different events and making event-participant arrangements in a global view. We find that the GEACC problem is NP-hard due to the conflicts among events. Thus, we design two approximation algorithms with provable approximation ratios and an exact algorithm with pruning technique to address this problem. In addition, we propose an online setting of GEACC, called OnlineGEACC, which is also practical in real-world scenarios. We further design an online algorithm with provable performance guarantee. Finally, we verify the effectiveness and efficiency of the proposed methods through extensive experiments on real and synthetic datasets.
    Print ISSN: 1041-4347
    Electronic ISSN: 1558-2191
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 75
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-08-05
    Description: Given a point $p$ and a set of points $S$ , the kNN operation finds the $k$ closest points to in $S$ . It is a computational intensive task with a large range of applications such as knowledge discovery or data mining. However, as the volume and the dimension of data increase, only distributed approaches can perform such costly operation in a reasonable time. Recent works have focused on implementing efficient solutions using the MapReduce programming model because it is suitable for distributed large scale data processing. Although these works provide different solutions to the same problem, each one has particular constraints and properties. In this paper, we compare the different existing approaches for computing kNN on MapReduce, first theoretically, and then by performing an extensive experimental evaluation. To be able to compare solutions, we identify three generic steps for kNN computation on MapReduce: data pre-processing, data partitioning, and computation. We then analyze each step from load balancing, accuracy, and complexity aspects. Experiments in this paper use a variety of datasets, and analyze the impact of data volume, data dimension, and the value of k from many perspectives like time and space complexity, and accuracy. The experimental part brings new advantages and shortcomings that are discussed for each algorithm. To the best of our knowledge, this is the first pape- that compares kNN computing methods on MapReduce both theoretically and experimentally with the same setting. Overall, this paper can be used as a guide to tackle kNN-based practical problems in the context of big data.
    Print ISSN: 1041-4347
    Electronic ISSN: 1558-2191
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 76
    Publication Date: 2016-08-05
    Description: Mining communities or clusters in networks is valuable in analyzing, designing, and optimizing many natural and engineering complex systems, e.g., protein networks, power grid, and transportation systems. Most of the existing techniques view the community mining problem as an optimization problem based on a given quality function(e.g., modularity), however none of them are grounded with a systematic theory to identify the central nodes in the network. Moreover, how to reconcile the mining efficiency and the community quality still remains an open problem. In this paper, we attempt to address the above challenges by introducing a novel algorithm. First, a kernel function with a tunable influence factor is proposed to measure the leadership of each node, those nodes with highest local leadership can be viewed as the candidate central nodes. Then, we use a discrete-time dynamical system to describe the dynamical assignment of community membership; and formulate the serval conditions to guarantee the convergence of each node's dynamic trajectory, by which the hierarchical community structure of the network can be revealed. The proposed dynamical system is independent of the quality function used, so could also be applied in other community mining models. Our algorithm is highly efficient: the computational complexity analysis shows that the execution time is nearly linearly dependent on the number of nodes in sparse networks. We finally give demonstrative applications of the algorithm to a set of synthetic benchmark networks and also real-world networks to verify the algorithmic performance.
    Print ISSN: 1041-4347
    Electronic ISSN: 1558-2191
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 77
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-08-05
    Description: When a microblogging user adopts some content propagated to her, we can attribute that to three behavioral factors, namely, topic virality , user virality , and user susceptibility . Topic virality measures the degree to which a topic attracts propagations by users. User virality and susceptibility refer to the ability of a user to propagate content to other users, and the propensity of a user adopting content propagated to her, respectively. In this paper, we study the problem of mining these behavioral factors specific to topics from microblogging content propagation data. We first construct a three dimensional tensor for representing the propagation instances. We then propose a tensor factorization framework to simultaneously derive the three sets of behavioral factors. Based on this framework, we develop a numerical factorization model and another probabilistic factorization variant. We also develop an efficient algorithm for the models’ parameters learning. Our experiments on a large Twitter dataset and synthetic datasets show that the proposed models can effectively mine the topic-specific behavioral factors of users and tweet topics. We further demonstrate that the proposed models consistently outperforms the other state-of-the-art content based models in retweet prediction over time.
    Print ISSN: 1041-4347
    Electronic ISSN: 1558-2191
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 78
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-08-05
    Description: We propose an algorithm for detecting patterns exhibited by anomalous clusters in high dimensional discrete data. Unlike most anomaly detection (AD) methods, which detect individual anomalies, our proposed method detects groups ( clusters ) of anomalies; i.e., sets of points which collectively exhibit abnormal patterns. In many applications, this can lead to a better understanding of the nature of the atypical behavior and to identifying the sources of the anomalies. Moreover, we consider the case where the atypical patterns exhibit on only a small (salient) subset of the very high dimensional feature space. Individual AD techniques and techniques that detect anomalies using all the features typically fail to detect such anomalies, but our method can detect such instances collectively, discover the shared anomalous patterns exhibited by them, and identify the subsets of salient features. In this paper, we focus on detecting anomalous topics in a batch of text documents, developing our algorithm based on topic models. Results of our experiments show that our method can accurately detect anomalous topics and salient features (words) under each such topic in a synthetic data set and two real-world text corpora and achieves better performance compared to both standard group AD and individual AD techniques. All required code to reproduce our experiments is available from https://github.com/hsoleimani/ATD .
    Print ISSN: 1041-4347
    Electronic ISSN: 1558-2191
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 79
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-08-05
    Description: Multilabel classification is prevalent in many real-world applications where data instances may be associated with multiple labels simultaneously. In multilabel classification, exploiting label correlations is an essential but nontrivial task. Most of the existing multilabel learning algorithms are either ineffective or computationally demanding and less scalable in exploiting label correlations. In this paper, we propose a co-evolutionary multilabel hypernetwork (Co-MLHN) as an attempt to exploit label correlations in an effective and efficient way. To this end, we firstly convert the traditional hypernetwork into a multilabel hypernetwork (MLHN) where label correlations are explicitly represented. We then propose a co-evolutionary learning algorithm to learn an integrated classification model for all labels. The proposed Co-MLHN exploits arbitrary order label correlations and has linear computational complexity with respect to the number of labels. Empirical studies on a broad range of multilabel data sets demonstrate that Co-MLHN achieves competitive results against state-of-the-art multilabel learning algorithms, in terms of both classification performance and scalability with respect to the number of labels.
    Print ISSN: 1041-4347
    Electronic ISSN: 1558-2191
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 80
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-08-05
    Description: Query auto completion (QAC) methods recommend queries to search engine users when they start entering a query. Current QAC methods mostly rank query completions based on their past popularity, i.e., on the number of times they have previously been submitted as a query. However, query popularity changes over time and may vary drastically across users. Accordingly, the ranking of query completions should be adjusted. Previous time-sensitive and user-specific QAC methods have been developed separately, yielding significant improvements over methods that are neither time-sensitive nor personalized. We propose a hybrid QAC method that is both time-sensitive and personalized. We extend it to handle long-tail prefixes, which we achieve by assigning optimal weights to the contribution from time-sensitivity and personalization. Using real-world search log datasets, we return top $N$ query suggestions ranked by predicted popularity as estimated from popularity trends and cyclic popularity behavior; we rerank them by integrating similarities to a user's previous queries (both in the current session and in previous sessions). Our method outperforms state-of-the-art time-sensitive QAC baselines, achieving total improvements of between 3 and 7 percent in terms of mean reciprocal rank (MRR). After optimizing the weights, our extended model achieves MRR improvements of between 4 and 8 percent.
    Print ISSN: 1041-4347
    Electronic ISSN: 1558-2191
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 81
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-08-05
    Description: Wrappers are pieces of software used to extract data from websites and structure them for further application processing. Unfortunately, websites are continuously evolving and structural changes happen with no forewarning, which usually results in wrappers working incorrectly. Thus, wrappers maintenance is necessary for detecting whether wrapper is extracting erroneous data. The solution consists of using verification models to detect whether wrapper output is statistically similar to the output produced by the wrapper itself when it was successfully invoked in the past. Current proposals present some weaknesses, as the data used to build these models are supposed to be homogeneous, independent, or representative enough, or following a single predefined mathematical model. In this paper, we present MAVE, a novel multilevel wrapper verification system that is based on one-class classification techniques to overcome previous weaknesses. The experimental results show that our proposal outperforms accuracy of current solutions.
    Print ISSN: 1041-4347
    Electronic ISSN: 1558-2191
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 82
    Publication Date: 2016-07-19
    Description: During a construction project life cycle, project costs and time estimations contribute greatly to baseline scheduling. Besides, schedule risk analysis and project control are also influenced by the above factors. Although many papers have offered estimation techniques, little attempt has been made to generate project time series data as daily progressive estimations in different project environments that could help researchers in generating general and customized formulae in further studies. This paper, however, is an attempt to introduce a new simulation approach to reflect the data regarding time series progress of the project, considering the specifications and the complexity of the project and the environment where the project is performed. Moreover, this simulator can equip project managers with estimated information, which reassures them of the execution stages of the project although they lack historical data. A case study is presented to show the usefulness of the model and its applicability in practice. In this study, singular spectrum analysis has been employed to analyze the simulated outputs, and the results are separated based on their signal and noise trends. The signal trend is used as a point-of-reference to compare the outputs of a simulation employing S-curve technique results and the formulae corresponding to earned value management, as well as the life of a given project.
    Electronic ISSN: 1999-4893
    Topics: Computer Science
    Published by MDPI Publishing
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 83
    Publication Date: 2016-07-27
    Description: This paper discusses the parameter estimation problems of multi-input output-error autoregressive (OEAR) systems. By combining the auxiliary model identification idea and the data filtering technique, a data filtering based recursive generalized least squares (F-RGLS) identification algorithm and a data filtering based iterative least squares (F-LSI) identification algorithm are derived. Compared with the F-RGLS algorithm, the proposed F-LSI algorithm is more effective and can generate more accurate parameter estimates. The simulation results confirm this conclusion.
    Electronic ISSN: 1999-4893
    Topics: Computer Science
    Published by MDPI Publishing
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 84
    Publication Date: 2016-08-05
    Description: The force-directed paradigm is one of the few generic approaches to drawing graphs. Since force-directed algorithms can be extended easily, they are used frequently. Most of these algorithms are, however, quite slow on large graphs, as they compute a quadratic number of forces in each iteration. We give a new algorithm that takes only O ( m + n log n ) time per iteration when laying out a graph with n vertices and m edges. Our algorithm approximates the true forces using the so-called well-separated pair decomposition. We perform experiments on a large number of graphs and show that we can strongly reduce the runtime, even on graphs with less than a hundred vertices, without a significant influence on the quality of the drawings (in terms of the number of crossings and deviation in edge lengths).
    Electronic ISSN: 1999-4893
    Topics: Computer Science
    Published by MDPI Publishing
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 85
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-08-05
    Description: Online portfolio selection has attracted increasing attention from data mining and machine learning communities in recent years. An important theory in financial markets is mean reversion, which plays a critical role in some state-of-the-art portfolio selection strategies. Although existing mean reversion strategies have been shown to achieve good empirical performance on certain datasets, they seldom carefully deal with noise and outliers in the data, leading to suboptimal portfolios, and consequently yielding poor performance in practice. In this paper, we propose to exploit the reversion phenomenon by using robust $L_1$ -median estimators, and design a novel online portfolio selection strategy named “Robust Median Reversion” (RMR), which constructs optimal portfolios based on the improved reversion estimator. We examine the performance of the proposed algorithms on various real markets with extensive experiments. Empirical results show that RMR can overcome the drawbacks of existing mean reversion algorithms and achieve significantly better results. Finally, RMR runs in linear time, and thus is suitable for large-scale real-time algorithmic trading applications.
    Print ISSN: 1041-4347
    Electronic ISSN: 1558-2191
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 86
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-08-05
    Description: The development of a topic in a set of topic documents is constituted by a series of person interactions at a specific time and place. Knowing the interactions of the persons mentioned in these documents is helpful for readers to better comprehend the documents. In this paper, we propose a topic person interaction detection method called SPIRIT, which classifies the text segments in a set of topic documents that convey person interactions. We design the rich interactive tree structure to represent syntactic, context, and semantic information of text, and this structure is incorporated into a tree-based convolution kernel to identify interactive segments. Experiment results based on real world topics demonstrate that the proposed rich interactive tree structure effectively detects the topic person interactions and that our method outperforms many well-known relation extraction and protein-protein interaction methods.
    Print ISSN: 1041-4347
    Electronic ISSN: 1558-2191
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 87
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-08-05
    Description: A social recommendation system has attracted a lot of attention recently in the research communities of information retrieval, machine learning, and data mining. Traditional social recommendation algorithms are often based on batch machine learning methods which suffer from several critical limitations, e.g., extremely expensive model retraining cost whenever new user ratings arrive, unable to capture the change of user preferences over time. Therefore, it is important to make social recommendation system suitable for real-world online applications where data often arrives sequentially and user preferences may change dynamically and rapidly. In this paper, we present a new framework of online social recommendation from the viewpoint of online graph regularized user preference learning (OGRPL), which incorporates both collaborative user-item relationship as well as item content features into an unified preference learning process. We further develop an efficient iterative procedure, OGRPL-FW which utilizes the Frank-Wolfe algorithm, to solve the proposed online optimization problem. We conduct extensive experiments on several large-scale datasets, in which the encouraging results demonstrate that the proposed algorithms obtain significantly lower errors (in terms of both RMSE and MAE) than the state-of-the-art online recommendation methods when receiving the same amount of training data in the online learning process.
    Print ISSN: 1041-4347
    Electronic ISSN: 1558-2191
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 88
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-08-05
    Description: In this paper, we revisit the private over-threshold data aggregation problem. We formally define the problem's security requirements as both data and user privacy goals. To achieve both goals, and to strike a balance between efficiency and functionality, we devise an efficient cryptographic construction and its proxy-based variant. Both schemes are provably secure in the semi-honest model. Our key idea for the constructions and their malicious variants is to compose two encryption functions tightly coupled in a way that the two functions are commutative and one public-key encryption has an additive homomorphism. We call that double encryption. We analyze the computational and communication complexities of our construction, and show that it is much more efficient than the existing protocols in the literature. Specifically, our protocol has linear complexity in computation and communication with respect to the number of users. Its round complexity is also linear in the number of users. Finally, we show that our basic protocol is efficiently transformed into a stronger protocol secure in the presence of malicious adversaries, and provide the resulting protocol's performance and security analysis.
    Print ISSN: 1041-4347
    Electronic ISSN: 1558-2191
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 89
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-08-05
    Description: General health examination is an integral part of healthcare in many countries. Identifying the participants at risk is important for early warning and preventive intervention. The fundamental challenge of learning a classification model for risk prediction lies in the unlabeled data that constitutes the majority of the collected dataset. Particularly, the unlabeled data describes the participants in health examinations whose health conditions can vary greatly from healthy to very-ill. There is no ground truth for differentiating their states of health. In this paper, we propose a graph-based, semi-supervised learning algorithm called SHG-Health (Semi-supervised Heterogeneous Graph on Health) for risk predictions to classify a progressively developing situation with the majority of the data unlabeled. An efficient iterative algorithm is designed and the proof of convergence is given. Extensive experiments based on both real health examination datasets and synthetic datasets are performed to show the effectiveness and efficiency of our method.
    Print ISSN: 1041-4347
    Electronic ISSN: 1558-2191
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 90
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-08-05
    Description: Automated feature selection is important for text categorization to reduce feature size and to speed up learning process of classifiers. In this paper, we present a novel and efficient feature selection framework based on the Information Theory, which aims to rank the features with their discriminative capacity for classification. We first revisit two information measures: Kullback-Leibler divergence and Jeffreys divergence for binary hypothesis testing, and analyze their asymptotic properties relating to type I and type II errors of a Bayesian classifier. We then introduce a new divergence measure, called Jeffreys-Multi-Hypothesis (JMH) divergence, to measure multi-distribution divergence for multi-class classification. Based on the JMH-divergence, we develop two efficient feature selection methods, termed maximum discrimination ( $MD$ ) and methods, for text categorization. The promising results of extensive experiments demonstrate the effectiveness of the proposed approaches.
    Print ISSN: 1041-4347
    Electronic ISSN: 1558-2191
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 91
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-07-08
    Description: The proliferation of location-based social networks, such as Foursquare and Facebook Places, offers a variety of ways to record human mobility, including user generated geo-tagged contents, check-in services, and mobile apps. Although trajectory data is of great value to many applications, it is challenging to analyze and mine trajectory data due to the complex characteristics reflected in human mobility, which is affected by multiple contextual information. In this paper, we propose a Multi-Context Trajectory Embedding Model, called MC-TEM, to explore contexts in a systematic way. MC-TEM is developed in the distributed representation learning framework, and it is flexible to characterize various kinds of useful contexts for different applications. To the best of our knowledge, it is the first time that the distributed representation learning methods apply to trajectory data. We formally incorporate multiple context information of trajectory data into the proposed model, including user-level, trajectory-level, location-level, and temporal contexts. All the context information is represented in the same embedding space. We apply MC-TEM to two challenging tasks, namely location recommendation and social link prediction. We conduct extensive experiments on three real-world datasets. Extensive experiment results have demonstrated the superiority of our MC-TEM model over several state-of-the-art methods.
    Print ISSN: 1041-4347
    Electronic ISSN: 1558-2191
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 92
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-07-08
    Description: The study of urban networks reveals that the accessibility of important city objects for the vehicle traffic and pedestrians is significantly correlated to the popularity, micro-criminality, micro-economic vitality, and social liveability of the city, and is always the chief factor in regulating the growth and expansion of the city. The accessibility between different components of an urban structure are frequently measured along the streets and routes considered as edges of a planar graph, while the traffic ultimate destination points and street junctions are treated as vertices. For estimation of the accessibility of destination vertex $j$ from vertex $i$ through urban networks, in particular, the random walks are used to calculate the expected distance a random walker starting from $i$ makes before $j$ is visited (known as access time ). The state-of-the-art of access time computation is costly in large planar graphs since it involves matrix operation over entire graph. The time complexity is $O(n^{2.376})$ where $n$ is the number of vertices in the planar graph. To enable efficient access time query answering in large planar graphs, this work proposes the first access time oracle which is based on the proposed access time decomposition and reconstruction scheme. The oracle is a hierarchical data structure with deliberate design on the relationships between different hierarchical levels. The storage requirement of the proposed oracle is $O(n^{frac{4}{3}}log log n)$ and the access time query response time is $O(n^{frac{2}{3}})$ . The extensive tests on a number of large real-world road networks (with up to about 2 million vertices) have verified the superiority of the proposed oracle.
    Print ISSN: 1041-4347
    Electronic ISSN: 1558-2191
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 93
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-07-08
    Description: Clustering is one of the research hotspots in the field of data mining and has extensive applications in practice. Recently, Rodriguez and Laio [1] published a clustering algorithm on Science that identifies the clustering centers in an intuitive way and clusters objects efficiently and effectively. However, the algorithm is sensitive to a preassigned parameter and suffers from the identification of the “ideal” number of clusters. To overcome these shortages, this paper proposes a new clustering algorithm that can detect the clustering centers automatically via statistical testing. Specifically, the proposed algorithm first defines a new metric to measure the density of an object that is more robust to the preassigned parameter, further generates a metric to evaluate the centrality of each object. Afterwards, it identifies the objects with extremely large centrality metrics as the clustering centers via an outward statistical testing method. Finally, it groups the remaining objects into clusters containing their nearest neighbors with higher density. Extensive experiments are conducted over different kinds of clustering data sets to evaluate the performance of the proposed algorithm and compare with the algorithm in Science. The results show the effectiveness and robustness of the proposed algorithm.
    Print ISSN: 1041-4347
    Electronic ISSN: 1558-2191
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 94
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-07-08
    Description: A representative skyline contains $k$ skyline points that can represent its corresponding full skyline. The existing measuring criteria of $k$ representative skylines are specifically designed for static data, and they cannot effectively handle streaming data. In this paper, we focus on the problem of calculating the $k$ representative skyline over data streams. First, we propose a new criterion to choose $k$ skyline points as the $k$ representative skyline for data stream environments, termed the $k$ largest dominance skyline ( $k$ -LDS), which is representative to the entire data set and is highly stable over the streaming data. Second, we propose an efficient exact algorithm, called Prefix-based Algorithm (PBA), to solve the $k$ -LDS problem in a 2-dimensional space. The time complexity of PBA is only $mathcal {O}((M-k)times k)$ where $M$ is the size of the full skyline set. Third, the $k$ -LDS problem for a $d$ -dimensional ( $dge 3$ ) space turns out to be very complex. Therefore, a greedy algorithm is designed to answer $k$ -LDS queries. To further accelerate the calculation, we propose a $epsilon$ -greedy algorithm which can achieve an approximate factor of $frac{1}{(1+epsilon)}(1-frac{1}{sqrt{e}})$
    Print ISSN: 1041-4347
    Electronic ISSN: 1558-2191
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 95
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-07-08
    Description: Domain adaptation generalizes a learning model across source domain and target domain that are sampled from different distributions. It is widely applied to cross-domain data mining for reusing labeled information and mitigating labeling consumption. Recent studies reveal that deep neural networks can learn abstract feature representation, which can reduce, but not remove, the cross-domain discrepancy. To enhance the invariance of deep representation and make it more transferable across domains, we propose a unified deep adaptation framework for jointly learning transferable representation and classifier to enable scalable domain adaptation, by taking the advantages of both deep learning and optimal two-sample matching. The framework constitutes two inter-dependent paradigms, unsupervised pre-training for effective training of deep models using deep denoising autoencoders, and supervised fine-tuning for effective exploitation of discriminative information using deep neural networks, both learned by embedding the deep representations to reproducing kernel Hilbert spaces (RKHSs) and optimally matching different domain distributions. To enable scalable learning, we develop a linear-time algorithm using unbiased estimate that scales linearly to large samples. Extensive empirical results show that the proposed framework significantly outperforms state of the art methods on diverse adaptation tasks: sentiment polarity prediction, email spam filtering, newsgroup content categorization, and visual object recognition.
    Print ISSN: 1041-4347
    Electronic ISSN: 1558-2191
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 96
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-07-08
    Description: With the rapid growth of various applications on the Internet, recommender systems become fundamental for helping users alleviate the problem of information overload. Since contextual information is a significant factor in modeling the user behavior, various context-aware recommendation methods have been proposed recently. The state-of-the-art context modeling methods usually treat contexts as certain dimensions similar to those of users and items, and capture relevances between contexts and users/items. However, such kind of relevance has much difficulty in explanation. Some works on multi-domain relation prediction can also be used for the context-aware recommendation, but they have limitations in generating recommendations under a large amount of contextual information. Motivated by recent works in natural language processing, we represent each context value with a latent vector, and model the contextual information as a semantic operation on the user and item. Besides, we use the contextual operating tensor to capture the common semantic effects of contexts. Experimental results show that the proposed Context Operating Tensor (COT) model yields significant improvements over the competitive compared methods on three typical datasets. From the experimental results of COT, we also obtain some interesting observations which follow our intuition.
    Print ISSN: 1041-4347
    Electronic ISSN: 1558-2191
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 97
    Publication Date: 2016-07-08
    Description: In many applications, one can obtain descriptions about the same objects or events from a variety of sources. As a result, this will inevitably lead to data or information conflicts. One important problem is to identify the true information (i.e., the truths ) among conflicting sources of data. It is intuitive to trust reliable sources more when deriving the truths, but it is usually unknown which one is more reliable a priori . Moreover, each source possesses a variety of properties with different data types. An accurate estimation of source reliability has to be made by modeling multiple properties in a unified model. Existing conflict resolution work either does not conduct source reliability estimation, or models multiple properties separately. In this paper, we propose to resolve conflicts among multiple sources of heterogeneous data types. We model the problem using an optimization framework where truths and source reliability are defined as two sets of unknown variables. The objective is to minimize the overall weighted deviation between the truths and the multi-source observations where each source is weighted by its reliability. Different loss functions can be incorporated into this framework to recognize the characteristics of various data types, and efficient computation approaches are developed. The proposed framework is further adapted to deal with streaming data in an incremental fashion and large-scale data in MapReduce model. Experiments on real-world weather, stock, and flight data as well as simulated multi-source data demonstrate the advantage of jointly modeling different data types in the proposed framework.
    Print ISSN: 1041-4347
    Electronic ISSN: 1558-2191
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 98
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-07-08
    Description: Detection of non-overlapping and overlapping communities are essentially the same problem. However, current algorithms focus either on finding overlapping or non-overlapping communities. We present a generalized framework that can identify both non-overlapping and overlapping communities, without any prior input about the network or its community distribution. To do so, we introduce a vertex-based metric, GenPerm , that quantifies by how much a vertex belongs to each of its constituent communities. Our community detection algorithm is based on maximizing the GenPerm over all the vertices in the network. We demonstrate, through experiments over synthetic and real-world networks, that GenPerm is more effective than other metrics in evaluating community structure. Further, we show that due to its vertex-centric property, GenPerm can be used to unfold several inferences beyond community detection, such as core-periphery analysis and message spreading. Our algorithm for maximizing GenPerm outperforms six state-of-the-art algorithms in accurately predicting the ground-truth labels. Finally, we discuss the problem of resolution limit in overlapping communities and demonstrate that maximizing GenPerm can mitigate this problem.
    Print ISSN: 1041-4347
    Electronic ISSN: 1558-2191
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 99
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-07-08
    Description: Social hierarchy (i.e., pyramid structure of societies) is a fundamental concept in sociology and social network analysis. The importance of social hierarchy in a social network is that the topological structure of the social hierarchy is essential in both shaping the nature of social interactions between individuals and unfolding the structure of the social networks. The social hierarchy found in a social network can be utilized to improve the accuracy of link prediction, provide better query results, rank web pages, and study information flow and spread in complex networks. In this paper, we model a social network as a directed graph $G$ , and consider the social hierarchy as DAG (directed acyclic graph) of $G$ , denoted as $G_D$ . By DAG, all the vertices in $G$ can be partitioned into different levels, the vertices at the same level represent a disjoint group in the social hierarchy, and all the edges in DAG follow one direction. The main issue we study in this paper is how to find DAG $G_D$ in $G$ . The approach we take is to find $G_D$ by removing all possible cycles from $G$ such that $G = {cal U}(G) cup G_D$ , where ${cal U}(G)$ is a maximum Eulerian subgraph which contains all possible cycles. We give the reasons for doing so, investigate the properties of $G_D$ found, and discuss the applications. In addition, we develop a novel two-phase algorithm, called Greedy-&-Refine, which greedily computes an Eulerian subgraph and then refines this greedy solution to find the maximum Eulerian subgraph. We give a bound between the greedy solution and the optimal. The quality of our greedy approach is high. We conduct comprehensive experimental studies over 14 real-world datasets. The results show that our algorithms are at least two orders of magnitude faster than the baseline algorithm.
    Print ISSN: 1041-4347
    Electronic ISSN: 1558-2191
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 100
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-07-08
    Description: Visualization provides a powerful means for data analysis. But to be practical, visual analytics tools must support smooth and flexible use of visualizations at a fast rate. This becomes increasingly onerous with the ever-increasing size of real-world datasets. First, large databases make interaction more difficult once query response time exceeds several seconds. Second, any attempt to show all data points will overload the visualization, resulting in chaos that will only confuse the user. Over the last few years, substantial effort has been put into addressing both of these issues and many innovative solutions have been proposed. Indeed, data visualization is a topic that is too large to be addressed in a single survey paper. Thus, we restrict our attention here to interactive visualization of large data sets. Our focus then is skewed in a natural way towards query processing problem—provided by an underlying database system—rather than to the actual data visualization problem.
    Print ISSN: 1041-4347
    Electronic ISSN: 1558-2191
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
Close ⊗
This website uses cookies and the analysis tool Matomo. More information can be found here...