ALBERT

All Library Books, journals and Electronic Records Telegrafenberg

Ihre E-Mail wurde erfolgreich gesendet. Bitte prüfen Sie Ihren Maileingang.

Leider ist ein Fehler beim E-Mail-Versand aufgetreten. Bitte versuchen Sie es erneut.

Vorgang fortführen?

Exportieren
Filter
  • Artikel  (201)
  • 2015-2019  (201)
  • 2010-2014
  • 2017  (201)
  • 2010
  • IEEE Transactions on Knowledge and Data Engineering  (201)
  • 1274
  • Informatik  (201)
  • 1
    facet.materialart.
    Unbekannt
    Institute of Electrical and Electronics Engineers (IEEE)
    Publikationsdatum: 2017-09-13
    Beschreibung: Social recommender system, using social relation networks as additional input to improve the accuracy of traditional recommender systems, has become an important research topic. However, most existing methods utilize the entire user relationship network with no consideration to its huge size, sparsity, imbalance, and noise issues. This may degrade the efficiency and accuracy of social recommender systems. This study proposes a new approach to manage the complexity of adding social relation networks to recommender systems. Our method first generates an individual relationship network (IRN) for each user and item by developing a novel fitting algorithm of relationship networks to control the relationship propagation and contracting. We then fuse matrix factorization with social regularization and the neighborhood model using IRN's to generate recommendations. Our approach is quite general, and can also be applied to the item-item relationship network by switching the roles of users and items. Experiments on four datasets with different sizes, sparsity levels, and relationship types show that our approach can improve predictive accuracy and gain a better scalability compared with state-of-the-art social recommendation methods.
    Print ISSN: 1041-4347
    Digitale ISSN: 1558-2191
    Thema: Informatik
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 2
    facet.materialart.
    Unbekannt
    Institute of Electrical and Electronics Engineers (IEEE)
    Publikationsdatum: 2017-09-13
    Beschreibung: Heterogeneous graph is a popular data model to represent the real-world relations with abundant semantics. To analyze heterogeneous graphs, an important step is extracting homogeneous graphs from the heterogeneous graphs, called homogeneous graph extraction. In an extracted homogeneous graph, the relation is defined by a line pattern on the heterogeneous graph and the new attribute values of the relation are calculated by user-defined aggregate functions. The key challenges of the extraction problem are how to efficiently enumerate paths matched by the line pattern and aggregate values for each pair of vertices from the matched paths. To address above two challenges, we propose a parallel graph extraction framework, where we use vertex-centric model to enumerate paths and compute aggregate functions in parallel. The framework compiles the line pattern into a path concatenation plan, which determines the order of concatenating paths and generates the final paths in a divide-and-conquer manner. We introduce a cost model to estimate the cost of a plan and discuss three plan selection strategies, among which the best plan can enumerate paths in $\mathcal {O}(log(l))$ iterations, where $l$ is the length of a pattern. Furthermore, to improve the performance of evaluating aggregate functions, we classify the aggregate functions into three categories, i.e., distributive aggregation, algebraic aggregation, and holistic aggregation. Since the distributive and algebraic aggregations can be computed from the partial paths, we speed up the aggregation by computing partial aggregate values during the path enumeration.
    Print ISSN: 1041-4347
    Digitale ISSN: 1558-2191
    Thema: Informatik
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 3
    facet.materialart.
    Unbekannt
    Institute of Electrical and Electronics Engineers (IEEE)
    Publikationsdatum: 2017-09-13
    Beschreibung: Recently, social networks have witnessed a massive surge in popularity. A key issue in social network research is network evolution analysis, which assumes that all the autonomous nodes in a social network follow uniform evolution mechanisms. However, different nodes in a social network should have different evolution mechanisms to generate different edges. This is proposed as the underlying idea to ensure the nodes’ evolution diversity in this paper. Our approach involves identifying the micro-level node evolution that generates different edges by introducing the existing link prediction methods from the perspectives of nodes. We also propose the edge generation coefficient to evaluate the extent to which an edge's generation can be explained by a link prediction method. To quantify the nodes’ evolution diversity, we define the diverse evolution distance. Furthermore, a diverse node adaption algorithm is proposed to indirectly analyze the evolution of the entire network based on the nodes’ evolution diversity. Extensive experiments on disparate real-world networks demonstrate that the introduction of the nodes’ evolution diversity is important and beneficial for analyzing the network evolution. The diverse node adaption algorithm outperforms other state-of-the-art link prediction algorithms in terms of both accuracy and universality. The greater the nodes’ evolution diversity, the more obvious its advantages.
    Print ISSN: 1041-4347
    Digitale ISSN: 1558-2191
    Thema: Informatik
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 4
    facet.materialart.
    Unbekannt
    Institute of Electrical and Electronics Engineers (IEEE)
    Publikationsdatum: 2017-09-13
    Beschreibung: Probabilistic top- $k$ ranking is an important and well-studied query operator in uncertain databases. However, the quality of top- $k$ results might be heavily affected by the ambiguity and uncertainty of the underlying data. Uncertainty reduction techniques have been proposed to improve the quality of top- $k$ results by cleaning the original data. Unfortunately, most data cleaning models aim to probe the exact values of the objects individually and therefore do not work well for subjective data types, such as user ratings, which are inherently probabilistic. In this paper, we propose a novel pairwise crowdsourcing model to reduce the uncertainty of top- $k$ ranking using a crowd of domain experts. Given a crowdsourcing task of limited budget, we propose efficient algorithms to select the best object pairs for crowdsourcing that will bring in the highest quality improvement. Extensive experiments show that our proposed solutions outperform a random selection method by up to 30 times in terms of quality improvement of probabilistic top- $k$ ranking queries. In terms of efficiency, our proposed solutions can reduce the elapsed time of a brute-force algorithm from several days to one minute.
    Print ISSN: 1041-4347
    Digitale ISSN: 1558-2191
    Thema: Informatik
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 5
    facet.materialart.
    Unbekannt
    Institute of Electrical and Electronics Engineers (IEEE)
    Publikationsdatum: 2017-08-12
    Beschreibung: The increase of interest in using social media as a source for research has motivated tackling the challenge of automatically geolocating tweets, given the lack of explicit location information in the majority of tweets. In contrast to much previous work that has focused on location classification of tweets restricted to a specific country, here we undertake the task in a broader context by classifying global tweets at the country level, which is so far unexplored in a real-time scenario. We analyze the extent to which a tweet’s country of origin can be determined by making use of eight tweet-inherent features for classification. Furthermore, we use two datasets, collected a year apart from each other, to analyze the extent to which a model trained from historical tweets can still be leveraged for classification of new tweets. With classification experiments on all 217 countries in our datasets, as well as on the top 25 countries, we offer some insights into the best use of tweet-inherent features for an accurate country-level classification of tweets. We find that the use of a single feature, such as the use of tweet content alone-the most widely used feature in previous work-leaves much to be desired. Choosing an appropriate combination of both tweet content and metadata can actually lead to substantial improvements of between 20 and 50 percent. We observe that tweet content, the user’s self-reported location and the user’s real name, all of which are inherent in a tweet and available in a real-time scenario, are particularly useful to determine the country of origin. We also experiment on the applicability of a model trained on historical tweets to classify new tweets, finding that the choice of a particular combination of features whose utility does not fade over time can actually lead to comparable performance, avoiding the need to retrain. However, the difficulty of achieving accurate classification inc- eases slightly for countries with multiple commonalities, especially for English and Spanish speaking countries.
    Print ISSN: 1041-4347
    Digitale ISSN: 1558-2191
    Thema: Informatik
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 6
    facet.materialart.
    Unbekannt
    Institute of Electrical and Electronics Engineers (IEEE)
    Publikationsdatum: 2017-08-12
    Beschreibung: The query logs from an on-line map query system provide rich cues to understand the behaviors of human crowds. With the growing ability of collecting large scale query logs, the query suggestion has been a topic of recent interest. In general, query suggestion aims at recommending a list of relevant queries w.r.t. users’ inputs via an appropriate learning of crowds’ query logs. In this paper, we are particularly interested in map query suggestions (e.g., the predictions of location-related queries) and propose a novel model Hierarchical Contextual Attention Recurrent Neural Network (HCAR-NN) for map query suggestion in an encoding-decoding manner. Given crowds map query logs, our proposed HCAR-NN not only learns the local temporal correlation among map queries in a query session (e.g., queries in a short-term interval are relevant to accomplish a search mission), but also captures the global longer range contextual dependencies among map query sessions in query logs (e.g., how a sequence of queries within a short-term interval has an influence on another sequence of queries). We evaluate our approach over millions of queries from a commercial search engine (i.e., Baidu Map ). Experimental results show that the proposed approach provides significant performance improvements over the competitive existing methods in terms of classical metrics (i.e., Recall@K and MRR ) as well as the prediction of crowds’ search missions.
    Print ISSN: 1041-4347
    Digitale ISSN: 1558-2191
    Thema: Informatik
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 7
    facet.materialart.
    Unbekannt
    Institute of Electrical and Electronics Engineers (IEEE)
    Publikationsdatum: 2017-09-13
    Beschreibung: Data science models, although successful in a number of commercial domains, have had limited applicability in scientific problems involving complex physical phenomena. Theory-guided data science (TGDS) is an emerging paradigm that aims to leverage the wealth of scientific knowledge for improving the effectiveness of data science models in enabling scientific discovery. The overarching vision of TGDS is to introduce scientific consistency as an essential component for learning generalizable models. Further, by producing scientifically interpretable models, TGDS aims to advance our scientific understanding by discovering novel domain insights. Indeed, the paradigm of TGDS has started to gain prominence in a number of scientific disciplines such as turbulence modeling, material discovery, quantum chemistry, bio-medical science, bio-marker discovery, climate science, and hydrology. In this paper, we formally conceptualize the paradigm of TGDS and present a taxonomy of research themes in TGDS. We describe several approaches for integrating domain knowledge in different research themes using illustrative examples from different disciplines. We also highlight some of the promising avenues of novel research for realizing the full potential of theory-guided data science.
    Print ISSN: 1041-4347
    Digitale ISSN: 1558-2191
    Thema: Informatik
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 8
    facet.materialart.
    Unbekannt
    Institute of Electrical and Electronics Engineers (IEEE)
    Publikationsdatum: 2017-09-13
    Beschreibung: Many feature extraction methods reduce the dimensionality of data based on the input graph matrix. The graph construction which reflects relationships among raw data points is crucial to the quality of resulting low-dimensional representations. To improve the quality of graph and make it more suitable for feature extraction tasks, we incorporate a new graph learning mechanism into feature extraction and add an interaction between the learned graph and the low-dimensional representations. Based on this learning mechanism, we propose a novel framework, termed as unsupervised single view feature extraction with structured graph (FESG), which learns both a transformation matrix and an ideal structured graph containing the clustering information. Moreover, we propose a novel way to extend FESG framework for multi-view learning tasks. The extension is named as unsupervised multiple views feature extraction with structured graph (MFESG), which learns an optimal weight for each view automatically without requiring an additional parameter. To show the effectiveness of the framework, we design two concrete formulations within FESG and MFESG, together with two efficient solving algorithms. Promising experimental results on plenty of real-world datasets have validated the effectiveness of our proposed algorithms.
    Print ISSN: 1041-4347
    Digitale ISSN: 1558-2191
    Thema: Informatik
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 9
    facet.materialart.
    Unbekannt
    Institute of Electrical and Electronics Engineers (IEEE)
    Publikationsdatum: 2017-09-13
    Beschreibung: We propose a parametric network generation model which we call network reconstruction model (NRM) for structural reconstruction of scale-free real networks with power-law exponent greater than 2 in the tail of its degree distribution. The reconstruction method for a real network is concerned with finding the optimal values of the model parameters by utilizing the power-law exponents of model network and the real network. The method is validated for certain real world networks. The usefulness of NRM in order to solve structural reconstruction problem is demonstrated by comparing its performance with some existing popular network generative models. We show that NRM can generate networks which follow edge-densification and densification power-law when the model parameters satisfy an inequality. Computable expressions of the expected number of triangles and expected diameter are obtained for model networks generated by NRM. Finally, we numerically establish that NRM can generate networks with shrinking diameter and modular structure when specific model parameters are chosen.
    Print ISSN: 1041-4347
    Digitale ISSN: 1558-2191
    Thema: Informatik
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 10
    facet.materialart.
    Unbekannt
    Institute of Electrical and Electronics Engineers (IEEE)
    Publikationsdatum: 2017-09-13
    Beschreibung: Censoring is a common phenomenon that arises in many longitudinal studies where an event of interest could not be recorded within the given time frame. Censoring causes missing time-to-event labels, and this effect is compounded when dealing with datasets which have high amounts of censored instances. In addition, dependent censoring in the data, where censoring is dependent on the covariates in the data leads to bias in standard survival estimators. This motivates us to develop an approach for pre-processing censored data which calibrates the right censored (RC) times in an attempt to reduce the bias in the survival estimators. This calibration is done using an imputation method which estimates the sparse inverse covariance matrix over the dataset in an iterative convergence framework. During estimation, we apply row and column-based regularization to account for both row and column-wise correlations between different instances while imputing them. This is followed by comparing these imputed censored times with the original RC times to obtain the final calibrated RC times. These calibrated RC times can now be used in the survival dataset in place of the original RC times for more effective prediction. One of the major benefits of our calibration approach is that it is a pre-processing method for censored data which can be used in conjunction with any survival prediction algorithm and improve its performance. We evaluate the goodness of our approach using a wide array of survival prediction algorithms which are applied over crowdfunding data, electronic health records (EHRs), and synthetic censored datasets. Experimental results indicate that our calibration method improves the AUC values of survival prediction algorithms, compared to applying them directly on the original survival data.
    Print ISSN: 1041-4347
    Digitale ISSN: 1558-2191
    Thema: Informatik
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 11
    facet.materialart.
    Unbekannt
    Institute of Electrical and Electronics Engineers (IEEE)
    Publikationsdatum: 2017-09-13
    Beschreibung: Business processes are prone to unexpected changes, as process workers may suddenly or gradually start executing a process differently in order to adjust to changes in workload, season, or other external factors. Early detection of business process changes enables managers to identify and act upon changes that may otherwise affect process performance. Business process drift detection refers to a family of methods to detect changes in a business process by analyzing event logs extracted from the systems that support the execution of the process. Existing methods for business process drift detection are based on an explorative analysis of a potentially large feature space and in some cases they require users to manually identify specific features that characterize the drift. Depending on the explored feature space, these methods miss various types of changes. Moreover, they are either designed to detect sudden drifts or gradual drifts but not both. This paper proposes an automated and statistically grounded method for detecting sudden and gradual business process drifts under a unified framework. An empirical evaluation shows that the method detects typical change patterns with significantly higher accuracy and lower detection delay than existing methods, while accurately distinguishing between sudden and gradual drifts.
    Print ISSN: 1041-4347
    Digitale ISSN: 1558-2191
    Thema: Informatik
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 12
    facet.materialart.
    Unbekannt
    Institute of Electrical and Electronics Engineers (IEEE)
    Publikationsdatum: 2017-09-13
    Beschreibung: With the soaring development of large scale online social networks, online information sharing is becoming ubiquitous everyday. Various information is propagating through online social networks including both the positive and negative. In this paper, we focus on the negative information problems such as the online rumors. Rumor blocking is a serious problem in large-scale social networks. Malicious rumors could cause chaos in society and hence need to be blocked as soon as possible after being detected. In this paper, we propose a model of dynamic rumor influence minimization with user experience (DRIMUX). Our goal is to minimize the influence of the rumor (i.e., the number of users that have accepted and sent the rumor) by blocking a certain subset of nodes. A dynamic Ising propagation model considering both the global popularity and individual attraction of the rumor is presented based on a realistic scenario. In addition, different from existing problems of influence minimization, we take into account the constraint of user experience utility. Specifically, each node is assigned a tolerance time threshold. If the blocking time of each user exceeds that threshold, the utility of the network will decrease. Under this constraint, we then formulate the problem as a network inference problem with survival theory, and propose solutions based on maximum likelihood principle. Experiments are implemented based on large-scale real world networks and validate the effectiveness of our method.
    Print ISSN: 1041-4347
    Digitale ISSN: 1558-2191
    Thema: Informatik
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 13
    facet.materialart.
    Unbekannt
    Institute of Electrical and Electronics Engineers (IEEE)
    Publikationsdatum: 2017-09-13
    Beschreibung: Monotonic classification is a kind of classification task in which a monotonicity constraint exist between features and class, i.e., if sample $x_i$ has a higher value in each feature than sample $x_j$ , it should be assigned to a class with a higher level than the level of $x_j$ 's class. Several methods have been proposed, but they have some limits such as with limited kind of data or limited classification accuracy. In our former work, the classification accuracy on monotonic classification has been improved by fusing monotonic decision trees, but it always has a complex classification model. This work aims to find a monotonic classifier to process both nominal and numeric data by fusing complete monotonic decision trees. Through finding the completed feature subsets based on discernibility matrix on ordinal dataset, a set of monotonic decision trees can be obtained directly and automatically, on which the rank is still preserved. Fewer decision trees are needed, which will serve as base classifiers to construct a decision forest fused complete monotonic decision trees. The experiment results on 10 datasets demonstrate that the proposed method can reduce the number of base classifiers effectively and then simplify classification model, and obtain good classification performance simultaneously.
    Print ISSN: 1041-4347
    Digitale ISSN: 1558-2191
    Thema: Informatik
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 14
    facet.materialart.
    Unbekannt
    Institute of Electrical and Electronics Engineers (IEEE)
    Publikationsdatum: 2017-09-13
    Beschreibung: In the field of pattern recognition, data analysis, and machine learning, data points are usually modeled as high-dimensional vectors. Due to the curse-of-dimensionality, it is non-trivial to efficiently process the orginal data directly. Given the unique properties of nonlinear dimensionality reduction techniques, nonlinear learning methods are widely adopted to reduce the dimension of data. However, existing nonlinear learning methods fail in many real applications because of the too-strict requirements (for real data) or the difficulty in parameters tuning. Therefore, in this paper, we investigate the manifold learning methods which belong to the family of nonlinear dimensionality reduction methods. Specifically, we proposed a new manifold learning principle for dimensionality reduction named Curved Cosine Mapping (CCM). Based on the law of cosines in Euclidean space, CCM applies a brand new mapping pattern to manifold learning. In CCM, the nonlinear geometric relationships are obtained by utlizing the law of cosines, and then quantified as the dimensionality-reduced features. Compared with the existing approaches, the model has weaker theoretical assumptions over the input data. Moreover, to further reduce the computation cost, an optimized version of CCM is developed. Finally, we conduct extensive experiments over both artificial and real-world datasets to demonstrate the performance of proposed techniques.
    Print ISSN: 1041-4347
    Digitale ISSN: 1558-2191
    Thema: Informatik
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 15
    facet.materialart.
    Unbekannt
    Institute of Electrical and Electronics Engineers (IEEE)
    Publikationsdatum: 2017-09-13
    Beschreibung: Database design is critical for high performance in relational databases and a myriad of tools exist to aid application designers in selecting an appropriate schema. While the problem of schema optimization is also highly relevant for NoSQL databases, existing tools for relational databases are inadequate in that setting. Application designers wishing to use a NoSQL database instead rely on rules of thumb to select an appropriate schema. We present a system for recommending database schemas for NoSQL applications. Our cost-based approach uses a novel binary integer programming formulation to guide the mapping from the application's conceptual data model to a database schema. We implemented a prototype of this approach for the Cassandra extensible record store. Our prototype, the NoSQL Schema Evaluator (NoSE) is able to capture rules of thumb used by expert designers without explicitly encoding the rules. Automating the design process allows NoSE to produce efficient schemas and to examine more alternatives than would be possible with a manual rule-based approach.
    Print ISSN: 1041-4347
    Digitale ISSN: 1558-2191
    Thema: Informatik
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 16
    facet.materialart.
    Unbekannt
    Institute of Electrical and Electronics Engineers (IEEE)
    Publikationsdatum: 2017-09-13
    Beschreibung: During the last decade, community-based question answering (CQA) sites have accumulated a vast amount of questions and their crowdsourced answers over time. How to efficiently identify the quality of answers that are relevant to a given question has become an active line of research in CQA. The major challenge of CQA is the accurate selection of high-quality answers w.r.t given questions. Previous approaches tend to model the semantic matching between individual pair of one question and its corresponding answer (how fitting an answer is to a posted question). However, these works ignore the temporal interactions between answers (how previous answers influence the late posted answers). For example, a rational user likely adapts others’ opinions, revises his inclinations, and posts a more appropriate answer after understanding the given question and previously posted answers. As a result, this paper devises an architecture named Temporal Interaction and Causal Influence LSTM (TC-LSTM) to effectively leverage not only the causal influence between question-answer (how appropriate an answer is for a given question) but also the temporal interactions between answers-answer (how a high-quality answer gradually forms). In particular, long short-term memory (LSTM) is used to capture the explicit question-answer influence and the implicit answers-answer interactions. Experiments are conducted on SemEval 2015 CQA dataset for answer classification task and Baidu Zhidao Dataset for answer ranking task. The experimental results show the advantage of our model comparing with other state-of-the-art methods.
    Print ISSN: 1041-4347
    Digitale ISSN: 1558-2191
    Thema: Informatik
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 17
    facet.materialart.
    Unbekannt
    Institute of Electrical and Electronics Engineers (IEEE)
    Publikationsdatum: 2017-09-13
    Beschreibung: The recent Bigdata and IoT era has presented a number of applications that generate objects in a streaming fashion. It is well-known that real-time mining of important patterns from data streams support many domains. In retail markets and social network services, for example, such patterns are itemsets and words that frequently appear in many user-accounts, i.e., co-occurrence patterns . To efficiently monitor co-occurrence patterns, we address the novel problem of mining top-k closed co-occurrence patterns across multiple streams. We employ sliding window setting in this problem, and each pattern is ranked based on count, which is the number of streams that have generated the pattern. Since objects are consecutively generated and deleted, the count of a given pattern is dynamic, which may change the rank of the pattern. This renders a challenge to monitoring the top-k answer in real-time. We propose an index-based algorithm that addresses the challenge and provides the exact answer. Specifically, we propose the CP-Graph, a hybrid index of graph and inverted file structures. The CP-Graph can efficiently compute the count of a given pattern and update the answer while pruning unnecessary patterns. Our experimental study on real datasets demonstrates the efficiency and scalability of our solution.
    Print ISSN: 1041-4347
    Digitale ISSN: 1558-2191
    Thema: Informatik
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 18
    facet.materialart.
    Unbekannt
    Institute of Electrical and Electronics Engineers (IEEE)
    Publikationsdatum: 2017-09-13
    Beschreibung: Networks are prevalent in many high impact domains. Moreover, cross-domain interactions are frequently observed in many applications, which naturally form the dependencies between different networks. Such kind of highly coupled network systems are referred to as multi-layered networks , and have been used to characterize various complex systems, including critical infrastructure networks, cyber-physical systems, collaboration platforms, biological systems, and many more. Different from single-layered networks where the functionality of their nodes is mainly affected by within-layer connections, multi-layered networks are more vulnerable to disturbance as the impact can be amplified through cross-layer dependencies, leading to the cascade failure to the entire system. To manipulate the connectivity in multi-layered networks, some recent methods have been proposed based on two-layered networks with specific types of connectivity measures. In this paper, we address the above challenges in multiple dimensions. First, we propose a family of connectivity measures ( SubLine ) that unifies a wide range of classic network connectivity measures. Third, we reveal that the connectivity measures in the SubLine family enjoy diminishing returns property , which guarantees a near-optimal solution with linear complexity for the connectivity optimization problem. Finally, we evaluate our proposed algorithm on real data sets to demonstrate its effectiveness and efficiency.
    Print ISSN: 1041-4347
    Digitale ISSN: 1558-2191
    Thema: Informatik
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 19
    facet.materialart.
    Unbekannt
    Institute of Electrical and Electronics Engineers (IEEE)
    Publikationsdatum: 2017-09-13
    Beschreibung: The increasing popularity of social media has encouraged health consumers to share, explore, and validate health and wellness information on social networks, which provide a rich repository of Patient Generated Wellness Data (PGWD). While data-driven healthcare has attracted a lot of attention from academia and industry for improving care delivery through personalized healthcare, limited research has been done on harvesting and utilizing PGWD available on social networks. Recently, representation learning has been widely used in many applications to learn low-dimensional embedding of users. However, existing approaches for representation learning are not directly applicable to PGWD due to its domain nature as characterized by longitudinality, incompleteness, and sparsity of observed data as well as heterogeneity of the patient population. To tackle these problems, we propose an approach which directly learns the embedding from longitudinal data of users, instead of vector-based representation. In particular, we simultaneously learn a low-dimensional latent space as well as the temporal evolution of users in the wellness space. The proposed method takes into account two types of wellness prior knowledge: (1) temporal progression of wellness attributes; and (2) heterogeneity of wellness attributes in the patient population. Our approach scales well to large datasets using parallel stochastic gradient descent. We conduct extensive experiments to evaluate our framework at tackling three major tasks in wellness domain: attribute prediction, success prediction, and community detection. Experimental results on two real-world datasets demonstrate the ability of our approach in learning effective user representations.
    Print ISSN: 1041-4347
    Digitale ISSN: 1558-2191
    Thema: Informatik
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 20
    facet.materialart.
    Unbekannt
    Institute of Electrical and Electronics Engineers (IEEE)
    Publikationsdatum: 2017-09-13
    Beschreibung: The widespread location-aware applications produce a vast amount of spatio-textual data that contains both spatial and textual attributes. To make use of this enriched information for users to describe their preferences for travel routes, we propose a Bounded-Cost Informative Route (BCIR) query to retrieve the routes that are the most textually relevant to the user-specified query keywords subject to a travel cost constraint. BCIR query is particularly helpful for tourists and city explorers to plan their travel routes. We will show that BCIR query is an NP-hard problem. To answer BCIR query efficiently, we propose an exact solution with effective pruning techniques and two approximate solutions with performance guarantees. Extensive experiments over real data sets demonstrate that the proposed solutions achieve the expected performance.
    Print ISSN: 1041-4347
    Digitale ISSN: 1558-2191
    Thema: Informatik
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 21
    facet.materialart.
    Unbekannt
    Institute of Electrical and Electronics Engineers (IEEE)
    Publikationsdatum: 2017-09-13
    Beschreibung: In partial label learning, each training example is associated with a set of candidate labels among which only one is the ground-truth label. The common strategy to induce predictive model is trying to disambiguate the candidate label set, i.e., differentiating the modeling outputs of individual candidate labels. Specifically, disambiguation by differentiation can be conducted either by identifying the ground-truth label iteratively or by treating each candidate label equally. Nonetheless, the disambiguation strategy is prone to be misled by the false positive labels co-occurring with ground-truth label. In this paper, a new partial label learning strategy is studied which refrains from conducting disambiguation. Specifically, by adapting error-correcting output codes (ECOC), a simple yet effective approach named Pl-ecoc is proposed by utilizing candidate label set as an entirety . During training phase, to build binary classifier w.r.t. each column coding, any partially labeled example will be regarded as a positive or negative training example only if its candidate label set entirely falls into the coding dichotomy. During testing phase, class label for the unseen instance is determined via loss-based decoding which considers binary classifiers’ empirical performance and predictive margin. Extensive experiments show that Pl-ecoc performs favorably against state-of-the-art partial label learning approaches.
    Print ISSN: 1041-4347
    Digitale ISSN: 1558-2191
    Thema: Informatik
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 22
    facet.materialart.
    Unbekannt
    Institute of Electrical and Electronics Engineers (IEEE)
    Publikationsdatum: 2017-09-13
    Beschreibung: Convolutional Neural Network (CNN) has gained attractions in image analytics and speech recognition in recent years. However, employing CNN for classification of graphs remains to be challenging. This paper presents the Ngram graph-block based convolutional neural network model for classification of graphs. Our Ngram deep learning framework consists of three novel components. First, we introduce the concept of $n$ -gram block to transform each raw graph object into a sequence of $n$ -gram blocks connected through overlapping regions. Second, we introduce a diagonal convolution step to extract local patterns and connectivity features hidden in these $n$ -gram blocks by performing $n$ -gram normalization. Finally, we develop deeper global patterns based on the local patterns and the ways that they respond to overlapping regions by building a $n$ -gram deep learning model using convolutional neural network. We evaluate the effectiveness of our approach by comparing it with the existing state of art methods using five real graph repositories from bioinformatics and social networks domains. Our results show that the Ngram approach outperforms existing methods with high accuracy and comparable performance.
    Print ISSN: 1041-4347
    Digitale ISSN: 1558-2191
    Thema: Informatik
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 23
    facet.materialart.
    Unbekannt
    Institute of Electrical and Electronics Engineers (IEEE)
    Publikationsdatum: 2017-09-13
    Beschreibung: Linear discriminant analysis (LDA) is one of the most important supervised linear dimensional reduction techniques which seeks to learn low-dimensional representation from the original high-dimensional feature space through a transformation matrix, while preserving the discriminative information via maximizing the between-class scatter matrix and minimizing the within class scatter matrix. However, the conventional LDA is formulated to maximize the arithmetic mean of trace ratios which suffers from the domination of the largest objectives and might deteriorate the recognition accuracy in practical applications with a large number of classes. In this paper, we propose a new criterion to maximize the weighted harmonic mean of trace ratios, which effectively avoid the domination problem while did not raise any difficulties in the formulation. An efficient algorithm is exploited to solve the proposed challenging problems with fast convergence, which might always find the globally optimal solution just using eigenvalue decomposition in each iteration. Finally, we conduct extensive experiments to illustrate the effectiveness and superiority of our method over both of synthetic datasets and real-life datasets for various tasks, including face recognition, human motion recognition and head pose recognition. The experimental results indicate that our algorithm consistently outperforms other compared methods on all of the datasets.
    Print ISSN: 1041-4347
    Digitale ISSN: 1558-2191
    Thema: Informatik
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 24
    facet.materialart.
    Unbekannt
    Institute of Electrical and Electronics Engineers (IEEE)
    Publikationsdatum: 2017-09-13
    Beschreibung: Social media plays a major role in helping people affected by natural calamities. These people use social media to request information and help in situations where time is a critical commodity. However, generic social media platforms like Twitter and Facebook are not conducive for obtaining answers promptly. Algorithms to ensure prompt responders for questions in social media have to understand and model the factors affecting their response time. In this paper, we draw from sociological studies on information seeking and organizational behavior to identify users who can provide timely and relevant responses to questions posted on social media. We first draw from these theories to model the future availability and past response behavior of the candidate responders and integrate these criteria with user relevance. We propose a learning algorithm from these criteria to derive optimal rankings of responders for a given question. We present questions posted on Twitter as a form of information seeking activity in social media and use them to evaluate our framework. Our experiments demonstrate that the proposed framework is useful in identifying timely and relevant responders for questions in social media.
    Print ISSN: 1041-4347
    Digitale ISSN: 1558-2191
    Thema: Informatik
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 25
    facet.materialart.
    Unbekannt
    Institute of Electrical and Electronics Engineers (IEEE)
    Publikationsdatum: 2017-06-07
    Beschreibung: Over the last decades, several studies have demonstrated the importance of co-clustering to simultaneously produce groups of objects and features. Even to obtain object clusters only, using co-clustering is often more effective than one-way clustering, especially when considering sparse high dimensional data. In this paper, we present a novel generative mixture model for co-clustering such data. This model, the Sparse Poisson Latent Block Model (SPLBM), is based on the Poisson distribution, which arises naturally for contingency tables, such as document-term matrices. The advantages of SPLBM are two-fold. First, it is a rigorous statistical model which is also very parsimonious. Second, it has been designed from the ground up to deal with data sparsity problems. As a consequence, in addition to seeking homogeneous blocks, as other available algorithms, it also filters out homogeneous but noisy ones due to the sparsity of the data. Experiments on various datasets of different size and structure show that an algorithm based on SPLBM clearly outperforms state-of-the-art algorithms. Most notably, the SPLBM-based algorithm presented here succeeds in retrieving the natural cluster structure of difficult, unbalanced datasets which other known algorithms are unable to handle effectively.
    Print ISSN: 1041-4347
    Digitale ISSN: 1558-2191
    Thema: Informatik
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 26
    facet.materialart.
    Unbekannt
    Institute of Electrical and Electronics Engineers (IEEE)
    Publikationsdatum: 2017-06-07
    Beschreibung: With the increasing availability of moving-object tracking data, trajectory search is increasingly important. We propose and investigate a novel query type named trajectory search by regions of interest (TSR query). Given an argument set of trajectories, a TSR query takes a set of regions of interest as a parameter and returns the trajectory in the argument set with the highest spatial-density correlation to the query regions. This type of query is useful in many popular applications such as trip planning and recommendation, and location based services in general. TSR query processing faces three challenges: how to model the spatial-density correlation between query regions and data trajectories, how to effectively prune the search space, and how to effectively schedule multiple so-called query sources. To tackle these challenges, a series of new metrics are defined to model spatial-density correlations. An efficient trajectory search algorithm is developed that exploits upper and lower bounds to prune the search space and that adopts a query-source selection strategy, as well as integrates a heuristic search strategy based on priority ranking to schedule multiple query sources. The performance of TSR query processing is studied in extensive experiments based on real and synthetic spatial data.
    Print ISSN: 1041-4347
    Digitale ISSN: 1558-2191
    Thema: Informatik
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 27
    facet.materialart.
    Unbekannt
    Institute of Electrical and Electronics Engineers (IEEE)
    Publikationsdatum: 2017-06-07
    Beschreibung: Traditional relational topic models provide a successful way to discover the hidden topics from a document network. Many theoretical and practical tasks, such as dimensional reduction, document clustering, and link prediction, could benefit from this revealed knowledge. However, existing relational topic models are based on an assumption that the number of hidden topics is known a priori, which is impractical in many real-world applications. Therefore, in order to relax this assumption, we propose a nonparametric relational topic model using stochastic processes instead of fixed-dimensional probability distributions in this paper. Specifically, each document is assigned a Gamma process, which represents the topic interest of this document. Although this method provides an elegant solution, it brings additional challenges when mathematically modeling the inherent network structure of typical document network, i.e., two spatially closer documents tend to have more similar topics. Furthermore, we require that the topics are shared by all the documents. In order to resolve these challenges, we use a subsampling strategy to assign each document a different Gamma process from the global Gamma process, and the subsampling probabilities of documents are assigned with a Markov Random Field constraint that inherits the document network structure. Through the designed posterior inference algorithm, we can discover the hidden topics and its number simultaneously. Experimental results on both synthetic and real-world network datasets demonstrate the capabilities of learning the hidden topics and, more importantly, the number of topics.
    Print ISSN: 1041-4347
    Digitale ISSN: 1558-2191
    Thema: Informatik
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 28
    facet.materialart.
    Unbekannt
    Institute of Electrical and Electronics Engineers (IEEE)
    Publikationsdatum: 2017-06-07
    Beschreibung: We present a new two-level composition model for crowdsourced Sensor-Cloud services based on dynamic features such as spatio-temporal aspects. The proposed approach is defined based on a formal Sensor-Cloud service model that abstracts the functionality and non-functional aspects of sensor data on the cloud in terms of spatio-temporal features. A spatio-temporal indexing technique based on the 3D R-tree to enable fast identification of appropriate Sensor-Cloud services is proposed. A novel quality model is introduced that considers dynamic features of sensors to select and compose Sensor-Cloud services. The quality model defines Coverage as a Service which is formulated as a composition of crowdsourced Sensor-Cloud services. We present two new QoS-aware spatio-temporal composition algorithms to select the optimal composition plan. Experimental results validate the performance of the proposed algorithms.
    Print ISSN: 1041-4347
    Digitale ISSN: 1558-2191
    Thema: Informatik
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 29
    facet.materialart.
    Unbekannt
    Institute of Electrical and Electronics Engineers (IEEE)
    Publikationsdatum: 2017-06-07
    Beschreibung: Web search engines are composed by thousands of query processing nodes, i.e., servers dedicated to process user queries. Such many servers consume a significant amount of energy, mostly accountable to their CPUs, but they are necessary to ensure low latencies, since users expect sub-second response times (e.g., 500 ms). However, users can hardly notice response times that are faster than their expectations. Hence, we propose the Predictive Energy Saving Online Scheduling Algorithm ( $\sf{PESOS}$ ) to select the most appropriate CPU frequency to process a query on a per-core basis. $\sf{PESOS}$ aims at process queries by their deadlines, and leverage high-level scheduling information to reduce the CPU energy consumption of a query processing node. $\sf{PESOS}$ bases its decision on query efficiency predictors, estimating the processing volume and processing time of a query. We experimentally evaluate $\sf{PESOS}$ upon the TREC ClueWeb09B collection and the MSN2006 query log. Results show that $\sf{PESOS}$ can reduce the CPU energy consumption of a query processing node up to ${\sim}$ 48 percent compared to a system running at maximum CPU core frequency. $\sf{PESOS}$ outperforms also the best state-of-the-art competitor with a ${\sim}$ 20 percent energy saving, while the competitor requires a fine parameter tuning and it may incurs in uncontrollable latency violations.
    Print ISSN: 1041-4347
    Digitale ISSN: 1558-2191
    Thema: Informatik
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 30
    facet.materialart.
    Unbekannt
    Institute of Electrical and Electronics Engineers (IEEE)
    Publikationsdatum: 2017-06-07
    Beschreibung: To harness the rich amount of information available on the web today, many organizations aggregate public (and private) data to derive knowledge repositories for real-world entities. This paper aims to build historical profiles of real-world entities by integrating temporal records collected from different sources. This problem is challenging not only because entities may change their attribute values over time, but also because information provided by the sources could be unreliable. In this paper, we present a new solution for profiling entities over time. To understand the evolution of entities, we describe a novel transition model which gives the probability that an entity will change to a particular attribute value after some time period. Next, a set of quality metrics are defined for the data sources to capture the exactness and timeliness of their provided values. The transition model and the quality metrics are then built into a source-aware temporal matching algorithm that can link temporal records to entities at the right time and augment entity profiles with correct values. Our suite of experiments demonstrate that the proposed approach is able to outperform the state-of-the-art techniques by constructing more complete and accurate profiles for entities.
    Print ISSN: 1041-4347
    Digitale ISSN: 1558-2191
    Thema: Informatik
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 31
    facet.materialart.
    Unbekannt
    Institute of Electrical and Electronics Engineers (IEEE)
    Publikationsdatum: 2017-06-07
    Beschreibung: Query expansion has been widely adopted in Web search as a way of tackling the ambiguity of queries. Personalized search utilizing folksonomy data has demonstrated an extreme vocabulary mismatch problem that requires even more effective query expansion methods. Co-occurrence statistics, tag-tag relationships, and semantic matching approaches are among those favored by previous research. However, user profiles which only contain a user's past annotation information may not be enough to support the selection of expansion terms, especially for users with limited previous activity with the system. We propose a novel model to construct enriched user profiles with the help of an external corpus for personalized query expansion. Our model integrates the current state-of-the-art text representation learning framework, known as word embeddings, with topic models in two groups of pseudo-aligned documents. Based on user profiles, we build two novel query expansion techniques. These two techniques are based on topical weights-enhanced word embeddings, and the topical relevance between the query and the terms inside a user profile, respectively. The results of an in-depth experimental evaluation, performed on two real-world datasets using different external corpora, show that our approach outperforms traditional techniques, including existing non-personalized and personalized query expansion methods.
    Print ISSN: 1041-4347
    Digitale ISSN: 1558-2191
    Thema: Informatik
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 32
    facet.materialart.
    Unbekannt
    Institute of Electrical and Electronics Engineers (IEEE)
    Publikationsdatum: 2017-06-07
    Beschreibung: YouTube, with millions of content creators, has become the preferred destination for viewing videos online. Through the Partner program, YouTube allows content creators to monetize their popular videos. Of significant importance for content creators is which meta-level features (title, tag, thumbnail, and description) are most sensitive for promoting video popularity. The popularity of videos also depends on the social dynamics, i.e., the interaction of the content creators (or channels) with YouTube users. Using real-world data consisting of about 6 million videos spread over 25 thousand channels, we empirically examine the sensitivity of YouTube meta-level features and social dynamics. The key meta-level features that impact the view counts of a video include: first day view count, number of subscribers, contrast of the video thumbnail, Google hits, number of keywords, video category, title length, and number of upper-case letters in the title, respectively, and illustrate that these meta-level features can be used to estimate the popularity of a video. In addition, optimizing the meta-level features after a video is posted increases the popularity of videos. In the context of social dynamics, we discover that there is a causal relationship between views to a channel and the associated number of subscribers. Additionally, insights into the effects of scheduling and video playthrough in a channel are also provided. Our findings provide a useful understanding of user engagement in YouTube.
    Print ISSN: 1041-4347
    Digitale ISSN: 1558-2191
    Thema: Informatik
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 33
    facet.materialart.
    Unbekannt
    Institute of Electrical and Electronics Engineers (IEEE)
    Publikationsdatum: 2017-06-07
    Beschreibung: Text classification is a process of classifying documents into predefined categories through different classifiers learned from labelled or unlabelled training samples. Many researchers who work on binary text classification attempt to find a more effective way to separate relevant texts from a large data set. However, current text classifiers cannot unambiguously describe the decision boundary between positive and negative objects because of uncertainties caused by text feature selection and the knowledge learning process. This paper proposes a three-way decision model for dealing with the uncertain boundary to improve the binary text classification performance based on the rough set techniques and centroid solution. It aims to understand the uncertain boundary through partitioning the training samples into three regions (the positive, boundary and negative regions) by two main boundary vectors C~P and C~N, created from the labeled positive and negative training subsets, respectively, and further resolve the objects in the boundary region by two derived boundary vectors B~P and B~N, produced according to the structure of the boundary region. It involves an indirect strategy which is composed of two successive steps in the whole classification process: ‘two-way to three-way’ and ‘three-way to two-way’. Four decision rules are proposed from the training process and applied to the incoming documents for more precise classification. A large number of experiments have been conducted based on the standard data sets RCV1 and Reuters-21578. The experimental results show that the usage of boundary vectors is very effective and efficient for dealing with uncertainties of the decision boundary, and the proposed model has significantly improved the performance of binary text classification in terms of F1 measure and AUC area compared with six other popular baseline models.
    Print ISSN: 1041-4347
    Digitale ISSN: 1558-2191
    Thema: Informatik
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 34
    facet.materialart.
    Unbekannt
    Institute of Electrical and Electronics Engineers (IEEE)
    Publikationsdatum: 2017-06-07
    Beschreibung: Uncertain graph models are widely used in real-world applications such as knowledge graphs and social networks. To capture the uncertainty, each edge in an uncertain graph is associated with an existential probability that signifies the likelihood of the existence of the edge. One notable issue of querying uncertain graphs is that the results are sometimes uninformative because of the edge uncertainty. In this paper, we consider probabilistic reachability queries, which are one of the fundamental classes of graph queries. To make the results more informative, we adopt a crowdsourcing-based approach to clean the uncertain edges. However, considering the time and monetary cost of crowdsourcing, it is a problem to efficiently select a limited set of edges for cleaning that maximizes the quality improvement. We prove that the edge selection problem is #P-hard. In light of the hardness of the problem, we propose a series of edge selection algorithms, followed by a number of optimization techniques and pruning heuristics for reducing the computation time. Our experimental results demonstrate that our proposed techniques outperform a random selection by up to 27 times in terms of the result quality improvement and the brute-force solution by up to 60 times in terms of the elapsed time.
    Print ISSN: 1041-4347
    Digitale ISSN: 1558-2191
    Thema: Informatik
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 35
    facet.materialart.
    Unbekannt
    Institute of Electrical and Electronics Engineers (IEEE)
    Publikationsdatum: 2017-06-07
    Beschreibung: We propose a collaborative multi-domain sentiment classification approach to train sentiment classifiers for multiple domains simultaneously. In our approach, the sentiment information in different domains is shared to train more accurate and robust sentiment classifiers for each domain when labeled data is scarce. Specifically, we decompose the sentiment classifier of each domain into two components, a global one and a domain-specific one. The global model can capture the general sentiment knowledge and is shared by various domains. The domain-specific model can capture the specific sentiment expressions in each domain. In addition, we extract domain-specific sentiment knowledge from both labeled and unlabeled samples in each domain and use it to enhance the learning of domain-specific sentiment classifiers. Besides, we incorporate the similarities between domains into our approach as regularization over the domain-specific sentiment classifiers to encourage the sharing of sentiment information between similar domains. Two kinds of domain similarity measures are explored, one based on textual content and the other one based on sentiment expressions. Moreover, we introduce two efficient algorithms to solve the model of our approach. Experimental results on benchmark datasets show that our approach can effectively improve the performance of multi-domain sentiment classification and significantly outperform baseline methods.
    Print ISSN: 1041-4347
    Digitale ISSN: 1558-2191
    Thema: Informatik
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 36
    facet.materialart.
    Unbekannt
    Institute of Electrical and Electronics Engineers (IEEE)
    Publikationsdatum: 2017-06-07
    Beschreibung: Transfer learning techniques have been broadly applied in applications where labeled data in a target domain are difficult to obtain while a lot of labeled data are available in related source domains. In practice, there can be multiple source domains that are related to the target domain, and how to combine them is still an open problem. In this paper, we seek to leverage labeled data from multiple source domains to enhance classification performance in a target domain where the target data are received in an online fashion. This problem is known as the online transfer learning problem. To achieve this, we propose novel online transfer learning paradigms in which the source and target domains are leveraged adaptively. We consider two different problem settings: homogeneous transfer learning and heterogeneous transfer learning. The proposed methods work in an online manner, where the weights of the source domains are adjusted dynamically. We provide the mistake bounds of the proposed methods and perform comprehensive experiments on real-world data sets to demonstrate the effectiveness of the proposed algorithms.
    Print ISSN: 1041-4347
    Digitale ISSN: 1558-2191
    Thema: Informatik
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 37
    facet.materialart.
    Unbekannt
    Institute of Electrical and Electronics Engineers (IEEE)
    Publikationsdatum: 2017-06-07
    Beschreibung: We study the problem of preserving user privacy in the publication of location sequences. Consider a database of trajectories, corresponding to movements of people, captured by their transactions when they use credit cards, RFID debit cards, or NFC ( http://en.wikipedia.org/wiki/Near_field_communication ) compliant devices. We show that, if such trajectories are published exactly (by only hiding the identities of persons that followed them), one can use partial trajectory knowledge as a quasi-identifier for the remaining locations in the sequence. We devise four intuitive techniques, based on combinations of locations suppression and trajectories splitting, and we show that they can prevent privacy breaches while keeping published data accurate for aggregate query answering and frequent subsets data mining.
    Print ISSN: 1041-4347
    Digitale ISSN: 1558-2191
    Thema: Informatik
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 38
    facet.materialart.
    Unbekannt
    Institute of Electrical and Electronics Engineers (IEEE)
    Publikationsdatum: 2017-06-07
    Beschreibung: Automatic discovery of newsworthy themes from sequenced data can relieve journalists from manually poring over a large amount of data in order to find interesting news. In this paper, we propose a novel $k$ -Sketch query that aims to find $k$ striking streaks to best summarize a subject. Our scoring function takes into account streak strikingness and streak coverage at the same time. We study the $k$ -Sketch query processing in both offline and online scenarios, and propose various streak-level pruning techniques to find striking candidates. Among those candidates, we then develop approximate methods to discover the $k$ most representative streaks with theoretical bounds. We conduct experiments on four real datasets, and the results demonstrate the efficiency and effectiveness of our proposed algorithms: the running time achieves up to 500 times speedup and the quality of the generated summaries is endorsed by the anonymous users from Amazon Mechanical Turk.
    Print ISSN: 1041-4347
    Digitale ISSN: 1558-2191
    Thema: Informatik
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 39
    facet.materialart.
    Unbekannt
    Institute of Electrical and Electronics Engineers (IEEE)
    Publikationsdatum: 2017-06-07
    Beschreibung: Getting back to previously viewed web pages is a common yet uneasy task for users due to the large volume of personally accessed information on the web. This paper leverages human's natural recall process of using episodic and semantic memory cues to facilitate recall, and presents a personal web revisitation technique called WebPagePrev through context and content keywords. Underlying techniques for context and content memories’ acquisition, storage, decay, and utilization for page re-finding are discussed. A relevance feedback mechanism is also involved to tailor to individual's memory strength and revisitation habits. Our 6-month user study shows that: (1) Compared with the existing web revisitation tool Memento , History List Searching method, and Search Engine method, the proposed WebPagePrev delivers the best re-finding quality in finding rate (92.10 percent), average F1-measure (0.4318), and average rank error (0.3145). (2) Our dynamic management of context and content memories including decay and reinforcement strategy can mimic users’ retrieval and recall mechanism. With relevance feedback, the finding rate of WebPagePrev increases by 9.82 percent, average F1-measure increases by 47.09 percent, and average rank error decreases by 19.44 percent compared to stable memory management strategy. Among time, location, and activity context factors in WebPagePrev , activity is the best recall cue, and context+content based re-finding delivers the best performance, compared to context based re-finding and content based re-finding.
    Print ISSN: 1041-4347
    Digitale ISSN: 1558-2191
    Thema: Informatik
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 40
    facet.materialart.
    Unbekannt
    Institute of Electrical and Electronics Engineers (IEEE)
    Publikationsdatum: 2017-06-07
    Beschreibung: This paper presents a spectral analysis of signed networks from both theoretical and practical aspects. On the theoretical aspect, we conduct theoretical studies based on results from matrix perturbation for analyzing community structures of complex signed networks and show how the negative edges affect distributions and patterns of node spectral coordinates in the spectral space. We prove and demonstrate that node spectral coordinates form orthogonal clusters for two types of signed networks: graphs with dense inter-community mixed sign edges and $k$ -dispute graphs where inner-community connections are absent or very sparse but inter-community connections are dense with negative edges. The cluster orthogonality pattern is different from the line orthogonality pattern (i.e., node spectral coordinates form orthogonal lines) observed in the networks with $k$ -block structure. We show why the line orthogonality pattern does not hold in the spectral space for these two types of networks. On the practical aspect, we have developed a clustering method to study signed networks and $k$ -dispute networks. Empirical evaluations on both synthetic networks (with up to one million nodes) and real networks show our algorithm outperforms existing clustering methods on signed networks in terms of accuracy and efficiency.
    Print ISSN: 1041-4347
    Digitale ISSN: 1558-2191
    Thema: Informatik
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 41
    facet.materialart.
    Unbekannt
    Institute of Electrical and Electronics Engineers (IEEE)
    Publikationsdatum: 2017-07-08
    Beschreibung: Conventional semi-supervised clustering approaches have several shortcomings, such as (1) not fully utilizing all useful must-link and cannot-link constraints, (2) not considering how to deal with high dimensional data with noise, and (3) not fully addressing the need to use an adaptive process to further improve the performance of the algorithm. In this paper, we first propose the transitive closure based constraint propagation approach, which makes use of the transitive closure operator and the affinity propagation to address the first limitation. Then, the random subspace based semi-supervised clustering ensemble framework with a set of proposed confidence factors is designed to address the second limitation and provide more stable, robust, and accurate results. Next, the adaptive semi-supervised clustering ensemble framework is proposed to address the third limitation, which adopts a newly designed adaptive process to search for the optimal subspace set. Finally, we adopt a set of nonparametric tests to compare different semi-supervised clustering ensemble approaches over multiple datasets. The experimental results on 20 real high dimensional cancer datasets with noisy genes and 10 datasets from UCI datasets and KEEL datasets show that (1) The proposed approaches work well on most of the real-world datasets. (2) It outperforms other state-of-the-art approaches on 12 out of 20 cancer datasets, and 8 out of 10 UCI machine learning datasets.
    Print ISSN: 1041-4347
    Digitale ISSN: 1558-2191
    Thema: Informatik
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 42
    facet.materialart.
    Unbekannt
    Institute of Electrical and Electronics Engineers (IEEE)
    Publikationsdatum: 2017-07-08
    Beschreibung: An ongoing challenge in the rapidly evolving app market ecosystem is to maintain the integrity of app categories. At the time of registration, app developers have to select, what they believe, is the most appropriate category for their apps. Besides the inherent ambiguity of selecting the right category, the approach leaves open the possibility of misuse and potential gaming by the registrant. Periodically, the app store will refine the list of categories available and potentially reassign the apps. However, it has been observed that the mismatch between the description of the app and the category it belongs to, continues to persist. Although some common mechanisms (e.g., a complaint-driven or manual checking) exist, they limit the response time to detect miscategorized apps and still open the challenge on categorization. We introduce FRAC+ : (FR)amework for (A)pp (C)ategorization. FRAC+ has the following salient features: (i) it is based on a data-driven topic model and automatically suggests the categories appropriate for the app store, and (ii) it can detect miscategorizated apps. Extensive experiments attest to the performance of FRAC+ . Experiments on Google Play shows that FRAC+ ’s topics are more aligned with Google ’s new categories and 0.35-1.10 percent game apps are detected to be miscategorized.
    Print ISSN: 1041-4347
    Digitale ISSN: 1558-2191
    Thema: Informatik
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 43
    facet.materialart.
    Unbekannt
    Institute of Electrical and Electronics Engineers (IEEE)
    Publikationsdatum: 2017-07-08
    Beschreibung: This paper investigates an important problem in stream mining, i.e., classification under streaming emerging new classes or SENC . The SENC problem can be decomposed into three subproblems: detecting emerging new classes, classifying known classes, and updating models to integrate each new class as part of known classes. The common approach is to treat it as a classification problem and solve it using either a supervised learner or a semi-supervised learner. We propose an alternative approach by using unsupervised learning as the basis to solve this problem. The proposed method employs completely-random trees which have been shown to work well in unsupervised learning and supervised learning independently in the literature. The completely-random trees are used as a single common core to solve all three subproblems: unsupervised learning, supervised learning, and model update on data streams. We show that the proposed unsupervised-learning-focused method often achieves significantly better outcomes than existing classification-focused methods.
    Print ISSN: 1041-4347
    Digitale ISSN: 1558-2191
    Thema: Informatik
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 44
    facet.materialart.
    Unbekannt
    Institute of Electrical and Electronics Engineers (IEEE)
    Publikationsdatum: 2017-07-08
    Beschreibung: Differential privacy is an essential and prevalent privacy model that has been widely explored in recent decades. This survey provides a comprehensive and structured overview of two research directions: differentially private data publishing and differentially private data analysis. We compare the diverse release mechanisms of differentially private data publishing given a variety of input data in terms of query type, the maximum number of queries, efficiency, and accuracy. We identify two basic frameworks for differentially private data analysis and list the typical algorithms used within each framework. The results are compared and discussed based on output accuracy and efficiency. Further, we propose several possible directions for future research and possible applications.
    Print ISSN: 1041-4347
    Digitale ISSN: 1558-2191
    Thema: Informatik
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 45
    facet.materialart.
    Unbekannt
    Institute of Electrical and Electronics Engineers (IEEE)
    Publikationsdatum: 2017-07-08
    Beschreibung: With the popularity of social media (e.g., Facebook and Flicker), users can easily share their check-in records and photos during their trips. In view of the huge number of user historical mobility records in social media, we aim to discover travel experiences to facilitate trip planning. When planning a trip, users always have specific preferences regarding their trips. Instead of restricting users to limited query options such as locations, activities, or time periods, we consider arbitrary text descriptions as keywords about personalized requirements. Moreover, a diverse and representative set of recommended travel routes is needed. Prior works have elaborated on mining and ranking existing routes from check-in data. To meet the need for automatic trip organization, we claim that more features of Places of Interest (POIs) should be extracted. Therefore, in this paper, we propose an efficient Keyword-aware Representative Travel Route framework that uses knowledge extraction from users’ historical mobility records and social interactions. Explicitly, we have designed a keyword extraction module to classify the POI-related tags, for effective matching with query keywords. We have further designed a route reconstruction algorithm to construct route candidates that fulfill the requirements. To provide befitting query results, we explore Representative Skyline concepts, that is, the Skyline routes which best describe the trade-offs among different POI features. To evaluate the effectiveness and efficiency of the proposed algorithms, we have conducted extensive experiments on real location-based social network datasets, and the experiment results show that our methods do indeed demonstrate good performance compared to state-of-the-art works.
    Print ISSN: 1041-4347
    Digitale ISSN: 1558-2191
    Thema: Informatik
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 46
    facet.materialart.
    Unbekannt
    Institute of Electrical and Electronics Engineers (IEEE)
    Publikationsdatum: 2017-07-08
    Beschreibung: Archiving graph data over history is demanded in many applications, such as social network studies, collaborative projects, scientific graph databases, and bibliographies. Typically people are interested in querying temporal graphs. Existing keyword search approaches for graph-structured data are insufficient for querying temporal graphs. This paper initiates the study of supporting keyword-based queries on temporal graphs. We propose a search syntax that is a moderate extension of keyword search, which allows casual users to easily search temporal graphs with optional predicates and ranking functions related to timestamps. To generate results efficiently, we first propose a best path iterator, which finds the paths between two data nodes in each snapshot that is the “best” with respect to three ranking factors. It prunes invalid or inferior paths and maximizes shared processing among different snapshots. Then, we develop algorithms that efficiently generate top- $k$ query results. Extensive experiments verified the efficiency and effectiveness of our approach.
    Print ISSN: 1041-4347
    Digitale ISSN: 1558-2191
    Thema: Informatik
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 47
    facet.materialart.
    Unbekannt
    Institute of Electrical and Electronics Engineers (IEEE)
    Publikationsdatum: 2017-07-08
    Beschreibung: Influence maximization is a recent but well-studied problem which helps identify a small set of users that are most likely to “influence” the maximum number of users in a social network. The problem has attracted a lot of attention as it provides a way to improve marketing, branding, and product adoption. However, existing studies rarely consider the physical locations of the users, but location is an important factor in targeted marketing. In this paper, we propose and investigate the problem of influence maximization in location-aware social networks, or, more generally, Geo-social Influence Spanning Maximization . Given a query $q$ composed of a region $R$ , a regional acceptance rate $\rho$ , and an integer $k$ as a seed selection budget, our aim is to find the maximum geographic spanning regions (MGSR). We refer to this as the MGSR problem. Our approach differs from previous work as we focus more on identifying the maximum spanning geographical regions within a region $R$ , rather than just the number of activated users in the given network like the traditional influence maximization problem  [14] . Our rese- rch approach can be effectively used for online marketing campaigns that depend on the physical location of social users. To address the MGSR problem, we first prove NP-Hardness. Next, we present a greedy algorithm with a $1-1/e$ approximation ratio to solve the problem, and further improve the efficiency by developing an upper bounded pruning approach. Then, we propose the OIR*-Tree index, which is a hybrid index combining ordered influential node lists with an R*-tree. We show that our index based approach is significantly more efficient than the greedy algorithm and the upper bounded pruning algorithm, especially when $k$ is large. Finally, we evaluate the performance for all of the proposed approaches using three real datasets.
    Print ISSN: 1041-4347
    Digitale ISSN: 1558-2191
    Thema: Informatik
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 48
    facet.materialart.
    Unbekannt
    Institute of Electrical and Electronics Engineers (IEEE)
    Publikationsdatum: 2017-07-08
    Beschreibung: Automatic social circle detection in ego-networks is a fundamentally important task for social network analysis. So far, most studies focused on how to detect overlapping circles or how to detect based on both network structure and node profiles. This paper asks an orthogonal research question: how to detect circles by leveraging multiple views of the network structure? As a first step, we crawl ego networks from Twitter and model them by six views, including user relationships, user interactions, and user content. We then apply both standard and our modified multi-view spectral clustering techniques to detect circles on these ego-networks. By extensive automatic and manual evaluations, we deliver two major findings: first, multi-view clustering techniques detect better circles than single-view clustering methods; second, our modified clustering technique which presumes sparse networks are incomplete detects better circles than the standard clustering technique which ignores such potential incompleteness. In particular, the second finding makes us conjecture a direct application of standard clustering on potentially incomplete networks may yield biased results. We lightly investigate this issue by deriving a bias upper bound that integrates theories of spectral clustering and matrix perturbation, and discussing how the bound may be affected by several network characteristics.
    Print ISSN: 1041-4347
    Digitale ISSN: 1558-2191
    Thema: Informatik
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 49
    facet.materialart.
    Unbekannt
    Institute of Electrical and Electronics Engineers (IEEE)
    Publikationsdatum: 2017-07-08
    Beschreibung: A large amount of heterogeneous event data are increasingly generated, e.g., in online systems for Web services or operational systems in enterprises. Owing to the difference between event data and traditional relational data, the matching of heterogeneous events is highly non-trivial. While event names are often opaque (e.g., merely with obscure IDs), the existing structure-based matching techniques for relational data also fail to perform owing to the poor discriminative power of dependency relationships between events. We note that interesting patterns exist in the occurrence of events, which may serve as discriminative features in event matching. In this paper, we formalize the problem of matching events with patterns. A generic pattern based matching framework is proposed, which is compatible with the existing structure based techniques. To improve the matching efficiency, we devise several bounds of matching scores for pruning. Recognizing the np -hardness of the optimal event matching problem with patterns, we propose efficient heuristic. Finally, extensive experiments demonstrate the effectiveness of our pattern based matching compared with approaches adapted from existing techniques, and the efficiency improved by the bounding, pruning and heuristic methods.
    Print ISSN: 1041-4347
    Digitale ISSN: 1558-2191
    Thema: Informatik
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 50
    facet.materialart.
    Unbekannt
    Institute of Electrical and Electronics Engineers (IEEE)
    Publikationsdatum: 2017-08-09
    Beschreibung: Community Question Answering (CQA) has increasingly become an important service for people asking questions and providing answers online, which enables people to help each other by sharing knowledge. Recently, with accumulation of users and contents, much concern has arisen over the efficiency and answer quality of CQA services. To address this problem, question routing has been proposed which aims at routing new questions to suitable answerers, who have both high possibility and high ability to answer the questions. In this paper, we formulate question routing as a multi-objective ranking problem, and present a multi-objective learning-to-rank approach for question routing (MLQR), which can simultaneously optimize the answering possibility and answer quality of routed users. In MLQR, realizing that questions are relatively short and usually attached with tags, we first propose a tagword topic model (TTM) to derive topical representations of questions. Based on TTM, we then develop features for each question-user pair, which are captured at both platform level and thread level. In particular, the platform-level features summarize the information of a user from his/her history posts in the CQA platform, while the thread-level features model the pairwise competitions of a user with others in his/her answered threads. Finally, we extend a state-of-the-art learning-to-rank algorithm for training a multi-objective ranking model. Extensive experimental results on real-world datasets show that our MLQR can outperform state-of-the-art methods in terms of both answering possibility and answer quality.
    Print ISSN: 1041-4347
    Digitale ISSN: 1558-2191
    Thema: Informatik
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 51
    facet.materialart.
    Unbekannt
    Institute of Electrical and Electronics Engineers (IEEE)
    Publikationsdatum: 2017-08-09
    Beschreibung: There have been many attempts to classify imbalanced data, since this classification is critical in a wide variety of applications related to the detection of anomalies, failures, and risks. Many conventional methods, which can be categorized into sampling, cost-sensitive, or ensemble, include heuristic and task dependent processes. In order to achieve a better classification performance by formulation without heuristics and task dependence, we propose confusion-matrix-based kernel logistic regression (CM-KLOGR). Its objective function is the harmonic mean of various evaluation criteria derived from a confusion matrix, such criteria as sensitivity, positive predictive value, and others for negatives. This objective function and its optimization are consistently formulated on the framework of KLOGR, based on minimum classification error and generalized probabilistic descent (MCE/GPD) learning. Due to the merits of the harmonic mean, KLOGR, and MCE/GPD, CM-KLOGR improves the multifaceted performances in a well-balanced way. This paper presents the formulation of CM-KLOGR and its effectiveness through experiments that comparatively evaluated CM-KLOGR using benchmark imbalanced datasets.
    Print ISSN: 1041-4347
    Digitale ISSN: 1558-2191
    Thema: Informatik
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 52
    facet.materialart.
    Unbekannt
    Institute of Electrical and Electronics Engineers (IEEE)
    Publikationsdatum: 2017-08-09
    Beschreibung: Psychological stress is threatening people’s health. It is non-trivial to detect stress timely for proactive care. With the popularity of social media, people are used to sharing their daily activities and interacting with friends on social media platforms, making it feasible to leverage online social network data for stress detection. In this paper, we find that users stress state is closely related to that of his/her friends in social media, and we employ a large-scale dataset from real-world social platforms to systematically study the correlation of users’ stress states and social interactions. We first define a set of stress-related textual, visual, and social attributes from various aspects, and then propose a novel hybrid model - a factor graph model combined with Convolutional Neural Network to leverage tweet content and social interaction information for stress detection. Experimental results show that the proposed model can improve the detection performance by 6-9 percent in F1-score. By further analyzing the social interaction data, we also discover several intriguing phenomena, i.e., the number of social structures of sparse connections (i.e., with no delta connections) of stressed users is around 14 percent higher than that of non-stressed users, indicating that the social structure of stressed users’ friends tend to be less connected and less complicated than that of non-stressed users.
    Print ISSN: 1041-4347
    Digitale ISSN: 1558-2191
    Thema: Informatik
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 53
    facet.materialart.
    Unbekannt
    Institute of Electrical and Electronics Engineers (IEEE)
    Publikationsdatum: 2017-08-09
    Beschreibung: With the advances in geo-positioning technologies and location-based services, it is nowadays quite common for road networks to have textual contents on the vertices. Previous work on identifying an optimal route that covers a sequence of query keywords has been studied in recent years. However, in many practical scenarios, an optimal route might not always be desirable. For example, a personalized route query is issued by providing some clues that describe the spatial context between PoIs along the route, where the result can be far from the optimal one. Therefore, in this paper, we investigate the problem of clue-based route search ( ${\sf {CRS}}$ ), which allows a user to provide clues on keywords and spatial relationships. First, we propose a greedy algorithm and a dynamic programming algorithm as baselines. To improve efficiency, we develop a branch-and-bound algorithm that prunes unnecessary vertices in query processing. In order to quickly locate candidate, we propose an AB-tree that stores both the distance and keyword information in tree structure. To further reduce the index size, we construct a PB-tree by utilizing the virtue of 2-hop label index to pinpoint the candidate. Extensive experiments are conducted and verify the superiority of our algorithms and index structures.
    Print ISSN: 1041-4347
    Digitale ISSN: 1558-2191
    Thema: Informatik
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 54
    facet.materialart.
    Unbekannt
    Institute of Electrical and Electronics Engineers (IEEE)
    Publikationsdatum: 2017-08-09
    Beschreibung: We study the problem of finding related forum posts to a post at hand. In contrast to traditional approaches for finding related documents that perform content comparisons across the content of the posts as a whole, we consider each post as a set of segments, each written with a different goal in mind. We advocate that the relatedness between two posts should be based on the similarity of their respective segments that are intended for the same goal, i.e., are conveying the same intention. This means that it is possible for the same terms to weigh differently in the relatedness score depending on the intention of the segment in which they are found. We have developed a segmentation method that by monitoring a number of text features can identify the parts of a post where significant jumps occur indicating a point where a segmentation should take place. The generated segments of all the posts are clustered to form intention clusters and then similarities across the posts are calculated through similarities across segments with the same intention. We experimentally illustrate the effectiveness and efficiency of our segmentation method and our overall approach of finding related forum posts.
    Print ISSN: 1041-4347
    Digitale ISSN: 1558-2191
    Thema: Informatik
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 55
    facet.materialart.
    Unbekannt
    Institute of Electrical and Electronics Engineers (IEEE)
    Publikationsdatum: 2017-08-09
    Beschreibung: Location information of Web pages plays an important role in location-sensitive tasks such as Web search ranking for location-sensitive queries. However, such information is usually ambiguous, incomplete, or even missing, which raises the problem of location prediction for Web pages. Meanwhile, Web pages are massive and often noisy, which pose challenges to the majority of existing algorithms for location prediction. In this paper, we propose a novel and scalable location prediction framework for Web pages based on the query-URL click graph. In particular, we introduce a concept of term location vectors to capture location distributions for all terms and develop an automatic approach to learn the importance of each term location vector for location prediction. Empirical results on a large URL set demonstrate that the proposed framework significantly improves the location prediction accuracy comparing with various representative baselines. We further provide a principled way to incorporate the proposed framework into the search ranking task and experimental results on a commercial search engine show that the proposed method remarkably boosts the ranking performance for location-sensitive queries.
    Print ISSN: 1041-4347
    Digitale ISSN: 1558-2191
    Thema: Informatik
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 56
    facet.materialart.
    Unbekannt
    Institute of Electrical and Electronics Engineers (IEEE)
    Publikationsdatum: 2017-08-09
    Beschreibung: We study content-based learning to rank from the perspective of learning distance functions. Standardly, the two key issues of learning to rank, feature mappings and score functions, are usually modeled separately, and the learning is usually restricted to modeling a linear distance function such as the Mahalanobis distance. However, the modeling of feature mappings and score functions are mutually interacted, and the patterns underlying the data are probably complicated and nonlinear. Thus, as a general nonlinear distance family, the Bregman distance is a suitable distance function for learning to rank, due to its strong generalization ability for distance functions, and its nonlinearity for exploring the general patterns of data distributions. In this paper, we study learning to rank as a structural learning problem, and devise a Bregman distance function to build the ranking model based on structural SVM. To improve the model robustness to outliers, we develop a robust structural learning framework for the ranking model. The proposed model Robust Structural Bregman distance functions Learning to Rank ( RSBLR ) is a general and unified framework for learning distance functions to rank. The experiments of data ranking on real-world datasets show the superiority of this method to the state-of-the-art literature, as well as its robustness to the noisily labeled outliers.
    Print ISSN: 1041-4347
    Digitale ISSN: 1558-2191
    Thema: Informatik
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 57
    facet.materialart.
    Unbekannt
    Institute of Electrical and Electronics Engineers (IEEE)
    Publikationsdatum: 2017-08-09
    Beschreibung: String similarity join, as an essential operation in applications including data integration and data cleaning, has attracted significant attention in the research community. Previous studies focus on global similarity join. In this paper, we study local similarity join with edit distance constraints, which finds string pairs from two string collections that have similar substrings. We study two kinds of local similarity join problems: checking local similar pairs and locating local similar pairs. We first consider the case where if two strings are locally similar to each other, they must share a common gram of a certain length. We show how to do efficient local similarity verification based on a matching gram pair. We propose two pruning techniques and an incremental method to further improve the efficiency of finding matching gram pairs. Then, we devise a method to locate the longest similar substring pair for two local similar strings. We conducted a comprehensive experimental study to evaluate the efficiency of these techniques.
    Print ISSN: 1041-4347
    Digitale ISSN: 1558-2191
    Thema: Informatik
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 58
    facet.materialart.
    Unbekannt
    Institute of Electrical and Electronics Engineers (IEEE)
    Publikationsdatum: 2017-08-09
    Beschreibung: With the rapid development of big data analytics, mobile computing, Internet of Things, cloud computing, and social networking, cyberspace has expanded to a cross-fused and ubiquitous space made up of human beings, things, and information. Internet applications have evolved from Web 1.0 to Web 2.0 and Web 3.0, and web information has seen an explosive growth, which is strongly promoting the advent of a global era of big data. In this ubiquitous cyberspace, traditional search engines can no longer fully satisfy the evolving needs of various types of users. Therefore, search engines must make completely innovative, revolutionary changes for the next generation of search, which is referred to as “big search”. This paper first studies the development needs of big search. Then, big search is defined, and the 5S properties (Sourcing, Sensing, Synthesizing, Solution, and Security) of big search, which are different from those of traditional search engines, are elaborated. Also, the paper provides a system architecture for big search, explores the key technologies that support the 5S properties, and describes potential application fields of big search technology. Finally, the research opportunities of big search are discussed.
    Print ISSN: 1041-4347
    Digitale ISSN: 1558-2191
    Thema: Informatik
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 59
    facet.materialart.
    Unbekannt
    Institute of Electrical and Electronics Engineers (IEEE)
    Publikationsdatum: 2017-08-09
    Beschreibung: With the rapid development of mobile networks and the widespread usage of mobile devices, spatial crowdsourcing, which refers to assigning location-based tasks to moving workers, has drawn increasing attention. One of the important issues in spatial crowdsourcing is task assignment, which allocates tasks to appropriate workers. However, existing works generally assume that no rejection would happen after the task assignment is completed by the server. Ignorance of such an operation can lead to low system throughput. Thus, in this paper, we take workers’ rejection into consideration and try to maximize workers’ acceptance in order to improve the system throughput. Specifically, we first formally define the problem of maximizing workers’ acceptance in rejection-aware spatial crowdsourcing. Unfortunately, the problem is NP-hard. We propose two exact solutions to obtain the optimal assignment, but they are not efficient enough and not scalable for large inputs. Then, we present four approximation approaches for improving the efficiency. Finally, we show the effectiveness of the proposed pruning strategy for the exact solutions and the superiority of the proposed Greedy algorithm over other approximation methods through extensive experiments.
    Print ISSN: 1041-4347
    Digitale ISSN: 1558-2191
    Thema: Informatik
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 60
    facet.materialart.
    Unbekannt
    Institute of Electrical and Electronics Engineers (IEEE)
    Publikationsdatum: 2017-08-09
    Beschreibung: In any competitive business, success is based on the ability to make an item more appealing to customers than the competition. A number of questions arise in the context of this task: how do we formalize and quantify the competitiveness between two items? Who are the main competitors of a given item? What are the features of an item that most affect its competitiveness? Despite the impact and relevance of this problem to many domains, only a limited amount of work has been devoted toward an effective solution. In this paper, we present a formal definition of the competitiveness between two items, based on the market segments that they can both cover. Our evaluation of competitiveness utilizes customer reviews, an abundant source of information that is available in a wide range of domains. We present efficient methods for evaluating competitiveness in large review datasets and address the natural problem of finding the top-k competitors of a given item. Finally, we evaluate the quality of our results and the scalability of our approach using multiple datasets from different domains.
    Print ISSN: 1041-4347
    Digitale ISSN: 1558-2191
    Thema: Informatik
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 61
    facet.materialart.
    Unbekannt
    Institute of Electrical and Electronics Engineers (IEEE)
    Publikationsdatum: 2017-08-09
    Beschreibung: Twitter, together with other online social networks, such as Facebook, and Gowalla have begun to collect hundreds of millions of check-ins. Check-in data captures the spatial and temporal information of user movements and interests. To model and analyze the spatio-temporal aspect of check-in data and discover temporal topics and regions, we first propose a spatio-temporal topic model, i.e., Upstream Spatio-Temporal Topic Model (USTTM). USTTM can discover temporal topics and regions, i.e., a user’s choice of region and topic is affected by time in this model. We use continuous time to model check-in data, rather than discretized time, avoiding the loss of information through discretization. In addition, USTTM captures the property that user’s interests and activity space will change over time, and users have different region and topic distributions at different times in USTTM. However, both USTTM and other related models capture “microscopic patterns” within a single city, where users share POIs, and cannot discover “macroscopic” patterns in a global area, where users check-in to different POIs. Therefore, we also propose a macroscopic spatio-temporal topic model, MSTTM, employing words of tweets that are shared between cities to learn the topics of user interests. We perform an experimental evaluation on Twitter and Gowalla data sets from New York City and on a Twitter US data set. In our qualitative analysis, we perform experiments with USTTM to discover temporal topics, e.g., how topic “tourist destinations” changes over time, and to demonstrate that MSTTM indeed discovers macroscopic, generic topics. In our quantitative analysis, we evaluate the effectiveness of USTTM in terms of perplexity, accuracy of POI recommendation, and accuracy of user and time prediction. Our results show that the proposed USTTM achieves- better performance than the state-of-the-art models, confirming that it is more natural to model time as an upstream variable affecting the other variables. Finally, the performance of the macroscopic model MSTTM is evaluated on a Twitter US dataset, demonstrating a substantial improvement of POI recommendation accuracy compared to the microscopic models.
    Print ISSN: 1041-4347
    Digitale ISSN: 1558-2191
    Thema: Informatik
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 62
    facet.materialart.
    Unbekannt
    Institute of Electrical and Electronics Engineers (IEEE)
    Publikationsdatum: 2017-08-09
    Beschreibung: Spectral clustering has been playing a vital role in various research areas. Most traditional spectral clustering algorithms comprise two independent stages (e.g., first learning continuous labels and then rounding the learned labels into discrete ones), which may cause unpredictable deviation of resultant cluster labels from genuine ones, thereby leading to severe information loss and performance degradation. In this work, we study how to achieve discrete clustering as well as reliably generalize to unseen data. We propose a novel spectral clustering scheme which deeply explores cluster label properties, including discreteness, nonnegativity, and discrimination, as well as learns robust out-of-sample prediction functions. Specifically, we explicitly enforce a discrete transformation on the intermediate continuous labels, which leads to a tractable optimization problem with a discrete solution. Besides, we preserve the natural nonnegative characteristic of the clustering labels to enhance the interpretability of the results. Moreover, to further compensate the unreliability of the learned clustering labels, we integrate an adaptive robust module with $\ell _{2,p}$ loss to learn prediction function for grouping unseen data. We also show that the out-of-sample component can inject discriminative knowledge into the learning of cluster labels under certain conditions. Extensive experiments conducted on various data sets have demonstrated the superiority of our proposal as compared to several existing clustering approaches.
    Print ISSN: 1041-4347
    Digitale ISSN: 1558-2191
    Thema: Informatik
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 63
    facet.materialart.
    Unbekannt
    Institute of Electrical and Electronics Engineers (IEEE)
    Publikationsdatum: 2017-08-09
    Beschreibung: Location prediction is widely used to forecast users’ next place to visit based on his/her mobility logs. It is an essential problem in location data processing, invaluable for surveillance, business, and personal applications. It is very challenging due to the sparsity issues of check-in data. An often ignored problem in recent studies is the variety across different check-in scenarios, which is becoming more urgent due to the increasing availability of more location check-in applications. In this paper, we propose a new feature fusion based prediction approach, GALLOP, i.e., GlobAL feature fused LOcation Prediction for different check-in scenarios. Based on the carefully designed feature extraction methods, we utilize a novel combined prediction framework. Specifically, we set out to utilize the density estimation model to profile geographical features, i.e., context information, the factorization method to extract collaborative information, and a graph structure to extract location transition patterns of users’ temporal check-in sequence, i.e., content information. An empirical study on three different check-in datasets demonstrates impressive robustness and improvement of the proposed approach.
    Print ISSN: 1041-4347
    Digitale ISSN: 1558-2191
    Thema: Informatik
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 64
    facet.materialart.
    Unbekannt
    Institute of Electrical and Electronics Engineers (IEEE)
    Publikationsdatum: 2017-08-09
    Beschreibung: Signed networks with positive and negative links attract considerable interest in their studying since they contain more information than unsigned networks. Community detection and sign (or attitude) prediction are still primary challenges, as the fundamental problems of signed network analysis. For this, a generative Bayesian approach is presented wherein 1) a signed stochastic blockmodel is proposed to characterize the community structure in the context of signed networks, by explicit formulating the distributions of the density and frustration of signed links from a stochastic perspective, and 2) a model learning algorithm is advanced by theoretical deriving a variational Bayes EM for the parameter estimation and variation-based approximate evidence for the model selection. The comparison of the above approach with the state-of-the-art methods on synthetic and real-world networks, shows its advantage in the community detection and sign prediction for the exploratory networks.
    Print ISSN: 1041-4347
    Digitale ISSN: 1558-2191
    Thema: Informatik
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 65
    facet.materialart.
    Unbekannt
    Institute of Electrical and Electronics Engineers (IEEE)
    Publikationsdatum: 2017-08-09
    Beschreibung: Online display advertising has becomes a billion-dollar industry, and it keeps growing. Advertisers attempt to send marketing messages to attract potential customers via graphic banner ads on publishers’ webpages. Advertisers are charged for each view of a page that delivers their display ads. However, recent studies have discovered that more than half of the ads are never shown on users’ screens due to insufficient scrolling. Thus, advertisers waste a great amount of money on these ads that do not bring any return on investment. Given this situation, the Interactive Advertising Bureau calls for a shift toward charging by viewable impression, i.e., charge for ads that are viewed by users. With this new pricing model, it is helpful to predict the viewability of an ad. This paper proposes two probabilistic latent class models (PLC) that predict the viewability of any given scroll depth for a user-page pair. Using a real-life dataset from a large publisher, the experiments demonstrate that our models outperform comparison systems.
    Print ISSN: 1041-4347
    Digitale ISSN: 1558-2191
    Thema: Informatik
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 66
    facet.materialart.
    Unbekannt
    Institute of Electrical and Electronics Engineers (IEEE)
    Publikationsdatum: 2017-08-09
    Beschreibung: This paper focuses on a new type of taxonomy called supervised taxonomy (ST). Supervised taxonomies are generated considering background information concerning class labels in addition to distance metrics, and are capable of capturing class-uniform regions in a dataset. A hierarchical, agglomerative clustering algorithm, called STAXAC that generates STs is proposed and its properties are analyzed. Experimental results are presented that show that STAXAC produces purer taxonomies than the neighbor-joining (NJ) algorithm—a very popular taxonomy generation algorithm. We introduced novel measures and algorithms that assess classification complexity, class modality, and show that STs can be used as the main input of an effective data-editing tool to enhance the accuracy of k-nearest neighbor classifiers. We demonstrated in our experimental evaluation that assessing the classification complexity of a ST provides us with a good estimate of the difficulty of the classification problem at hand. Moreover, a class modality discovery tool (CMD) has been provided that—based on a domain expert's notion of what constitutes a “note-worthy” subclass—determines if specific classes in the dataset are zero-modal, unimodal, and multi-modal.
    Print ISSN: 1041-4347
    Digitale ISSN: 1558-2191
    Thema: Informatik
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 67
    facet.materialart.
    Unbekannt
    Institute of Electrical and Electronics Engineers (IEEE)
    Publikationsdatum: 2017-08-09
    Beschreibung: Modeling the process of information diffusion is a challenging problem. Although numerous attempts have been made in order to solve this problem, very few studies are actually able to simulate and predict temporal dynamics of the diffusion process. In this paper, we propose a novel information diffusion model, namely GT model, which treats the nodes of a network as intelligent and rational agents and then calculates their corresponding payoffs, given different choices to make strategic decisions. By introducing time-related payoffs based on the diffusion data, the proposed GT model can be used to predict whether or not the user's behaviors will occur in a specific time interval. The user’s payoff can be divided into two parts: social payoff from the user’s social contacts and preference payoff from the user’s idiosyncratic preference. We here exploit the global influence of the user and the social influence between any two users to accurately calculate the social payoff. In addition, we develop a new method of presenting social influence that can fully capture the temporal dynamics of social influence. Experimental results from two different datasets, Sina Weibo and Flickr demonstrate the rationality and effectiveness of the proposed prediction method with different evaluation metrics.
    Print ISSN: 1041-4347
    Digitale ISSN: 1558-2191
    Thema: Informatik
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 68
    facet.materialart.
    Unbekannt
    Institute of Electrical and Electronics Engineers (IEEE)
    Publikationsdatum: 2017-08-09
    Beschreibung: With the advent of multi-view data, multi-view learning has become an important research direction in both machine learning and data mining. Considering the difficulty of obtaining labeled data in many real applications, we focus on the multi-view unsupervised feature selection problem. Traditional approaches all characterize the similarity by fixed and pre-defined graph Laplacian in each view separately and ignore the underlying common structures across different views. In this paper, we propose an algorithm named Multi-view Unsupervised Feature Selection with Adaptive Similarity and View Weight (ASVW) to overcome the above mentioned problems. Specifically, by leveraging the learning mechanism to characterize the common structures adaptively, we formulate the objective function by a common graph Laplacian across different views, together with the sparse $\ell _{2,p}$ -norm constraint designed for feature selection. We develop an efficient algorithm to address the non-smooth minimization problem and prove that the algorithm will converge. To validate the effectiveness of ASVW, comparisons are made with some benchmark methods on real-world datasets. We also evaluate our method in the real sports action recognition task. The experimental results demonstrate the effectiveness of our proposed algorithm.
    Print ISSN: 1041-4347
    Digitale ISSN: 1558-2191
    Thema: Informatik
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 69
    facet.materialart.
    Unbekannt
    Institute of Electrical and Electronics Engineers (IEEE)
    Publikationsdatum: 2017-08-09
    Beschreibung: Representing and managing temporal knowledge, in the form of temporal constraints, is a crucial task in many areas, including knowledge representation, planning, and scheduling. The current literature in the area is moving from the treatment of “crisp” temporal constraints to fuzzy or probabilistic constraints, to account for preferences and\or uncertainty. Given a set of temporal constraints, the evaluation of the tightest implied constraints is a fundamental task, which is essential also to provide reliable query-answering facilities . However, while such tasks have been widely addressed for “crisp” temporal constraints, they have not attracted enough attention in the “non-crisp” context yet. We overcome such a limitation, by (i) extending quantitative temporal constraints to cope with preferences , (ii) defining a temporal reasoning algorithm which evaluates the tightest temporal constraints, and (iii) providing suitable query-answering facilities based on it.
    Print ISSN: 1041-4347
    Digitale ISSN: 1558-2191
    Thema: Informatik
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 70
    facet.materialart.
    Unbekannt
    Institute of Electrical and Electronics Engineers (IEEE)
    Publikationsdatum: 2017-10-11
    Beschreibung: In real-world applications, objects of multiple types are interconnected, forming Heterogeneous Information Networks . In such heterogeneous information networks, we make the key observation that many interactions happen due to some event and the objects in each event form a complete semantic unit. By taking advantage of such a property, we propose a generic framework called H yper E dge- B ased E mbedding ( Hebe ) to learn object embeddings with events in heterogeneous information networks, where a hyperedge encompasses the objects participating in one event. The Hebe framework models the proximity among objects in each event with two methods: (1) predicting a target object given other participating objects in the event, and (2) predicting if the event can be observed given all the participating objects. Since each hyperedge encapsulates more information of a given event, Hebe is robust to data sparseness and noise. In addition, Hebe is scalable when the data size spirals. Extensive experiments on large-scale real-world datasets show the efficacy and robustness of the proposed framework.
    Print ISSN: 1041-4347
    Digitale ISSN: 1558-2191
    Thema: Informatik
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 71
    facet.materialart.
    Unbekannt
    Institute of Electrical and Electronics Engineers (IEEE)
    Publikationsdatum: 2017-10-07
    Beschreibung: A standard procedure for evaluating the performance of classification algorithms is k -fold cross validation. Since the training sets for any pair of iterations in k -fold cross validation are overlapping when the number of folds is larger than two, the resulting accuracy estimates are considered to be dependent. In this paper, the overlapping of training sets is shown to be irrelevant in determining whether two fold accuracies are dependent or not. Then a statistical method is proposed to test the appropriateness of assuming independence for the accuracy estimates in k -fold cross validation. This method is applied on 20 data sets, and the experimental results suggest that it is generally appropriate to assume that the fold accuracies are independent. The cross validation of non-overlapping training sets can make fold accuracies to be dependent. However, this dependence almost has no impact on estimating the sample variance of fold accuracies, and hence they can generally be assumed to be independent.
    Print ISSN: 1041-4347
    Digitale ISSN: 1558-2191
    Thema: Informatik
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 72
    facet.materialart.
    Unbekannt
    Institute of Electrical and Electronics Engineers (IEEE)
    Publikationsdatum: 2017-10-07
    Beschreibung: In a social network, even about the same information the excitement between different users are different. If we want to spread a piece of new information and maximize the expected total amount of excitement, which seed users should we choose? This problem indeed is substantially different from the renowned influence maximization problem and cannot be tackled using the existing approaches. In this paper, motivated by the demand in a few interesting applications, we model the novel problem of activity maximization, and tackle the problem systematically. We first analyze the complexity and the approximability of the problem. We develop an upper bound and a lower bound that are submodular so that the Sandwich framework can be applied. We then devise a polling-based randomized algorithm that guarantees a data dependent approximation factor. Our experiments on four real data sets clearly verify the effectiveness and scalability of our method, as well as the advantage of our method against the other heuristic methods.
    Print ISSN: 1041-4347
    Digitale ISSN: 1558-2191
    Thema: Informatik
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 73
    facet.materialart.
    Unbekannt
    Institute of Electrical and Electronics Engineers (IEEE)
    Publikationsdatum: 2017-10-07
    Beschreibung: With advances in geo-positioning technologies and mobile internet, location-based services have attracted much attention, and spatial keyword queries are catching on fast. However, as far as we aware, no prior work considers the temporal information of geo-tagged objects. Temporal information is important in the spatial keyword query because many objects are not always valid. For example, visitors may plan their trips according to the opening time of attractions. In this paper, we identify and solve a novel problem, i.e., the time-aware Boolean spatial keyword query (TABSKQ), which returns the $k$ objects that satisfy users’ spatio-temporal description and textual constraint. We first present pruning strategies and algorithm based on the CIR $^{+}$ -tree (i.e., the CIR-tree with temporal information). Then, we propose an efficient index structure, called the TA-tree, and its corresponding algorithms, which can prune the search space using both spatio-temporal and textual information. Furthermore, we study an interesting TABSKQ variant, i.e., Joint TABSKQ (JTABSKQ), which aims to process a set of TABSKQs jointly, and extend our techniques to tackle it. Extensive experiments with real datasets offer insight into the performance of our proposed indices and algorithms.
    Print ISSN: 1041-4347
    Digitale ISSN: 1558-2191
    Thema: Informatik
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 74
    facet.materialart.
    Unbekannt
    Institute of Electrical and Electronics Engineers (IEEE)
    Publikationsdatum: 2017-10-07
    Beschreibung: In this paper, we tackle a challenging problem inherent in a series of applications: tracking the influential nodes in dynamic networks. Specifically, we model a dynamic network as a stream of edge weight updates. This general model embraces many practical scenarios as special cases, such as edge and node insertions, deletions as well as evolving weighted graphs. Under the popularly adopted linear threshold model and independent cascade model, we consider two essential versions of the problem: finding the nodes whose influences passing a user specified threshold and finding the top- $k$ most influential nodes. Our key idea is to use the polling-based methods and maintain a sample of random RR sets so that we can approximate the influence of nodes with provable quality guarantees. We develop an efficient algorithm that incrementally updates the sample random RR sets against network changes. We also design methods to determine the proper sample sizes for the two versions of the problem so that we can provide strong quality guarantees and, at the same time, be efficient in both space and time. In addition to the thorough theoretical results, our experimental results on five real network data sets clearly demonstrate the effectiveness and efficiency of our algorithms.
    Print ISSN: 1041-4347
    Digitale ISSN: 1558-2191
    Thema: Informatik
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 75
    facet.materialart.
    Unbekannt
    Institute of Electrical and Electronics Engineers (IEEE)
    Publikationsdatum: 2017-10-07
    Beschreibung: A network with $n$ nodes contains $O(n^2)$ possible links. Even for networks of modest size, it is often difficult to evaluate all pairwise possibilities for links in a meaningful way. Further, even though link prediction is closely related to missing value estimation problems, it is often difficult to use sophisticated models such as latent factor methods because of their computational complexity on large networks. Hence, most known link prediction methods are designed for evaluating the link propensity on a specified subset of links, rather than on the entire networks. In practice, however, it is essential to perform an exhaustive search over the entire networks. In this article, we propose an ensemble enabled approach to scaling up link prediction, by decomposing traditional link prediction problems into subproblems of smaller size. These subproblems are each solved with latent factor models, which can be effectively implemented on networks of modest size. By incorporating with the characteristics of link prediction, the ensemble approach further reduces the sizes of subproblems without sacrificing its prediction accuracy. The ensemble enabled approach has several advantages in terms of performance, and our experimental results demonstrate the effectiveness and scalability of our approach.
    Print ISSN: 1041-4347
    Digitale ISSN: 1558-2191
    Thema: Informatik
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 76
    facet.materialart.
    Unbekannt
    Institute of Electrical and Electronics Engineers (IEEE)
    Publikationsdatum: 2017-10-07
    Beschreibung: Multi-Task Learning (MTL) can enhance a classifier’s generalization performance by learning multiple related tasks simultaneously. Conventional MTL works under the offline or batch setting, and suffers from expensive training cost and poor scalability. To address such inefficiency issues, online learning techniques have been applied to solve MTL problems. However, most existing algorithms of online MTL constrain task relatedness into a presumed structure via a single weight matrix, which is a strict restriction that does not always hold in practice. In this paper, we propose a robust online MTL framework that overcomes this restriction by decomposing the weight matrix into two components: The first one captures the low-rank common structure among tasks via a nuclear norm and the second one identifies the personalized patterns of outlier tasks via a group lasso. Theoretical analysis shows the proposed algorithm can achieve a sub-linear regret with respect to the best linear model in hindsight. Even though the above framework achieves good performance, the nuclear norm that simply adds all nonzero singular values together may not be a good low-rank approximation. To improve the results, we use a log-determinant function as a non-convex rank approximation. The gradient scheme is applied to optimize log-determinant function and can obtain a closed-form solution for this refined problem. Experimental results on a number of real-world applications verify the efficacy of our method.
    Print ISSN: 1041-4347
    Digitale ISSN: 1558-2191
    Thema: Informatik
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 77
    facet.materialart.
    Unbekannt
    Institute of Electrical and Electronics Engineers (IEEE)
    Publikationsdatum: 2017-10-07
    Beschreibung: SimRank is a similarity measure between vertices in a graph. Recently, many algorithms have been proposed to efficiently evaluate SimRank similarities. However, the existing algorithms either overlook uncertainty in graph structures or depends on an unreasonable assumption. In this paper, we study SimRank on uncertain graphs. Following the random-walk-based formulation of SimRank on deterministic graphs and the possible world model of uncertain graphs, we first define random walks on uncertain graphs and show that our definition of random walks satisfies Markov’s property. We formulate our SimRank measure based on random walks on uncertain graphs. We discover a critical difference between random walks on uncertain graphs and random walks on deterministic graphs, which makes all existing SimRank computation algorithms on deterministic graphs inapplicable to uncertain graphs. For SimRank computation, we consider computing both single-pair SimRank and single-source top- $K$ SimRank. We propose three algorithms, namely the sampling algorithm with high efficiency, the two-phase algorithm with comparable efficiency and higher accuracy, and a speeding-up algorithm with much higher efficiency. Meanwhile, we present an optimized algorithm for efficient computing the single-source top- $K$ SimRank. The experimental results verify the effectiveness of our SimRank measure and the efficiency of the proposed SimRank computation algorithms.
    Print ISSN: 1041-4347
    Digitale ISSN: 1558-2191
    Thema: Informatik
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 78
    facet.materialart.
    Unbekannt
    Institute of Electrical and Electronics Engineers (IEEE)
    Publikationsdatum: 2017-10-07
    Beschreibung: Satisfaction prediction is one of the prime concerns in search performance evaluation. It is a non-trivial task for three major reasons: (1) The definition of satisfaction is subjective and different users may have different opinions in the process of satisfaction judgment. (2) Most existing studies on satisfaction prediction mainly rely on users’ click-through or query reformulation behaviors but there are many sessions without such interactions. (3) Most existing works primarily rely on the hypothesis that all results on search result pages (SERPs) are homogeneous, but a variety of heterogeneous search results have been aggregated into SERPs to improve the diversity and quality of search results recently. To shed light on these research questions, we construct an experimental search engine that could collect users’ satisfaction feedback as well as mouse click-through/movement data. Inspired by recent studies in predicting search result relevance based on mouse movement patterns (namely, motifs), we propose to estimate search satisfaction with motifs extracted from mouse movement data on SERPs. Besides the existing frequency-based motif selection method, two novel selection strategies (distance-based and distribution-based) are also adopted to extract high-quality motifs for satisfaction prediction. Experimental results show that the proposed strategies outperform existing methods and have promising generalization capability for unseen users and queries in both a homogeneous and heterogeneous search environment.
    Print ISSN: 1041-4347
    Digitale ISSN: 1558-2191
    Thema: Informatik
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 79
    facet.materialart.
    Unbekannt
    Institute of Electrical and Electronics Engineers (IEEE)
    Publikationsdatum: 2017-10-07
    Beschreibung: Point-of-interest (POI) recommendation has become an important way to help people discover attractive and interesting places, especially when they travel out of town. However, the extreme sparsity of user-POI matrix and cold-start issues severely hinder the performance of collaborative filtering-based methods. Moreover, user preferences may vary dramatically with respect to the geographical regions due to different urban compositions and cultures. To address these challenges, we stand on recent advances in deep learning and propose a Spatial-Aware Hierarchical Collaborative Deep Learning model (SH-CDL). The model jointly performs deep representation learning for POIs from heterogeneous features and hierarchically additive representation learning for spatial-aware personal preferences. To combat data sparsity in spatial-aware user preference modeling, both the collective preferences of the public in a given target region and the personal preferences of the user in adjacent regions are exploited in the form of social regularization and spatial smoothing. To deal with the multimodal heterogeneous features of the POIs, we introduce a late feature fusion strategy into our SH-CDL model. The extensive experimental analysis shows that our proposed model outperforms the state-of-the-art recommendation models, especially in out-of-town and cold-start recommendation scenarios.
    Print ISSN: 1041-4347
    Digitale ISSN: 1558-2191
    Thema: Informatik
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 80
    facet.materialart.
    Unbekannt
    Institute of Electrical and Electronics Engineers (IEEE)
    Publikationsdatum: 2017-10-07
    Beschreibung: Searching for associations between entities is needed in many domains like national security and bioinformatics. In recent years, it has been facilitated by the emergence of graph-structured semantic data on the Web, which offers structured semantic associations more explicit than those hiding in unstructured text for computers to discover. The increasing volume of semantic data often produces excessively many semantic associations, and requires ranking techniques to identify the more important ones for users. Despite the fruitful theoretical research on innovative ranking techniques, there is a lack of comprehensive empirical evaluation of these techniques. In this article, we carry out an extensive evaluation of eight techniques for ranking semantic associations, including two novel ones we propose. The practical effectiveness of these techniques is assessed based on 1,200 ground-truth rankings created by 30 human experts for real-life semantic associations and the explanations given by the experts. Our findings also suggest a number of directions in improving existing techniques and developing novel techniques for future work.
    Print ISSN: 1041-4347
    Digitale ISSN: 1558-2191
    Thema: Informatik
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 81
    facet.materialart.
    Unbekannt
    Institute of Electrical and Electronics Engineers (IEEE)
    Publikationsdatum: 2017-10-07
    Beschreibung: Many real-world problems involve learning models for rare classes in situations where there are no gold standard labels for training samples but imperfect labels are available for all instances. In this paper, we present RAPT, a three step predictive modeling framework for classifying rare class in such problem settings. The first step of the proposed framework learns a classifier that jointly optimizes precision and recall by only using imperfectly labeled training samples. We also show that, under certain assumptions on the imperfect labels, the quality of this classifier is almost as good as the one constructed using perfect labels. The second and third steps of the framework make use of the fact that imperfect labels are available for all instances to further improve the precision and recall of the rare class. We evaluate the RAPT framework on two real-world applications of mapping forest fires and urban extent from earth observing satellite data. The experimental results indicate that RAPT can be used to identify forest fires and urban areas with high precision and recall by using imperfect labels, even though obtaining expert annotated samples on a global scale is infeasible in these applications.
    Print ISSN: 1041-4347
    Digitale ISSN: 1558-2191
    Thema: Informatik
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 82
    facet.materialart.
    Unbekannt
    Institute of Electrical and Electronics Engineers (IEEE)
    Publikationsdatum: 2017-10-07
    Beschreibung: In recent years, various data clustering algorithms have been proposed in the data mining and engineering communities. However, there are still drawbacks in traditional clustering methods which are worth to be further investigated, such as clustering for the high dimensional data, learning an ideal affinity matrix which optimally reveals the global data structure, discovering the intrinsic geometrical and discriminative properties of the data space, and reducing the noises influence brings by the complex data input. In this paper, we propose a novel clustering algorithm called robust dual clustering with adaptive manifold regularization (RDC), which simultaneously performs dual matrix factorization tasks with the target of an identical cluster indicator in both of the original and projected feature spaces, respectively. Among which, the $l_{2,1}$ -norm is used instead of the conventional $l_{2}$ -norm to measure the loss, which helps to improve the model robustness by relieving the influences by the noises and outliers. In order to better consider the intrinsic geometrical and discriminative data structure, we incorporate the manifold regularization term on the cluster indicator by using a particularly learned affinity matrix which is more suitable for the clustering task. Moreover, a novel augmented lagrangian method (ALM) based procedure is designed to effectively and efficiently seek the optimal solution of the proposed RDC optimization. Numerous experiments on the representative data sets demonstrate the superior performance of the proposed method compares to the existing clustering algorithms.
    Print ISSN: 1041-4347
    Digitale ISSN: 1558-2191
    Thema: Informatik
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 83
    facet.materialart.
    Unbekannt
    Institute of Electrical and Electronics Engineers (IEEE)
    Publikationsdatum: 2017-10-07
    Beschreibung: Query auto-completion (QAC) is widely used by modern search engines to assist users by predicting their intended queries. Most QAC approaches rely on deterministic batch learning algorithms trained from past query log data. However, query popularities keep changing all the time and QAC operates in a real-time scenario where users interact with the search engine continually. So, ideally, QAC must be timely and adaptive enough to reflect time-sensitive changes in an online fashion. Second, due to the vertical position bias, a query suggestion with a higher rank tends to attract more clicks regardless of user’s original intention. Hence, in the long run, it is important to place some lower ranked yet potentially more relevant queries to higher positions to collect more valuable user feedbacks. In order to tackle these issues, we propose to formulate QAC as a ranked Multi-Armed Bandits (MAB) problem which enjoys theoretical soundness. To utilize prior knowledge from query logs, we propose to use Bayesian inference and Thompson Sampling to solve this MAB problem. Extensive experiments on large scale datasets show that our QAC algorithm has the capacity to adaptively learn temporal trends, and outperforms existing QAC algorithms in ranking qualities.
    Print ISSN: 1041-4347
    Digitale ISSN: 1558-2191
    Thema: Informatik
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 84
    facet.materialart.
    Unbekannt
    Institute of Electrical and Electronics Engineers (IEEE)
    Publikationsdatum: 2017-10-07
    Beschreibung: The collection of time series data increases as more monitoring and automation are being deployed. These deployments range in scale from an Internet of things (IoT) device located in a household to enormous distributed Cyber-Physical Systems (CPSs) producing large volumes of data at high velocity. To store and analyze these vast amounts of data, specialized Time Series Management Systems (TSMSs) have been developed to overcome the limitations of general purpose Database Management Systems (DBMSs) for times series management. In this paper, we present a thorough analysis and classification of TSMSs developed through academic or industrial research and documented through publications. Our classification is organized into categories based on the architectures observed during our analysis. In addition, we provide an overview of each system with a focus on the motivational use case that drove the development of the system, the functionality for storage and querying of time series a system implements, the components the system is composed of, and the capabilities of each system with regard to Stream Processing and Approximate Query Processing (AQP) . Last, we provide a summary of research directions proposed by other researchers in the field and present our vision for a next generation TSMS.
    Print ISSN: 1041-4347
    Digitale ISSN: 1558-2191
    Thema: Informatik
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 85
    facet.materialart.
    Unbekannt
    Institute of Electrical and Electronics Engineers (IEEE)
    Publikationsdatum: 2017-10-07
    Beschreibung: Given a graph $G$ and a set $Q$ of query nodes, we examine the Steiner Maximum-Connected Subgraph (SMCS) problem. The SMCS, or $G$ 's induced subgraph that contains $Q$ with the largest connectivity, can be useful for customer prediction, product promotion, and team assembling. Despite its importance, the SMCS problem has only been recently studied. Existing solutions evaluate the maximum SMCS , whose number of nodes is the largest among all the SMCSs of $Q$ . However, the maximum SMCS, which may contain a lot of nodes, can be difficult to interpret. In this paper, we investigate the minimal SMCS , which is the minimal subgraph of $G$ with the maximum connectivity containing $Q$ . The minimal SMCS contains much fewer nodes than its maximum counterpart, and is thus easier to be understood. However, the minimal SMCS can be costly to evaluate. We thus propose effici- nt Expand-Refine algorithms, as well as their approximate versions with accuracy guarantees. We further develop a cache-based processing model to improve the efficiency for an important case when $Q$ consists of a single node. Extensive experiments on large real and synthetic graph datasets validate the effectiveness and efficiency of our approaches.
    Print ISSN: 1041-4347
    Digitale ISSN: 1558-2191
    Thema: Informatik
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 86
    facet.materialart.
    Unbekannt
    Institute of Electrical and Electronics Engineers (IEEE)
    Publikationsdatum: 2017-10-07
    Beschreibung: Clinical Guidelines (CGs) provide general evidence-based recommendations and physicians often have to resort also to their Basic Medical Knowledge (BMK) to cope with specific patients. In this paper, we explore the interplay between CGs and BMK from the viewpoint of a-posteriori conformance analysis, intended as the adherence of a specific execution log to both the CG and the BMK. In this paper, we consider also the temporal dimension: the guideline may include temporal constraints for the execution of actions, and its adaptation to a specific patient and context may add or modify conditions and temporal constraints for actions. We propose an approach for analyzing execution traces in Answer Set Programming with respect to a guideline and BMK, pointing out discrepancies – including temporal discrepancies – with respect to the different knowledge sources, and providing explanations regarding how the applications of the CG and the BMK have interacted, especially in case strictly adhering to both is not possible.
    Print ISSN: 1041-4347
    Digitale ISSN: 1558-2191
    Thema: Informatik
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 87
    facet.materialart.
    Unbekannt
    Institute of Electrical and Electronics Engineers (IEEE)
    Publikationsdatum: 2017-10-07
    Beschreibung: Networked observational devices have proliferated in recent years, contributing to voluminous data streams from a variety of sources and problem domains. These streams often have a spatiotemporal component and include multidimensional features of interest. Processing such data in an offline fashion using batch systems or data warehouses is costly from both a storage and computational standpoint, and in many situations the insights derived from the data streams are useful only if they are timely. In this study, we propose Synopsis , an online, distributed sketch that is constructed from voluminous spatiotemporal data streams. The sketch summarizes feature values and inter-feature relationships in memory to facilitate real-time query evaluations and to serve as input to computations expressed using analytical engines. As the data streams evolve, Synopsis performs targeted dynamic scaling to ensure high accuracy and effective resource utilization. We evaluate our system in the context of two real-world spatiotemporal datasets and demonstrate its efficacy in both scalability and query evaluations.
    Print ISSN: 1041-4347
    Digitale ISSN: 1558-2191
    Thema: Informatik
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 88
    facet.materialart.
    Unbekannt
    Institute of Electrical and Electronics Engineers (IEEE)
    Publikationsdatum: 2017-05-05
    Beschreibung: In many applications, top- k dominating query is an important operation to return k tuples with the highest domination scores in a potentially huge data space. It is analyzed that the existing algorithms have their performance problems when performed on massive data. This paper proposes a novel table-scan-based TDTS algorithm to efficiently compute top- k dominating results. TDTS first presorts the table for early termination. The early termination checking is proposed in this paper, along with the theoretical analysis of scan depth. The pruning operation for tuples is devised in this paper. The theoretical pruning effect shows that the number of tuples maintained in TDTS can be reduced substantially. The extensive experimental results, conducted on synthetic and real-life data sets, show that TDTS outperforms the existing algorithms significantly.
    Print ISSN: 1041-4347
    Digitale ISSN: 1558-2191
    Thema: Informatik
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 89
    facet.materialart.
    Unbekannt
    Institute of Electrical and Electronics Engineers (IEEE)
    Publikationsdatum: 2017-05-05
    Beschreibung: Continuous top- $k$ query over streaming data is a fundamental problem in database. In this paper, we focus on the sliding window scenario, where a continuous top- $k$ query returns the top- $k$ objects within each query window on the data stream. Existing algorithms support this type of queries via incrementally maintaining a subset of objects in the window and try to retrieve the answer from this subset as much as possible whenever the window slides. However, since all the existing algorithms are sensitive to query parameters and data distribution, they all suffer from expensive incremental maintenance cost. In this paper, we propose a self-adaptive partition framework to support continuous top- $k$ query. It partitions the window into sub-windows and only maintains a small number of candidates with highest scores in each sub-window. Based on this framework, we have developed several partition algorithms to cater for different object distributions and query parameters. To our best knowledge, it is the first algorithm that achieves logarithmic complexity w.r.t. $k$ for incrementally maintaining the candidate set even in the worst case scenarios.
    Print ISSN: 1041-4347
    Digitale ISSN: 1558-2191
    Thema: Informatik
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 90
    facet.materialart.
    Unbekannt
    Institute of Electrical and Electronics Engineers (IEEE)
    Publikationsdatum: 2017-05-05
    Beschreibung: Social networking services have been prevalent at many online communities such as Twitter.com and Weibo.com, where millions of users keep interacting with each other every day. One interesting and important problem in the social networking services is to rank users based on their vitality in a timely fashion. An accurate ranking list of user vitality could benefit many parties in social network services such as the ads providers and site operators. Although it is very promising to obtain a vitality-based ranking list of users, there are many technical challenges due to the large scale and dynamics of social networking data. In this paper, we propose a unique perspective to achieve this goal, which is quantifying user vitality by analyzing the dynamic interactions among users on social networks. Examples of social network include but are not limited to social networks in microblog sites and academical collaboration networks. Intuitively, if a user has many interactions with his friends within a time period and most of his friends do not have many interactions with their friends simultaneously, it is very likely that this user has high vitality. Based on this idea, we develop quantitative measurements for user vitality and propose our first algorithm for ranking users based vitality. Also, we further consider the mutual influence between users while computing the vitality measurements and propose the second ranking algorithm, which computes user vitality in an iterative way. Other than user vitality ranking, we also introduce a vitality prediction problem, which is also of great importance for many applications in social networking services. Along this line, we develop a customized prediction model to solve the vitality prediction problem. To evaluate the performance of our algorithms, we collect two dynamic social network data sets. The experimental results with both data sets clearly demonstrate the advantage of our ranking and prediction methods.
    Print ISSN: 1041-4347
    Digitale ISSN: 1558-2191
    Thema: Informatik
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 91
    facet.materialart.
    Unbekannt
    Institute of Electrical and Electronics Engineers (IEEE)
    Publikationsdatum: 2017-05-05
    Beschreibung: Big sensing data is prevalent in both industry and scientific research applications where the data is generated with high volume and velocity. Cloud computing provides a promising platform for big sensing data processing and storage as it provides a flexible stack of massive computing, storage, and software services in a scalable manner. Current big sensing data processing on Cloud have adopted some data compression techniques. However, due to the high volume and velocity of big sensing data, traditional data compression techniques lack sufficient efficiency and scalability for data processing. Based on specific on-Cloud data compression requirements, we propose a novel scalable data compression approach based on calculating similarity among the partitioned data chunks. Instead of compressing basic data units, the compression will be conducted over partitioned data chunks. To restore original data sets, some restoration functions and predictions will be designed. MapReduce is used for algorithm implementation to achieve extra scalability on Cloud. With real world meteorological big sensing data experiments on U-Cloud platform, we demonstrate that the proposed scalable compression approach based on data chunk similarity can significantly improve data compression efficiency with affordable data accuracy loss.
    Print ISSN: 1041-4347
    Digitale ISSN: 1558-2191
    Thema: Informatik
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 92
    facet.materialart.
    Unbekannt
    Institute of Electrical and Electronics Engineers (IEEE)
    Publikationsdatum: 2017-05-05
    Beschreibung: Transfer learning has been proven to be effective for the problems where training data from a source domain and test data from a target domain are drawn from different distributions. To reduce the distribution divergence between the source domain and the target domain, many previous studies have been focused on designing and optimizing objective functions with the Euclidean distance to measure dissimilarity between instances. However, in some real-world applications, the Euclidean distance may be inappropriate to capture the intrinsic similarity or dissimilarity between instances. To deal with this issue, in this paper, we propose a metric transfer learning framework (MTLF) to encode metric learning in transfer learning. In MTLF, instance weights are learned and exploited to bridge the distributions of different domains, while Mahalanobis distance is learned simultaneously to maximize the intra-class distances and minimize the inter-class distances for the target domain. Unlike previous work where instance weights and Mahalanobis distance are trained in a pipelined framework that potentially leads to error propagation across different components, MTLF attempts to learn instance weights and a Mahalanobis distance in a parallel framework to make knowledge transfer across domains more effective. Furthermore, we develop general solutions to both classification and regression problems on top of MTLF, respectively. We conduct extensive experiments on several real-world datasets on object recognition, handwriting recognition, and WiFi location to verify the effectiveness of MTLF compared with a number of state-of-the-art methods.
    Print ISSN: 1041-4347
    Digitale ISSN: 1558-2191
    Thema: Informatik
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 93
    facet.materialart.
    Unbekannt
    Institute of Electrical and Electronics Engineers (IEEE)
    Publikationsdatum: 2017-05-05
    Beschreibung: In this work, we focus on modeling user-generated review and overall rating pairs, and aim to identify semantic aspects and aspect-level sentiments from review data as well as to predict overall sentiments of reviews. We propose a novel probabilistic supervised joint aspect and sentiment model (SJASM) to deal with the problems in one go under a unified framework. SJASM represents each review document in the form of opinion pairs, and can simultaneously model aspect terms and corresponding opinion words of the review for hidden aspect and sentiment detection. It also leverages sentimental overall ratings, which often come with online reviews, as supervision data, and can infer the semantic aspects and aspect-level sentiments that are not only meaningful but also predictive of overall sentiments of reviews. Moreover, we also develop efficient inference method for parameter estimation of SJASM based on collapsed Gibbs sampling. We evaluate SJASM extensively on real-world review data, and experimental results demonstrate that the proposed model outperforms seven well-established baseline methods for sentiment analysis tasks.
    Print ISSN: 1041-4347
    Digitale ISSN: 1558-2191
    Thema: Informatik
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 94
    facet.materialart.
    Unbekannt
    Institute of Electrical and Electronics Engineers (IEEE)
    Publikationsdatum: 2017-05-05
    Beschreibung: Finding similar questions from historical archives has been applied to question answering, with well theoretical underpinnings and great practical success. Nevertheless, each question in the returned candidate pool often associates with multiple answers, and hence users have to painstakingly browse a lot before finding the correct one. To alleviate such problem, we present a novel scheme to rank answer candidates via pairwise comparisons. In particular, it consists of one offline learning component and one online search component. In the offline learning component, we first automatically establish the positive, negative, and neutral training samples in terms of preference pairs guided by our data-driven observations. We then present a novel model to jointly incorporate these three types of training samples. The closed-form solution of this model is derived. In the online search component, we first collect a pool of answer candidates for the given question via finding its similar questions. We then sort the answer candidates by leveraging the offline trained model to judge the preference orders. Extensive experiments on the real-world vertical and general community-based question answering datasets have comparatively demonstrated its robustness and promising performance. Also, we have released the codes and data to facilitate other researchers.
    Print ISSN: 1041-4347
    Digitale ISSN: 1558-2191
    Thema: Informatik
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 95
    facet.materialart.
    Unbekannt
    Institute of Electrical and Electronics Engineers (IEEE)
    Publikationsdatum: 2017-05-05
    Beschreibung: Much work has focused on automatically constructing conceptual taxonomies or semantic networks from large text corpora. In this paper, we use a state-of-the-art data-driven conceptual taxonomy, Probase, to show that missing links in taxonomies are the chief problem that hinders their adoption by many real life applications, for the missing links break the inferencing that the conceptual taxonomy claims to support. To solve this problem, we devise a collaborative filtering framework to infer missing links in taxonomies derived from text corpora. We implement our method mainly on Probase, creating a denser taxonomy containing 5.1 million (about 30 percent) more isA relationships, with an accuracy of above 90 percent. We conduct comprehensive experiments to demonstrate the quality of the revised conceptual taxonomies.
    Print ISSN: 1041-4347
    Digitale ISSN: 1558-2191
    Thema: Informatik
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 96
    facet.materialart.
    Unbekannt
    Institute of Electrical and Electronics Engineers (IEEE)
    Publikationsdatum: 2017-05-05
    Beschreibung: Ensemble forecasting is a widely-used numerical prediction method for modeling the evolution of nonlinear dynamic systems. To predict the future state of such systems, a set of ensemble member forecasts is generated from multiple runs of computer models, where each run is obtained by perturbing the starting condition or using a different model representation of the system. The ensemble mean or median is typically chosen as a point estimate for the ensemble member forecasts. These approaches are limited in that they assume each ensemble member is equally skillful and may not preserve the temporal autocorrelation of the predicted time series. To overcome these limitations, we present an online multi-task learning framework called ORION to estimate the optimal weights for combining the ensemble member forecasts. Unlike other existing formulations, the proposed framework is novel in that its learning algorithm must backtrack and revise its previous forecasts before making future predictions if the earlier forecasts were incorrect when verified against new observation data. We termed this strategy as online learning with restart . Our proposed framework employs a graph Laplacian regularizer to ensure consistency of the predicted time series. It can also accommodate different types of loss functions, including $epsilon$ -insensitive and quantile loss functions, the latter of which is particularly useful for extreme value prediction. A theoretical proof demonstrating the convergence of our algorithm is also given. Experimental results on seasonal soil moisture forecasts from 12 major river basins in North America demonstrate the superiority of ORION compared to other baseline algorithms.
    Print ISSN: 1041-4347
    Digitale ISSN: 1558-2191
    Thema: Informatik
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 97
    facet.materialart.
    Unbekannt
    Institute of Electrical and Electronics Engineers (IEEE)
    Publikationsdatum: 2017-05-05
    Beschreibung: In recent years, recommender system is one of indispensable components in many e-commerce websites. One of the major challenges that largely remains open is the cold-start problem, which can be viewed as a barrier that keeps the cold-start users/items away from the existing ones. In this paper, we aim to break through this barrier for cold-start users/items by the assistance of existing ones. In particular, inspired by the classic Elo Rating System, which has been widely adopted in chess tournaments, we propose a novel rating comparison strategy (R a P are ) to learn the latent profiles of cold-start users/items. The centerpiece of our R a P are is to provide a fine-grained calibration on the latent profiles of cold-start users/items by exploring the differences between cold-start and existing users/items. As a generic strategy, our proposed strategy can be instantiated into existing methods in recommender systems. To reveal the capability of R a P are strategy, we instantiate our strategy on two prevalent methods in recommender systems, i.e., the matrix factorization based and neighborhood based collaborative filtering. Experimental evaluations on five real data sets validate the superiority of our approach over the existing methods in cold-start scenario.
    Print ISSN: 1041-4347
    Digitale ISSN: 1558-2191
    Thema: Informatik
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 98
    facet.materialart.
    Unbekannt
    Institute of Electrical and Electronics Engineers (IEEE)
    Publikationsdatum: 2017-05-05
    Beschreibung: Sociologists have long converged that the evolution of a S ocial N etworking S ervice(SNS) is driven by the interplay between users’ preferences (reflected in user-item interaction behavior) and the social network structure (reflected in user-user interaction behavior). Nevertheless, traditional approaches either modeled these two kinds of behaviors in isolation or relied on a static assumption of a SNS. Thus, it is still unclear how do the roles of the dynamic social network structure and users’ historical preferences affect the evolution of SNSs. Furthermore, can transforming the underlying social theories in the platform evolution modeling process benefit both behavior prediction tasks? In this paper, we incorporate the underlying social theories to explain and model the evolution of users’ two kinds of behaviors in SNSs. Specifically, we present two kinds of representations for users’ behaviors: a direct (latent) representation that presumes users’ behaviors are represented directly (latently) by their historical behaviors. Under each representation, we associate each user's two kinds of behaviors with two vectors at each time. Then, for each representation, we propose the corresponding learning model to fuse the interplay between users’ two kinds of behaviors. Finally, extensive experimental results demonstrate the effectiveness of our proposed models for both user preference prediction and social link suggestion.
    Print ISSN: 1041-4347
    Digitale ISSN: 1558-2191
    Thema: Informatik
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 99
    facet.materialart.
    Unbekannt
    Institute of Electrical and Electronics Engineers (IEEE)
    Publikationsdatum: 2017-05-05
    Beschreibung: Fraudulent behaviors in Google Play, the most popular Android app market, fuel search rank abuse and malware proliferation. To identify malware, previous work has focused on app executable and permission analysis. In this paper, we introduce FairPlay, a novel system that discovers and leverages traces left behind by fraudsters, to detect both malware and apps subjected to search rank fraud. FairPlay correlates review activities and uniquely combines detected review relations with linguistic and behavioral signals gleaned from Google Play app data (87 K apps, 2.9 M reviews, and 2.4M reviewers, collected over half a year), in order to identify suspicious apps. FairPlay achieves over 95 percent accuracy in classifying gold standard datasets of malware, fraudulent and legitimate apps. We show that 75 percent of the identified malware apps engage in search rank fraud. FairPlay discovers hundreds of fraudulent apps that currently evade Google Bouncer's detection technology. FairPlay also helped the discovery of more than 1,000 reviews, reported for 193 apps, that reveal a new type of “coercive” review campaign: users are harassed into writing positive reviews, and install and review other apps.
    Print ISSN: 1041-4347
    Digitale ISSN: 1558-2191
    Thema: Informatik
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 100
    facet.materialart.
    Unbekannt
    Institute of Electrical and Electronics Engineers (IEEE)
    Publikationsdatum: 2017-03-08
    Beschreibung: In recommender systems, one key task is to predict the personalized rating of a user to a new item and then return the new items having the top predicted ratings to the user. Recommender systems usually apply collaborative filtering techniques (e.g., matrix factorization) over a sparse user-item rating matrix to make rating prediction. However, the collaborative filtering techniques are severely affected by the data sparsity of the underlying user-item rating matrix and often confront the cold-start problems for new items and users . Since the attributes of items and social links between users become increasingly accessible in the Internet, this paper exploits the rich attributes of items and social links of users to alleviate the rating sparsity effect and tackle the cold-start problems. Specifically, we first propose a K ernel-based A ttribute-aware M atrix F actorization model called KAMF to integrate the attribute information of items into matrix factorization. KAMF can discover the nonlinear interactions among attributes, users, and items, which mitigate the rating sparsity effect and deal with the cold-start problem for new items by nature. Further, we extend KAMF to address the cold-start problem for new users by utilizing the social links between users. Finally, we conduct a comprehensive performance evaluation for KAMF using two large-scale real-world data sets recently released in Yelp and MovieLens. Experimental results show that KAMF achieves significantly superior performance against other state-of-the-art rating prediction techniques.
    Print ISSN: 1041-4347
    Digitale ISSN: 1558-2191
    Thema: Informatik
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
Schließen ⊗
Diese Webseite nutzt Cookies und das Analyse-Tool Matomo. Weitere Informationen finden Sie hier...