ALBERT

All Library Books, journals and Electronic Records Telegrafenberg

Your email was sent successfully. Check your inbox.

An error occurred while sending the email. Please try again.

Proceed reservation?

Export
Filter
  • Articles  (791)
  • 2015-2019  (791)
  • 1990-1994
  • 1915-1919
  • IEEE Transactions on Pattern Analysis and Machine Intelligence  (791)
  • 1275
  • Computer Science  (791)
  • Philosophy
  • 1
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2018-03-06
    Description: Encouraged by the recent progress in pedestrian detection, we investigate the gap between current state-of-the-art methods and the “perfect single frame detector”. We enable our analysis by creating a human baseline for pedestrian detection (over the Caltech pedestrian dataset). After manually clustering the frequent errors of a top detector, we characterise both localisation and background-versus-foreground errors. To address localisation errors we study the impact of training annotation noise on the detector performance, and show that we can improve results even with a small portion of sanitised training data. To address background/foreground discrimination, we study convnets for pedestrian detection, and discuss which factors affect their performance. Other than our in-depth analysis, we report top performance on the Caltech pedestrian dataset, and provide a new sanitised set of training and test annotations.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 2
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2018-03-06
    Description: This paper addresses the problem of robust and efficient relative rotation averaging in the context of large-scale Structure from Motion. Relative rotation averaging finds global or absolute rotations for a set of cameras from a set of observed relative rotations between pairs of cameras. We propose a generalized framework of relative rotation averaging that can use different robust loss functions and jointly optimizes for all the unknown camera rotations. Our method uses a quasi-Newton optimization which results in an efficient iteratively reweighted least squares (IRLS) formulation that works in the Lie algebra of the 3D rotation group. We demonstrate the performance of our approach on a number of large-scale data sets. We show that our method outperforms existing methods in the literature both in terms of speed and accuracy.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 3
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2018-04-04
    Description: Speaker diarization consists of assigning speech signals to people engaged in a dialogue. An audio-visual spatiotemporal diarization model is proposed. The model is well suited for challenging scenarios that consist of several participants engaged in multi-party interaction while they move around and turn their heads towards the other participants rather than facing the cameras and the microphones. Multiple-person visual tracking is combined with multiple speech-source localization in order to tackle the speech-to-person association problem. The latter is solved within a novel audio-visual fusion method on the following grounds: binaural spectral features are first extracted from a microphone pair, then a supervised audio-visual alignment technique maps these features onto an image, and finally a semi-supervised clustering method assigns binaural spectral features to visible persons. The main advantage of this method over previous work is that it processes in a principled way speech signals uttered simultaneously by multiple persons. The diarization itself is cast into a latent-variable temporal graphical model that infers speaker identities and speech turns, based on the output of an audio-visual association process, executed at each time slice, and on the dynamics of the diarization variable itself. The proposed formulation yields an efficient exact inference procedure. A novel dataset, that contains audio-visual training data as well as a number of scenarios involving several participants engaged in formal and informal dialogue, is introduced. The proposed method is thoroughly tested and benchmarked with respect to several state-of-the art diarization algorithms.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 4
    Publication Date: 2018-04-04
    Description: The capabilities of (I) learning transferable knowledge across domains; and (II) fine-tuning the pre-learned base knowledge towards tasks with considerably smaller data scale are extremely important. Many of the existing transfer learning techniques are supervised approaches, among which deep learning has the demonstrated power of learning domain transferrable knowledge with large scale network trained on massive amounts of labeled data. However, in many biomedical tasks, both the data and the corresponding label can be very limited, where the unsupervised transfer learning capability is urgently needed. In this paper, we proposed a novel multi-scale convolutional sparse coding (MSCSC) method, that (I) automatically learns filter banks at different scales in a joint fashion with enforced scale-specificity of learned patterns; and (II) provides an unsupervised solution for learning transferable base knowledge and fine-tuning it towards target tasks. Extensive experimental evaluation of MSCSC demonstrates the effectiveness of the proposed MSCSC in both regular and transfer learning tasks in various biomedical domains.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 5
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2018-04-04
    Description: Learning visual representations from web data has recently attracted attention for object recognition. Previous studies have mainly focused on overcoming label noise and data bias and have shown promising results by learning directly from web data. However, we argue that it might be better to transfer knowledge from existing human labeling resources to improve performance at nearly no additional cost. In this paper, we propose a new semi-supervised method for learning via web data. Our method has the unique design of exploiting strong supervision, i.e., in addition to standard image-level labels, our method also utilizes detailed annotations including object bounding boxes and part landmarks. By transferring as much knowledge as possible from existing strongly supervised datasets to weakly supervised web images, our method can benefit from sophisticated object recognition algorithms and overcome several typical problems found in webly-supervised learning. We consider the problem of fine-grained visual categorization, in which existing training resources are scarce, as our main research objective. Comprehensive experimentation and extensive analysis demonstrate encouraging performance of the proposed approach, which, at the same time, delivers a new pipeline for fine-grained visual categorization that is likely to be highly effective for real-world applications.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 6
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2018-04-04
    Description: We are motivated by the need for a generic object proposal generation algorithm which achieves good balance between object detection recall, proposal localization quality and computational efficiency. We propose a novel object proposal algorithm, BING++ , which inherits the virtue of good computational efficiency of BING [1] but significantly improves its proposal localization quality. At high level we formulate the problem of object proposal generation from a novel probabilistic perspective, based on which our BING++ manages to improve the localization quality by employing edges and segments to estimate object boundaries and update the proposals sequentially. We propose learning the parameters efficiently by searching for approximate solutions in a quantized parameter space for complexity reduction. We demonstrate the generalization of BING++ with the same fixed parameters across different object classes and datasets. Empirically our BING++ can run at half speed of BING on CPU, but significantly improve the localization quality by 18.5 and 16.7 percent on both VOC2007 and Microhsoft COCO datasets, respectively. Compared with other state-of-the-art approaches, BING++ can achieve comparable performance, but run significantly faster.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 7
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2018-04-04
    Description: Recent years have witnessed the success of deep neural networks in dealing with a plenty of practical problems. Dropout has played an essential role in many successful deep neural networks, by inducing regularization in the model training. In this paper, we present a new regularized training approach: Shakeout. Instead of randomly discarding units as Dropout does at the training stage, Shakeout randomly chooses to enhance or reverse each unit's contribution to the next layer. This minor modification of Dropout has the statistical trait: the regularizer induced by Shakeout adaptively combines $L_{0}$ , $L_{1}$ and $L_{2}$ regularization terms. Our classification experiments with representative deep architectures on image datasets MNIST, CIFAR-10 and ImageNet show that Shakeout deals with over-fitting effectively and outperforms Dropout. We empirically demonstrate that Shakeout leads to sparser weights under both unsupervised and supervised settings. Shakeout also leads to the grouping effect of the input units in a layer. Considering the weights in reflecting the importance of connections, Shakeout is superior to Dropout, which is valuable for the deep model compression. Moreover, we demonstrate that Shakeout can effectively reduce the instability of the training process of the deep architecture.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 8
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2018-04-04
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 9
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2018-04-04
    Description: The goal of this paper is to perform 3D object detection in the context of autonomous driving. Our method aims at generating a set of high-quality 3D object proposals by exploiting stereo imagery. We formulate the problem as minimizing an energy function that encodes object size priors, placement of objects on the ground plane as well as several depth informed features that reason about free space, point cloud densities and distance to the ground. We then exploit a CNN on top of these proposals to perform object detection. In particular, we employ a convolutional neural net (CNN) that exploits context and depth information to jointly regress to 3D bounding box coordinates and object pose. Our experiments show significant performance gains over existing RGB and RGB-D object proposal methods on the challenging KITTI benchmark. When combined with the CNN, our approach outperforms all existing results in object detection and orientation estimation tasks for all three KITTI object classes. Furthermore, we experiment also with the setting where LIDAR information is available, and show that using both LIDAR and stereo leads to the best result.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 10
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2018-03-06
    Description: In this paper, we seek to improve deep neural networks by generalizing the pooling operations that play a central role in the current architectures. We pursue a careful exploration of approaches to allow pooling to learn and to adapt to complex and variable patterns. The two primary directions lie in: (1) learning a pooling function via (two strategies of) combining of max and average pooling, and (2) learning a pooling function in the form of a tree-structured fusion of pooling filters that are themselves learned. In our experiments every generalized pooling operation we explore improves performance when used in place of average or max pooling. We experimentally demonstrate that the proposed pooling operations provide a boost in invariance properties relative to conventional pooling and set the state of the art on several widely adopted benchmark datasets. These benefits come with only a light increase in computational overhead during training (ranging from additional 5 to 15 percent in time complexity) and a very modest increase in the number of model parameters (e.g., additional 1, 9, and 27 parameters for mixed, gated, and 2-level tree pooling operators, respectively). To gain more insights about our proposed pooling methods, we also visualize the learned pooling masks and the embeddings of the internal feature responses for different pooling operations. Our proposed pooling operations are easy to implement and can be applied within various deep neural network architectures.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 11
    Publication Date: 2018-04-04
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 12
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2018-04-04
    Description: Various perceptual domains have underlying compositional semantics that are rarely captured in current models. We suspect this is because directly learning the compositional structure has evaded these models. Yet, the compositional structure of a given domain can be grounded in a separate domain thereby simplifying its learning. To that end, we propose a new approach to modeling bimodal perceptual domains that explicitly relates distinct projections across each modality and then jointly learns a bimodal sparse representation. The resulting model enables compositionality across these distinct projections and hence can generalize to unobserved percepts spanned by this compositional basis. For example, our model can be trained on red triangles and blue squares ; yet, implicitly will also have learned red squares and blue triangles . The structure of the projections and hence the compositional basis is learned automatically; no assumption is made on the ordering of the compositional elements in either modality. Although our modeling paradigm is general, we explicitly focus on a tabletop building-blocks setting. To test our model, we have acquired a new bimodal dataset comprising images and spoken utterances of colored shapes (blocks) in the tabletop setting. Our experiments demonstrate the benefits of explicitly leveraging compositionality in both quantitative and human evaluation studies.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 13
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2018-04-04
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 14
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2018-04-04
    Description: Building upon recent Deep Neural Network architectures, current approaches lying in the intersection of Computer Vision and Natural Language Processing have achieved unprecedented breakthroughs in tasks like automatic captioning or image retrieval. Most of these learning methods, though, rely on large training sets of images associated with human annotations that specifically describe the visual content. In this paper we propose to go a step further and explore the more complex cases where textual descriptions are loosely related to the images. We focus on the particular domain of news articles in which the textual content often expresses connotative and ambiguous relations that are only suggested but not directly inferred from images. We introduce an adaptive CNN architecture that shares most of the structure for multiple tasks including source detection, article illustration and geolocation of articles. Deep Canonical Correlation Analysis is deployed for article illustration, and a new loss function based on Great Circle Distance is proposed for geolocation. Furthermore, we present BreakingNews, a novel dataset with approximately 100K news articles including images, text and captions, and enriched with heterogeneous meta-data (such as GPS coordinates and user comments). We show this dataset to be appropriate to explore all aforementioned problems, for which we provide a baseline performance using various Deep Learning architectures, and different representations of the textual and visual features. We report very promising results and bring to light several limitations of current state-of-the-art in this kind of domain, which we hope will help spur progress in the field.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 15
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2018-04-04
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 16
    Publication Date: 2018-04-04
    Description: Machine learning algorithms for the analysis of time-series often depend on the assumption that utilised data are temporally aligned. Any temporal discrepancies arising in the data is certain to lead to ill-generalisable models, which in turn fail to correctly capture properties of the task at hand. The temporal alignment of time-series is thus a crucial challenge manifesting in a multitude of applications. Nevertheless, the vast majority of algorithms oriented towards temporal alignment are either applied directly on the observation space or simply utilise linear projections-thus failing to capture complex, hierarchical non-linear representations that may prove beneficial, especially when dealing with multi-modal data (e.g., visual and acoustic information). To this end, we present Deep Canonical Time Warping (DCTW), a method that automatically learns non-linear representations of multiple time-series that are (i) maximally correlated in a shared subspace, and (ii) temporally aligned. Furthermore, we extend DCTW to a supervised setting, where during training, available labels can be utilised towards enhancing the alignment process. By means of experiments on four datasets, we show that the representations learnt significantly outperform state-of-the-art methods in temporal alignment, elegantly handling scenarios with heterogeneous feature sets, such as the temporal alignment of acoustic and visual information.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 17
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2018-04-04
    Description: Domain adaptation between diverse source and target domains is challenging, especially in the real-world visual recognition tasks where the images and videos consist of significant variations in viewpoints, illuminations, qualities, etc. In this paper, we propose a new approach for domain generalization and domain adaptation based on exemplar SVMs. Specifically, we decompose the source domain into many subdomains, each of which contains only one positive training sample and all negative samples. Each subdomain is relatively less diverse, and is expected to have a simpler distribution. By training one exemplar SVM for each subdomain, we obtain a set of exemplar SVMs. To further exploit the inherent structure of source domain, we introduce a nuclear-norm based regularizer into the objective function in order to enforce the exemplar SVMs to produce a low-rank output on training samples. In the prediction process, the confident exemplar SVM classifiers are selected and reweigted according to the distribution mismatch between each subdomain and the test sample in the target domain. We formulate our approach based on the logistic regression and least square SVM algorithms, which are referred to as low rank exemplar SVMs (LRE-SVMs) and low rank exemplar least square SVMs (LRE-LSSVMs), respectively. A fast algorithm is also developed for accelerating the training of LRE-LSSVMs. We further extend Domain Adaptation Machine (DAM) to learn an optimal target classifier for domain adaptation, and show that our approach can also be applied to domain adaptation with evolving target domain, where the target data distribution is gradually changing. The comprehensive experiments for object recognition and action recognition demonstrate the effectiveness of our approach for domain generalization and domain adaptation with fixed and evolving target domains.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 18
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2018-04-04
    Description: In content-based image retrieval, SIFT feature and the feature from deep convolutional neural network (CNN) have demonstrated promising performance. To fully explore both visual features in a unified framework for effective and efficient retrieval, we propose a collaborative index embedding method to implicitly integrate the index matrices of them. We formulate the index embedding as an optimization problem from the perspective of neighborhood sharing and solve it with an alternating index update scheme. After the iterative embedding, only the embedded CNN index is kept for on-line query, which demonstrates significant gain in retrieval accuracy, with very economical memory cost. Extensive experiments have been conducted on the public datasets with million-scale distractor images. The experimental results reveal that, compared with the recent state-of-the-art retrieval algorithms, our approach achieves competitive accuracy performance with less memory overhead and efficient query computation.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 19
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2018-04-04
    Description: In this paper, we propose a context-aware local binary feature learning (CA-LBFL) method for face recognition. Unlike existing learning-based local face descriptors such as discriminant face descriptor (DFD) and compact binary face descriptor (CBFD) which learn each feature code individually, our CA-LBFL exploits the contextual information of adjacent bits by constraining the number of shifts from different binary bits, so that more robust information can be exploited for face representation. Given a face image, we first extract pixel difference vectors (PDV) in local patches, and learn a discriminative mapping in an unsupervised manner to project each pixel difference vector into a context-aware binary vector. Then, we perform clustering on the learned binary codes to construct a codebook, and extract a histogram feature for each face image with the learned codebook as the final representation. In order to exploit local information from different scales, we propose a context-aware local binary multi-scale feature learning (CA-LBMFL) method to jointly learn multiple projection matrices for face representation. To make the proposed methods applicable for heterogeneous face recognition, we present a coupled CA-LBFL (C-CA-LBFL) method and a coupled CA-LBMFL (C-CA-LBMFL) method to reduce the modality gap of corresponding heterogeneous faces in the feature level, respectively. Extensive experimental results on four widely used face datasets clearly show that our methods outperform most state-of-the-art face descriptors.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 20
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2018-04-04
    Description: Single modality action recognition on RGB or depth sequences has been extensively explored recently. It is generally accepted that each of these two modalities has different strengths and limitations for the task of action recognition. Therefore, analysis of the RGB+D videos can help us to better study the complementary properties of these two types of modalities and achieve higher levels of performance. In this paper, we propose a new deep autoencoder based shared-specific feature factorization network to separate input multimodal signals into a hierarchy of components. Further, based on the structure of the features, a structured sparsity learning machine is proposed which utilizes mixed norms to apply regularization within components and group selection between them for better classification performance. Our experimental results show the effectiveness of our cross-modality feature analysis framework by achieving state-of-the-art accuracy for action classification on five challenging benchmark datasets.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 21
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2018-04-04
    Description: Recently, cross-modal search has attracted considerable attention but remains a very challenging task because of the integration complexity and heterogeneity of the multi-modal data. To address both challenges, in this paper, we propose a novel method termed hetero-manifold regularisation (HMR) to supervise the learning of hash functions for efficient cross-modal search. A hetero-manifold integrates multiple sub-manifolds defined by homogeneous data with the help of cross-modal supervision information. Taking advantages of the hetero-manifold, the similarity between each pair of heterogeneous data could be naturally measured by three order random walks on this hetero-manifold. Furthermore, a novel cumulative distance inequality defined on the hetero-manifold is introduced to avoid the computational difficulty induced by the discreteness of hash codes. By using the inequality, cross-modal hashing is transformed into a problem of hetero-manifold regularised support vector learning. Therefore, the performance of cross-modal search can be significantly improved by seamlessly combining the integrated information of the hetero-manifold and the strong generalisation of the support vector machine. Comprehensive experiments show that the proposed HMR achieve advantageous results over the state-of-the-art methods in several challenging cross-modal tasks.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 22
    Publication Date: 2018-04-04
    Description: We propose Multi-Task Learning with Low Rank Attribute Embedding (MTL-LORAE) to address the problem of person re-identification on multi-cameras. Re-identifications on different cameras are considered as related tasks, which allows the shared information among different tasks to be explored to improve the re-identification accuracy. The MTL-LORAE framework integrates low-level features with mid-level attributes as the descriptions for persons. To improve the accuracy of such description, we introduce the low-rank attribute embedding, which maps original binary attributes into a continuous space utilizing the correlative relationship between each pair of attributes. In this way, inaccurate attributes are rectified and missing attributes are recovered. The resulting objective function is constructed with an attribute embedding error and a quadratic loss concerning class labels. It is solved by an alternating optimization strategy. The proposed MTL-LORAE is tested on four datasets and is validated to outperform the existing methods with significant margins.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 23
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2018-04-04
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 24
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2018-04-04
    Description: In the early days, content-based image retrieval (CBIR) was studied with global features. Since 2003, image retrieval based on local descriptors ( de facto SIFT) has been extensively studied for over a decade due to the advantage of SIFT in dealing with image transformations. Recently, image representations based on the convolutional neural network (CNN) have attracted increasing interest in the community and demonstrated impressive performance. Given this time of rapid evolution, this article provides a comprehensive survey of instance retrieval over the last decade. Two broad categories, SIFT-based and CNN-based methods, are presented. For the former, according to the codebook size, we organize the literature into using large/medium-sized/small codebooks. For the latter, we discuss three lines of methods, i.e., using pre-trained or fine-tuned CNN models, and hybrid methods. The first two perform a single-pass of an image to the network, while the last category employs a patch-based feature extraction scheme. This survey presents milestones in modern instance retrieval, reviews a broad selection of previous works in different categories, and provides insights on the connection between SIFT and CNN-based methods. After analyzing and comparing retrieval performance of different categories on several datasets, we discuss promising directions towards generic and specialized instance retrieval.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 25
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2018-04-04
    Description: This paper proposes a method for line segment detection in digital images. We propose a novel linelet-based representation to model intrinsic properties of line segments in rasterized image space. Based on this, line segment detection, validation, and aggregation frameworks are constructed. For a numerical evaluation on real images, we propose a new benchmark dataset of real images with annotated lines called YorkUrban-LineSegment. The results show that the proposed method outperforms state-of-the-art methods numerically and visually. To our best knowledge, this is the first report of numerical evaluation of line segment detection on real images.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 26
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2018-04-04
    Description: This paper introduces a fast and efficient segmentation technique for 2D images and 3D point clouds of building facades. Facades of buildings are highly structured and consequently most methods that have been proposed for this problem aim to make use of this strong prior information. Contrary to most prior work, we are describing a system that is almost domain independent and consists of standard segmentation methods. We train a sequence of boosted decision trees using auto-context features. This is learned using stacked generalization. We find that this technique performs better, or comparable with all previous published methods and present empirical results on all available 2D and 3D facade benchmark datasets. The proposed method is simple to implement, easy to extend, and very efficient at test-time inference.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 27
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2018-03-06
    Description: Super resolving a low-resolution video, namely video super-resolution (SR), is usually handled by either single-image SR or multi-frame SR. Single-Image SR deals with each video frame independently, and ignores intrinsic temporal dependency of video frames which actually plays a very important role in video SR. Multi-Frame SR generally extracts motion information, e.g., optical flow, to model the temporal dependency, but often shows high computational cost. Considering that recurrent neural networks (RNNs) can model long-term temporal dependency of video sequences well, we propose a fully convolutional RNN named bidirectional recurrent convolutional network for efficient multi-frame SR. Different from vanilla RNNs, 1) the commonly-used full feedforward and recurrent connections are replaced with weight-sharing convolutional connections. So they can greatly reduce the large number of network parameters and well model the temporal dependency in a finer level, i.e., patch-based rather than frame-based, and 2) connections from input layers at previous timesteps to the current hidden layer are added by 3D feedforward convolutions, which aim to capture discriminate spatio-temporal patterns for short-term fast-varying motions in local adjacent frames. Due to the cheap convolutional operations, our model has a low computational complexity and runs orders of magnitude faster than other multi-frame SR methods. With the powerful temporal dependency modeling, our model can super resolve videos with complex motions and achieve well performance.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 28
    Publication Date: 2018-03-06
    Description: Human faces in surveillance videos often suffer from severe image blur, dramatic pose variations, and occlusion. In this paper, we propose a comprehensive framework based on Convolutional Neural Networks (CNN) to overcome challenges in video-based face recognition (VFR). First, to learn blur-robust face representations, we artificially blur training data composed of clear still images to account for a shortfall in real-world video training data. Using training data composed of both still images and artificially blurred data, CNN is encouraged to learn blur-insensitive features automatically. Second, to enhance robustness of CNN features to pose variations and occlusion, we propose a Trunk-Branch Ensemble CNN model (TBE-CNN), which extracts complementary information from holistic face images and patches cropped around facial components. TBE-CNN is an end-to-end model that extracts features efficiently by sharing the low- and middle-level convolutional layers between the trunk and branch networks. Third, to further promote the discriminative power of the representations learnt by TBE-CNN, we propose an improved triplet loss function. Systematic experiments justify the effectiveness of the proposed techniques. Most impressively, TBE-CNN achieves state-of-the-art performance on three popular video face databases: PaSC, COX Face, and YouTube Faces. With the proposed techniques, we also obtain the first place in the BTAS 2016 Video Person Recognition Evaluation.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 29
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2018-02-07
    Description: Direct Sparse Odometry (DSO) is a visual odometry method based on a novel, highly accurate sparse and direct structure and motion formulation. It combines a fully direct probabilistic model (minimizing a photometric error) with consistent, joint optimization of all model parameters, including geometry-represented as inverse depth in a reference frame-and camera motion. This is achieved in real time by omitting the smoothness prior used in other direct methods and instead sampling pixels evenly throughout the images. Since our method does not depend on keypoint detectors or descriptors, it can naturally sample pixels from across all image regions that have intensity gradient, including edges or smooth intensity variations on essentially featureless walls. The proposed model integrates a full photometric calibration, accounting for exposure time, lens vignetting, and non-linear response functions. We thoroughly evaluate our method on three different datasets comprising several hours of video. The experiments show that the presented approach significantly outperforms state-of-the-art direct and indirect methods in a variety of real-world settings, both in terms of tracking accuracy and robustness.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 30
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2018-02-07
    Description: Recognizing human actions from unknown and unseen (novel) views is a challenging problem. We propose a Robust Non-Linear Knowledge Transfer Model (R-NKTM) for human action recognition from novel views. The proposed R-NKTM is a deep fully-connected neural network that transfers knowledge of human actions from any unknown view to a shared high-level virtual view by finding a set of non-linear transformations that connects the views. The R-NKTM is learned from 2D projections of dense trajectories of synthetic 3D human models fitted to real motion capture data and generalizes to real videos of human actions. The strength of our technique is that we learn a single R-NKTM for all actions and all viewpoints for knowledge transfer of any real human action video without the need for re-training or fine-tuning the model. Thus, R-NKTM can efficiently scale to incorporate new action classes. R-NKTM is learned with dummy labels and does not require knowledge of the camera viewpoint at any stage. Experiments on three benchmark cross-view human action datasets show that our method outperforms existing state-of-the-art.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 31
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2018-02-07
    Description: The objective of this work is to detect shadows in images. We pose this as the problem of labeling image regions, where each region corresponds to a group of superpixels. To predict the label of each region, we train a kernel Least-Squares Support Vector Machine (LSSVM) for separating shadow and non-shadow regions. The parameters of the kernel and the classifier are jointly learned to minimize the leave-one-out cross validation error. Optimizing the leave-one-out cross validation error is typically difficult, but it can be done efficiently in our framework. Experiments on two challenging shadow datasets, UCF and UIUC, show that our region classifier outperforms more complex methods. We further enhance the performance of the region classifier by embedding it in a Markov Random Field (MRF) framework and adding pairwise contextual cues. This leads to a method that outperforms the state-of-the-art for shadow detection. In addition we propose a new method for shadow removal based on region relighting. For each shadow region we use a trained classifier to identify a neighboring lit region of the same material. Given a pair of lit-shadow regions we perform a region relighting transformation based on histogram matching of luminance values between the shadow region and the lit region. Once a shadow is detected, we demonstrate that our shadow removal approach produces results that outperform the state of the art by evaluating our method using a publicly available benchmark dataset.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 32
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2018-02-07
    Description: In this paper, we present an attribute grammar for solving two coupled tasks: i) parsing a 2D image into semantic regions; and ii) recovering the 3D scene structures of all regions. The proposed grammar consists of a set of production rules, each describing a kind of spatial relation between planar surfaces in 3D scenes. These production rules are used to decompose an input image into a hierarchical parse graph representation where each graph node indicates a planar surface or a composite surface. Different from other stochastic image grammars, the proposed grammar augments each graph node with a set of attribute variables to depict scene-level global geometry, e.g., camera focal length, or local geometry, e.g., surface normal, contact lines between surfaces. These geometric attributes impose constraints between a node and its off-springs in the parse graph. Under a probabilistic framework, we develop a Markov Chain Monte Carlo method to construct a parse graph that optimizes the 2D image recognition and 3D scene reconstruction purposes simultaneously. We evaluated our method on both public benchmarks and newly collected datasets. Experiments demonstrate that the proposed method is capable of achieving state-of-the-art scene reconstruction of a single image.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 33
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2018-02-07
    Description: Supervised topic models simultaneously model the latent topic structure of large collections of documents and a response variable associated with each document. Existing inference methods are based on variational approximation or Monte Carlo sampling, which often suffers from the local minimum defect. Spectral methods have been applied to learn unsupervised topic models, such as latent Dirichlet allocation (LDA), with provable guarantees. This paper investigates the possibility of applying spectral methods to recover the parameters of supervised LDA (sLDA). We first present a two-stage spectral method, which recovers the parameters of LDA followed by a power update method to recover the regression model parameters. Then, we further present a single-phase spectral algorithm to jointly recover the topic distribution matrix as well as the regression weights. Our spectral algorithms are provably correct and computationally efficient. We prove a sample complexity bound for each algorithm and subsequently derive a sufficient condition for the identifiability of sLDA. Thorough experiments on synthetic and real-world datasets verify the theory and demonstrate the practical effectiveness of the spectral algorithms. In fact, our results on a large-scale review rating dataset demonstrate that our single-phase spectral algorithm alone gets comparable or even better performance than state-of-the-art methods, while previous work on spectral methods has rarely reported such promising performance.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 34
    Publication Date: 2018-02-07
    Description: Online multi-object tracking aims at estimating the tracks of multiple objects instantly with each incoming frame and the information provided up to the moment. It still remains a difficult problem in complex scenes, because of the large ambiguity in associating multiple objects in consecutive frames and the low discriminability between objects appearances. In this paper, we propose a robust online multi-object tracking method that can handle these difficulties effectively. We first define the tracklet confidence using the detectability and continuity of a tracklet, and decompose a multi-object tracking problem into small subproblems based on the tracklet confidence. We then solve the online multi-object tracking problem by associating tracklets and detections in different ways according to their confidence values. Based on this strategy, tracklets sequentially grow with online-provided detections, and fragmented tracklets are linked up with others without any iterative and expensive association steps. For more reliable association between tracklets and detections, we also propose a deep appearance learning method to learn a discriminative appearance model from large training datasets, since the conventional appearance learning methods do not provide rich representation that can distinguish multiple objects with large appearance variations. In addition, we combine online transfer learning for improving appearance discriminability by adapting the pre-trained deep model during online tracking. Experiments with challenging public datasets show distinct performance improvement over other state-of-the-arts batch and online tracking methods, and prove the effect and usefulness of the proposed methods for online multi-object tracking.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 35
    Publication Date: 2018-02-07
    Description: Active learning is an effective way of engaging users to interactively train models for visual recognition more efficiently. The vast majority of previous works focused on active learning with a single human oracle. The problem of active learning with multiple oracles in a collaborative setting has not been well explored. We present a collaborative computational model for active learning with multiple human oracles, the input from whom may possess different levels of noises. It leads to not only an ensemble kernel machine that is robust to label noises, but also a principled label quality measure to online detect irresponsible labelers. Instead of running independent active learning processes for each individual human oracle, our model captures the inherent correlations among the labelers through shared data among them. Our experiments with both simulated and real crowd-sourced noisy labels demonstrate the efficacy of our model.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 36
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2018-02-07
    Description: Max-trees, or component trees, are graph structures that represent the connected components of an image in a hierarchical way. Nowadays, many application fields rely on images with high-dynamic range or floating point values. Efficient sequential algorithms exist to build trees and compute attributes for images of any bit depth. However, we show that the current parallel algorithms perform poorly already with integers at bit depths higher than 16 bits per pixel. We propose a parallel method combining the two worlds of flooding and merging max-tree algorithms. First, a pilot max-tree of a quantized version of the image is built in parallel using a flooding method. Later, this structure is used in a parallel leaf-to-root approach to compute efficiently the final max-tree and to drive the merging of the sub-trees computed by the threads. We present an analysis of the performance both on simulated and actual 2D images and 3D volumes. Execution times are about $20times$ better than the fastest sequential algorithm and speed-up goes up to $30-40$ on 64 threads.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 37
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2018-02-07
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 38
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2018-02-07
    Description: Accompanied with the rising popularity of compressed sensing, the Alternating Direction Method of Multipliers (ADMM) has become the most widely used solver for linearly constrained convex problems with separable objectives. In this work, we observe that many existing ADMMs update the primal variable by minimizing different majorant functions with their convergence proofs given case by case. Inspired by the principle of majorization minimization, we respectively present the unified frameworks of Gauss-Seidel ADMMs and Jacobian ADMMs, which use different historical information for the current updating. Our frameworks generalize previous ADMMs to solve the problems with non-separable objectives. We also show that ADMMs converge faster when the used majorant function is tighter. We then propose the Mixed Gauss-Seidel and Jacobian ADMM (M-ADMM) which alleviates the slow convergence issue of Jacobian ADMMs by absorbing merits of the Gauss-Seidel ADMMs. M-ADMM can be further improved by backtracking and wise variable partition. We also propose to solve the multi-blocks problems by Proximal Gauss-Seidel ADMM which is of the Gauss-Seidel type. It convegences for non-strongly convex objective. Experiments on both synthesized and real-world data demonstrate the superiority of our new ADMMs. Finally, we release a toolbox that implements efficient ADMMs for many problems in compressed sensing.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 39
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2018-02-07
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 40
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2018-02-07
    Description: We propose a novel approach to generate a binary descriptor optimized for each image patch independently. The approach is inspired by the linear discriminant embedding that simultaneously increases inter and decreases intra class distances. A set of discriminative and uncorrelated binary tests is established from all possible tests in an offline training process. The patch adapted descriptors are then efficiently built online from a subset of features which lead to lower intra-class distances and thus, to a more robust descriptor. We perform experiments on three widely used benchmarks and demonstrate improvements in matching performance, and illustrate that per-patch optimization outperforms global optimization.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 41
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2018-02-07
    Description: The objective of this paper is to design an embedding method that maps local features describing an image (e.g., SIFT) to a higher dimensional representation useful for the image retrieval problem. First, motivated by the relationship between the linear approximation of a nonlinear function in high dimensional space and the state-of-the-art feature representation used in image retrieval, i.e., VLAD, we propose a new approach for the approximation. The embedded vectors resulted by the function approximation process are then aggregated to form a single representation for image retrieval. Second, in order to make the proposed embedding method applicable to large scale problem, we further derive its fast version in which the embedded vectors can be efficiently computed, i.e., in the closed-form. We compare the proposed embedding methods with the state of the art in the context of image search under various settings: when the images are represented by medium length vectors, short vectors, or binary vectors. The experimental results show that the proposed embedding methods outperform existing the state of the art on the standard public image retrieval benchmarks.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 42
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2018-02-07
    Description: Median filtering is among the most utilized tools for smoothing real-valued data, as it is robust, edge-preserving, value-preserving, and yet can be computed efficiently. For data living on the unit circle, such as phase data or orientation data, a filter with similar properties is desirable. For these data, there is no unique means to define a median; so we discuss various possibilities. The arc distance median turns out to be the only variant which leads to robust, edge-preserving and value-preserving smoothing. However, there are no efficient algorithms for filtering based on the arc distance median. Here, we propose fast algorithms for filtering of signals and images with values on the unit circle based on the arc distance median. For non-quantized data, we develop an algorithm that scales linearly with the filter size. The runtime of our reference implementation is only moderately higher than the Matlab implementation of the classical median filter for real-valued data. For quantized data, we obtain an algorithm of constant complexity w.r.t. the filter size. We demonstrate the performance of our algorithms for real life data sets: phase images from interferometric synthetic aperture radar, planar flow fields from optical flow, and time series of wind directions.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 43
    Publication Date: 2018-02-07
    Description: Multi-object tracking has been studied for decades. However, when it comes to tracking pedestrians in extremely crowded scenes, we are limited to only few works. This is an important problem which gives rise to several challenges. Pre-trained object detectors fail to localize targets in crowded sequences. This consequently limits the use of data-association based multi-target tracking methods which rely on the outcome of an object detector. Additionally, the small apparent target size makes it challenging to extract features to discriminate targets from their surroundings. Finally, the large number of targets greatly increases computational complexity which in turn makes it hard to extend existing multi-target tracking approaches to high-density crowd scenarios. In this paper, we propose a tracker that addresses the aforementioned problems and is capable of tracking hundreds of people efficiently. We formulate online crowd tracking as Binary Quadratic Programing. Our formulation employs target's individual information in the form of appearance and motion as well as contextual cues in the form of neighborhood motion, spatial proximity and grouping, and solves detection and data association simultaneously. In order to solve the proposed quadratic optimization efficiently, where state-of art commercial quadratic programing solvers fail to find the solution in a reasonable amount of time, we propose to use the most recent version of the Modified Frank Wolfe algorithm, which takes advantage of SWAP-steps to speed up the optimization. We show that the proposed formulation can track hundreds of targets efficiently and improves state-of-art results by significant margins on eleven challenging high density crowd sequences.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 44
    Publication Date: 2018-02-07
    Description: Superpixels are perceptually meaningful atomic regions that can effectively capture image features. Among various methods for computing uniform superpixels, simple linear iterative clustering (SLIC) is popular due to its simplicity and high performance. In this paper, we extend SLIC to compute content-sensitive superpixels, i.e., small superpixels in content-dense regions with high intensity or colour variation and large superpixels in content-sparse regions. Rather than using the conventional SLIC method that clusters pixels in $mathbb {R}^5$ , we map the input image $I$ to a 2-dimensional manifold $mathcal {M}subset mathbb {R}^5$ , whose area elements are a good measure of the content density in $I$ . We propose a simple method, called intrinsic manifold SLIC (IMSLIC), for computing a geodesic centroidal Voronoi tessellation (GCVT)—a uniform tessellation—on $mathcal {M}$ , which induces the content-sensitive superpixels in $I$ . In contrast to the existing algorithms, IMSLIC characterizes the content sensitivity b- measuring areas of Voronoi cells on $mathcal {M}$ . Using a simple and fast approximation to a closed-form solution, the method can compute the GCVT at a very low cost and guarantees that all Voronoi cells are simply connected. We thoroughly evaluate IMSLIC and compare it with eleven representative methods on the BSDS500 dataset and seven representative methods on the NYUV2 dataset. Computational results show that IMSLIC outperforms existing methods in terms of commonly used quality measures pertaining to superpixels such as compactness, adherence to boundaries, and achievable segmentation accuracy. We also evaluate IMSLIC and seven representative methods in an image contour closure application, and the results on two datasets, WHD and WSD, show that IMSLIC achieves the best foreground segmentation performance.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 45
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2018-02-07
    Description: Light-field cameras have recently emerged as a powerful tool for one-shot passive 3D shape capture. However, obtaining the shape of glossy objects like metals or plastics remains challenging, since standard Lambertian cues like photo-consistency cannot be easily applied. In this paper, we derive a spatially-varying (SV)BRDF-invariant theory for recovering 3D shape and reflectance from light-field cameras. Our key theoretical insight is a novel analysis of diffuse plus single-lobe SVBRDFs under a light-field setup. We show that, although direct shape recovery is not possible, an equation relating depths and normals can still be derived. Using this equation, we then propose using a polynomial (quadratic) shape prior to resolve the shape ambiguity. Once shape is estimated, we also recover the reflectance. We present extensive synthetic data on the entire MERL BRDF dataset, as well as a number of real examples to validate the theory, where we simultaneously recover shape and BRDFs from a single image taken with a Lytro Illum camera.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 46
    Publication Date: 2018-02-07
    Description: We propose a novel approach to reconstructing curvilinear tree structures evolving over time, such as road networks in 2D aerial images or neural structures in 3D microscopy stacks acquired in vivo . To enforce temporal consistency, we simultaneously process all images in a sequence, as opposed to reconstructing structures of interest in each image independently. We formulate the problem as a Quadratic Mixed Integer Program and demonstrate the additional robustness that comes from using all available visual clues at once, instead of working frame by frame. Furthermore, when the linear structures undergo local changes over time, our approach automatically detects them.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 47
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2018-02-07
    Description: It is often desirable to be able to recognize when inputs to a recognition function learned in a supervised manner correspond to classes unseen at training time. With this ability, new class labels could be assigned to these inputs by a human operator, allowing them to be incorporated into the recognition function—ideally under an efficient incremental update mechanism. While good algorithms that assume inputs from a fixed set of classes exist, e.g. , artificial neural networks and kernel machines, it is not immediately obvious how to extend them to perform incremental learning in the presence of unknown query classes. Existing algorithms take little to no distributional information into account when learning recognition functions and lack a strong theoretical foundation. We address this gap by formulating a novel, theoretically sound classifier—the Extreme Value Machine (EVM). The EVM has a well-grounded interpretation derived from statistical Extreme Value Theory (EVT), and is the first classifier to be able to perform nonlinear kernel-free variable bandwidth incremental learning. Compared to other classifiers in the same deep network derived feature space, the EVM is accurate and efficient on an established benchmark partition of the ImageNet dataset.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 48
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2018-02-07
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 49
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2018-02-07
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 50
    Publication Date: 2018-02-07
    Description: Video text extraction plays an important role for multimedia understanding and retrieval. Most previous research efforts are conducted within individual frames. A few of recent methods, which pay attention to text tracking using multiple frames, however, do not effectively mine the relations among text detection, tracking and recognition. In this paper, we propose a generic Bayesian-based framework of Tracking based Text Detection And Recognition (T $^2$ DAR) from web videos for embedded captions, which is composed of three major components, i.e., text tracking, tracking based text detection, and tracking based text recognition. In this unified framework, text tracking is first conducted by tracking-by-detection. Tracking trajectories are then revised and refined with detection or recognition results. Text detection or recognition is finally improved with multi-frame integration. Moreover, a challenging video text (embedded caption text) database (USTB-VidTEXT) is constructed and publicly available. A variety of experiments on this dataset verify that our proposed approach largely improves the performance of text detection and recognition from web videos.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 51
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2018-02-07
    Description: We propose a novel minimal path method for the segmentation of 2D and 3D line structures. Minimal path methods perform propagation of a wavefront emanating from a start point at a speed derived from image features, followed by path extraction using backtracing. Usually, the computation of the speed and the propagation of the wave are two separate steps, and point features are used to compute a static speed. We introduce a new continuous minimal path method which steers the wave propagation progressively using dynamic speed based on path features . We present three instances of our method, using an appearance feature of the path, a geometric feature based on the curvature of the path, and a joint appearance and geometric feature based on the tangent of the wavefront. These features have not been used in previous continuous minimal path methods. We compute the features dynamically during the wave propagation, and also efficiently using a fast numerical scheme and a low-dimensional parameter space. Our method does not suffer from discretization or metrication errors. We performed qualitative and quantitative evaluations using 2D and 3D images from different application areas.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 52
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2018-01-10
    Description: Presents the table of contents for this issue of the publication.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 53
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2018-01-10
    Description: Presents a listing of the editorial board, board of governors, current staff, committee members, and/or society editors for this issue of the publication.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 54
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2018-01-10
    Description: Given a large collection of unlabeled face images, we address the problem of clustering faces into an unknown number of identities. This problem is of interest in social media, law enforcement, and other applications, where the number of faces can be of the order of hundreds of million, while the number of identities (clusters) can range from a few thousand to millions. To address the challenges of run-time complexity and cluster quality, we present an approximate Rank-Order clustering algorithm that performs better than popular clustering algorithms (k-Means and Spectral). Our experiments include clustering up to 123 million face images into over 10 million clusters. Clustering results are analyzed in terms of external (known face labels) and internal (unknown face labels) quality measures, and run-time. Our algorithm achieves an F-measure of 0.87 on the LFW benchmark (13 K faces of 5,749 individuals), which drops to 0.27 on the largest dataset considered (13 K faces in LFW + 123M distractor images). Additionally, we show that frames in the YouTube benchmark can be clustered with an F-measure of 0.71. An internal per-cluster quality measure is developed to rank individual clusters for manual exploration of high quality clusters that are compact and isolated.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 55
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2018-01-10
    Description: Deep unfolding provides an approach to integrate the probabilistic generative models and the deterministic neural networks. Such an approach is benefited by deep representation, easy interpretation, flexible learning and stochastic modeling. This study develops the unsupervised and supervised learning of deep unfolded topic models for document representation and classification. Conventionally, the unsupervised and supervised topic models are inferred via the variational inference algorithm where the model parameters are estimated by maximizing the lower bound of logarithm of marginal likelihood using input documents without and with class labels, respectively. The representation capability or classification accuracy is constrained by the variational lower bound and the tied model parameters across inference procedure. This paper aims to relax these constraints by directly maximizing the end performance criterion and continuously untying the parameters in learning process via deep unfolding inference (DUI). The inference procedure is treated as the layer-wise learning in a deep neural network. The end performance is iteratively improved by using the estimated topic parameters according to the exponentiated updates. Deep learning of topic models is therefore implemented through a back-propagation procedure. Experimental results show the merits of DUI with increasing number of layers compared with variational inference in unsupervised as well as supervised topic models.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 56
    Publication Date: 2018-01-10
    Description: Biometrics is the technique of automatically recognizing individuals based on their biological or behavioral characteristics. Various biometric traits have been introduced and widely investigated, including fingerprint, iris, face, voice, palmprint, gait and so forth. Apart from identity, biometric data may convey various other personal information, covering affect, age, gender, race, accent, handedness, height, weight, etc. Among these, analysis of demographics (age, gender, and race) has received tremendous attention owing to its wide real-world applications, with significant efforts devoted and great progress achieved. This survey first presents biometric demographic analysis from the standpoint of human perception, then provides a comprehensive overview of state-of-the-art advances in automated estimation from both academia and industry. Despite these advances, a number of challenging issues continue to inhibit its full potential. We second discuss these open problems, and finally provide an outlook into the future of this very active field of research by sharing some promising opportunities.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 57
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2018-01-10
    Description: In this paper, we investigate and exploit the influence of facial expressions on automatic age estimation. Different from existing approaches, our method jointly learns the age and expression by introducing a new graphical model with a latent layer between the age/expression labels and the features. This layer aims to learn the relationship between the age and expression and captures the face changes which induce the aging and expression appearance, and thus obtaining expression-invariant age estimation. Conducted on three age-expression datasets (FACES [1] , Lifespan [2] and NEMO [3] ), our experiments illustrate the improvement in performance when the age is jointly learnt with expression in comparison to expression-independent age estimation. The age estimation error is reduced by 14.43, 37.75 and 9.30 percent for the FACES, Lifespan and NEMO datasets respectively. The results obtained by our graphical model, without prior-knowledge of the expressions of the tested faces, are better than the best reported ones for all datasets. The flexibility of the proposed model to include more cues is explored by incorporating gender together with age and expression. The results show performance improvements for all cues.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 58
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2018-03-06
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 59
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2018-03-06
    Description: Nearest neighbor search is a problem of finding the data points from the database such that the distances from them to the query point are the smallest. Learning to hash is one of the major solutions to this problem and has been widely studied recently. In this paper, we present a comprehensive survey of the learning to hash algorithms, categorize them according to the manners of preserving the similarities into: pairwise similarity preserving, multiwise similarity preserving, implicit similarity preserving, as well as quantization, and discuss their relations. We separate quantization from pairwise similarity preserving as the objective function is very different though quantization, as we show, can be derived from preserving the pairwise similarities. In addition, we present the evaluation protocols, and the general performance analysis, and point out that the quantization algorithms perform superiorly in terms of search accuracy, search time cost, and space cost. Finally, we introduce a few emerging topics.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 60
    Publication Date: 2018-03-06
    Description: Graph matching is essential in several fields that use structured information, such as biology, chemistry, social networks, knowledge management, document analysis and others. Except for special classes of graphs, graph matching has in the worst-case an exponential complexity; however, there are algorithms that show an acceptable execution time, as long as the graphs are not too large and not too dense. In this paper we introduce a novel subgraph isomorphism algorithm, VF3, particularly efficient in the challenging case of graphs with thousands of nodes and a high edge density. Its performance, both in terms of time and memory, has been assessed on a large dataset of 12,700 random graphs with a size up to 10,000 nodes, made publicly available. VF3 has been compared with four other state-of-the-art algorithms, and the huge experimentation required more than two years of processing time. The results confirm that VF3 definitely outperforms the other algorithms when the graphs become huge and dense, but also has a very good performance on smaller or sparser graphs.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 61
    Publication Date: 2018-03-06
    Description: Pose variation remains to be a major challenge for real-world face recognition. We approach this problem through a probabilistic elastic part model. We extract local descriptors (e.g., LBP or SIFT) from densely sampled multi-scale image patches. By augmenting each descriptor with its location, a Gaussian mixture model (GMM) is trained to capture the spatial-appearance distribution of the face parts of all face images in the training corpus, namely the probabilistic elastic part (PEP) model. Each mixture component of the GMM is confined to be a spherical Gaussian to balance the influence of the appearance and the location terms, which naturally defines a part. Given one or multiple face images of the same subject, the PEP-model builds its PEP representation by sequentially concatenating descriptors identified by each Gaussian component in a maximum likelihood sense. We further propose a joint Bayesian adaptation algorithm to adapt the universally trained GMM to better model the pose variations between the target pair of faces/face tracks, which consistently improves face verification accuracy. Our experiments show that we achieve state-of-the-art face verification accuracy with the proposed representations on the Labeled Face in the Wild (LFW) dataset, the YouTube video face database, and the CMU MultiPIE dataset.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 62
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2018-03-06
    Description: We present Convolutional Oriented Boundaries (COB), which produces multiscale oriented contours and region hierarchies starting from generic image classification Convolutional Neural Networks (CNNs). COB is computationally efficient, because it requires a single CNN forward pass for multi-scale contour detection and it uses a novel sparse boundary representation for hierarchical segmentation; it gives a significant leap in performance over the state-of-the-art, and it generalizes very well to unseen categories and datasets. Particularly, we show that learning to estimate not only contour strength but also orientation provides more accurate results. We perform extensive experiments for low-level applications on BSDS, PASCAL Context, PASCAL Segmentation, and NYUD to evaluate boundary detection performance, showing that COB provides state-of-the-art contours and region hierarchies in all datasets. We also evaluate COB on high-level tasks when coupled with multiple pipelines for object proposals, semantic contours, semantic segmentation, and object detection on MS-COCO, SBD, and PASCAL; showing that COB also improves the results for all tasks.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 63
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2018-03-06
    Description: Inferring a probability density function (pdf) for shape from a population of point sets is a challenging problem. The lack of point-to-point correspondences and the non-linearity of the shape spaces undermine the linear models. Methods based on manifolds model the shape variations naturally, however, statistics are often limited to a single geodesic mean and an arbitrary number of variation modes. We relax the manifold assumption and consider a piece-wise linear form, implementing a mixture of distinctive shape classes. The pdf for point sets is defined hierarchically, modeling a mixture of Probabilistic Principal Component Analyzers (PPCA) in higher dimension. A Variational Bayesian approach is designed for unsupervised learning of the posteriors of point set labels, local variation modes, and point correspondences. By maximizing the model evidence, the numbers of clusters, modes of variations, and points on the mean models are automatically selected. Using the predictive distribution, we project a test shape to the spaces spanned by the local PPCA's. The method is applied to point sets from: i) synthetic data, ii) healthy versus pathological heart morphologies, and iii) lumbar vertebrae. The proposed method selects models with expected numbers of clusters and variation modes, achieving lower generalization-specificity errors compared to state-of-the-art.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 64
    Publication Date: 2018-03-06
    Description: We propose an approach for retrieving a sequence of natural sentences for an image stream. Since general users often take a series of pictures on their experiences, much online visual information exists in the form of image streams, for which it would better take into consideration of the whole image stream to produce natural language descriptions. While almost all previous studies have dealt with the relation between a single image and a single natural sentence, our work extends both input and output dimension to a sequence of images and a sequence of sentences. For retrieving a coherent flow of multiple sentences for a photo stream, we propose a multimodal neural architecture called coherence recurrent convolutional network (CRCN), which consists of convolutional neural networks, bidirectional long short-term memory (LSTM) networks, and an entity-based local coherence model. Our approach directly learns from vast user-generated resource of blog posts as text-image parallel training data. We collect more than 22 K unique blog posts with 170 K associated images for the travel topics of NYC , Disneyland , Australia , and Hawaii . We demonstrate that our approach outperforms other state-of-the-art image captioning methods for text sequence generation, using both quantitative measures and user studies via Amazon Mechanical Turk.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 65
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2018-03-06
    Description: sGLOH (shifting GLOH) is a histogram-based keypoint descriptor that can be associated to multiple quantized rotations of the keypoint patch without any recomputation. This property can be exploited to define the best distance between two descriptor vectors, thus avoiding computing the dominant orientation. In addition, sGLOH can reject incongruous correspondences by adding a global constraint on the rotations either as an a priori knowledge or based on the data. This paper thoroughly reconsiders sGLOH and improves it in terms of robustness, speed and descriptor dimension. The revised sGLOH embeds more quantized rotations, thus yielding more correct matches. A novel fast matching scheme is also designed, which significantly reduces both computation time and memory usage. In addition, a new binarization technique based on comparisons inside each descriptor histogram is defined, yielding a more compact, faster, yet robust alternative. Results on an exhaustive comparative experimental evaluation show that the revised sGLOH descriptor incorporating the above ideas and combining them according to task requirements, improves in most cases the state of the art in both image matching and object recognition.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 66
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2018-03-06
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 67
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2018-03-06
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 68
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2018-03-06
    Description: We propose a new method to add an uncalibrated node into a network of calibrated cameras using only pairwise point correspondences. While previous methods perform this task using triple correspondences, these are often difficult to establish when there is limited overlap between different views. In such challenging cases we must rely on pairwise correspondences and our solution becomes more advantageous. Our method includes an 11-point minimal solution for the intrinsic and extrinsic calibration of a camera from pairwise correspondences with other two calibrated cameras, and a new inlier selection framework that extends the traditional RANSAC family of algorithms to sampling across multiple datasets. Our method is validated on different application scenarios where a lack of triple correspondences might occur: addition of a new node to a camera network; calibration and motion estimation of a moving camera inside a camera network; and addition of views with limited overlap to a Structure-from-Motion model.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 69
    Publication Date: 2018-03-06
    Description: In this work we address the task of semantic image segmentation with Deep Learning and make three main contributions that are experimentally shown to have substantial practical merit. First , we highlight convolution with upsampled filters, or ‘atrous convolution’, as a powerful tool in dense prediction tasks. Atrous convolution allows us to explicitly control the resolution at which feature responses are computed within Deep Convolutional Neural Networks. It also allows us to effectively enlarge the field of view of filters to incorporate larger context without increasing the number of parameters or the amount of computation. Second , we propose atrous spatial pyramid pooling (ASPP) to robustly segment objects at multiple scales. ASPP probes an incoming convolutional feature layer with filters at multiple sampling rates and effective fields-of-views, thus capturing objects as well as image context at multiple scales. Third , we improve the localization of object boundaries by combining methods from DCNNs and probabilistic graphical models. The commonly deployed combination of max-pooling and downsampling in DCNNs achieves invariance but has a toll on localization accuracy. We overcome this by combining the responses at the final DCNN layer with a fully connected Conditional Random Field (CRF), which is shown both qualitatively and quantitatively to improve localization performance. Our proposed “DeepLab” system sets the new state-of-art at the PASCAL VOC-2012 semantic image segmentation task, reaching 79.7 percent mIOU in the test set, and advances the results on three other datasets: PASCAL-Context, PASCAL-Person-Part, and Cityscapes. All of our code is made publicly available online.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 70
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2018-03-06
    Description: Age progression is defined as aesthetically re-rendering the aging face at any future age for an individual face. In this work, we aim to automatically render aging faces in a personalized way. Basically, for each age group, we learn an aging dictionary to reveal its aging characteristics (e.g., wrinkles), where the dictionary bases corresponding to the same index yet from two neighboring aging dictionaries form a particular aging pattern cross these two age groups, and a linear combination of all these patterns expresses a particular personalized aging process. Moreover, two factors are taken into consideration in the dictionary learning process. First, beyond the aging dictionaries, each person may have extra personalized facial characteristics, e.g., mole, which are invariant in the aging process. Second, it is challenging or even impossible to collect faces of all age groups for a particular person, yet much easier and more practical to get face pairs from neighboring age groups. To this end, we propose a novel Bi-level Dictionary Learning based Personalized Age Progression (BDL-PAP) method. Here, bi-level dictionary learning is formulated to learn the aging dictionaries based on face pairs from neighboring age groups. Extensive experiments well demonstrate the advantages of the proposed BDL-PAP over other state-of-the-arts in term of personalized age progression, as well as the performance gain for cross-age face verification by synthesizing aging faces.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 71
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2018-03-06
    Description: Face alignment acts as an important task in computer vision. Regression-based methods currently dominate the approach to solving this problem, which generally employ a series of mapping functions from the face appearance to iteratively update the face shape hypothesis. One keypoint here is thus how to perform the regression procedure. In this work, we formulate this regression procedure as a sparse coding problem. We learn two relational dictionaries, one for the face appearance and the other one for the face shape, with coupled reconstruction coefficient to capture their underlying relationships. To deploy this model for face alignment, we derive the relational dictionaries in a stage-wised manner to perform close-loop refinement of themselves, i.e., the face appearance dictionary is first learned from the face shape dictionary and then used to update the face shape hypothesis, and the updated face shape dictionary from the shape hypothesis is in return used to refine the face appearance dictionary. To improve the model accuracy, we extend this model hierarchically from the whole face shape to face part shapes, thus both the global and local view variations of a face are captured. To locate facial landmarks under occlusions, we further introduce an occlusion dictionary into the face appearance dictionary to recover face shape from partially occluded face appearance. The occlusion dictionary is learned in a data driven manner from background images to represent a set of elemental occlusion patterns, a sparse combination of which models various practical partial face occlusions. By integrating all these technical innovations, we obtain a robust and accurate approach to locate facial landmarks under different face views and possibly severe occlusions for face images in the wild. Extensive experimental analyses and evaluations on different benchmark datasets, as well as two new datasets built by ourselves, have demonstrated the robustness and accuracy of our proposed - odel, especially for face images with large view variations and/or severe occlusions.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 72
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2018-03-06
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 73
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2018-03-06
    Description: Recent deep learning based approaches have achieved great success on handwriting recognition. Chinese characters are among the most widely adopted writing systems in the world. Previous research has mainly focused on recognizing handwritten Chinese characters. However, recognition is only one aspect for understanding a language, another challenging and interesting task is to teach a machine to automatically write (pictographic) Chinese characters. In this paper, we propose a framework by using the recurrent neural network (RNN) as both a discriminative model for recognizing Chinese characters and a generative model for drawing (generating) Chinese characters. To recognize Chinese characters, previous methods usually adopt the convolutional neural network (CNN) models which require transforming the online handwriting trajectory into image-like representations. Instead, our RNN based approach is an end-to-end system which directly deals with the sequential structure and does not require any domain-specific knowledge. With the RNN system (combining an LSTM and GRU), state-of-the-art performance can be achieved on the ICDAR-2013 competition database. Furthermore, under the RNN framework, a conditional generative model with character embedding is proposed for automatically drawing recognizable Chinese characters. The generated characters (in vector format) are human-readable and also can be recognized by the discriminative RNN model with high accuracy. Experimental results verify the effectiveness of using RNNs as both generative and discriminative models for the tasks of drawing and recognizing Chinese characters.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 74
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2018-03-06
    Description: To describe trans-dimensional observations in sample spaces of different dimensions, we propose a probabilistic model, called the trans-dimensional random field (TRF) by explicitly mixing a collection of random fields. In the framework of stochastic approximation (SA), we develop an effective training algorithm, called augmented SA, which jointly estimates the model parameters and normalizing constants while using trans-dimensional mixture sampling to generate observations of different dimensions. Furthermore, we introduce several statistical and computational techniques to improve the convergence of the training algorithm and reduce computational cost, which together enable us to successfully train TRF models on large datasets. The new model and training algorithm are thoroughly evaluated in a number of experiments. The word morphology experiment provides a benchmark test to study the convergence of the training algorithm and to compare with other algorithms, because log-likelihoods and gradients can be exactly calculated in this experiment. For language modeling, our experiments demonstrate the superiority of the TRF approach in being computationally more efficient in computing data probabilities by avoiding local normalization and being able to flexibly integrate a richer set of features, when compared with n-gram models and neural network models.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 75
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2018-01-10
    Description: Rank minimization can be converted into tractable surrogate problems, such as Nuclear Norm Minimization (NNM) and Weighted NNM (WNNM). The problems related to NNM, or WNNM, can be solved iteratively by applying a closed-form proximal operator, called Singular Value Thresholding (SVT), or Weighted SVT, but they suffer from high computational cost of Singular Value Decomposition (SVD) at each iteration. We propose a fast and accurate approximation method for SVT, that we call fast randomized SVT (FRSVT), with which we avoid direct computation of SVD. The key idea is to extract an approximate basis for the range of the matrix from its compressed matrix. Given the basis, we compute partial singular values of the original matrix from the small factored matrix. In addition, by developping a range propagation method, our method further speeds up the extraction of approximate basis at each iteration. Our theoretical analysis shows the relationship between the approximation bound of SVD and its effect to NNM via SVT. Along with the analysis, our empirical results quantitatively and qualitatively show that our approximation rarely harms the convergence of the host algorithms. We assess the efficiency and accuracy of the proposed method on various computer vision problems, e.g., subspace clustering, weather artifact removal, and simultaneous multi-image alignment and rectification.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 76
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2018-01-10
    Description: We address the problem of large-scale visual place recognition for situations where the scene undergoes a major change in appearance, for example, due to illumination (day/night), change of seasons, aging, or structural modifications over time such as buildings being built or destroyed. Such situations represent a major challenge for current large-scale place recognition methods. This work has the following three principal contributions. First, we demonstrate that matching across large changes in the scene appearance becomes much easier when both the query image and the database image depict the scene from approximately the same viewpoint. Second, based on this observation, we develop a new place recognition approach that combines (i) an efficient synthesis of novel views with (ii) a compact indexable image representation. Third, we introduce a new challenging dataset of 1,125 camera-phone query images of Tokyo that contain major changes in illumination (day, sunset, night) as well as structural changes in the scene. We demonstrate that the proposed approach significantly outperforms other large-scale place recognition techniques on this challenging data.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 77
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2018-01-10
    Description: The challenge of person re-identification (re-id) is to match individual images of the same person captured by different non-overlapping camera views against significant and unknown cross-view feature distortion. While a large number of distance metric/subspace learning models have been developed for re-id, the cross-view transformations they learned are view-generic and thus potentially less effective in quantifying the feature distortion inherent to each camera view. Learning view-specific feature transformations for re-id (i.e., view-specific re-id), an under-studied approach, becomes an alternative resort for this problem. In this work, we formulate a novel view-specific person re-identification framework from the feature augmentation point of view, called C amera co R relation A ware F eature augmen T ation (CRAFT). Specifically, CRAFT performs cross-view adaptation by automatically measuring camera correlation from cross-view visual data distribution and adaptively conducting feature augmentation to transform the original features into a new adaptive space. Through our augmentation framework, view-generic learning algorithms can be readily generalized to learn and optimize view-specific sub-models whilst simultaneously modelling view-generic discrimination information. Therefore, our framework not only inherits the strength of view-generic model learning but also provides an effective way to take into account view specific characteristics. Our CRAFT framework can be extended to jointly learn view-specific feature transformations for person re-id across a large network with more than two cameras, a largely under-investigated but realistic re-id setting. Additionally, we present a domain-generic deep person appearance representation which is designed particularly to be towards view invariant for facilitating cross-view adaptation by CRAFT. We conducted extensively comparative experiments to validate the- superiority and advantages of our proposed framework over state-of-the-art competitors on contemporary challenging person re-id datasets.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 78
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2018-01-10
    Description: In this paper we introduce the Boosted Random Ferns (BRFs) to rapidly build discriminative classifiers for learning and detecting object categories. At the core of our approach we use standard random ferns, but we introduce four main innovations that let us bring ferns from an instance to a category level, and still retain efficiency. First, we define binary features on the histogram of oriented gradients-domain (as opposed to intensity-), allowing for a better representation of intra-class variability. Second, both the positions where ferns are evaluated within the sliding window, and the location of the binary features for each fern are not chosen completely at random, but instead we use a boosting strategy to pick the most discriminative combination of them. This is further enhanced by our third contribution, that is to adapt the boosting strategy to enable sharing of binary features among different ferns, yielding high recognition rates at a low computational cost. And finally, we show that training can be performed online, for sequentially arriving images. Overall, the resulting classifier can be very efficiently trained, densely evaluated for all image locations in about 0.1 seconds, and provides detection rates similar to competing approaches that require expensive and significantly slower processing times. We demonstrate the effectiveness of our approach by thorough experimentation in publicly available datasets in which we compare against state-of-the-art, and for tasks of both 2D detection and 3D multi-view estimation.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 79
    Publication Date: 2018-01-10
    Description: The analysis of thin curvilinear objects in 3D images is a complex and challenging task. In this article, we introduce a new, non-linear operator, called RORPO (Ranking the Orientation Responses of Path Operators). Inspired by the multidirectional paradigm currently used in linear filtering for thin structure analysis, RORPO is built upon the notion of path operator from mathematical morphology. This operator, unlike most operators commonly used for 3D curvilinear structure analysis, is discrete, non-linear and non-local. From this new operator, two main curvilinear structure characteristics can be estimated: an intensity feature, that can be assimilated to a quantitative measure of curvilinearity; and a directional feature, providing a quantitative measure of the structure's orientation. We provide a full description of the structural and algorithmic details for computing these two features from RORPO, and we discuss computational issues. We experimentally assess RORPO by comparison with three of the most popular curvilinear structure analysis filters, namely Frangi Vesselness, Optimally Oriented Flux, and Hybrid Diffusion with Continuous Switch. In particular, we show that our method provides up to 8 percent more true positive and 50 percent less false positives than the next best method, on synthetic and real 3D images.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 80
    Publication Date: 2018-01-10
    Description: This paper presents a simple yet effective supervised deep hash approach that constructs binary hash codes from labeled data for large-scale image search. We assume that the semantic labels are governed by several latent attributes with each attribute on or off , and classification relies on these attributes. Based on this assumption, our approach, dubbed supervised semantics-preserving deep hashing (SSDH), constructs hash functions as a latent layer in a deep network and the binary codes are learned by minimizing an objective function defined over classification error and other desirable hash codes properties. With this design, SSDH has a nice characteristic that classification and retrieval are unified in a single learning model. Moreover, SSDH performs joint learning of image representations, hash codes, and classification in a point-wised manner, and thus is scalable to large-scale datasets. SSDH is simple and can be realized by a slight enhancement of an existing deep architecture for classification; yet it is effective and outperforms other hashing approaches on several benchmarks and large datasets. Compared with state-of-the-art approaches, SSDH achieves higher retrieval accuracy, while the classification performance is not sacrificed.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 81
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2018-01-10
    Description: Subspace clustering is an important problem in machine learning with many applications in computer vision and pattern recognition. Prior work has studied this problem using algebraic, iterative, statistical, low-rank and sparse representation techniques. While these methods have been applied to both linear and affine subspaces, theoretical results have only been established in the case of linear subspaces. For example, algebraic subspace clustering (ASC) is guaranteed to provide the correct clustering when the data points are in general position and the union of subspaces is transversal . In this paper we study in a rigorous fashion the properties of ASC in the case of affine subspaces. Using notions from algebraic geometry, we prove that the homogenization trick , which embeds points in a union of affine subspaces into points in a union of linear subspaces, preserves the general position of the points and the transversality of the union of subspaces in the embedded space, thus establishing the correctness of ASC for affine subspaces.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 82
    Publication Date: 2018-01-10
    Description: In this paper, we study the challenging problem of categorizing videos according to high-level semantics such as the existence of a particular human action or a complex event. Although extensive efforts have been devoted in recent years, most existing works combined multiple video features using simple fusion strategies and neglected the utilization of inter-class semantic relationships. This paper proposes a novel unified framework that jointly exploits the feature relationships and the class relationships for improved categorization performance. Specifically, these two types of relationships are estimated and utilized by imposing regularizations in the learning process of a deep neural network (DNN). Through arming the DNN with better capability of harnessing both the feature and the class relationships, the proposed regularized DNN (rDNN) is more suitable for modeling video semantics. We show that rDNN produces better performance over several state-of-the-art approaches. Competitive results are reported on the well-known Hollywood2 and Columbia Consumer Video benchmarks. In addition, to stimulate future research on large scale video categorization, we collect and release a new benchmark dataset, called FCVID, which contains 91,223 Internet videos and 239 manually annotated categories.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 83
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2018-01-10
    Description: There is a large variation in the activities that humans perform in their everyday lives. We consider modeling these composite human activities which comprises multiple basic level actions in a completely unsupervised setting. Our model learns high-level co-occurrence and temporal relations between the actions. We consider the video as a sequence of short-term action clips, which contains human-words and object-words. An activity is about a set of action-topics and object-topics indicating which actions are present and which objects are interacting with. We then propose a new probabilistic model relating the words and the topics. It allows us to model long-range action relations that commonly exist in the composite activities, which is challenging in previous works. We apply our model to the unsupervised action segmentation and clustering, and to a novel application that detects forgotten actions, which we call action patching. For evaluation, we contribute a new challenging RGB-D activity video dataset recorded by the new Kinect v2, which contains several human daily activities as compositions of multiple actions interacting with different objects. Moreover, we develop a robotic system that watches and reminds people using our action patching algorithm. Our robotic setup can be easily deployed on any assistive robots.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 84
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2018-01-10
    Description: We propose a template matching method for the detection of 2D image objects that are characterized by orientation patterns. Our method is based on data representations via orientation scores, which are functions on the space of positions and orientations, and which are obtained via a wavelet-type transform. This new representation allows us to detect orientation patterns in an intuitive and direct way, namely via cross-correlations. Additionally, we propose a generalized linear regression framework for the construction of suitable templates using smoothing splines. Here, it is important to recognize a curved geometry on the position-orientation domain, which we identify with the Lie group SE(2): the roto-translation group. Templates are then optimized in a B-spline basis, and smoothness is defined with respect to the curved geometry. We achieve state-of-the-art results on three different applications: detection of the optic nerve head in the retina (99.83 percent success rate on 1,737 images), of the fovea in the retina (99.32 percent success rate on 1,616 images), and of the pupil in regular camera images (95.86 percent on 1,521 images). The high performance is due to inclusion of both intensity and orientation features with effective geometric priors in the template matching. Moreover, our method is fast due to a cross-correlation based matching approach.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 85
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2018-01-10
    Description: Reconstructing the shape of a 3D object from multi-view images under unknown, general illumination is a fundamental problem in computer vision. High quality reconstruction is usually challenging especially when fine detail is needed and the albedo of the object is non-uniform. This paper introduces vertex overall illumination vectors to model the illumination effect and presents a total variation (TV) based approach for recovering surface details using shading and multi-view stereo (MVS). Behind the approach are the two important observations: (1) the illumination over the surface of an object often appears to be piecewise smooth and (2) the recovery of surface orientation is not sufficient for reconstructing the surface, which was often overlooked previously. Thus we propose to use TV to regularize the overall illumination vectors and use visual hull to constrain partial vertices. The reconstruction is formulated as a constrained TV-minimization problem that simultaneously treats the shape and illumination vectors as unknowns. An augmented Lagrangian method is proposed to quickly solve the TV-minimization problem. As a result, our approach is robust, stable and is able to efficiently recover high-quality surface details even when starting with a coarse model obtained using MVS. These advantages are demonstrated by extensive experiments on the state-of-the-art MVS database, which includes challenging objects with varying albedo.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 86
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2018-01-10
    Description: Multi-target regression has recently regained great popularity due to its capability of simultaneously learning multiple relevant regression tasks and its wide applications in data mining, computer vision and medical image analysis, while great challenges arise from jointly handling inter-target correlations and input-output relationships. In this paper, we propose Multi-layer Multi-target Regression (MMR) which enables simultaneously modeling intrinsic inter-target correlations and nonlinear input-output relationships in a general framework via robust low-rank learning. Specifically, the MMR can explicitly encode inter-target correlations in a structure matrix by matrix elastic nets (MEN); the MMR can work in conjunction with the kernel trick to effectively disentangle highly complex nonlinear input-output relationships; the MMR can be efficiently solved by a new alternating optimization algorithm with guaranteed convergence. The MMR leverages the strength of kernel methods for nonlinear feature learning and the structural advantage of multi-layer learning architectures for inter-target correlation modeling. More importantly, it offers a new multi-layer learning paradigm for multi-target regression which is endowed with high generality, flexibility and expressive ability. Extensive experimental evaluation on 18 diverse real-world datasets demonstrates that our MMR can achieve consistently high performance and outperforms representative state-of-the-art algorithms, which shows its great effectiveness and generality for multivariate prediction.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 87
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2018-01-10
    Description: Learning-based hashing algorithms are “hot topics” because they can greatly increase the scale at which existing methods operate. In this paper, we propose a new learning-based hashing method called “fast supervised discrete hashing” (FSDH) based on “supervised discrete hashing” (SDH). Regressing the training examples (or hash code) to the corresponding class labels is widely used in ordinary least squares regression. Rather than adopting this method, FSDH uses a very simple yet effective regression of the class labels of training examples to the corresponding hash code to accelerate the algorithm. To the best of our knowledge, this strategy has not previously been used for hashing. Traditional SDH decomposes the optimization into three sub-problems, with the most critical sub-problem - discrete optimization for binary hash codes - solved using iterative discrete cyclic coordinate descent (DCC), which is time-consuming. However, FSDH has a closed-form solution and only requires a single rather than iterative hash code-solving step, which is highly efficient. Furthermore, FSDH is usually faster than SDH for solving the projection matrix for least squares regression, making FSDH generally faster than SDH. For example, our results show that FSDH is about 12-times faster than SDH when the number of hashing bits is 128 on the CIFAR-10 data base, and FSDH is about 151-times faster than FastHash when the number of hashing bits is 64 on the MNIST data-base. Our experimental results show that FSDH is not only fast, but also outperforms other comparative methods.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 88
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2018-01-10
    Description: These instructions give guidelines for preparing papers for this publication. Presents information for authors publishing in this journal.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 89
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2018-01-10
    Description: Digitally unwrapping images of paper sheets is crucial for accurate document scanning and text recognition. This paper presents a method for automatically rectifying curved or folded paper sheets from a few images captured from multiple viewpoints. Prior methods either need expensive 3D scanners or model deformable surfaces using over-simplified parametric representations. In contrast, our method uses regular images and is based on general developable surface models that can represent a wide variety of paper deformations. Our main contribution is a new robust rectification method based on ridge-aware 3D reconstruction of a paper sheet and unwrapping the reconstructed surface using properties of developable surfaces via $ell _1$ conformal mapping. We present results on several examples including book pages, folded letters and shopping receipts.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 90
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2018-01-10
    Description: Advertisement, IEEE.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 91
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2018-01-10
    Description: Presents the table of contents for this issue of the publication.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 92
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2018-01-12
    Description: Linear discriminant analysis (LDA) is a classical method for discriminative dimensionality reduction. The original LDA may degrade in its performance for non-Gaussian data, and may be unable to extract sufficient features to satisfactorily explain the data when the number of classes is small. Two prominent extensions to address these problems are subclass discriminant analysis (SDA) and mixture subclass discriminant analysis (MSDA). They divide every class into subclasses and re-define the within-class and between-class scatter matrices on the basis of subclass. In this paper we study the issue of how to obtain subclasses more effectively in order to achieve higher class separation. We observe that there is significant overlap between models of the subclasses, which we hypothesise is undesirable. In order to reduce their overlap we propose an extension of LDA, separability oriented subclass discriminant analysis (SSDA), which employs hierarchical clustering to divide a class into subclasses using a separability oriented criterion, before applying LDA optimisation using re-defined scatter matrices. Extensive experiments have shown that SSDA has better performance than LDA, SDA and MSDA in most cases. Additional experiments have further shown that SSDA can project data into LDA space that has higher class separation than LDA, SDA and MSDA in most cases.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 93
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2017-09-06
    Description: These instructions give guidelines for preparing papers for this publication. Presents information for authors publishing in this journal.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 94
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2017-09-06
    Description: The quality assessment of sets of features extracted from patterns of epidermal ridges on our fingers is a biometric challenge problem with implications on questions concerning security, privacy and identity fraud. In this work, we introduced a new methodology to analyze the quality of high-resolution fingerprint images containing sets of fingerprint pores. Our approach takes into account the spatial interrelationship between the considered features and some basic transformations involving point process and anisotropic analysis. We proposed two new quality index algorithms following spatial and structural classes of analysis. These algorithms have proved to be effective as a performance predictor and as a filter excluding low-quality features in a recognition process. The experiments using error reject curves show that the proposed approaches outperform the state-of-the-art quality assessment algorithm for high-resolution fingerprint recognition, besides defining a new method for reconstructing their friction ridge phases in a very consistent way.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 95
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2017-09-06
    Description: In this paper, we address the problem of registering a distorted image and a reference image of the same scene by estimating the camera motion that had caused the distortion. We simultaneously detect the regions of changes between the two images. We attend to the coalesced effect of rolling shutter and motion blur that occurs frequently in moving CMOS cameras. We first model a general image formation framework for a 3D scene following a layered approach in the presence of rolling shutter and motion blur. We then develop an algorithm which performs layered registration to detect changes. This algorithm includes an optimisation problem that leverages the sparsity of the camera trajectory in the pose space and the sparsity of changes in the spatial domain. We create a synthetic dataset for change detection in the presence of motion blur and rolling shutter effect covering different types of camera motion for both planar and 3D scenes. We compare our method with existing registration methods and also show several real examples captured with CMOS cameras.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 96
    Publication Date: 2017-09-06
    Description: Named entities such as people, locations, and organizations play a vital role in characterizing online content. They often reflect information of interest and are frequently used in search queries. Although named entities can be detected reliably from textual content, extracting relations among them is more challenging, yet useful in various applications (e.g., news recommending systems). In this paper, we present a novel model and system for learning semantic relations among named entities from collections of news articles. We model each named entity occurrence with sparse structured logistic regression, and consider the words (predictors) to be grouped based on background semantics. This sparse group LASSO approach forces the weights of word groups that do not influence the prediction towards zero. The resulting sparse structure is utilized for defining the type and strength of relations. Our unsupervised system yields a named entities’ network where each relation is typed, quantified, and characterized in context. These relations are the key to understanding news material over time and customizing newsfeeds for readers. Extensive evaluation of our system on articles from TIME magazine and BBC News shows that the learned relations correlate with static semantic relatedness measures like WLM, and capture the evolving relationships among named entities over time.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 97
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2017-09-06
    Description: Presents the table of contents for this issue of the publication.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 98
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2017-09-06
    Description: A striking example of brain organisation is the stereotyped arrangement of cell preferences in the visual cortex for edges of particular orientations in the visual image. These “orientation preference maps” appear to have remarkably consistent statistical properties across many species. However fine scale analysis of these properties requires the accurate reconstruction of maps from imaging data which is highly noisy. A new approach for solving this reconstruction problem is to use Bayesian Gaussian process methods, which produce more accurate results than classical techniques. However, so far this work has not considered the fact that maps for several other features of visual input coexist with the orientation preference map and that these maps have mutually dependent spatial arrangements. Here we extend the Gaussian process framework to the multiple output case, so that we can consider multiple maps simultaneously. We demonstrate that this improves reconstruction of multiple maps compared to both classical techniques and the single output approach, can encode the empirically observed relationships, and is easily extendible. This provides the first principled approach for studying the spatial relationships between feature maps in visual cortex.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 99
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2017-09-06
    Description: We present a spatio-temporal energy minimization formulation for simultaneous video object discovery and co-segmentation across multiple videos containing irrelevant frames. Our approach overcomes a limitation that most existing video co-segmentation methods possess, i.e., they perform poorly when dealing with practical videos in which the target objects are not present in many frames. Our formulation incorporates a spatio-temporal auto-context model, which is combined with appearance modeling for superpixel labeling. The superpixel-level labels are propagated to the frame level through a multiple instance boosting algorithm with spatial reasoning, based on which frames containing the target object are identified. Our method only needs to be bootstrapped with the frame-level labels for a few video frames (e.g., usually 1 to 3) to indicate if they contain the target objects or not. Extensive experiments on four datasets validate the efficacy of our proposed method: 1) object segmentation from a single video on the SegTrack dataset, 2) object co-segmentation from multiple videos on a video co-segmentation dataset, and 3) joint object discovery and co-segmentation from multiple videos containing irrelevant frames on the MOViCS dataset and XJTU-Stevens, a new dataset that we introduce in this paper. The proposed method compares favorably with the state-of-the-art in all of these experiments.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 100
    Publication Date: 2017-09-06
    Description: Leveraging the compressive sensing (CS) theory, coded aperture snapshot spectral imaging (CASSI) provides an efficient solution to recover 3D hyperspectral data from a 2D measurement. The dual-camera design of CASSI, by adding an uncoded panchromatic measurement, enhances the reconstruction fidelity while maintaining the snapshot advantage. In this paper, we propose an adaptive nonlocal sparse representation (ANSR) model to boost the performance of dual-camera compressive hyperspectral imaging (DCCHI). Specifically, the CS reconstruction problem is formulated as a 3D cube based sparse representation to make full use of the nonlocal similarity in both the spatial and spectral domains. Our key observation is that, the panchromatic image, besides playing the role of direct measurement, can be further exploited to help the nonlocal similarity estimation. Therefore, we design a joint similarity metric by adaptively combining the internal similarity within the reconstructed hyperspectral image and the external similarity within the panchromatic image. In this way, the fidelity of CS reconstruction is greatly enhanced. Both simulation and hardware experimental results show significant improvement of the proposed method over the state-of-the-art.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
Close ⊗
This website uses cookies and the analysis tool Matomo. More information can be found here...