ALBERT

All Library Books, journals and Electronic Records Telegrafenberg

Your email was sent successfully. Check your inbox.

An error occurred while sending the email. Please try again.

Proceed reservation?

Export
Filter
  • Articles  (791)
  • Institute of Electrical and Electronics Engineers (IEEE)  (791)
  • 2015-2019  (791)
  • 1990-1994
  • 1945-1949
  • IEEE Transactions on Pattern Analysis and Machine Intelligence  (791)
  • 1275
  • Computer Science  (791)
  • Geography
  • Mechanical Engineering, Materials Science, Production Engineering, Mining and Metallurgy, Traffic Engineering, Precision Mechanics
Collection
  • Articles  (791)
Publisher
  • Institute of Electrical and Electronics Engineers (IEEE)  (791)
Years
  • 2015-2019  (791)
  • 1990-1994
  • 1945-1949
  • 2010-2014  (475)
Year
Topic
  • Computer Science  (791)
  • Geography
  • Mechanical Engineering, Materials Science, Production Engineering, Mining and Metallurgy, Traffic Engineering, Precision Mechanics
  • 1
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-08-04
    Description: The goal of cross-domain matching (CDM) is to find correspondences between two sets of objects in different domains in an unsupervised way. CDM has various interesting applications, including photo album summarization where photos are automatically aligned into a designed frame expressed in the Cartesian coordinate system, and temporal alignment which aligns sequences such as videos that are potentially expressed using different features. In this paper, we propose an information-theoretic CDM framework based on squared-loss mutual information (SMI). The proposed approach can directly handle non-linearly related objects/sequences with different dimensions, with the ability that hyper-parameters can be objectively optimized by cross-validation. We apply the proposed method to several real-world problems including image matching, unpaired voice conversion, photo album summarization, cross-feature video and cross-domain video-to-mocap alignment, and Kinect -based action recognition, and experimentally demonstrate that the proposed method is a promising alternative to state-of-the-art CDM methods.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 2
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-08-04
    Description: The skeleton of a 2D shape is an important geometric structure in pattern analysis and computer vision. In this paper we study the skeleton of a 2D shape in a two-manifold $mathcal {M}$ , based on a geodesic metric. We present a formal definition of the skeleton $S(Omega )$ for a shape $Omega$ in $mathcal {M}$ and show several properties that make $S(Omega )$ distinct from its Euclidean counterpart in $mathbb {R}^2$ . We further prove that for a shape sequence $lbrace Omega _irbrace$ that converge to a shape $Omega$ in $mathcal {M}$ , the mapping $Omega righta- row overline{S}(Omega )$ is lower semi-continuous. A direct application of this result is that we can use a set $P$ of sample points to approximate the boundary of a 2D shape $Omega$ in $mathcal {M}$ , and the Voronoi diagram of $P$ inside $Omega subset mathcal {M}$ gives a good approximation to the skeleton $S(Omega )$ . Examples of skeleton computation in topography and brain morphometry are illustrated.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 3
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-08-04
    Description: A widely used approach for locating points on deformable objects in images is to generate feature response images for each point, and then to fit a shape model to these response images. We demonstrate that Random Forest regression-voting can be used to generate high quality response images quickly. Rather than using a generative or a discriminative model to evaluate each pixel, a regressor is used to cast votes for the optimal position of each point. We show that this leads to fast and accurate shape model matching when applied in the Constrained Local Model framework. We evaluate the technique in detail, and compare it with a range of commonly used alternatives across application areas: the annotation of the joints of the hands in radiographs and the detection of feature points in facial images. We show that our approach outperforms alternative techniques, achieving what we believe to be the most accurate results yet published for hand joint annotation and state-of-the-art performance for facial feature point detection.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 4
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-08-04
    Description: We present a novel method to recognise planar structures in a single image and estimate their 3D orientation. This is done by exploiting the relationship between image appearance and 3D structure, using machine learning methods with supervised training data. As such, the method does not require specific features or use geometric cues, such as vanishing points. We employ general feature representations based on spatiograms of gradients and colour, coupled with relevance vector machines for classification and regression. We first show that using hand-labelled training data, we are able to classify pre-segmented regions as being planar or not, and estimate their 3D orientation. We then incorporate the method into a segmentation algorithm to detect multiple planar structures from a previously unseen image.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 5
    Publication Date: 2015-08-04
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 6
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-08-04
    Description: Multiple view segmentation consists in segmenting objects simultaneously in several views. A key issue in that respect and compared to monocular settings is to ensure propagation of segmentation information between views while minimizing complexity and computational cost. In this work, we first investigate the idea that examining measurements at the projections of a sparse set of 3D points is sufficient to achieve this goal. The proposed algorithm softly assigns each of these 3D samples to the scene background if it projects on the background region in at least one view, or to the foreground if it projects on foreground region in all views. Second, we show how other modalities such as depth may be seamlessly integrated in the model and benefit the segmentation. The paper exposes a detailed set of experiments used to validate the algorithm, showing results comparable with the state of art, with reduced computational complexity. We also discuss the use of different modalities for specific situations, such as dealing with a low number of viewpoints or a scene with color ambiguities between foreground and background.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 7
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-08-04
    Description: Text detection in natural scene images is an important prerequisite for many content-based image analysis tasks, while most current research efforts only focus on horizontal or near horizontal scene text. In this paper, first we present a unified distance metric learning framework for adaptive hierarchical clustering, which can simultaneously learn similarity weights (to adaptively combine different feature similarities) and the clustering threshold (to automatically determine the number of clusters). Then, we propose an effective multi-orientation scene text detection system, which constructs text candidates by grouping characters based on this adaptive clustering. Our text candidates construction method consists of several sequential coarse-to-fine grouping steps: morphology-based grouping via single-link clustering, orientation-based grouping via divisive hierarchical clustering, and projection-based grouping also via divisive clustering. The effectiveness of our proposed system is evaluated on several public scene text databases, e.g., ICDAR Robust Reading Competition data sets (2011 and 2013), MSRA-TD500 and NEOCR. Specifically, on the multi-orientation text data set MSRA-TD500, the $f$ measure of our system is $71$ percent, much better than the state-of-the-art performance. We also construct and release a practical challenging multi-orientation scene text data set (USTB-SV1K), which is available at http://prir.ustb.edu.cn/TexStar/MOMV-text-detection/.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 8
    Publication Date: 2015-08-04
    Description: A novel approach for event summarization and rare event detection is proposed. Unlike conventional methods that deal with event summarization and rare event detection independently, our method solves them in a single framework by transforming them into a graph editing problem. In our approach, a video is represented by a graph, each node of which indicates an event obtained by segmenting the video spatially and temporally. The edges between nodes describe the relationship between events. Based on the degree of relations, edges have different weights. After learning the graph structure, our method finds subgraphs that represent event summarization and rare events in the video by editing the graph, that is, merging its subgraphs or pruning its edges. The graph is edited to minimize a predefined energy model with the Markov Chain Monte Carlo (MCMC) method. The energy model consists of several parameters that represent the causality, frequency, and significance of events. We design a specific energy model that uses these parameters to satisfy each objective of event summarization and rare event detection. The proposed method is extended to obtain event summarization and rare event detection results across multiple videos captured from multiple views. For this purpose, the proposed method independently learns and edits each graph of individual videos for event summarization or rare event detection. Then, the method matches the extracted multiple graphs to each other, and constructs a single composite graph that represents event summarization or rare events from multiple views. Experimental results show that the proposed approach accurately summarizes multiple videos in a fully unsupervised manner . Moreover, the experiments demonstrate that the approach is advantageous in detecting rare transition of events .
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 9
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-08-04
    Description: Object tracking has been one of the most important and active research areas in the field of computer vision. A large number of tracking algorithms have been proposed in recent years with demonstrated success. However, the set of sequences used for evaluation is often not sufficient or is sometimes biased for certain types of algorithms. Many datasets do not have common ground-truth object positions or extents, and this makes comparisons among the reported quantitative results difficult. In addition, the initial conditions or parameters of the evaluated tracking algorithms are not the same, and thus, the quantitative results reported in literature are incomparable or sometimes contradictory. To address these issues, we carry out an extensive evaluation of the state-of-the-art online object-tracking algorithms with various evaluation criteria to understand how these methods perform within the same framework. In this work, we first construct a large dataset with ground-truth object positions and extents for tracking and introduce the sequence attributes for the performance analysis. Second, we integrate most of the publicly available trackers into one code library with uniform input and output formats to facilitate large-scale performance evaluation. Third, we extensively evaluate the performance of 31 algorithms on 100 sequences with different initialization settings. By analyzing the quantitative results, we identify effective approaches for robust tracking and provide potential future research directions in this field.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 10
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-08-04
    Description: Fused Lasso is a popular regression technique that encodes the smoothness of the data. It has been applied successfully to many applications with a smooth feature structure. However, the computational cost of the existing solvers for fused Lasso is prohibitive when the feature dimension is extremely large. In this paper, we propose novel screening rules that are able to quickly identity the adjacent features with the same coefficients. As a result, the number of variables to be estimated can be significantly reduced, leading to substantial savings in computational cost and memory usage. To the best of our knowledge, the proposed approach is the first attempt to develop screening methods for the fused Lasso problem with general data matrix. Our major contributions are: 1) we derive a new dual formulation of fused Lasso that comes with several desirable properties; 2) we show that the new dual formulation of fused Lasso is equivalent to that of the standard Lasso by two affine transformations; 3) we propose a novel framework for developing effective and efficient screening rules for f used La sso via the m onotonicity of the s ubdifferentials (FLAMS). Some appealing features of FLAMS are: 1) our methods are safe in the sense that the detected adjacent features are guaranteed to have the same coefficients; 2) the dataset needs to be scanned only once to run the screening, whose computational cost is negligible compared to that of solving the fused Lasso; (3) FLAMS is independent of the solvers and can be integrated with any existing solvers. We have evaluated the proposed FLAMS rules on both synthetic and real datasets. The experiments indicate that FLAMS is very effective in identifying the adjacent features with the same coefficients. The speedup gained by FLAMS can be orders of magnitude.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 11
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-08-04
    Description: Hidden conditional random fields (HCRFs) are discriminative latent variable models which have been shown to successfully learn the hidden structure of a given classification problem. An Infinite hidden conditional random field is a hidden conditional random field with a countably infinite number of hidden states, which rids us not only of the necessity to specify a priori a fixed number of hidden states available but also of the problem of overfitting. Markov chain Monte Carlo (MCMC) sampling algorithms are often employed for inference in such models. However, convergence of such algorithms is rather difficult to verify, and as the complexity of the task at hand increases the computational cost of such algorithms often becomes prohibitive. These limitations can be overcome by variational techniques. In this paper, we present a generalized framework for infinite HCRF models, and a novel variational inference approach on a model based on coupled Dirichlet Process Mixtures, the HCRF-DPM. We show that the variational HCRF-DPM is able to converge to a correct number of represented hidden states, and performs as well as the best parametric HCRFs—chosen via cross-validation—for the difficult tasks of recognizing instances of agreement, disagreement, and pain in audiovisual sequences.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 12
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-08-04
    Description: In this paper, we address the challenging problem of detecting pedestrians who appear in groups. A new approach is proposed for single-pedestrian detection aided by two-pedestrian detection. A mixture model of two-pedestrian detectors is designed to capture the unique visual cues which are formed by nearby pedestrians but cannot be captured by single-pedestrian detectors. A probabilistic framework is proposed to model the relationship between the configurations estimated by single- and two-pedestrian detectors, and to refine the single-pedestrian detection result using two-pedestrian detection. The two-pedestrian detector can integrate with any single-pedestrian detector. Twenty-five state-of-the-art single-pedestrian detection approaches are combined with the two-pedestrian detector on three widely used public datasets: Caltech, TUD-Brussels, and ETH. Experimental results show that our framework improves all these approaches. The average improvement is $9$ percent on the Caltech-Test dataset, $11$ percent on the TUD-Brussels dataset and $17$ percent on the ETH dataset in terms of average miss rate. The lowest average miss rate is reduced from $37$ to percent on the Caltech-Test dataset, from $55$ to $50$ percent on the TUD-Brussels dataset and from $43$ to $38$ percent on the ETH dataset.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 13
    Publication Date: 2015-06-06
    Description: Systems based on bag-of-words models from image features collected at maxima of sparse interest point operators have been used successfully for both computer visual object and action recognition tasks. While the sparse, interest-point based approach to recognition is not inconsistent with visual processing in biological systems that operate in ‘saccade and fixate’ regimes, the methodology and emphasis in the human and the computer vision communities remains sharply distinct. Here, we make three contributions aiming to bridge this gap. First, we complement existing state-of-the art large scale dynamic computer vision annotated datasets like Hollywood-2  [1] and UCF Sports  [2] with human eye movements collected under the ecological constraints of visual action and scene context recognition tasks. To our knowledge these are the first large human eye tracking datasets to be collected and made publicly available for video, vision.imar.ro/eyetracking (497,107 frames, each viewed by 19 subjects), unique in terms of their (a) large scale and computer vision relevance, (b) dynamic, video stimuli, (c) task control, as well as free-viewing . Second, we introduce novel dynamic consistency and alignment measures , which underline the remarkable stability of patterns of visual search among subjects. Third, we leverage the significant amount of collected data in order to pursue studies and build automatic, end-to-end trainable computer vision systems based on human eye movements. Our studies not only shed light on the differences between computer vision spatio-temporal interest point image sampling strategies and the human fixations, as well as their impact for visual recognition performance, but also demonstrate that human fixations can be accurately predicted, and when used in an end-to-end automatic system, leveraging some of the advanced computer vision practice, can lead to state of the art results.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 14
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-06-06
    Description: We consider the problem of parameter estimation and energy minimization for a region-based semantic segmentation model. The model divides the pixels of an image into non-overlapping connected regions, each of which is to a semantic class. In the context of energy minimization, the main problem we face is the large number of putative pixel-to-region assignments. We address this problem by designing an accurate linear programming based approach for selecting the best set of regions from a large dictionary. The dictionary is constructed by merging and intersecting segments obtained from multiple bottom-up over-segmentations. The linear program is solved efficiently using dual decomposition. In the context of parameter estimation, the main problem we face is the lack of fully supervised data. We address this issue by developing a principled framework for parameter estimation using diverse data. More precisely, we propose a latent structural support vector machine formulation, where the latent variables model any missing information in the human annotation. Of particular interest to us are three types of annotations: (i) images segmented using generic foreground or background classes; (ii) images with bounding boxes specified for objects; and (iii) images labeled to indicate the presence of a class. Using large, publicly available datasets we show that our methods are able to significantly improve the accuracy of the region-based model.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 15
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-06-06
    Description: In this paper we address the problem of finding the most probable state of a discrete Markov random field (MRF), also known as the MRF energy minimization problem. The task is known to be NP-hard in general and its practical importance motivates numerous approximate algorithms. We propose a submodular relaxation approach (SMR) based on a Lagrangian relaxation of the initial problem. Unlike the dual decomposition approach of Komodakis et al. [29] SMR does not decompose the graph structure of the initial problem but constructs a submodular energy that is minimized within the Lagrangian relaxation. Our approach is applicable to both pairwise and high-order MRFs and allows to take into account global potentials of certain types. We study theoretical properties of the proposed approach and evaluate it experimentally.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 16
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-06-06
    Description: Higher-order Markov Random Fields, which can capture important properties of natural images, have become increasingly important in computer vision. While graph cuts work well for first-order MRF’s, until recently they have rarely been effective for higher-order MRF’s. Ishikawa’s graph cut technique [1] , [2] shows great promise for many higher-order MRF’s. His method transforms an arbitrary higher-order MRF with binary labels into a first-order one with the same minima. If all the terms are submodular the exact solution can be easily found; otherwise, pseudoboolean optimization techniques can produce an optimal labeling for a subset of the variables. We present a new transformation with better performance than [1] , [2] , both theoretically and experimentally. While [1] , [2] transforms each higher-order term independently, we use the underlying hypergraph structure of the MRF to transform a group of terms at once. For $n$ binary variables, each of which appears in terms with $k$ other variables, at worst we produce $n$ non-submodular terms, while [1] , [2] produces $O(- k)$ . We identify a local completeness property under which our method perform even better, and show that under certain assumptions several important vision problems (including common variants of fusion moves) have this property. We show experimentally that our method produces smaller weight of non-submodular edges, and that this metric is directly related to the effectiveness of QPBO [3] . Running on the same field of experts dataset used in [1] , [2] we optimally label significantly more variables (96 versus 80 percent) and converge more rapidly to a lower energy. Preliminary experiments suggest that some other higher-order MRF’s used in stereo [4] and segmentation [5] are also locally complete and would thus benefit from our work.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 17
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-06-06
    Description: We introduce a conceptually novel structured prediction model, GPstruct , which is kernelized, non-parametric and Bayesian, by design. We motivate the model with respect to existing approaches, among others, conditional random fields (CRFs), maximum margin Markov networks (M $^3$ N), and structured support vector machines (SVMstruct), which embody only a subset of its properties. We present an inference procedure based on Markov Chain Monte Carlo. The framework can be instantiated for a wide range of structured objects such as linear chains, trees, grids, and other general graphs. As a proof of concept, the model is benchmarked on several natural language processing tasks and a video gesture segmentation task involving a linear chain structure. We show prediction accuracies for GPstruct which are comparable to or exceeding those of CRFs and SVMstruct.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 18
    Publication Date: 2015-08-04
    Description: Improving the quality of degraded images is a key problem in image processing, but the breadth of the problem leads to domain-specific approaches for tasks such as super-resolution and compression artifact removal. Recent approaches have shown that a general approach is possible by learning application-specific models from examples; however, learning models sophisticated enough to generate high-quality images is computationally expensive, and so specific per-application or per-dataset models are impractical. To solve this problem, we present an efficient semi-local approximation scheme to large-scale Gaussian processes. This allows efficient learning of task-specific image enhancements from example images without reducing quality. As such, our algorithm can be easily customized to specific applications and datasets, and we show the efficiency and effectiveness of our approach across five domains: single-image super-resolution for scene, human face, and text images, and artifact removal in JPEG- and JPEG 2000-encoded images.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 19
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-08-04
    Description: Parsimony, including sparsity and low rank, has been shown to successfully model data in numerous machine learning and signal processing tasks. Traditionally, such modeling approaches rely on an iterative algorithm that minimizes an objective function with parsimony-promoting terms. The inherently sequential structure and data-dependent complexity and latency of iterative optimization constitute a major limitation in many applications requiring real-time performance or involving large-scale data. Another limitation encountered by these modeling techniques is the difficulty of their inclusion in discriminative learning scenarios. In this work, we propose to move the emphasis from the model to the pursuit algorithm, and develop a process-centric view of parsimonious modeling, in which a learned deterministic fixed-complexity pursuit process is used in lieu of iterative optimization. We show a principled way to construct learnable pursuit process architectures for structured sparse and robust low rank models, derived from the iteration of proximal descent algorithms. These architectures learn to approximate the exact parsimonious representation at a fraction of the complexity of the standard optimization methods. We also show that appropriate training regimes allow to naturally extend parsimonious models to discriminative settings. State-of-the-art results are demonstrated on several challenging problems in image and audio processing with several orders of magnitude speed-up compared to the exact optimization algorithms.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 20
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-08-04
    Description: CANDECOMP/PARAFAC (CP) tensor factorization of incomplete data is a powerful technique for tensor completion through explicitly capturing the multilinear latent factors. The existing CP algorithms require the tensor rank to be manually specified, however, the determination of tensor rank remains a challenging problem especially for CP rank . In addition, existing approaches do not take into account uncertainty information of latent factors, as well as missing entries. To address these issues, we formulate CP factorization using a hierarchical probabilistic model and employ a fully Bayesian treatment by incorporating a sparsity-inducing prior over multiple latent factors and the appropriate hyperpriors over all hyperparameters, resulting in automatic rank determination. To learn the model, we develop an efficient deterministic Bayesian inference algorithm, which scales linearly with data size. Our method is characterized as a tuning parameter-free approach, which can effectively infer underlying multilinear factors with a low-rank constraint, while also providing predictive distributions over missing entries. Extensive simulations on synthetic data illustrate the intrinsic capability of our method to recover the ground-truth of CP rank and prevent the overfitting problem, even when a large amount of entries are missing. Moreover, the results from real-world applications, including image inpainting and facial image synthesis, demonstrate that our method outperforms state-of-the-art approaches for both tensor factorization and tensor completion in terms of predictive performance.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 21
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-08-04
    Description: Existing deep convolutional neural networks (CNNs) require a fixed-size (e.g., 224 $times$ 224) input image. This requirement is “artificial” and may reduce the recognition accuracy for the images or sub-images of an arbitrary size/scale. In this work, we equip the networks with another pooling strategy, “spatial pyramid pooling”, to eliminate the above requirement. The new network structure, called SPP-net, can generate a fixed-length representation regardless of image size/scale. Pyramid pooling is also robust to object deformations. With these advantages, SPP-net should in general improve all CNN-based image classification methods. On the ImageNet 2012 dataset, we demonstrate that SPP-net boosts the accuracy of a variety of CNN architectures despite their different designs. On the Pascal VOC 2007 and Caltech101 datasets, SPP-net achieves state-of-the-art classification results using a single full-image representation and no fine-tuning. The power of SPP-net is also significant in object detection. Using SPP-net, we compute the feature maps from the entire image only once, and then pool features in arbitrary regions (sub-images) to generate fixed-length representations for training the detectors. This method avoids repeatedly computing the convolutional features. In processing test images, our method is 24-102 $times$ faster than the R-CNN method, while achieving better or comparable accuracy on Pascal VOC 2007. In ImageNet Large Scale Visual Recognition Challenge (ILSVRC) 2014, our methods rank #2 in object detection and #3 in image classification among all 38 teams. This manuscript also introduces the improvement made for this - ompetition.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 22
    Publication Date: 2015-08-04
    Description: We present efficient graph cut algorithms for three problems: (1) finding a region in an image, so that the histogram (or distribution) of an image feature within the region most closely matches a given model; (2) co-segmentation of image pairs and (3) interactive image segmentation with a user-provided bounding box. Each algorithm seeks the optimum of a global cost function based on the Bhattacharyya measure, a convenient alternative to other matching measures such as the Kullback–Leibler divergence. Our functionals are not directly amenable to graph cut optimization as they contain non-linear functions of fractional terms, which make the ensuing optimization problems challenging. We first derive a family of parametric bounds of the Bhattacharyya measure by introducing an auxiliary labeling. Then, we show that these bounds are auxiliary functions of the Bhattacharyya measure, a result which allows us to solve each problem efficiently via graph cuts. We show that the proposed optimization procedures converge within very few graph cut iterations. Comprehensive and various experiments, including quantitative and comparative evaluations over two databases, demonstrate the advantages of the proposed algorithms over related works in regard to optimality, computational load, accuracy and flexibility.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 23
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-08-05
    Description: Provides a listing of board members, committee members, editors, and society officers.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 24
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-08-05
    Description: Certain inner feelings and physiological states like pain are subjective states that cannot be directly measured, but can be estimated from spontaneous facial expressions. Since they are typically characterized by subtle movements of facial parts, analysis of the facial details is required. To this end, we formulate a new regression method for continuous estimation of the intensity of facial behavior interpretation, called Doubly Sparse Relevance Vector Machine (DSRVM). DSRVM enforces double sparsity by jointly selecting the most relevant training examples (a.k.a. relevance vectors) and the most important kernels associated with facial parts relevant for interpretation of observed facial expressions. This advances prior work on multi-kernel learning, where sparsity of relevant kernels is typically ignored. Empirical evaluation on challenging Shoulder Pain videos, and the benchmark DISFA and SEMAINE datasets demonstrate that DSRVM outperforms competing approaches with a multi-fold reduction of running times in training and testing.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 25
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-08-05
    Description: Raindrops adhered to a windscreen or window glass can significantly degrade the visibility of a scene. Modeling, detecting and removing raindrops will, therefore, benefit many computer vision applications, particularly outdoor surveillance systems and intelligent vehicle systems. In this paper, a method that automatically detects and removes adherent raindrops is introduced. The core idea is to exploit the local spatio-temporal derivatives of raindrops. To accomplish the idea, we first model adherent raindrops using law of physics, and detect raindrops based on these models in combination with motion and intensity temporal derivatives of the input video. Having detected the raindrops, we remove them and restore the images based on an analysis that some areas of raindrops completely occludes the scene, and some other areas occlude only partially. For partially occluding areas, we restore them by retrieving as much as possible information of the scene, namely, by solving a blending function on the detected partially occluding areas using the temporal intensity derivative. For completely occluding areas, we recover them by using a video completion technique. Experimental results using various real videos show the effectiveness of our method.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 26
    Publication Date: 2016-08-05
    Description: Deep convolutional networks have proven to be very successful in learning task specific features that allow for unprecedented performance on various computer vision tasks. Training of such networks follows mostly the supervised learning paradigm, where sufficiently many input-output pairs are required for training. Acquisition of large training sets is one of the key challenges, when approaching a new task. In this paper, we aim for generic feature learning and present an approach for training a convolutional network using only unlabeled data. To this end, we train the network to discriminate between a set of surrogate classes. Each surrogate class is formed by applying a variety of transformations to a randomly sampled ‘seed’ image patch. In contrast to supervised network training, the resulting feature representation is not class specific. It rather provides robustness to the transformations that have been applied during training. This generic feature representation allows for classification results that outperform the state of the art for unsupervised learning on several popular datasets (STL-10, CIFAR-10, Caltech-101, Caltech-256). While features learned with our approach cannot compete with class specific features from supervised training on a classification task, we show that they are advantageous on geometric matching problems, where they also outperform the SIFT descriptor.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 27
    Publication Date: 2016-08-05
    Description: We address the video-based face association problem, in which one attempts to extract the face tracks of multiple subjects while maintaining label consistency. Traditional tracking algorithms have difficulty in handling this task, especially when challenging nuisance factors like motion blur, low resolution or significant camera motions are present. We demonstrate that contextual features, in addition to face appearance itself, play an important role in this case. We propose principled methods to combine multiple features using Conditional Random Fields and Max-Margin Markov networks to infer labels for the detected faces. Different from many existing approaches, our algorithms work in online mode and hence have a wider range of applications. We address issues such as parameter learning, inference and handling false positves/negatives that arise in the proposed approach. Finally, we evaluate our approach on several public databases.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 28
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-08-05
    Description: There are many scenarios in artificial intelligence, signal processing or medicine, in which a temporal sequence consists of several unknown overlapping independent causes, and we are interested in accurately recovering those canonical causes. Factorial hidden Markov models (FHMMs) present the versatility to provide a good fit to these scenarios. However, in some scenarios, the number of causes or the number of states of the FHMM cannot be known or limited a priori. In this paper, we propose an infinite factorial unbounded-state hidden Markov model (IFUHMM), in which the number of parallel hidden Markovmodels (HMMs) and states in each HMM are potentially unbounded. We rely on a Bayesian nonparametric (BNP) prior over integer-valued matrices, in which the columns represent the Markov chains, the rows the time indexes, and the integers the state for each chain and time instant. First, we extend the existent infinite factorial binary-state HMM to allow for any number of states. Then, we modify this model to allow for an unbounded number of states and derive an MCMC-based inference algorithm that properly deals with the trade-off between the unbounded number of states and chains. We illustrate the performance of our proposed models in the power disaggregation problem.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 29
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-08-05
    Description: The speed with which intelligent systems can react to an action depends on how soon it can be recognized. The ability to recognize ongoing actions is critical in many applications, for example, spotting criminal activity. It is challenging, since decisions have to be made based on partial videos of temporally incomplete action executions. In this paper, we propose a novel discriminative multi-scale kernelized model for predicting the action class from a partially observed video. The proposed model captures temporal dynamics of human actions by explicitly considering all the history of observed features as well as features in smaller temporal segments. A compositional kernel is proposed to hierarchically capture the relationships between partial observations as well as the temporal segments, respectively. We develop a new learning formulation, which elegantly captures the temporal evolution over time, and enforces the label consistency between segments and corresponding partial videos. We prove that the proposed learning formulation minimizes the upper bound of the empirical risk. Experimental results on four public datasets show that the proposed approach outperforms state-of-the-art action prediction methods.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 30
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-08-05
    Description: Evidence is mounting that Convolutional Networks (ConvNets) are the most effective representation learning method for visual recognition tasks. In the common scenario, a ConvNet is trained on a large labeled dataset (source) and the feed-forward units activation of the trained network, at a certain layer of the network, is used as a generic representation of an input image for a task with relatively smaller training set (target). Recent studies have shown this form of representation transfer to be suitable for a wide range of target visual recognition tasks. This paper introduces and investigates several factors affecting the transferability of such representations. It includes parameters for training of the source ConvNet such as its architecture, distribution of the training data, etc. and also the parameters of feature extraction such as layer of the trained ConvNet, dimensionality reduction, etc. Then, by optimizing these factors, we show that significant improvements can be achieved on various (17) visual recognition tasks. We further show that these visual recognition tasks can be categorically ordered based on their similarity to the source task such that a correlation between the performance of tasks and their similarity to the source task w.r.t. the proposed factors is observed.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 31
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-08-05
    Description: Graph matching (GM) is a fundamental problem in computer science, and it plays a central role to solve correspondence problems in computer vision. GM problems that incorporate pairwise constraints can be formulated as a quadratic assignment problem (QAP). Although widely used, solving the correspondence problem through GM has two main limitations: (1) the QAP is NP-hard and difficult to approximate; (2) GM algorithms do not incorporate geometric constraints between nodes that are natural in computer vision problems. To address aforementioned problems, this paper proposes factorized graph matching (FGM). FGM factorizes the large pairwise affinity matrix into smaller matrices that encode the local structure of each graph and the pairwise affinity between edges. Four are the benefits that follow from this factorization: (1) There is no need to compute the costly (in space and time) pairwise affinity matrix; (2) The factorization allows the use of a path-following optimization algorithm, that leads to improved optimization strategies and matching performance; (3) Given the factorization, it becomes straight-forward to incorporate geometric transformations (rigid and non-rigid) to the GM problem. (4) Using a matrix formulation for the GM problem and the factorization, it is easy to reveal commonalities and differences between different GM methods. The factorization also provides a clean connection with other matching algorithms such as iterative closest point; Experimental results on synthetic and real databases illustrate how FGM outperforms state-of-the-art algorithms for GM. The code is available at http://humansensing.cs.cmu.edu/fgm .
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 32
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-08-05
    Description: Existing data association techniques mostly focus on matching pairs of data-point sets and then repeating this process along space-time to achieve long term correspondences. However, in many problems such as person re-identification, a set of data-points may be observed at multiple spatio-temporal locations and/or by multiple agents in a network and simply combining the local pairwise association results between sets of data-points often leads to inconsistencies over the global space-time horizons. In this paper, we propose a Novel Network Consistent Data Association (NCDA) framework formulated as an optimization problem that not only maintains consistency in association results across the network, but also improves the pairwise data association accuracies. The proposed NCDA can be solved as a binary integer program leading to a globally optimal solution and is capable of handling the challenging data-association scenario where the number of data-points varies across different sets of instances in the network. We also present an online implementation of NCDA method that can dynamically associate new observations to already observed data-points in an iterative fashion, while maintaining network consistency. We have tested both the batch and the online NCDA in two application areas—person re-identification and spatio-temporal cell tracking and observed consistent and highly accurate data association results in all the cases.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 33
    Publication Date: 2016-08-05
    Description: This paper presents a method for learning an And-Or model to represent context and occlusion for car detection and viewpoint estimation. The learned And-Or model represents car-to-car context and occlusion configurations at three levels: (i) spatially-aligned cars, (ii) single car under different occlusion configurations, and (iii) a small number of parts. The And-Or model embeds a grammar for representing large structural and appearance variations in a reconfigurable hierarchy. The learning process consists of two stages in a weakly supervised way (i.e., only bounding boxes of single cars are annotated). First, the structure of the And-Or model is learned with three components: (a) mining multi-car contextual patterns based on layouts of annotated single car bounding boxes, (b) mining occlusion configurations between single cars, and (c) learning different combinations of part visibility based on CAD simulations. The And-Or model is organized in a directed and acyclic graph which can be inferred by Dynamic Programming. Second, the model parameters (for appearance, deformation and bias) are jointly trained using Weak-Label Structural SVM. In experiments, we test our model on four car detection datasets—the KITTI dataset [1] , the PASCAL VOC2007 car dataset  [2] , and two self-collected car datasets, namely the Street-Parking car dataset and the Parking-Lot car dataset, and three datasets for car viewpoint estimation—the PASCAL VOC2006 car dataset  [2] , the 3D car dataset  [3] , and the PASCAL3D+ car dataset  [4] . Compared with state-of-the-art variants of deformable part-based models and other methods, our model achieves significant improvement consistently on the four detection datasets, and comparable performance on car viewpo- nt estimation.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 34
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-08-05
    Description: Convolutional Neural Network (CNN) has demonstrated promising performance in single-label image classification tasks. However, how CNN best copes with multi-label images still remains an open problem, mainly due to the complex underlying object layouts and insufficient multi-label training images. In this work, we propose a flexible deep CNN infrastructure, called Hypotheses-CNN-Pooling (HCP), where an arbitrary number of object segment hypotheses are taken as the inputs, then a shared CNN is connected with each hypothesis, and finally the CNN output results from different hypotheses are aggregated with max pooling to produce the ultimate multi-label predictions. Some unique characteristics of this flexible deep CNN infrastructure include: 1) no ground-truth bounding box information is required for training; 2) the whole HCP infrastructure is robust to possibly noisy and/or redundant hypotheses; 3) the shared CNN is flexible and can be well pre-trained with a large-scale single-label image dataset, e.g., ImageNet; and 4) it may naturally output multi-label prediction results. Experimental results on Pascal VOC 2007 and VOC 2012 multi-label image datasets well demonstrate the superiority of the proposed HCP infrastructure over other state-of-the-arts. In particular, the mAP reaches 90.5% by HCP only and 93.2% after the fusion with our complementary result in  [12] based on hand-crafted features on the VOC 2012 dataset.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 35
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-08-05
    Description: Wiberg matrix factorization breaks a matrix $Y$ into low-rank factors $U$ and $V$ by solving for $V$ in closed form given $U$ , linearizing $V(U)$ about $U$ , and iteratively minimizing $||Y - UV(U)||_2$ with respect to $U$ only. This approach factors the matrix while effectively removing $V$ from the minimization. Recently Eriksson and van den Hengel extended this approach to $L_1$ , minimizing $||Y - UV(U)||_1$ . We generalize their approach beyond factorization to minimize $||Y - f(U, V)||_1$ for more general functions $f(U, V)$ that are nonlinear in each of two sets of variables. We demonstrate the idea with a practical Wiberg algorithm for $L_1$ bundle adjustment. One Wiberg minimization can be nested inside another, effectively removing two of three sets of variables from a minimization. We demonstrate this idea with a nested Wiberg algorithm for $L_1$ projective bundle adjustment, solving for camera matrices, points, and projective depths. Wiberg minimization also generalizes to handle nonlinear constraints, and we demonstrate this idea with Constrained Wiberg Minimization for Multiple Instance Learning (CWM-MIL), which removes one set of variable
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 36
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-08-05
    Description: In this paper, we propose a novel subspace learning algorithm called Local Feature Discriminant Projection (LFDP) for supervised dimensionality reduction of local features. LFDP is able to efficiently seek a subspace to improve the discriminability of local features for classification. We make three novel contributions. First, the proposed LFDP is a general supervised subspace learning algorithm which provides an efficient way for dimensionality reduction of large-scale local feature descriptors. Second, we introduce the Differential Scatter Discriminant Criterion (DSDC) to the subspace learning of local feature descriptors which avoids the matrix singularity problem. Third, we propose a generalized orthogonalization method to impose on projections, leading to a more compact and less redundant subspace. Extensive experimental validation on three benchmark datasets including UIUC-Sports, Scene-15 and MIT Indoor demonstrates that the proposed LFDP outperforms other dimensionality reduction methods and achieves state-of-the-art performance for image classification.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 37
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-08-05
    Description: We propose a real-time method to accurately track the human head pose in the 3-dimensional (3D) world. Using a RGB-Depth camera, a face template is reconstructed by fitting a 3D morphable face model, and the head pose is determined by registering this user-specific face template to the input depth video.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 38
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-08-05
    Description: Euclidean statistics are often generalized to Riemannian manifolds by replacing straight-line interpolations with geodesic ones. While these Riemannian models are familiar-looking, they are restricted by the inflexibility of geodesics, and they rely on constructions which are optimal only in Euclidean domains. We consider extensions of Principal Component Analysis (PCA) to Riemannian manifolds. Classic Riemannian approaches seek a geodesic curve passing through the mean that optimizes a criteria of interest. The requirements that the solution both is geodesic and must pass through the mean tend to imply that the methods only work well when the manifold is mostly flat within the support of the generating distribution. We argue that instead of generalizing linear Euclidean models, it is more fruitful to generalize non-linear Euclidean models. Specifically, we extend the classic Principal Curves from Hastie & Stuetzle to data residing on a complete Riemannian manifold. We show that for elliptical distributions in the tangent of spaces of constant curvature, the standard principal geodesic is a principal curve. The proposed model is simple to compute and avoids many of the pitfalls of traditional geodesic approaches. We empirically demonstrate the effectiveness of the Riemannian principal curves on several manifolds and datasets.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 39
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-08-05
    Description: An end-to-end real-time text localization and recognition method is presented. Its real-time performance is achieved by posing the character detection and segmentation problem as an efficient sequential selection from the set of Extremal Regions. The ER detector is robust against blur, low contrast and illumination, color and texture variation. In the first stage, the probability of each ER being a character is estimated using features calculated by a novel algorithm in constant time and only ERs with locally maximal probability are selected for the second stage, where the classification accuracy is improved using computationally more expensive features. A highly efficient clustering algorithm then groups ERs into text lines and an OCR classifier trained on synthetic fonts is exploited to label character regions. The most probable character sequence is selected in the last stage when the context of each character is known. The method was evaluated on three public datasets. On the ICDAR 2013 dataset the method achieves state-of-the-art results in text localization; on the more challenging SVT dataset, the proposed method significantly outperforms the state-of-the-art methods and demonstrates that the proposed pipeline can incorporate additional prior knowledge about the detected text. The proposed method was exploited as the baseline in the ICDAR 2015 Robust Reading competition, where it compares favourably to the state-of-the art.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 40
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-08-05
    Description: These instructions give guidelines for preparing papers for this publication. Presents information for authors publishing in this journal.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 41
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-08-05
    Description: This paper addresses modelling data using the Watson distribution. The Watson distribution is one of the simplest distributions for analyzing axially symmetric data. This distribution has gained some attention in recent years due to its modeling capability. However, its Bayesian inference is fairly understudied due to difficulty in handling the normalization factor. Recent development of Markov chain Monte Carlo (MCMC) sampling methods can be applied for this purpose. However, these methods can be prohibitively slow for practical applications. A deterministic alternative is provided by variational methods that convert inference problems into optimization problems. In this paper, we present a variational inference for Watson mixture models. First, the variational framework is used to side-step the intractability arising from the coupling of latent states and parameters. Second, the variational free energy is further lower bounded in order to avoid intractable moment computation. The proposed approach provides a lower bound on the log marginal likelihood and retains distributional information over all parameters. Moreover, we show that it can regulate its own complexity by pruning unnecessary mixture components while avoiding over-fitting. We discuss potential applications of the modeling with Watson distributions in the problem of blind source separation, and clustering gene expression data sets.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 42
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-05-06
    Description: Recently, head pose estimation (HPE) from low-resolution surveillance data has gained in importance. However, monocular and multi-view HPE approaches still work poorly under target motion , as facial appearance distorts owing to camera perspective and scale changes when a person moves around. To this end, we propose FEGA-MTL , a novel framework based on Multi-Task Learning (MTL) for classifying the head pose of a person who moves freely in an environment monitored by multiple, large field-of-view surveillance cameras. Upon partitioning the monitored scene into a dense uniform spatial grid, FEGA-MTL simultaneously clusters grid partitions into regions with similar facial appearance, while learning region-specific head pose classifiers. In the learning phase, guided by two graphs which a-priori model the similarity among (1) grid partitions based on camera geometry and (2) head pose classes, FEGA-MTL derives the optimal scene partitioning and associated pose classifiers. Upon determining the target's position using a person tracker at test time, the corresponding region-specific classifier is invoked for HPE. The FEGA-MTL framework naturally extends to a weakly supervised setting where the target's walking direction is employed as a proxy in lieu of head orientation. Experiments confirm that FEGA-MTL significantly outperforms competing single-task and multi-task learning methods in multi-view settings.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 43
    Publication Date: 2016-05-06
    Description: Recent literature shows that facial attributes, i.e., contextual facial information, can be beneficial for improving the performance of real-world applications, such as face verification, face recognition, and image search. Examples of face attributes include gender, skin color, facial hair, etc. How to robustly obtain these facial attributes (traits) is still an open problem, especially in the presence of the challenges of real-world environments: non-uniform illumination conditions, arbitrary occlusions, motion blur and background clutter. What makes this problem even more difficult is the enormous variability presented by the same subject, due to arbitrary face scales, head poses, and facial expressions. In this paper, we focus on the problem of facial trait classification in real-world face videos. We have developed a fully automatic hierarchical and probabilistic framework that models the collective set of frame class distributions and feature spatial information over a video sequence. The experiments are conducted on a large real-world face video database that we have collected, labelled and made publicly available. The proposed method is flexible enough to be applied to any facial classification problem. Experiments on a large, real-world video database McGillFaces  [1] of 18,000 video frames reveal that the proposed framework outperforms alternative approaches, by up to 16.96 and 10.13%, for the facial attributes of gender and facial hair, respectively.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 44
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-05-06
    Description: Typical feature selection methods choose an optimal global feature subset that is applied over all regions of the sample space. In contrast, in this paper we propose a novel localized feature selection (LFS) approach whereby each region of the sample space is associated with its own distinct optimized feature set, which may vary both in membership and size across the sample space. This allows the feature set to optimally adapt to local variations in the sample space. An associated method for measuring the similarities of a query datum to each of the respective classes is also proposed. The proposed method makes no assumptions about the underlying structure of the samples; hence the method is insensitive to the distribution of the data over the sample space. The method is efficiently formulated as a linear programming optimization problem. Furthermore, we demonstrate the method is robust against the over-fitting problem. Experimental results on eleven synthetic and real-world data sets demonstrate the viability of the formulation and the effectiveness of the proposed algorithm. In addition we show several examples where localized feature selection produces better results than a global feature selection method.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 45
    Publication Date: 2016-05-06
    Description: Many typical applications of object detection operate within a prescribed false-positive range. In this situation the performance of a detector should be assessed on the basis of the area under the ROC curve over that range, rather than over the full curve, as the performance outside the prescribed range is irrelevant. This measure is labelled as the partial area under the ROC curve (pAUC). We propose a novel ensemble learning method which achieves a maximal detection rate at a user-defined range of false positive rates by directly optimizing the partial AUC using structured learning. In addition, in order to achieve high object detection performance, we propose a new approach to extracting low-level visual features based on spatial pooling. Incorporating spatial pooling improves the translational invariance and thus the robustness of the detection process. Experimental results on both synthetic and real-world data sets demonstrate the effectiveness of our approach, and we show that it is possible to train state-of-the-art pedestrian detectors using the proposed structured ensemble learning method with spatially pooled features. The result is the current best reported performance on the Caltech-USA pedestrian detection dataset.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 46
    Publication Date: 2016-05-06
    Description: This paper addresses the problem of matching common node correspondences among multiple graphs referring to an identical or related structure. This multi-graph matching problem involves two correlated components: i) the local pairwise matching affinity across pairs of graphs; ii) the global matching consistency that measures the uniqueness of the pairwise matchings by different composition orders. Previous studies typically either enforce the matching consistency constraints in the beginning of an iterative optimization, which may propagate matching error both over iterations and across graph pairs; or separate affinity optimization and consistency enforcement into two steps. This paper is motivated by the observation that matching consistency can serve as a regularizer in the affinity objective function especially when the function is biased due to noises or inappropriate modeling. We propose composition-based multi-graph matching methods to incorporate the two aspects by optimizing the affinity score, meanwhile gradually infusing the consistency. We also propose two mechanisms to elicit the common inliers against outliers. Compelling results on synthetic and real images show the competency of our algorithms.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 47
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-05-06
    Description: Standard edge detection operators such as the Laplacian of Gaussian and the gradient of Gaussian can be used to track contours in image sequences. When using edge operators, a contour, which is determined on a frame of the sequence, is simply used as a starting contour to locate the nearest contour on the subsequent frame. However, the strategy used to look for the nearest edge points may not work when tracking contours of non isolated gray level discontinuities. In these cases, strategies derived from the optical flow equation, which look for similar gray level distributions, appear to be more appropriate since these can work with a lower frame rate than that needed for strategies based on pure edge detection operators. However, an optical flow strategy tends to propagate the localization errors through the sequence and an additional edge detection procedure is essential to compensate for such a drawback. In this paper a spatio-temporal intensity moment is proposed which integrates the two basic functions of edge detection and tracking.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 48
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-05-06
    Description: Design and development of efficient and accurate feature descriptors is critical for the success of many computer vision applications. This paper proposes a new feature descriptor, referred to as DoN, for the 2D palmprint matching. The descriptor is extracted for each point on the palmprint. It is based on the ordinal measure which partially describes the difference of the neighboring points’ normal vectors. DoN has at least two advantages: 1) it describes the 3D information, which is expected to be highly stable under commonly occurring illumination variations during contactless imaging; 2) the size of DoN for each point is only one bit, which is computationally simple to extract, easy to match, and efficient to storage. We show that such 3D information can be extracted from a single 2D palmprint image. The analysis for the effectiveness of ordinal measure for palmprint matching is also provided. Four publicly available 2D palmprint databases are used to evaluate the effectiveness of DoN, both for identification and the verification. Our method on all these databases achieves the state-of-the-art performance.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 49
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-05-06
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 50
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-05-06
    Description: Searching for matches to high-dimensional vectors using hard/soft vector quantization is the most computationally expensive part of various computer vision algorithms including the bag of visual word (BoW). This paper proposes a fast computation method, Neighbor-to-Neighbor (NTN) search [1] , which skips some calculations based on the similarity of input vectors. For example, in image classification using dense SIFT descriptors, the NTN search seeks similar descriptors from a point on a grid to an adjacent point. Applications of the NTN search to vector quantization, a Gaussian mixture model, sparse coding, and a kernel codebook for extracting image or video representation are presented in this paper. We evaluated the proposed method on image and video benchmarks: the PASCAL VOC 2007 Classification Challenge and the TRECVID 2010 Semantic Indexing Task. NTN-VQ reduced the coding cost by 77.4 percent, and NTN-GMM reduced it by 89.3 percent, without any significant degradation in classification performance.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 51
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-05-09
    Description: A new data structure for efficient similarity search in very large datasets of high-dimensional vectors is introduced. This structure called the inverted multi-index generalizes the inverted index idea by replacing the standard quantization within inverted indices with product quantization. For very similar retrieval complexity and pre-processing time, inverted multi-indices achieve a much denser subdivision of the search space compared to inverted indices, while retaining their memory efficiency. Our experiments with large datasets of SIFT and GIST vectors demonstrate that because of the denser subdivision, inverted multi-indices are able to return much shorter candidate lists with higher recall. Augmented with a suitable reranking procedure, multi-indices were able to significantly improve the speed of approximate nearest neighbor search on the dataset of 1 billion SIFT vectors compared to the best previously published systems, while achieving better recall and incurring only few percent of memory overhead.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 52
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-05-09
    Description: While 3D object-centered shape-based models are appealing in comparison with 2D viewer-centered appearance-based models for their lower model complexities and potentially better view generalizabilities, the learning and inference of 3D models has been much less studied in the recent literature due to two factors: i) the enormous complexities of 3D shapes in geometric space; and ii) the gap between 3D shapes and their appearances in images. This paper aims at tackling the two problems by studying an And-Or Tree (AoT) representation that consists of two parts: i) a geometry-AoT quantizing the geometry space, i.e. the possible compositions of 3D volumetric parts and 2D surfaces within the volumes; and ii) an appearance-AoT quantizing the appearance space, i.e. the appearance variations of those shapes in different views. In this AoT, an And-node decomposes an entity into constituent parts, and an Or-node represents alternative ways of decompositions. Thus it can express a combinatorial number of geometry and appearance configurations through small dictionaries of 3D shape primitives and 2D image primitives. In the quantized space, the problem of learning a 3D object template is transformed to a structure search problem which can be efficiently solved in a dynamic programming algorithm by maximizing the information gain. We focus on learning 3D car templates from the AoT and collect a new car dataset featuring more diverse views. The learned car templates integrate both the shape-based model and the appearance-based model to combine the benefits of both. In experiments, we show three aspects: 1) the AoT is more efficient than the frequently used octree method in space representation; 2) the learned 3D car template matches the state-of-the art performances on car detection and pose estimation in a public multi-view car dataset; and 3) in our new dataset, the learned 3D template solves the joint task of simultaneous object detection, pose/view estimation, and part locali- ation. It can generalize over unseen views and performs better than the version 5 of the DPM model in terms of object detection and semantic part localization.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 53
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-05-09
    Description: Semantic segmentation and object detection are nowadays dominated by methods operating on regions obtained as a result of a bottom-up grouping process (segmentation) but use feature extractors developed for recognition on fixed-form (e.g. rectangular) patches, with full images as a special case. This is most likely suboptimal. In this paper we focus on feature extraction and description over free-form regions and study the relationship with their fixed-form counterparts. Our main contributions are novel pooling techniques that capture the second-order statistics of local descriptors inside such free-form regions. We introduce second-order generalizations of average and max-pooling that together with appropriate non-linearities, derived from the mathematical structure of their embedding space, lead to state-of-the-art recognition performance in semantic segmentation experiments without any type of local feature coding. In contrast, we show that codebook-based local feature coding is more important when feature extraction is constrained to operate over regions that include both foreground and large portions of the background, as typical in image classification settings, whereas for high-accuracy localization setups, second-order pooling over free-form regions produces results superior to those of the winning systems in the contemporary semantic segmentation challenges, with models that are much faster in both training and testing.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 54
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-05-09
    Description: Autoencoders are popular feature learning models, that are conceptually simple, easy to train and allow for efficient inference. Recent work has shown how certain autoencoders can be associated with an energy landscape, akin to negative log-probability in a probabilistic model, which measures how well the autoencoder can represent regions in the input space. The energy landscape has been commonly inferred heuristically, by using a training criterion that relates the autoencoder to a probabilistic model such as a Restricted Boltzmann Machine (RBM). In this paper we show how most common autoencoders are naturally associated with an energy function, independent of the training procedure, and that the energy landscape can be inferred analytically by integrating the reconstruction function of the autoencoder. For autoencoders with sigmoid hidden units, the energy function is identical to the free energy of an RBM, which helps shed light onto the relationship between these two types of model. We also show that the autoencoder energy function allows us to explain common regularization procedures, such as contractive training, from the perspective of dynamical systems. As a practical application of the energy function, a generative classifier based on class-specific autoencoders is presented.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 55
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-05-09
    Description: This paper introduces a new high dynamic range (HDR) imaging algorithm which utilizes rank minimization. Assuming a camera responses linearly to scene radiance, the input low dynamic range (LDR) images captured with different exposure time exhibit a linear dependency and form a rank-1 matrix when stacking intensity of each corresponding pixel together. In practice, misalignments caused by camera motion, presences of moving objects, saturations and image noise break the rank-1 structure of the LDR images. To address these problems, we present a rank minimization algorithm which simultaneously aligns LDR images and detects outliers for robust HDR generation. We evaluate the performances of our algorithm systematically using synthetic examples and qualitatively compare our results with results from the state-of-the-art HDR algorithms using challenging real world examples.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 56
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-05-09
    Description: Random forests works by averaging several predictions of de-correlated trees. We show a conceptually radical approach to generate a random forest: random sampling of many trees from a prior distribution, and subsequently performing a weighted ensemble of predictive probabilities. Our approach uses priors that allow sampling of decision trees even before looking at the data, and a power likelihood that explores the space spanned by combination of decision trees. While each tree performs Bayesian inference to compute its predictions, our aggregation procedure uses the power likelihood rather than the likelihood and is therefore strictly speaking not Bayesian. Nonetheless, we refer to it as a Bayesian random forest but with a built-in safety. The safeness comes as it has good predictive performance even if the underlying probabilistic model is wrong. We demonstrate empirically that our Safe-Bayesian random forest outperforms MCMC or SMC based Bayesian decision trees in term of speed and accuracy, and achieves competitive performance to entropy or Gini optimised random forest, yet is very simple to construct.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 57
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-05-09
    Description: Traditional Web search engines do not use the images in the HTML pages to find relevant documents for a given query. Instead, they typically operate by computing a measure of agreement between the keywords provided by the user and only the text portion of each page. In this paper we study whether the content of the pictures appearing in a Web page can be used to enrich the semantic description of an HTML document and consequently boost the performance of a keyword-based search engine. We present a Web-scalable system that exploits a pure text-based search engine to find an initial set of candidate documents for a given query. Then, the candidate set is reranked using visual information extracted from the images contained in the pages. The resulting system retains the computational efficiency of traditional text-based search engines with only a small additional storage cost needed to encode the visual information. We test our approach on one of the TREC Million Query Track benchmarks where we show that the exploitation of visual content yields improvement in accuracies for two distinct text-based search engines, including the system with the best reported performance on this benchmark. We further validate our approach by collecting document relevance judgements on our search results using Amazon Mechanical Turk. The results of this experiment confirm the improvement in accuracy produced by our image-based reranker over a pure text-based system.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 58
    Publication Date: 2015-05-09
    Description: This paper proposes a deterministic explanation for mutual-information-based image registration (MI registration). The explanation is that MI registration works because it aligns certain image partitions. This notion of aligning partitions is new, and is shown to be related to Schur- and quasi-convexity. The partition-alignment theory of this paper goes beyond explaining mutual- information. It suggests other objective functions for registering images. Some of these newer objective functions are not entropy-based. Simulations with noisy images show that the newer objective functions work well for registration, lending support to the theory. The theory proposed in this paper opens a number of directions for further research in image registration. These directions are also discussed.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 59
    Publication Date: 2015-05-09
    Description: Connected operators provide well-established solutions for digital image processing, typically in conjunction with hierarchical schemes. In graph-based frameworks, such operators basically rely on symmetric adjacency relations between pixels. In this article, we introduce a notion of directed connected operators for hierarchical image processing, by also considering non-symmetric adjacency relations. The induced image representation models are no longer partition hierarchies (i.e., trees), but directed acyclic graphs that generalize standard morphological tree structures such as component trees, binary partition trees or hierarchical watersheds. We describe how to efficiently build and handle these richer data structures, and we illustrate the versatility of the proposed framework in image filtering and image segmentation.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 60
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-05-09
    Description: Demographic estimation entails automatic estimation of age, gender and race of a person from his face image, which has many potential applications ranging from forensics to social media. Automatic demographic estimation, particularly age estimation, remains a challenging problem because persons belonging to the same demographic group can be vastly different in their facial appearances due to intrinsic and extrinsic factors. In this paper, we present a generic framework for automatic demographic (age, gender and race) estimation. Given a face image, we first extract demographic informative features via a boosting algorithm, and then employ a hierarchical approach consisting of between-group classification, and within-group regression. Quality assessment is also developed to identify low-quality face images that are difficult to obtain reliable demographic estimates. Experimental results on a diverse set of face image databases, FG-NET ( $1K$ images), FERET ( $3K$ images), MORPH II ( $75K$ images), PCSO ( $100K$ images), and a subset of LFW ( $4K$ images), show that the proposed approach has superior performance compared to t- e state of the art. Finally, we use crowdsourcing to study the human perception ability of estimating demographics from face images. A side-by-side comparison of the demographic estimates from crowdsourced data and the proposed algorithm provides a number of insights into this challenging problem.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 61
    Publication Date: 2015-05-09
    Description: Automatic affect analysis has attracted great interest in various contexts including the recognition of action units and basic or non-basic emotions. In spite of major efforts, there are several open questions on what the important cues to interpret facial expressions are and how to encode them. In this paper, we review the progress across a range of affect recognition applications to shed light on these fundamental questions. We analyse the state-of-the-art solutions by decomposing their pipelines into fundamental components, namely face registration, representation, dimensionality reduction and recognition. We discuss the role of these components and highlight the models and new trends that are followed in their design. Moreover, we provide a comprehensive analysis of facial representations by uncovering their advantages and limitations; we elaborate on the type of information they encode and discuss how they deal with the key challenges of illumination variations, registration errors, head-pose variations, occlusions, and identity bias. This survey allows us to identify open issues and to define future directions for designing real-world affect recognition systems.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 62
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-05-09
    Description: The high complexity of multi-scale, category-level object detection in cluttered scenes is efficiently handled by Hough voting methods. However, the main shortcoming of the approach is that mutually dependent local observations are independently casting their votes for intrinsically global object properties such as object scale. Object hypotheses are then assumed to be a mere sum of their part votes. Popular representation schemes are, however, based on a dense sampling of semi-local image features, which are consequently mutually dependent. We take advantage of part dependencies and incorporate them into probabilistic Hough voting by deriving an objective function that connects three intimately related problems: i) grouping mutually dependent parts, ii) solving the correspondence problem conjointly for dependent parts, and iii) finding concerted object hypotheses using extended groups rather than based on local observations alone. Early commitments are avoided by not restricting parts to only a single vote for a locally best correspondence and we learn a weighting of parts during training to reflect their differing relevance for an object. Experiments successfully demonstrate the benefit of incorporating part dependencies through grouping into Hough voting. The joint optimization of groupings, correspondences, and votes not only improves the detection accuracy over standard Hough voting and a sliding window baseline, but it also reduces the computational complexity by significantly decreasing the number of candidate hypotheses.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 63
    Publication Date: 2015-05-09
    Description: A proper temporal model is essential to analysis tasks involving sequential data. In computer-assisted surgical training, which is the focus of this study, obtaining accurate temporal models is a key step towards automated skill-rating. Conventional learning approaches can have only limited success in this domain due to insufficient amount of data with accurate labels. We propose a novel formulation termed Relative Hidden Markov Model and develop algorithms for obtaining a solution under this formulation. The method requires only relative ranking between input pairs, which are readily available from training sessions in the target application, hence alleviating the requirement on data labeling. The proposed algorithm learns a model from the training data so that the attribute under consideration is linked to the likelihood of the input, hence supporting comparing new sequences. For evaluation, synthetic data are first used to assess the performance of the approach, and then we experiment with real videos from a widely-adopted surgical training platform. Experimental results suggest that the proposed approach provides a promising solution to video-based motion skill evaluation. To further illustrate the potential of generalizing the method to other applications of temporal analysis, we also report experiments on using our model on speech-based emotion recognition.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 64
    Publication Date: 2015-05-09
    Description: These instructions give guidelines for preparing papers for this publication. Presents information for authors publishing in this journal.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 65
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-05-09
    Description: We present a fully automatic system for extracting the semantic structure of a typical academic presentation video, which captures the whole presentation stage with abundant camera motions such as panning, tilting, and zooming. Our system automatically detects and tracks both the projection screen and the presenter whenever they are visible in the video. By analyzing the image content of the tracked screen region, our system is able to detect slide progressions and extract a high-quality, non-occluded, geometrically-compensated image for each slide, resulting in a list of representative images that reconstruct the main presentation structure. Afterwards, our system recognizes text content and extracts keywords from the slides, which can be used for keyword-based video retrieval and browsing. Experimental results show that our system is able to generate more stable and accurate screen localization results than commonly-used object tracking methods. Our system also extracts more accurate presentation structures than general video summarization methods, for this specific type of video.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 66
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-05-09
    Description: A robust and effective specular highlight removal method is proposed in this paper. It is based on a key observation—the maximum fraction of the diffuse colour component in diffuse local patches in colour images changes smoothly. The specular pixels can thus be treated as noise in this case. This property allows the specular highlights to be removed in an image denoising fashion: an edge-preserving low-pass filter (e.g., the bilateral filter) can be used to smooth the maximum fraction of the colour components of the original image to remove the noise contributed by the specular pixels. Recent developments in fast bilateral filtering techniques enable the proposed method to run over $200times$ faster than state-of-the-art techniques on a standard CPU and differentiates it from previous work.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 67
    Publication Date: 2015-05-09
    Description: We propose a face alignment framework that relies on the texture model generated by the responses of discriminatively trained part-based filters. Unlike standard texture models built from pixel intensities or responses generated by generic filters (e.g. Gabor), our framework has two important advantages. First, by virtue of discriminative training, invariance to external variations (like identity, pose, illumination and expression) is achieved. Second, we show that the responses generated by discriminatively trained filters (or patch-experts) are sparse and can be modeled using a very small number of parameters. As a result, the optimization methods based on the proposed texture model can better cope with unseen variations. We illustrate this point by formulating both part-based and holistic approaches for generic face alignment and show that our framework outperforms the state-of-the-art on multiple ”wild” databases. The code and dataset annotations are available for research purposes from http://ibug.doc.ic.ac.uk/resources.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 68
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-04-04
    Description: We propose a new family of message passing techniques for MAP estimation in graphical models which we call Sequential Reweighted Message Passing (SRMP). Special cases include well-known techniques such as Min-Sum Diffusion (MSD) and a faster Sequential Tree-Reweighted Message Passing (TRW-S). Importantly, our derivation is simpler than the original derivation of TRW-S, and does not involve a decomposition into trees. This allows easy generalizations. The new family of algorithms can be viewed as a generalization of TRW-S from pairwise to higher-order graphical models. We test SRMP on several real-world problems with promising results.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 69
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-04-04
    Description: Objects occupy physical space and obey physical laws. To truly understand a scene, we must reason about the space that objects in it occupy, and how each objects is supported stably by each other. In other words, we seek to understand which objects would, if moved, cause other objects to fall. This 3D volumetric reasoning is important for many scene understanding tasks, ranging from segmentation of objects to perception of a rich 3D, physically well-founded, interpretations of the scene. In this paper, we propose a new algorithm to parse a single RGB-D image with 3D block units while jointly reasoning about the segments, volumes, supporting relationships, and object stability. Our algorithm is based on the intuition that a good 3D representation of the scene is one that fits the depth data well, and is a stable, self-supporting arrangement of objects (i.e., one that does not topple). We design an energy function for representing the quality of the block representation based on these properties. Our algorithm fits 3D blocks to the depth values corresponding to image segments, and iteratively optimizes the energy function. Our proposed algorithm is the first to consider stability of objects in complex arrangements for reasoning about the underlying structure of the scene. Experimental results show that our stability-reasoning framework improves RGB-D segmentation and scene volumetric representation.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 70
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-04-04
    Description: Statistical optimality in multipartite ranking is investigated as an extension of bipartite ranking. We consider the optimality of ranking algorithms through minimization of the theoretical risk which combines pairwise ranking errors of ordinal categories with differential ranking costs. The extension shows that for a certain class of convex loss functions including exponential loss, the optimal ranking function can be represented as a ratio of weighted conditional probability of upper categories to lower categories, where the weights are given by the misranking costs. This result also bridges traditional ranking methods such as proportional odds model in statistics with various ranking algorithms in machine learning. Further, the analysis of multipartite ranking with different costs provides a new perspective on non-smooth listwise ranking measures such as the discounted cumulative gain and preference learning. We illustrate our findings with simulation study and real data analysis.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 71
    Publication Date: 2015-04-04
    Description: Human re-identification across cameras with non-overlapping fields of view is one of the most important and difficult problems in video surveillance and analysis. However, current algorithms are likely to fail in real-world scenarios for several reasons. For example, surveillance cameras are typically mounted high above the ground plane, causing serious perspective changes. Also, most algorithms approach matching across images using the same descriptors, regardless of camera viewpoint or human pose. Here, we introduce a re-identification algorithm that addresses both problems. We build a model for human appearance as a function of pose, using training data gathered from a calibrated camera. We then apply this “pose prior” in online re-identification to make matching and identification more robust to viewpoint. We further integrate person-specific features learned over the course of tracking to improve the algorithm’s performance. We evaluate the performance of the proposed algorithm and compare it to several state-of-the-art algorithms, demonstrating superior performance on standard benchmarking datasets as well as a challenging new airport surveillance scenario.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 72
    Publication Date: 2015-04-04
    Description: Pushbroom cameras are widely used for earth observation applications. This sensor acquires 1D images over time and uses the straight motion of the satellite to sweep out a region of space and build a 2D image. The stability of the satellite is critical during the pushbroom acquisition process. Therefore its attitude is assumed to be constant over time. However, the recent manufacture of smaller and lighter satellites to reduce launching cost has weakened this assumption. Small oscillations of the satellite’s attitude can result in noticeable warps in images, and geolocation information is lost as the satellite does not capture what it ought to. Current solutions use inertial sensors to control the attitude and correct the images, but they are costly and of limited precision. As the warped images do contain information about attitude variations, we suggest using image registration to estimate them. We exploit the geometry of the focal plane and the stationary nature of the disturbances to recover undistorted images. We embed the estimation in a Bayesian framework where image registration, a prior on attitude variations and a radiometric correction model are fused to retrieve the motion of the satellite. We illustrate the performance of our algorithm on four satellite datasets.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 73
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-02-07
    Description: In this paper we propose a new method for generating synthetic handwritten signature images for biometric applications. The procedures we introduce imitate the mechanism of motor equivalence which divides human handwriting into two steps: the working out of an effector independent action plan and its execution via the corresponding neuromuscular path. The action plan is represented as a trajectory on a spatial grid. This contains both the signature text and its flourish, if there is one. The neuromuscular path is simulated by applying a kinematic Kaiser filter to the trajectory plan. The length of the filter depends on the pen speed which is generated using a scalar version of the sigma lognormal model. An ink deposition model, applied pixel by pixel to the pen trajectory, provides realistic static signature images. The lexical and morphological properties of the synthesized signatures as well as the range of the synthesis parameters have been estimated from real databases of real signatures such as the MCYT Off-line and the GPDS960GraySignature corpuses. The performance experiments show that by tuning only four parameters it is possible to generate synthetic identities with different stability and forgers with different skills. Therefore it is possible to create datasets of synthetic signatures with a performance similar to databases of real signatures. Moreover, we can customize the created dataset to produce skilled forgeries or simple forgeries which are easier to detect, depending on what the researcher needs. Perceptual evaluation gives an average confusion of 44.06 percent between real and synthetic signatures which shows the realism of the synthetic ones. The utility of the synthesized signatures is demonstrated by studying the influence of the pen type and number of users on an automatic signature verifier.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 74
    Publication Date: 2015-02-07
    Description: Provides instructions and guidelines to prospective authors who wish to submit manuscripts.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 75
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-02-07
    Description: We present a new approach for matching sets of branching curvilinear structures that form graphs embedded in ${mathbb {R}}^2$ or ${mathbb {R}}^3$ and may be subject to deformations. Unlike earlier methods, ours does not rely on local appearance similarity nor does require a good initial alignment. Furthermore, it can cope with non-linear deformations, topological differences, and partial graphs. To handle arbitrary non-linear deformations, we use Gaussian process regressions to represent the geometrical mapping relating the two graphs. In the absence of appearance information, we iteratively establish correspondences between points, update the mapping accordingly, and use it to estimate where to find the most likely correspondences that will be used in the next step. To make the computation tractable for large graphs, the set of new potential matches considered at each iteration is not selected at random as with many RANSAC-based algorithms. Instead, we introduce a so-called Active Testing Search strategy that performs a priority search to favor the most likely matches and speed-up the process. We demonstrate the effectiveness of our approach first on synthetic cases and then on angiography data, retinal fundus images, and microscopy image stacks acquired at very different resolutions.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 76
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-04-01
    Description: Symmetric Positive Definite (SPD) matrices emerge as data descriptors in several applications of computer vision such as object tracking, texture recognition, and diffusion tensor imaging. Clustering these data matrices forms an integral part of these applications, for which soft-clustering algorithms (K-Means, expectation maximization, etc.) are generally used. As is well-known, these algorithms need the number of clusters to be specified, which is difficult when the dataset scales. To address this issue, we resort to the classical nonparametric Bayesian framework by modeling the data as a mixture model using the Dirichlet process (DP) prior. Since these matrices do not conform to the Euclidean geometry, rather belongs to a curved Riemannian manifold,existing DP models cannot be directly applied. Thus, in this paper, we propose a novel DP mixture model framework for SPD matrices. Using the log-determinant divergence as the underlying dissimilarity measure to compare these matrices, and further using the connection between this measure and the Wishart distribution, we derive a novel DPM model based on the Wishart-Inverse-Wishart conjugate pair. We apply this model to several applications in computer vision. Our experiments demonstrate that our model is scalable to the dataset size and at the same time achieves superior accuracy compared to several state-of-the-art parametric and nonparametric clustering algorithms.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 77
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-04-01
    Description: We demonstrate the usefulness of surroundedness for eye fixation prediction by proposing a Boolean Map based Saliency model (BMS). In our formulation, an image is characterized by a set of binary images, which are generated by randomly thresholding the image's feature maps in a whitened feature space. Based on a Gestalt principle of figure-ground segregation, BMS computes a saliency map by discovering surrounded regions via topological analysis of Boolean maps. Furthermore, we draw a connection between BMS and the Minimum Barrier Distance to provide insight into why and how BMS can properly captures the surroundedness cue via Boolean maps. The strength of BMS is verified by its simplicity, efficiency and superior performance compared with 10 state-of-the-art methods on seven eye tracking benchmark datasets.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 78
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-04-01
    Description: We seek a practical method for establishing dense correspondences between two images with similar content, but possibly different 3D scenes. One of the challenges in designing such a system is the local scale differences of objects appearing in the two images. Previous methods often considered only few image pixels; matching only pixels for which stable scales may be reliably estimated. Recently, others have considered dense correspondences, but with substantial costs associated with generating, storing and matching scale invariant descriptors. Our work is motivated by the observation that pixels in the image have contexts—the pixels around them—which may be exploited in order to reliably estimate local scales. We make the following contributions. (i) We show that scales estimated in sparse interest points may be propagated to neighboring pixels where this information cannot be reliably determined. Doing so allows scale invariant descriptors to be extracted anywhere in the image. (ii) We explore three means for propagating this information: using the scales at detected interest points, using the underlying image information to guide scale propagation in each image separately, and using both images together. Finally, (iii), we provide extensive qualitative and quantitative results, demonstrating that scale propagation allows for accurate dense correspondences to be obtained even between very different images, with little computational costs beyond those required by existing methods.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 79
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-04-01
    Description: A robust algorithm is proposed for tracking a target object in dynamic conditions including motion blurs, illumination changes, pose variations, and occlusions. To cope with these challenging factors, multiple trackers based on different feature representations are integrated within a probabilistic framework. Each view of the proposed multiview (multi-channel) feature learning algorithm is concerned with one particular feature representation of a target object from which a tracker is developed with different levels of reliability. With the multiple trackers, the proposed algorithm exploits tracker interaction and selection for robust tracking performance. In the tracker interaction, a transition probability matrix is used to estimate dependencies between trackers. Multiple trackers communicate with each other by sharing information of sample distributions. The tracker selection process determines the most reliable tracker with the highest probability. To account for object appearance changes, the transition probability matrix and tracker probability are updated in a recursive Bayesian framework by reflecting the tracker reliability measured by a robust tracker likelihood function that learns to account for both transient and stable appearance changes. Experimental results on benchmark datasets demonstrate that the proposed interacting multiview algorithm performs robustly and favorably against state-of-the-art methods in terms of several quantitative metrics.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 80
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-04-01
    Description: In this study, we show that landmark detection or face alignment task is not a single and independent problem. Instead, its robustness can be greatly improved with auxiliary information. Specifically, we jointly optimize landmark detection together with the recognition of heterogeneous but subtly correlated facial attributes, such as gender, expression, and appearance attributes. This is non-trivial since different attribute inference tasks have different learning difficulties and convergence rates. To address this problem, we formulate a novel tasks-constrained deep model, which not only learns the inter-task correlation but also employs dynamic task coefficients to facilitate the optimization convergence when learning multiple complex tasks. Extensive evaluations show that the proposed task-constrained learning (i) outperforms existing face alignment methods, especially in dealing with faces with severe occlusion and pose variation, and (ii) reduces model complexity drastically compared to the state-of-the-art methods based on cascaded deep model.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 81
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-04-01
    Description: Archetypal analysis is a popular exploratory tool that explains a set of observations as compositions of few ‘pure’ patterns. The standard formulation of archetypal analysis addresses this problem for real valued observations by finding the approximate convex hull. Recently, a probabilistic formulation has been suggested which extends this framework to other observation types such as binary and count. In this article we further extend this framework to address the general case of nominal observations which includes, for example, multiple-option questionnaires. We view archetypal analysis in a generative framework: this allows explicit control over choosing a suitable number of archetypes by assigning appropriate prior information, and finding efficient update rules using variational Bayes’. We demonstrate the efficacy of this approach extensively on simulated data, and three real world examples: Austrian guest survey dataset, German credit dataset, and SUN attribute image dataset.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 82
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-04-01
    Description: In this paper, we propose a visual tracker based on a metric-weighted linear representation of appearance. In order to capture the interdependence of different feature dimensions, we develop two online distance metric learning methods using proximity comparison information and structured output learning. The learned metric is then incorporated into a linear representation of appearance. We show that online distance metric learning significantly improves the robustness of the tracker, especially on those sequences exhibiting drastic appearance changes. In order to bound growth in the number of training samples, we design a time-weighted reservoir sampling method. Moreover, we enable our tracker to automatically perform object identification during the process of object tracking, by introducing a collection of static template samples belonging to several object classes of interest. Object identification results for an entire video sequence are achieved by systematically combining the tracking information and visual recognition at each frame. Experimental results on challenging video sequences demonstrate the effectiveness of the method for both inter-frame tracking and object identification.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 83
    Publication Date: 2016-04-01
    Description: Natural images are scale invariant with structures at all length scales.We formulated a geometric view of scale invariance in natural images using percolation theory, which describes the behavior of connected clusters on graphs.We map images to the percolation model by defining clusters on a binary representation for images. We show that critical percolating structures emerge in natural images and study their scaling properties by identifying fractal dimensions and exponents for the scale-invariant distributions of clusters. This formulation leads to a method for identifying clusters in images from underlying structures as a starting point for image segmentation.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 84
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-01-08
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 85
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-01-08
    Description: Compact and discriminative visual codebooks are preferred in many visual recognition tasks. In the literature, a number of works have taken the approach of hierarchically merging visual words of an initial large-sized codebook, but implemented this approach with different merging criteria. In this work, we propose a single probabilistic framework to unify these merging criteria, by identifying two key factors: the function used to model the class-conditional distribution and the method used to estimate the distribution parameters. More importantly, by adopting new distribution functions and/or parameter estimation methods, our framework can readily produce a spectrum of novel merging criteria. Three of them are specifically discussed in this paper. For the first criterion, we adopt the multinomial distribution with the Bayesian method; For the second criterion, we integrate the Gaussian distribution with maximum likelihood parameter estimation. For the third criterion, which shows the best merging performance, we propose a max-margin-based parameter estimation method and apply it with the multinomial distribution. Extensive experimental study is conducted to systematically analyze the performance of the above three criteria and compare them with existing ones. As demonstrated, the best criterion within our framework achieves the overall best merging performance among the compared merging criteria developed in the literature.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 86
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-01-08
    Description: We propose a method to address challenges in unconstrained face detection, such as arbitrary pose variations and occlusions. First, a new image feature called Normalized Pixel Difference (NPD) is proposed. NPD feature is computed as the difference to sum ratio between two pixel values, inspired by the Weber Fraction in experimental psychology. The new feature is scale invariant, bounded, and is able to reconstruct the original image. Second, we propose a deep quadratic tree to learn the optimal subset of NPD features and their combinations, so that complex face manifolds can be partitioned by the learned rules. This way, only a single soft-cascade classifier is needed to handle unconstrained face detection. Furthermore, we show that the NPD features can be efficiently obtained from a look up table, and the detection template can be easily scaled, making the proposed face detector very fast. Experimental results on three public face datasets (FDDB, GENKI, and CMU-MIT) show that the proposed method achieves state-of-the-art performance in detecting unconstrained faces with arbitrary pose variations and occlusions in cluttered scenes.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 87
    Publication Date: 2016-01-08
    Description: This work presents a deformable point set registration algorithm that seeks an optimal set of radial basis functions to describe the registration. A novel, global optimization approach is introduced composed of simulated annealing with a particle filter based generator function to perform the registration. It is shown how constraints can be incorporated into this framework. A constraint on the deformation is enforced whose role is to ensure physically meaningful fields (i.e., invertible). Further, examples in which landmark constraints serve to guide the registration are shown. Results on 2D and 3D data demonstrate the algorithm’s robustness to noise and missing information.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 88
    Publication Date: 2016-01-08
    Description: In this paper we present an algorithmic approach for fitting isotonic models under convex, yet non-differentiable, loss functions. It is a generalization of the greedy non-regret approach proposed by Luss and Rosset (2014) for differentiable loss functions, taking into account the sub-gradiental extensions required. We prove that our suggested algorithm solves the isotonic modeling problem while maintaining favorable computational and statistical properties. As our suggested algorithm may be used for any non-differentiable loss function, we focus our interest on isotonic modeling for either regression or two-class classification with appropriate log-likelihood loss and lasso penalty on the fitted values. This combination allows us to maintain the non-parametric nature of isotonic modeling, while controlling model complexity through regularization. We demonstrate the efficiency and usefulness of this approach on both synthetic and real world data. An implementation of our suggested solution is publicly available from the first author's website (https://sites.google.com/site/amichaipainsky/software).
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 89
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-01-08
    Description: We present a novel non-parametric Bayesian model to jointly discover the dynamics of low-level actions and high-level behaviors of tracked objects. In our approach, actions capture both linear, low-level object dynamics, and an additional spatial distribution on where the dynamic occurs. Furthermore, behavior classes capture high-level temporal motion dependencies in Markov chains of actions, thus each learned behavior is a switching linear dynamical system. The number of actions and behaviors is discovered from the data itself using Dirichlet Processes. We are especially interested in cases where tracks can exhibit large kinematic and spatial variations, e.g. person tracks in open environments, as found in the visual surveillance and intelligent vehicle domains. The model handles real-valued features directly, so no information is lost by quantizing measurements into ‘visual words’, and variations in standing, walking and running can be discovered without discrete thresholds. We describe inference using Markov Chain Monte Carlo sampling and validate our approach on several artificial and real-world pedestrian track datasets from the surveillance and intelligent vehicle domain. We show that our model can distinguish between relevant behavior patterns that an existing state-of-the-art hierarchical model for clustering and simpler model variants cannot. The software and the artificial and surveillance datasets are made publicly available for benchmarking purposes.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 90
    Publication Date: 2016-01-08
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 91
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-04-01
    Description: Semantic segmentation is the problem of assigning an object label to each pixel. It unifies the image segmentation and object recognition problems. The importance of using contextual information in semantic segmentation frameworks has been widely realized in the field. We propose a contextual framework, called contextual hierarchical model (CHM) , which learns contextual information in a hierarchical framework for semantic segmentation. At each level of the hierarchy, a classifier is trained based on downsampled input images and outputs of previous levels. Our model then incorporates the resulting multi-resolution contextual information into a classifier to segment the input image at original resolution. This training strategy allows for optimization of a joint posterior probability at multiple resolutions through the hierarchy. Contextual hierarchical model is purely based on the input image patches and does not make use of any fragments or shape examples. Hence, it is applicable to a variety of problems such as object segmentation and edge detection. We demonstrate that CHM performs at par with state-of-the-art on Stanford background and Weizmann horse datasets. It also outperforms state-of-the-art edge detection methods on NYU depth dataset and achieves state-of-the-art on Berkeley segmentation dataset (BSDS 500).
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 92
    Publication Date: 2016-04-01
    Description: Hyperspectral imaging is beneficial to many applications but most traditional methods do not consider fluorescent effects which are present in everyday items ranging from paper to even our food. Furthermore, everyday fluorescent items exhibit a mix of reflection and fluorescence so proper separation of these components is necessary for analyzing them. In recent years, effective imaging methods have been proposed but most require capturing the scene under multiple illuminants. In this paper, we demonstrate efficient separation and recovery of reflectance and fluorescence emission spectra through the use of two high frequency illuminations in the spectral domain. With the obtained fluorescence emission spectra from our high frequency illuminants, we then describe how to estimate the fluorescence absorption spectrum of a material given its emission spectrum. In addition, we provide an in depth analysis of our method and also show that filters can be used in conjunction with standard light sources to generate the required high frequency illuminants. We also test our method under ambient light and demonstrate an application of our method to synthetic relighting of real scenes.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 93
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-04-01
    Description: We propose a new approach to simultaneously recover camera pose and 3D shape of non-rigid and potentially extensible surfaces from a monocular image sequence. For this purpose, we make use of the Extended Kalman Filter based Simultaneous Localization And Mapping (EKF-SLAM) formulation, a Bayesian optimization framework traditionally used in mobile robotics for estimating camera pose and reconstructing rigid scenarios. In order to extend the problem to a deformable domain we represent the object's surface mechanics by means of Navier's equations, which are solved using a Finite Element Method (FEM). With these main ingredients, we can further model the material's stretching, allowing us to go a step further than most of current techniques, typically constrained to surfaces undergoing isometric deformations. We extensively validate our approach in both real and synthetic experiments, and demonstrate its advantages with respect to competing methods. More specifically, we show that besides simultaneously retrieving camera pose and non-rigid shape, our approach is adequate for both isometric and extensible surfaces, does not require neither batch processing all the frames nor tracking points over the whole sequence and runs at several frames per second.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 94
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-04-01
    Description: We consider the problem of image representation from the perspective of statistical design. Recent studies have shown that images are possibly sampled from a low dimensional manifold despite of the fact that the ambient space is usually very high dimensional. Learning low dimensional image representations is crucial for many image processing tasks such as recognition and retrieval. Most of the existing approaches for learning low dimensional representations, such as principal component analysis (PCA) and locality preserving projections (LPP), aim at discovering the geometrical or discriminant structures in the data. In this paper, we take a different perspective from statistical experimental design, and propose a novel dimensionality reduction algorithm called A-Optimal Projection (AOP). AOP is based on a linear regression model. Specifically, AOP finds the optimal basis functions so that the expected prediction error of the regression model can be minimized if the new representations are used for training the model. Experimental results suggest that the proposed approach provides a better representation and achieves higher accuracy in image retrieval.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 95
    Publication Date: 2016-04-01
    Description: Many problems in computer vision can be formulated as multidimensional ellipsoid-specific fitting, which is to minimize the residual error such that the underlying quadratic surface is a multidimensional ellipsoid. In this paper, we present a fast and robust algorithm for solving ellipsoid-specific fitting directly. Our method is based on the alternating direction method of multipliers, which does not introduce extra positive semi-definiteness constraints. The computation complexity is thus significantly lower than those of semi-definite programming (SDP) based methods. More specifically, to fit $n$ data points into a $p$ dimensional ellipsoid, our complexity is $O(p^6 + np^4)+O(p^3)$ , where the former $O$ results from preprocessing data once , while that of the state-of-the-art SDP method is $O(p^6 + np^4 + n^{frac{3}{2}}p^2)$ for each iteration . The storage complexity of our algorithm is about $frac{1}{2}np^2$ , which - s at most $1/4$ of those of SDP methods. Extensive experiments testify to the great speed and accuracy advantages of our method over the state-of-the-art approaches. The implementation of our method is also much simpler than SDP based methods.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 96
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-04-01
    Description: This paper introduces a novel solution to the hand-eye calibration problem. It uses camera measurements directly and, at the same time, requires neither prior knowledge of the external camera calibrations nor a known calibration target. Our algorithm uses branch-and-bound approach to minimize an objective function based on the epipolar constraint. Further, it employs Linear Programming to decide the bounding step of the algorithm.Our technique is able to recover both the unknown rotation and translation simultaneously and the solution is guaranteed to be globally optimal with respect to the $L_{infty}$ -norm.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 97
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-04-01
    Description: We propose a completely automatic approach for recognizing low resolution face images captured in uncontrolled environment. The approach uses multidimensional scaling to learn a common transformation matrix for the entire face which simultaneously transforms the facial features of the low resolution and the high resolution training images such that the distance between them approximates the distance had both the images been captured under the same controlled imaging conditions. Stereo matching cost is used to obtain the similarity of two images in the transformed space. Though this gives very good recognition performance, the time taken for computing the stereo matching cost is significant. To overcome this limitation, we propose a reference-based approach in which each face image is represented by its stereo matching cost from a few reference images. Experimental evaluation on the real world challenging databases and comparison with the state-of-the-art super-resolution, classifier based and cross modal synthesis techniques show the effectiveness of the proposed algorithm.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 98
    Publication Date: 2016-04-01
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 99
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-04-01
    Description: Modern crowd theories agree that collective behavior is the result of the underlying interactions among small groups of individuals. In this work, we propose a novel algorithm for detecting social groups in crowds by means of a Correlation Clustering procedure on people trajectories. The affinity between crowd members is learned through an online formulation of the Structural SVM framework and a set of specifically designed features characterizing both their physical and social identity, inspired by Proxemic theory, Granger causality, DTW and Heat-maps. To adhere to sociological observations, we introduce a loss function ( $G$ -MITRE) able to deal with the complexity of evaluating group detection performances. We show our algorithm achieves state-of-the-art results when relying on both ground truth trajectories and tracklets previously extracted by available detector/tracker systems.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 100
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-07-04
    Description: Person re-identification in a non-overlapping multicamera scenario is an open challenge in computer vision because of the large changes in appearances caused by variations in viewing angle, lighting, background clutter, and occlusion over multiple cameras. As a result of these variations, features describing the same person get transformed between cameras. To model the transformation of features, the feature space is nonlinearly warped to get the “warp functions”. The warp functions between two instances of the same target form the set of feasible warp functions while those between instances of different targets form the set of infeasible warp functions. In this work, we build upon the observation that feature transformations between cameras lie in a nonlinear function space of all possible feature transformations. The space consisting of all the feasible and infeasible warp functions is the warp function space (WFS). We propose to learn a discriminating surface separating these two sets of warp functions in the WFS and to re-identify persons by classifying a test warp function as feasible or infeasible. Towards this objective, a Random Forest (RF) classifier is employed which effectively chooses the warp function components according to their importance in separating the feasible and the infeasible warp functions in the WFS. Extensive experiments on five datasets are carried out to show the superior performance of the proposed approach over state-of-the-art person re-identification methods. We show that our approach outperforms all other methods when large illumination variations are considered. At the same time it has been shown that our method reaches the best average performance over multiple combinations of the datasets, thus, showing that our method is not designed only to address a specific challenge posed by a particular dataset.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
Close ⊗
This website uses cookies and the analysis tool Matomo. More information can be found here...