ALBERT

All Library Books, journals and Electronic Records Telegrafenberg

Your email was sent successfully. Check your inbox.

An error occurred while sending the email. Please try again.

Proceed reservation?

Export
Filter
  • Articles  (227)
  • Institute of Electrical and Electronics Engineers (IEEE)  (227)
  • 2015-2019  (227)
  • 1990-1994
  • 1945-1949
  • 2016  (227)
  • IEEE Transactions on Pattern Analysis and Machine Intelligence  (227)
  • 1275
  • Computer Science  (227)
  • Geography
  • Mechanical Engineering, Materials Science, Production Engineering, Mining and Metallurgy, Traffic Engineering, Precision Mechanics
Collection
  • Articles  (227)
Publisher
  • Institute of Electrical and Electronics Engineers (IEEE)  (227)
Years
  • 2015-2019  (227)
  • 1990-1994
  • 1945-1949
Year
Topic
  • Computer Science  (227)
  • Geography
  • Mechanical Engineering, Materials Science, Production Engineering, Mining and Metallurgy, Traffic Engineering, Precision Mechanics
  • 1
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-08-05
    Description: Provides a listing of board members, committee members, editors, and society officers.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 2
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-08-05
    Description: Certain inner feelings and physiological states like pain are subjective states that cannot be directly measured, but can be estimated from spontaneous facial expressions. Since they are typically characterized by subtle movements of facial parts, analysis of the facial details is required. To this end, we formulate a new regression method for continuous estimation of the intensity of facial behavior interpretation, called Doubly Sparse Relevance Vector Machine (DSRVM). DSRVM enforces double sparsity by jointly selecting the most relevant training examples (a.k.a. relevance vectors) and the most important kernels associated with facial parts relevant for interpretation of observed facial expressions. This advances prior work on multi-kernel learning, where sparsity of relevant kernels is typically ignored. Empirical evaluation on challenging Shoulder Pain videos, and the benchmark DISFA and SEMAINE datasets demonstrate that DSRVM outperforms competing approaches with a multi-fold reduction of running times in training and testing.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 3
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-08-05
    Description: Raindrops adhered to a windscreen or window glass can significantly degrade the visibility of a scene. Modeling, detecting and removing raindrops will, therefore, benefit many computer vision applications, particularly outdoor surveillance systems and intelligent vehicle systems. In this paper, a method that automatically detects and removes adherent raindrops is introduced. The core idea is to exploit the local spatio-temporal derivatives of raindrops. To accomplish the idea, we first model adherent raindrops using law of physics, and detect raindrops based on these models in combination with motion and intensity temporal derivatives of the input video. Having detected the raindrops, we remove them and restore the images based on an analysis that some areas of raindrops completely occludes the scene, and some other areas occlude only partially. For partially occluding areas, we restore them by retrieving as much as possible information of the scene, namely, by solving a blending function on the detected partially occluding areas using the temporal intensity derivative. For completely occluding areas, we recover them by using a video completion technique. Experimental results using various real videos show the effectiveness of our method.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 4
    Publication Date: 2016-08-05
    Description: Deep convolutional networks have proven to be very successful in learning task specific features that allow for unprecedented performance on various computer vision tasks. Training of such networks follows mostly the supervised learning paradigm, where sufficiently many input-output pairs are required for training. Acquisition of large training sets is one of the key challenges, when approaching a new task. In this paper, we aim for generic feature learning and present an approach for training a convolutional network using only unlabeled data. To this end, we train the network to discriminate between a set of surrogate classes. Each surrogate class is formed by applying a variety of transformations to a randomly sampled ‘seed’ image patch. In contrast to supervised network training, the resulting feature representation is not class specific. It rather provides robustness to the transformations that have been applied during training. This generic feature representation allows for classification results that outperform the state of the art for unsupervised learning on several popular datasets (STL-10, CIFAR-10, Caltech-101, Caltech-256). While features learned with our approach cannot compete with class specific features from supervised training on a classification task, we show that they are advantageous on geometric matching problems, where they also outperform the SIFT descriptor.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 5
    Publication Date: 2016-08-05
    Description: We address the video-based face association problem, in which one attempts to extract the face tracks of multiple subjects while maintaining label consistency. Traditional tracking algorithms have difficulty in handling this task, especially when challenging nuisance factors like motion blur, low resolution or significant camera motions are present. We demonstrate that contextual features, in addition to face appearance itself, play an important role in this case. We propose principled methods to combine multiple features using Conditional Random Fields and Max-Margin Markov networks to infer labels for the detected faces. Different from many existing approaches, our algorithms work in online mode and hence have a wider range of applications. We address issues such as parameter learning, inference and handling false positves/negatives that arise in the proposed approach. Finally, we evaluate our approach on several public databases.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 6
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-08-05
    Description: There are many scenarios in artificial intelligence, signal processing or medicine, in which a temporal sequence consists of several unknown overlapping independent causes, and we are interested in accurately recovering those canonical causes. Factorial hidden Markov models (FHMMs) present the versatility to provide a good fit to these scenarios. However, in some scenarios, the number of causes or the number of states of the FHMM cannot be known or limited a priori. In this paper, we propose an infinite factorial unbounded-state hidden Markov model (IFUHMM), in which the number of parallel hidden Markovmodels (HMMs) and states in each HMM are potentially unbounded. We rely on a Bayesian nonparametric (BNP) prior over integer-valued matrices, in which the columns represent the Markov chains, the rows the time indexes, and the integers the state for each chain and time instant. First, we extend the existent infinite factorial binary-state HMM to allow for any number of states. Then, we modify this model to allow for an unbounded number of states and derive an MCMC-based inference algorithm that properly deals with the trade-off between the unbounded number of states and chains. We illustrate the performance of our proposed models in the power disaggregation problem.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 7
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-08-05
    Description: The speed with which intelligent systems can react to an action depends on how soon it can be recognized. The ability to recognize ongoing actions is critical in many applications, for example, spotting criminal activity. It is challenging, since decisions have to be made based on partial videos of temporally incomplete action executions. In this paper, we propose a novel discriminative multi-scale kernelized model for predicting the action class from a partially observed video. The proposed model captures temporal dynamics of human actions by explicitly considering all the history of observed features as well as features in smaller temporal segments. A compositional kernel is proposed to hierarchically capture the relationships between partial observations as well as the temporal segments, respectively. We develop a new learning formulation, which elegantly captures the temporal evolution over time, and enforces the label consistency between segments and corresponding partial videos. We prove that the proposed learning formulation minimizes the upper bound of the empirical risk. Experimental results on four public datasets show that the proposed approach outperforms state-of-the-art action prediction methods.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 8
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-08-05
    Description: Evidence is mounting that Convolutional Networks (ConvNets) are the most effective representation learning method for visual recognition tasks. In the common scenario, a ConvNet is trained on a large labeled dataset (source) and the feed-forward units activation of the trained network, at a certain layer of the network, is used as a generic representation of an input image for a task with relatively smaller training set (target). Recent studies have shown this form of representation transfer to be suitable for a wide range of target visual recognition tasks. This paper introduces and investigates several factors affecting the transferability of such representations. It includes parameters for training of the source ConvNet such as its architecture, distribution of the training data, etc. and also the parameters of feature extraction such as layer of the trained ConvNet, dimensionality reduction, etc. Then, by optimizing these factors, we show that significant improvements can be achieved on various (17) visual recognition tasks. We further show that these visual recognition tasks can be categorically ordered based on their similarity to the source task such that a correlation between the performance of tasks and their similarity to the source task w.r.t. the proposed factors is observed.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 9
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-08-05
    Description: Graph matching (GM) is a fundamental problem in computer science, and it plays a central role to solve correspondence problems in computer vision. GM problems that incorporate pairwise constraints can be formulated as a quadratic assignment problem (QAP). Although widely used, solving the correspondence problem through GM has two main limitations: (1) the QAP is NP-hard and difficult to approximate; (2) GM algorithms do not incorporate geometric constraints between nodes that are natural in computer vision problems. To address aforementioned problems, this paper proposes factorized graph matching (FGM). FGM factorizes the large pairwise affinity matrix into smaller matrices that encode the local structure of each graph and the pairwise affinity between edges. Four are the benefits that follow from this factorization: (1) There is no need to compute the costly (in space and time) pairwise affinity matrix; (2) The factorization allows the use of a path-following optimization algorithm, that leads to improved optimization strategies and matching performance; (3) Given the factorization, it becomes straight-forward to incorporate geometric transformations (rigid and non-rigid) to the GM problem. (4) Using a matrix formulation for the GM problem and the factorization, it is easy to reveal commonalities and differences between different GM methods. The factorization also provides a clean connection with other matching algorithms such as iterative closest point; Experimental results on synthetic and real databases illustrate how FGM outperforms state-of-the-art algorithms for GM. The code is available at http://humansensing.cs.cmu.edu/fgm .
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 10
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-08-05
    Description: Existing data association techniques mostly focus on matching pairs of data-point sets and then repeating this process along space-time to achieve long term correspondences. However, in many problems such as person re-identification, a set of data-points may be observed at multiple spatio-temporal locations and/or by multiple agents in a network and simply combining the local pairwise association results between sets of data-points often leads to inconsistencies over the global space-time horizons. In this paper, we propose a Novel Network Consistent Data Association (NCDA) framework formulated as an optimization problem that not only maintains consistency in association results across the network, but also improves the pairwise data association accuracies. The proposed NCDA can be solved as a binary integer program leading to a globally optimal solution and is capable of handling the challenging data-association scenario where the number of data-points varies across different sets of instances in the network. We also present an online implementation of NCDA method that can dynamically associate new observations to already observed data-points in an iterative fashion, while maintaining network consistency. We have tested both the batch and the online NCDA in two application areas—person re-identification and spatio-temporal cell tracking and observed consistent and highly accurate data association results in all the cases.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 11
    Publication Date: 2016-08-05
    Description: This paper presents a method for learning an And-Or model to represent context and occlusion for car detection and viewpoint estimation. The learned And-Or model represents car-to-car context and occlusion configurations at three levels: (i) spatially-aligned cars, (ii) single car under different occlusion configurations, and (iii) a small number of parts. The And-Or model embeds a grammar for representing large structural and appearance variations in a reconfigurable hierarchy. The learning process consists of two stages in a weakly supervised way (i.e., only bounding boxes of single cars are annotated). First, the structure of the And-Or model is learned with three components: (a) mining multi-car contextual patterns based on layouts of annotated single car bounding boxes, (b) mining occlusion configurations between single cars, and (c) learning different combinations of part visibility based on CAD simulations. The And-Or model is organized in a directed and acyclic graph which can be inferred by Dynamic Programming. Second, the model parameters (for appearance, deformation and bias) are jointly trained using Weak-Label Structural SVM. In experiments, we test our model on four car detection datasets—the KITTI dataset [1] , the PASCAL VOC2007 car dataset  [2] , and two self-collected car datasets, namely the Street-Parking car dataset and the Parking-Lot car dataset, and three datasets for car viewpoint estimation—the PASCAL VOC2006 car dataset  [2] , the 3D car dataset  [3] , and the PASCAL3D+ car dataset  [4] . Compared with state-of-the-art variants of deformable part-based models and other methods, our model achieves significant improvement consistently on the four detection datasets, and comparable performance on car viewpo- nt estimation.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 12
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-08-05
    Description: Convolutional Neural Network (CNN) has demonstrated promising performance in single-label image classification tasks. However, how CNN best copes with multi-label images still remains an open problem, mainly due to the complex underlying object layouts and insufficient multi-label training images. In this work, we propose a flexible deep CNN infrastructure, called Hypotheses-CNN-Pooling (HCP), where an arbitrary number of object segment hypotheses are taken as the inputs, then a shared CNN is connected with each hypothesis, and finally the CNN output results from different hypotheses are aggregated with max pooling to produce the ultimate multi-label predictions. Some unique characteristics of this flexible deep CNN infrastructure include: 1) no ground-truth bounding box information is required for training; 2) the whole HCP infrastructure is robust to possibly noisy and/or redundant hypotheses; 3) the shared CNN is flexible and can be well pre-trained with a large-scale single-label image dataset, e.g., ImageNet; and 4) it may naturally output multi-label prediction results. Experimental results on Pascal VOC 2007 and VOC 2012 multi-label image datasets well demonstrate the superiority of the proposed HCP infrastructure over other state-of-the-arts. In particular, the mAP reaches 90.5% by HCP only and 93.2% after the fusion with our complementary result in  [12] based on hand-crafted features on the VOC 2012 dataset.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 13
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-08-05
    Description: Wiberg matrix factorization breaks a matrix $Y$ into low-rank factors $U$ and $V$ by solving for $V$ in closed form given $U$ , linearizing $V(U)$ about $U$ , and iteratively minimizing $||Y - UV(U)||_2$ with respect to $U$ only. This approach factors the matrix while effectively removing $V$ from the minimization. Recently Eriksson and van den Hengel extended this approach to $L_1$ , minimizing $||Y - UV(U)||_1$ . We generalize their approach beyond factorization to minimize $||Y - f(U, V)||_1$ for more general functions $f(U, V)$ that are nonlinear in each of two sets of variables. We demonstrate the idea with a practical Wiberg algorithm for $L_1$ bundle adjustment. One Wiberg minimization can be nested inside another, effectively removing two of three sets of variables from a minimization. We demonstrate this idea with a nested Wiberg algorithm for $L_1$ projective bundle adjustment, solving for camera matrices, points, and projective depths. Wiberg minimization also generalizes to handle nonlinear constraints, and we demonstrate this idea with Constrained Wiberg Minimization for Multiple Instance Learning (CWM-MIL), which removes one set of variable
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 14
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-08-05
    Description: In this paper, we propose a novel subspace learning algorithm called Local Feature Discriminant Projection (LFDP) for supervised dimensionality reduction of local features. LFDP is able to efficiently seek a subspace to improve the discriminability of local features for classification. We make three novel contributions. First, the proposed LFDP is a general supervised subspace learning algorithm which provides an efficient way for dimensionality reduction of large-scale local feature descriptors. Second, we introduce the Differential Scatter Discriminant Criterion (DSDC) to the subspace learning of local feature descriptors which avoids the matrix singularity problem. Third, we propose a generalized orthogonalization method to impose on projections, leading to a more compact and less redundant subspace. Extensive experimental validation on three benchmark datasets including UIUC-Sports, Scene-15 and MIT Indoor demonstrates that the proposed LFDP outperforms other dimensionality reduction methods and achieves state-of-the-art performance for image classification.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 15
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-08-05
    Description: We propose a real-time method to accurately track the human head pose in the 3-dimensional (3D) world. Using a RGB-Depth camera, a face template is reconstructed by fitting a 3D morphable face model, and the head pose is determined by registering this user-specific face template to the input depth video.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 16
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-08-05
    Description: Euclidean statistics are often generalized to Riemannian manifolds by replacing straight-line interpolations with geodesic ones. While these Riemannian models are familiar-looking, they are restricted by the inflexibility of geodesics, and they rely on constructions which are optimal only in Euclidean domains. We consider extensions of Principal Component Analysis (PCA) to Riemannian manifolds. Classic Riemannian approaches seek a geodesic curve passing through the mean that optimizes a criteria of interest. The requirements that the solution both is geodesic and must pass through the mean tend to imply that the methods only work well when the manifold is mostly flat within the support of the generating distribution. We argue that instead of generalizing linear Euclidean models, it is more fruitful to generalize non-linear Euclidean models. Specifically, we extend the classic Principal Curves from Hastie & Stuetzle to data residing on a complete Riemannian manifold. We show that for elliptical distributions in the tangent of spaces of constant curvature, the standard principal geodesic is a principal curve. The proposed model is simple to compute and avoids many of the pitfalls of traditional geodesic approaches. We empirically demonstrate the effectiveness of the Riemannian principal curves on several manifolds and datasets.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 17
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-08-05
    Description: An end-to-end real-time text localization and recognition method is presented. Its real-time performance is achieved by posing the character detection and segmentation problem as an efficient sequential selection from the set of Extremal Regions. The ER detector is robust against blur, low contrast and illumination, color and texture variation. In the first stage, the probability of each ER being a character is estimated using features calculated by a novel algorithm in constant time and only ERs with locally maximal probability are selected for the second stage, where the classification accuracy is improved using computationally more expensive features. A highly efficient clustering algorithm then groups ERs into text lines and an OCR classifier trained on synthetic fonts is exploited to label character regions. The most probable character sequence is selected in the last stage when the context of each character is known. The method was evaluated on three public datasets. On the ICDAR 2013 dataset the method achieves state-of-the-art results in text localization; on the more challenging SVT dataset, the proposed method significantly outperforms the state-of-the-art methods and demonstrates that the proposed pipeline can incorporate additional prior knowledge about the detected text. The proposed method was exploited as the baseline in the ICDAR 2015 Robust Reading competition, where it compares favourably to the state-of-the art.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 18
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-08-05
    Description: These instructions give guidelines for preparing papers for this publication. Presents information for authors publishing in this journal.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 19
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-08-05
    Description: This paper addresses modelling data using the Watson distribution. The Watson distribution is one of the simplest distributions for analyzing axially symmetric data. This distribution has gained some attention in recent years due to its modeling capability. However, its Bayesian inference is fairly understudied due to difficulty in handling the normalization factor. Recent development of Markov chain Monte Carlo (MCMC) sampling methods can be applied for this purpose. However, these methods can be prohibitively slow for practical applications. A deterministic alternative is provided by variational methods that convert inference problems into optimization problems. In this paper, we present a variational inference for Watson mixture models. First, the variational framework is used to side-step the intractability arising from the coupling of latent states and parameters. Second, the variational free energy is further lower bounded in order to avoid intractable moment computation. The proposed approach provides a lower bound on the log marginal likelihood and retains distributional information over all parameters. Moreover, we show that it can regulate its own complexity by pruning unnecessary mixture components while avoiding over-fitting. We discuss potential applications of the modeling with Watson distributions in the problem of blind source separation, and clustering gene expression data sets.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 20
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-05-06
    Description: Recently, head pose estimation (HPE) from low-resolution surveillance data has gained in importance. However, monocular and multi-view HPE approaches still work poorly under target motion , as facial appearance distorts owing to camera perspective and scale changes when a person moves around. To this end, we propose FEGA-MTL , a novel framework based on Multi-Task Learning (MTL) for classifying the head pose of a person who moves freely in an environment monitored by multiple, large field-of-view surveillance cameras. Upon partitioning the monitored scene into a dense uniform spatial grid, FEGA-MTL simultaneously clusters grid partitions into regions with similar facial appearance, while learning region-specific head pose classifiers. In the learning phase, guided by two graphs which a-priori model the similarity among (1) grid partitions based on camera geometry and (2) head pose classes, FEGA-MTL derives the optimal scene partitioning and associated pose classifiers. Upon determining the target's position using a person tracker at test time, the corresponding region-specific classifier is invoked for HPE. The FEGA-MTL framework naturally extends to a weakly supervised setting where the target's walking direction is employed as a proxy in lieu of head orientation. Experiments confirm that FEGA-MTL significantly outperforms competing single-task and multi-task learning methods in multi-view settings.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 21
    Publication Date: 2016-05-06
    Description: Recent literature shows that facial attributes, i.e., contextual facial information, can be beneficial for improving the performance of real-world applications, such as face verification, face recognition, and image search. Examples of face attributes include gender, skin color, facial hair, etc. How to robustly obtain these facial attributes (traits) is still an open problem, especially in the presence of the challenges of real-world environments: non-uniform illumination conditions, arbitrary occlusions, motion blur and background clutter. What makes this problem even more difficult is the enormous variability presented by the same subject, due to arbitrary face scales, head poses, and facial expressions. In this paper, we focus on the problem of facial trait classification in real-world face videos. We have developed a fully automatic hierarchical and probabilistic framework that models the collective set of frame class distributions and feature spatial information over a video sequence. The experiments are conducted on a large real-world face video database that we have collected, labelled and made publicly available. The proposed method is flexible enough to be applied to any facial classification problem. Experiments on a large, real-world video database McGillFaces  [1] of 18,000 video frames reveal that the proposed framework outperforms alternative approaches, by up to 16.96 and 10.13%, for the facial attributes of gender and facial hair, respectively.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 22
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-05-06
    Description: Typical feature selection methods choose an optimal global feature subset that is applied over all regions of the sample space. In contrast, in this paper we propose a novel localized feature selection (LFS) approach whereby each region of the sample space is associated with its own distinct optimized feature set, which may vary both in membership and size across the sample space. This allows the feature set to optimally adapt to local variations in the sample space. An associated method for measuring the similarities of a query datum to each of the respective classes is also proposed. The proposed method makes no assumptions about the underlying structure of the samples; hence the method is insensitive to the distribution of the data over the sample space. The method is efficiently formulated as a linear programming optimization problem. Furthermore, we demonstrate the method is robust against the over-fitting problem. Experimental results on eleven synthetic and real-world data sets demonstrate the viability of the formulation and the effectiveness of the proposed algorithm. In addition we show several examples where localized feature selection produces better results than a global feature selection method.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 23
    Publication Date: 2016-05-06
    Description: Many typical applications of object detection operate within a prescribed false-positive range. In this situation the performance of a detector should be assessed on the basis of the area under the ROC curve over that range, rather than over the full curve, as the performance outside the prescribed range is irrelevant. This measure is labelled as the partial area under the ROC curve (pAUC). We propose a novel ensemble learning method which achieves a maximal detection rate at a user-defined range of false positive rates by directly optimizing the partial AUC using structured learning. In addition, in order to achieve high object detection performance, we propose a new approach to extracting low-level visual features based on spatial pooling. Incorporating spatial pooling improves the translational invariance and thus the robustness of the detection process. Experimental results on both synthetic and real-world data sets demonstrate the effectiveness of our approach, and we show that it is possible to train state-of-the-art pedestrian detectors using the proposed structured ensemble learning method with spatially pooled features. The result is the current best reported performance on the Caltech-USA pedestrian detection dataset.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 24
    Publication Date: 2016-05-06
    Description: This paper addresses the problem of matching common node correspondences among multiple graphs referring to an identical or related structure. This multi-graph matching problem involves two correlated components: i) the local pairwise matching affinity across pairs of graphs; ii) the global matching consistency that measures the uniqueness of the pairwise matchings by different composition orders. Previous studies typically either enforce the matching consistency constraints in the beginning of an iterative optimization, which may propagate matching error both over iterations and across graph pairs; or separate affinity optimization and consistency enforcement into two steps. This paper is motivated by the observation that matching consistency can serve as a regularizer in the affinity objective function especially when the function is biased due to noises or inappropriate modeling. We propose composition-based multi-graph matching methods to incorporate the two aspects by optimizing the affinity score, meanwhile gradually infusing the consistency. We also propose two mechanisms to elicit the common inliers against outliers. Compelling results on synthetic and real images show the competency of our algorithms.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 25
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-05-06
    Description: Standard edge detection operators such as the Laplacian of Gaussian and the gradient of Gaussian can be used to track contours in image sequences. When using edge operators, a contour, which is determined on a frame of the sequence, is simply used as a starting contour to locate the nearest contour on the subsequent frame. However, the strategy used to look for the nearest edge points may not work when tracking contours of non isolated gray level discontinuities. In these cases, strategies derived from the optical flow equation, which look for similar gray level distributions, appear to be more appropriate since these can work with a lower frame rate than that needed for strategies based on pure edge detection operators. However, an optical flow strategy tends to propagate the localization errors through the sequence and an additional edge detection procedure is essential to compensate for such a drawback. In this paper a spatio-temporal intensity moment is proposed which integrates the two basic functions of edge detection and tracking.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 26
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-05-06
    Description: Design and development of efficient and accurate feature descriptors is critical for the success of many computer vision applications. This paper proposes a new feature descriptor, referred to as DoN, for the 2D palmprint matching. The descriptor is extracted for each point on the palmprint. It is based on the ordinal measure which partially describes the difference of the neighboring points’ normal vectors. DoN has at least two advantages: 1) it describes the 3D information, which is expected to be highly stable under commonly occurring illumination variations during contactless imaging; 2) the size of DoN for each point is only one bit, which is computationally simple to extract, easy to match, and efficient to storage. We show that such 3D information can be extracted from a single 2D palmprint image. The analysis for the effectiveness of ordinal measure for palmprint matching is also provided. Four publicly available 2D palmprint databases are used to evaluate the effectiveness of DoN, both for identification and the verification. Our method on all these databases achieves the state-of-the-art performance.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 27
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-05-06
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 28
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-05-06
    Description: Searching for matches to high-dimensional vectors using hard/soft vector quantization is the most computationally expensive part of various computer vision algorithms including the bag of visual word (BoW). This paper proposes a fast computation method, Neighbor-to-Neighbor (NTN) search [1] , which skips some calculations based on the similarity of input vectors. For example, in image classification using dense SIFT descriptors, the NTN search seeks similar descriptors from a point on a grid to an adjacent point. Applications of the NTN search to vector quantization, a Gaussian mixture model, sparse coding, and a kernel codebook for extracting image or video representation are presented in this paper. We evaluated the proposed method on image and video benchmarks: the PASCAL VOC 2007 Classification Challenge and the TRECVID 2010 Semantic Indexing Task. NTN-VQ reduced the coding cost by 77.4 percent, and NTN-GMM reduced it by 89.3 percent, without any significant degradation in classification performance.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 29
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-04-01
    Description: Symmetric Positive Definite (SPD) matrices emerge as data descriptors in several applications of computer vision such as object tracking, texture recognition, and diffusion tensor imaging. Clustering these data matrices forms an integral part of these applications, for which soft-clustering algorithms (K-Means, expectation maximization, etc.) are generally used. As is well-known, these algorithms need the number of clusters to be specified, which is difficult when the dataset scales. To address this issue, we resort to the classical nonparametric Bayesian framework by modeling the data as a mixture model using the Dirichlet process (DP) prior. Since these matrices do not conform to the Euclidean geometry, rather belongs to a curved Riemannian manifold,existing DP models cannot be directly applied. Thus, in this paper, we propose a novel DP mixture model framework for SPD matrices. Using the log-determinant divergence as the underlying dissimilarity measure to compare these matrices, and further using the connection between this measure and the Wishart distribution, we derive a novel DPM model based on the Wishart-Inverse-Wishart conjugate pair. We apply this model to several applications in computer vision. Our experiments demonstrate that our model is scalable to the dataset size and at the same time achieves superior accuracy compared to several state-of-the-art parametric and nonparametric clustering algorithms.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 30
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-04-01
    Description: We demonstrate the usefulness of surroundedness for eye fixation prediction by proposing a Boolean Map based Saliency model (BMS). In our formulation, an image is characterized by a set of binary images, which are generated by randomly thresholding the image's feature maps in a whitened feature space. Based on a Gestalt principle of figure-ground segregation, BMS computes a saliency map by discovering surrounded regions via topological analysis of Boolean maps. Furthermore, we draw a connection between BMS and the Minimum Barrier Distance to provide insight into why and how BMS can properly captures the surroundedness cue via Boolean maps. The strength of BMS is verified by its simplicity, efficiency and superior performance compared with 10 state-of-the-art methods on seven eye tracking benchmark datasets.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 31
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-04-01
    Description: We seek a practical method for establishing dense correspondences between two images with similar content, but possibly different 3D scenes. One of the challenges in designing such a system is the local scale differences of objects appearing in the two images. Previous methods often considered only few image pixels; matching only pixels for which stable scales may be reliably estimated. Recently, others have considered dense correspondences, but with substantial costs associated with generating, storing and matching scale invariant descriptors. Our work is motivated by the observation that pixels in the image have contexts—the pixels around them—which may be exploited in order to reliably estimate local scales. We make the following contributions. (i) We show that scales estimated in sparse interest points may be propagated to neighboring pixels where this information cannot be reliably determined. Doing so allows scale invariant descriptors to be extracted anywhere in the image. (ii) We explore three means for propagating this information: using the scales at detected interest points, using the underlying image information to guide scale propagation in each image separately, and using both images together. Finally, (iii), we provide extensive qualitative and quantitative results, demonstrating that scale propagation allows for accurate dense correspondences to be obtained even between very different images, with little computational costs beyond those required by existing methods.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 32
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-04-01
    Description: A robust algorithm is proposed for tracking a target object in dynamic conditions including motion blurs, illumination changes, pose variations, and occlusions. To cope with these challenging factors, multiple trackers based on different feature representations are integrated within a probabilistic framework. Each view of the proposed multiview (multi-channel) feature learning algorithm is concerned with one particular feature representation of a target object from which a tracker is developed with different levels of reliability. With the multiple trackers, the proposed algorithm exploits tracker interaction and selection for robust tracking performance. In the tracker interaction, a transition probability matrix is used to estimate dependencies between trackers. Multiple trackers communicate with each other by sharing information of sample distributions. The tracker selection process determines the most reliable tracker with the highest probability. To account for object appearance changes, the transition probability matrix and tracker probability are updated in a recursive Bayesian framework by reflecting the tracker reliability measured by a robust tracker likelihood function that learns to account for both transient and stable appearance changes. Experimental results on benchmark datasets demonstrate that the proposed interacting multiview algorithm performs robustly and favorably against state-of-the-art methods in terms of several quantitative metrics.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 33
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-04-01
    Description: In this study, we show that landmark detection or face alignment task is not a single and independent problem. Instead, its robustness can be greatly improved with auxiliary information. Specifically, we jointly optimize landmark detection together with the recognition of heterogeneous but subtly correlated facial attributes, such as gender, expression, and appearance attributes. This is non-trivial since different attribute inference tasks have different learning difficulties and convergence rates. To address this problem, we formulate a novel tasks-constrained deep model, which not only learns the inter-task correlation but also employs dynamic task coefficients to facilitate the optimization convergence when learning multiple complex tasks. Extensive evaluations show that the proposed task-constrained learning (i) outperforms existing face alignment methods, especially in dealing with faces with severe occlusion and pose variation, and (ii) reduces model complexity drastically compared to the state-of-the-art methods based on cascaded deep model.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 34
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-04-01
    Description: Archetypal analysis is a popular exploratory tool that explains a set of observations as compositions of few ‘pure’ patterns. The standard formulation of archetypal analysis addresses this problem for real valued observations by finding the approximate convex hull. Recently, a probabilistic formulation has been suggested which extends this framework to other observation types such as binary and count. In this article we further extend this framework to address the general case of nominal observations which includes, for example, multiple-option questionnaires. We view archetypal analysis in a generative framework: this allows explicit control over choosing a suitable number of archetypes by assigning appropriate prior information, and finding efficient update rules using variational Bayes’. We demonstrate the efficacy of this approach extensively on simulated data, and three real world examples: Austrian guest survey dataset, German credit dataset, and SUN attribute image dataset.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 35
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-04-01
    Description: In this paper, we propose a visual tracker based on a metric-weighted linear representation of appearance. In order to capture the interdependence of different feature dimensions, we develop two online distance metric learning methods using proximity comparison information and structured output learning. The learned metric is then incorporated into a linear representation of appearance. We show that online distance metric learning significantly improves the robustness of the tracker, especially on those sequences exhibiting drastic appearance changes. In order to bound growth in the number of training samples, we design a time-weighted reservoir sampling method. Moreover, we enable our tracker to automatically perform object identification during the process of object tracking, by introducing a collection of static template samples belonging to several object classes of interest. Object identification results for an entire video sequence are achieved by systematically combining the tracking information and visual recognition at each frame. Experimental results on challenging video sequences demonstrate the effectiveness of the method for both inter-frame tracking and object identification.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 36
    Publication Date: 2016-04-01
    Description: Natural images are scale invariant with structures at all length scales.We formulated a geometric view of scale invariance in natural images using percolation theory, which describes the behavior of connected clusters on graphs.We map images to the percolation model by defining clusters on a binary representation for images. We show that critical percolating structures emerge in natural images and study their scaling properties by identifying fractal dimensions and exponents for the scale-invariant distributions of clusters. This formulation leads to a method for identifying clusters in images from underlying structures as a starting point for image segmentation.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 37
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-01-08
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 38
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-01-08
    Description: Compact and discriminative visual codebooks are preferred in many visual recognition tasks. In the literature, a number of works have taken the approach of hierarchically merging visual words of an initial large-sized codebook, but implemented this approach with different merging criteria. In this work, we propose a single probabilistic framework to unify these merging criteria, by identifying two key factors: the function used to model the class-conditional distribution and the method used to estimate the distribution parameters. More importantly, by adopting new distribution functions and/or parameter estimation methods, our framework can readily produce a spectrum of novel merging criteria. Three of them are specifically discussed in this paper. For the first criterion, we adopt the multinomial distribution with the Bayesian method; For the second criterion, we integrate the Gaussian distribution with maximum likelihood parameter estimation. For the third criterion, which shows the best merging performance, we propose a max-margin-based parameter estimation method and apply it with the multinomial distribution. Extensive experimental study is conducted to systematically analyze the performance of the above three criteria and compare them with existing ones. As demonstrated, the best criterion within our framework achieves the overall best merging performance among the compared merging criteria developed in the literature.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 39
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-01-08
    Description: We propose a method to address challenges in unconstrained face detection, such as arbitrary pose variations and occlusions. First, a new image feature called Normalized Pixel Difference (NPD) is proposed. NPD feature is computed as the difference to sum ratio between two pixel values, inspired by the Weber Fraction in experimental psychology. The new feature is scale invariant, bounded, and is able to reconstruct the original image. Second, we propose a deep quadratic tree to learn the optimal subset of NPD features and their combinations, so that complex face manifolds can be partitioned by the learned rules. This way, only a single soft-cascade classifier is needed to handle unconstrained face detection. Furthermore, we show that the NPD features can be efficiently obtained from a look up table, and the detection template can be easily scaled, making the proposed face detector very fast. Experimental results on three public face datasets (FDDB, GENKI, and CMU-MIT) show that the proposed method achieves state-of-the-art performance in detecting unconstrained faces with arbitrary pose variations and occlusions in cluttered scenes.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 40
    Publication Date: 2016-01-08
    Description: This work presents a deformable point set registration algorithm that seeks an optimal set of radial basis functions to describe the registration. A novel, global optimization approach is introduced composed of simulated annealing with a particle filter based generator function to perform the registration. It is shown how constraints can be incorporated into this framework. A constraint on the deformation is enforced whose role is to ensure physically meaningful fields (i.e., invertible). Further, examples in which landmark constraints serve to guide the registration are shown. Results on 2D and 3D data demonstrate the algorithm’s robustness to noise and missing information.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 41
    Publication Date: 2016-01-08
    Description: In this paper we present an algorithmic approach for fitting isotonic models under convex, yet non-differentiable, loss functions. It is a generalization of the greedy non-regret approach proposed by Luss and Rosset (2014) for differentiable loss functions, taking into account the sub-gradiental extensions required. We prove that our suggested algorithm solves the isotonic modeling problem while maintaining favorable computational and statistical properties. As our suggested algorithm may be used for any non-differentiable loss function, we focus our interest on isotonic modeling for either regression or two-class classification with appropriate log-likelihood loss and lasso penalty on the fitted values. This combination allows us to maintain the non-parametric nature of isotonic modeling, while controlling model complexity through regularization. We demonstrate the efficiency and usefulness of this approach on both synthetic and real world data. An implementation of our suggested solution is publicly available from the first author's website (https://sites.google.com/site/amichaipainsky/software).
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 42
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-01-08
    Description: We present a novel non-parametric Bayesian model to jointly discover the dynamics of low-level actions and high-level behaviors of tracked objects. In our approach, actions capture both linear, low-level object dynamics, and an additional spatial distribution on where the dynamic occurs. Furthermore, behavior classes capture high-level temporal motion dependencies in Markov chains of actions, thus each learned behavior is a switching linear dynamical system. The number of actions and behaviors is discovered from the data itself using Dirichlet Processes. We are especially interested in cases where tracks can exhibit large kinematic and spatial variations, e.g. person tracks in open environments, as found in the visual surveillance and intelligent vehicle domains. The model handles real-valued features directly, so no information is lost by quantizing measurements into ‘visual words’, and variations in standing, walking and running can be discovered without discrete thresholds. We describe inference using Markov Chain Monte Carlo sampling and validate our approach on several artificial and real-world pedestrian track datasets from the surveillance and intelligent vehicle domain. We show that our model can distinguish between relevant behavior patterns that an existing state-of-the-art hierarchical model for clustering and simpler model variants cannot. The software and the artificial and surveillance datasets are made publicly available for benchmarking purposes.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 43
    Publication Date: 2016-01-08
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 44
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-04-01
    Description: Semantic segmentation is the problem of assigning an object label to each pixel. It unifies the image segmentation and object recognition problems. The importance of using contextual information in semantic segmentation frameworks has been widely realized in the field. We propose a contextual framework, called contextual hierarchical model (CHM) , which learns contextual information in a hierarchical framework for semantic segmentation. At each level of the hierarchy, a classifier is trained based on downsampled input images and outputs of previous levels. Our model then incorporates the resulting multi-resolution contextual information into a classifier to segment the input image at original resolution. This training strategy allows for optimization of a joint posterior probability at multiple resolutions through the hierarchy. Contextual hierarchical model is purely based on the input image patches and does not make use of any fragments or shape examples. Hence, it is applicable to a variety of problems such as object segmentation and edge detection. We demonstrate that CHM performs at par with state-of-the-art on Stanford background and Weizmann horse datasets. It also outperforms state-of-the-art edge detection methods on NYU depth dataset and achieves state-of-the-art on Berkeley segmentation dataset (BSDS 500).
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 45
    Publication Date: 2016-04-01
    Description: Hyperspectral imaging is beneficial to many applications but most traditional methods do not consider fluorescent effects which are present in everyday items ranging from paper to even our food. Furthermore, everyday fluorescent items exhibit a mix of reflection and fluorescence so proper separation of these components is necessary for analyzing them. In recent years, effective imaging methods have been proposed but most require capturing the scene under multiple illuminants. In this paper, we demonstrate efficient separation and recovery of reflectance and fluorescence emission spectra through the use of two high frequency illuminations in the spectral domain. With the obtained fluorescence emission spectra from our high frequency illuminants, we then describe how to estimate the fluorescence absorption spectrum of a material given its emission spectrum. In addition, we provide an in depth analysis of our method and also show that filters can be used in conjunction with standard light sources to generate the required high frequency illuminants. We also test our method under ambient light and demonstrate an application of our method to synthetic relighting of real scenes.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 46
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-04-01
    Description: We propose a new approach to simultaneously recover camera pose and 3D shape of non-rigid and potentially extensible surfaces from a monocular image sequence. For this purpose, we make use of the Extended Kalman Filter based Simultaneous Localization And Mapping (EKF-SLAM) formulation, a Bayesian optimization framework traditionally used in mobile robotics for estimating camera pose and reconstructing rigid scenarios. In order to extend the problem to a deformable domain we represent the object's surface mechanics by means of Navier's equations, which are solved using a Finite Element Method (FEM). With these main ingredients, we can further model the material's stretching, allowing us to go a step further than most of current techniques, typically constrained to surfaces undergoing isometric deformations. We extensively validate our approach in both real and synthetic experiments, and demonstrate its advantages with respect to competing methods. More specifically, we show that besides simultaneously retrieving camera pose and non-rigid shape, our approach is adequate for both isometric and extensible surfaces, does not require neither batch processing all the frames nor tracking points over the whole sequence and runs at several frames per second.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 47
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-04-01
    Description: We consider the problem of image representation from the perspective of statistical design. Recent studies have shown that images are possibly sampled from a low dimensional manifold despite of the fact that the ambient space is usually very high dimensional. Learning low dimensional image representations is crucial for many image processing tasks such as recognition and retrieval. Most of the existing approaches for learning low dimensional representations, such as principal component analysis (PCA) and locality preserving projections (LPP), aim at discovering the geometrical or discriminant structures in the data. In this paper, we take a different perspective from statistical experimental design, and propose a novel dimensionality reduction algorithm called A-Optimal Projection (AOP). AOP is based on a linear regression model. Specifically, AOP finds the optimal basis functions so that the expected prediction error of the regression model can be minimized if the new representations are used for training the model. Experimental results suggest that the proposed approach provides a better representation and achieves higher accuracy in image retrieval.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 48
    Publication Date: 2016-04-01
    Description: Many problems in computer vision can be formulated as multidimensional ellipsoid-specific fitting, which is to minimize the residual error such that the underlying quadratic surface is a multidimensional ellipsoid. In this paper, we present a fast and robust algorithm for solving ellipsoid-specific fitting directly. Our method is based on the alternating direction method of multipliers, which does not introduce extra positive semi-definiteness constraints. The computation complexity is thus significantly lower than those of semi-definite programming (SDP) based methods. More specifically, to fit $n$ data points into a $p$ dimensional ellipsoid, our complexity is $O(p^6 + np^4)+O(p^3)$ , where the former $O$ results from preprocessing data once , while that of the state-of-the-art SDP method is $O(p^6 + np^4 + n^{frac{3}{2}}p^2)$ for each iteration . The storage complexity of our algorithm is about $frac{1}{2}np^2$ , which - s at most $1/4$ of those of SDP methods. Extensive experiments testify to the great speed and accuracy advantages of our method over the state-of-the-art approaches. The implementation of our method is also much simpler than SDP based methods.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 49
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-04-01
    Description: This paper introduces a novel solution to the hand-eye calibration problem. It uses camera measurements directly and, at the same time, requires neither prior knowledge of the external camera calibrations nor a known calibration target. Our algorithm uses branch-and-bound approach to minimize an objective function based on the epipolar constraint. Further, it employs Linear Programming to decide the bounding step of the algorithm.Our technique is able to recover both the unknown rotation and translation simultaneously and the solution is guaranteed to be globally optimal with respect to the $L_{infty}$ -norm.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 50
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-04-01
    Description: We propose a completely automatic approach for recognizing low resolution face images captured in uncontrolled environment. The approach uses multidimensional scaling to learn a common transformation matrix for the entire face which simultaneously transforms the facial features of the low resolution and the high resolution training images such that the distance between them approximates the distance had both the images been captured under the same controlled imaging conditions. Stereo matching cost is used to obtain the similarity of two images in the transformed space. Though this gives very good recognition performance, the time taken for computing the stereo matching cost is significant. To overcome this limitation, we propose a reference-based approach in which each face image is represented by its stereo matching cost from a few reference images. Experimental evaluation on the real world challenging databases and comparison with the state-of-the-art super-resolution, classifier based and cross modal synthesis techniques show the effectiveness of the proposed algorithm.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 51
    Publication Date: 2016-04-01
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 52
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-04-01
    Description: Modern crowd theories agree that collective behavior is the result of the underlying interactions among small groups of individuals. In this work, we propose a novel algorithm for detecting social groups in crowds by means of a Correlation Clustering procedure on people trajectories. The affinity between crowd members is learned through an online formulation of the Structural SVM framework and a set of specifically designed features characterizing both their physical and social identity, inspired by Proxemic theory, Granger causality, DTW and Heat-maps. To adhere to sociological observations, we introduce a loss function ( $G$ -MITRE) able to deal with the complexity of evaluating group detection performances. We show our algorithm achieves state-of-the-art results when relying on both ground truth trajectories and tracklets previously extracted by available detector/tracker systems.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 53
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-07-01
    Description: Detection and tracking humans in videos have been long-standing problems in computer vision. Most successful approaches (e.g., deformable parts models) heavily rely on discriminative models to build appearance detectors for body joints and generative models to constrain possible body configurations (e.g., trees). While these $2$ D models have been successfully applied to images (and with less success to videos), a major challenge is to generalize these models to cope with camera views. In order to achieve view-invariance, these $2$ D models typically require a large amount of training data across views that is difficult to gather and time-consuming to label. Unlike existing $2$ D models, this paper formulates the problem of human detection in videos as spatio-temporal matching (STM) between a $3$ D motion capture model and trajectories in videos. Our algorithm estimates the camera view and selects a subset of tracked trajectories that matches the motion of the $3$ D model. The STM is efficiently solved with linear programming, and it is robust to tracking mismatches, occlusions and outliers. To the best of our knowledg- this is the first paper that solves the correspondence between video and $3$ D motion capture data for human pose detection. Experiments on the CMU motion capture, Human3.6M, Berkeley MHAD and CMU MAD databases illustrate the benefits of our method over state-of-the-art approaches.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 54
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-07-01
    Description: This article tackles the problem of estimating non-rigid human 3D shape and motion from image sequences taken by uncalibrated cameras. Similar to other state-of-the-art solutions we factorize 2D observations in camera parameters, base poses and mixing coefficients. Existing methods require sufficient camera motion during the sequence to achieve a correct 3D reconstruction. To obtain convincing 3D reconstructions from arbitrary camera motion, our method is based on a-priorly trained base poses. We show that strong periodic assumptions on the coefficients can be used to define an efficient and accurate algorithm for estimating periodic motion such as walking patterns. For the extension to non-periodic motion we propose a novel regularization term based on temporal bone length constancy. In contrast to other works, the proposed method does not use a predefined skeleton or anthropometric constraints and can handle arbitrary camera motion. We achieve convincing 3D reconstructions, even under the influence of noise and occlusions. Multiple experiments based on a 3D error metric demonstrate the stability of the proposed method. Compared to other state-of-the-art methods our algorithm shows a significant improvement.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 55
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-07-01
    Description: In this work, we present an approach to fuse video with sparse orientation data obtained from inertial sensors to improve and stabilize full-body human motion capture. Even though video data is a strong cue for motion analysis, tracking artifacts occur frequently due to ambiguities in the images, rapid motions, occlusions or noise. As a complementary data source, inertial sensors allow for accurate estimation of limb orientations even under fast motions. However, accurate position information cannot be obtained in continuous operation. Therefore, we propose a hybrid tracker that combines video with a small number of inertial units to compensate for the drawbacks of each sensor type: on the one hand, we obtain drift-free and accurate position information from video data and, on the other hand, we obtain accurate limb orientations and good performance under fast motions from inertial sensors. In several experiments we demonstrate the increased performance and stability of our human motion tracker.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 56
    Publication Date: 2016-07-01
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 57
    Publication Date: 2016-07-01
    Description: We present a new gesture recognition method that is based on the conditional random field (CRF) model using multiple feature matching. Our approach solves the labeling problem, determining gesture categories and their temporal ranges at the same time. A generative probabilistic model is formalized and probability densities are nonparametrically estimated by matching input features with a training dataset. In addition to the conventional skeletal joint-based features, the appearance information near the active hand in an RGB image is exploited to capture the detailed motion of fingers. The estimated likelihood function is then used as the unary term for our CRF model. The smoothness term is also incorporated to enforce the temporal coherence of our solution. Frame-wise recognition results can then be obtained by applying an efficient dynamic programming technique. To estimate the parameters of the proposed CRF model, we incorporate the structured support vector machine (SSVM) framework that can perform efficient structured learning by using large-scale datasets. Experimental results demonstrate that our method provides effective gesture recognition results for challenging real gesture datasets. By scoring 0.8563 in the mean Jaccard index, our method has obtained the state-of-the-art results for the gesture recognition track of the 2014 ChaLearn Looking at People (LAP) Challenge.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 58
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-07-01
    Description: This paper describes a novel method called Deep Dynamic Neural Networks (DDNN) for multimodal gesture recognition. A semi-supervised hierarchical dynamic framework based on a Hidden Markov Model (HMM) is proposed for simultaneous gesture segmentation and recognition where skeleton joint information, depth and RGB images, are the multimodal input observations. Unlike most traditional approaches that rely on the construction of complex handcrafted features, our approach learns high-level spatio-temporal representations using deep neural networks suited to the input modality: a Gaussian-Bernouilli Deep Belief Network ( DBN ) to handle skeletal dynamics, and a 3D Convolutional Neural Network ( 3DCNN ) to manage and fuse batches of depth and RGB images. This is achieved through the modeling and learning of the emission probabilities of the HMM required to infer the gesture sequence. This purely data driven approach achieves a Jaccard index score of 0.81 in the ChaLearn LAP gesture spotting challenge. The performance is on par with a variety of state-of-the-art hand-tuned feature-based approaches and other learning-based methods, therefore opening the door to the use of deep learning techniques in order to further explore multimodal time series data.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 59
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-07-01
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 60
    Publication Date: 2016-07-01
    Description: Availability of handy RGB-D sensors has brought about a surge of gesture recognition research and applications. Among various approaches, one shot learning approach is advantageous because it requires minimum amount of data. Here, we provide a thorough review about one-shot learning gesture recognition from RGB-D data and propose a novel spatiotemporal feature extracted from RGB-D data, namely mixed features around sparse keypoints (MFSK). In the review, we analyze the challenges that we are facing, and point out some future research directions which may enlighten researchers in this field. The proposed MFSK feature is robust and invariant to scale, rotation and partial occlusions. To alleviate the insufficiency of one shot training samples, we augment the training samples by artificially synthesizing versions of various temporal scales, which is beneficial for coping with gestures performed at varying speed. We evaluate the proposed method on the Chalearn gesture dataset (CGD). The results show that our approach outperforms all currently published approaches on the challenging data of CGD, such as translated, scaled and occluded subsets. When applied to the RGB-D datasets that are not one-shot (e.g., the Cornell Activity Dataset-60 and MSR Daily Activity 3D dataset), the proposed feature also produces very promising results under leave-one-out cross validation or one-shot learning.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 61
    Publication Date: 2016-07-01
    Description: Facial expressions are an important way through which humans interact socially. Building a system capable of automatically recognizing facial expressions from images and video has been an intense field of study in recent years. Interpreting such expressions remains challenging and much research is needed about the way they relate to human affect. This paper presents a general overview of automatic RGB, 3D, thermal and multimodal facial expression analysis. We define a new taxonomy for the field, encompassing all steps from face detection to facial expression recognition, and describe and classify the state of the art methods accordingly. We also present the important datasets and the bench-marking of most influential methods. We conclude with a general discussion about trends, important questions and future lines of research.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 62
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-07-01
    Description: We present a method for gesture detection and localisation based on multi-scale and multi-modal deep learning. Each visual modality captures spatial information at a particular spatial scale (such as motion of the upper body or a hand), and the whole system operates at three temporal scales. Key to our technique is a training strategy which exploits: i) careful initialization of individual modalities; and ii) gradual fusion involving random dropping of separate channels (dubbed  ModDrop ) for learning cross-modality correlations while preserving uniqueness of each modality-specific representation. We present experiments on the ChaLearn 2014 Looking at People Challenge gesture recognition track, in which we placed first out of 17 teams. Fusing multiple modalities at several spatial and temporal scales leads to a significant increase in recognition rates, allowing the model to compensate for errors of the individual classifiers as well as noise in the separate channels. Futhermore, the proposed ModDrop training technique ensures robustness of the classifier to missing signals in one or several channels to produce meaningful predictions from any number of available modalities. In addition, we demonstrate the applicability of the proposed fusion scheme to modalities of arbitrary nature by experiments on the same dataset augmented with audio.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 63
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-07-01
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 64
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-07-01
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 65
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-07-01
    Description: Studying free-standing conversational groups (FCGs) in unstructured social settings (e.g., cocktail party ) is gratifying due to the wealth of information available at the group (mining social networks) and individual (recognizing native behavioral and personality traits) levels. However, analyzing social scenes involving FCGs is also highly challenging due to the difficulty in extracting behavioral cues such as target locations, their speaking activity and head/body pose due to crowdedness and presence of extreme occlusions. To this end, we propose SALSA , a novel dataset facilitating multimodal and Synergetic sociAL Scene Analysis, and make two main contributions to research on automated social interaction analysis: (1) SALSA records social interactions among 18 participants in a natural, indoor environment for over 60 minutes, under the poster presentation and cocktail party contexts presenting difficulties in the form of low-resolution images, lighting variations, numerous occlusions, reverberations and interfering sound sources; (2) To alleviate these problems we facilitate multimodal analysis by recording the social interplay using four static surveillance cameras and sociometric badges worn by each participant, comprising the microphone , accelerometer , bluetooth and infrared sensors. In addition to raw data, we also provide annotations concerning individuals’ personality as well as their position , head, body orientation and F-formation information over the entire event duration. Through extensive experiments with state-of-the-art approaches, we show (a) the limitations of current methods and (b) how the recorded multiple cues synergetically aid automatic analysis of social interactio- s. SALSA is available at h ttp://tev.fbk.eu/salsa.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 66
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-07-01
    Description: Recent introduction of low-cost depth cameras triggered a number of interesting works, pushing forward the state-of-the-art in human body pose extraction and tracking. However, despite the remarkable progress, many of the contemporary methods cope inadequately with complex scenarios, involving multiple interacting users, under the presence of severe inter- and intra-occlusions. In this work, we present a model-based approach for markerless articulated full body pose extraction and tracking in RGB-D sequences. A cylinder-based model is employed to represent the human body. For each body part a set of hypotheses is generated and tracked over time by a Particle Filter. To evaluate each hypothesis, we employ a novel metric that considers the reprojected Top View of the corresponding body part. The latter, in conjunction with depth information, effectively copes with difficult and ambiguous cases, such as severe occlusions. For evaluation purposes, we conducted several series of experiments using data from a public human action database, as well as own-collected data involving varying number of interacting users. The performance of the proposed method has been further compared against that of the Microsoft's Kinect SDK and NiTE $^{mathrm {TM}}$ using ground truth information. The results obtained attest for the effectiveness of our approach.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 67
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-07-01
    Description: Understanding human behavior through nonverbal-based features, is interesting in several applications such as surveillance, ambient assisted living and human-robot interaction. In this article in order to analyze human behaviors in social context, we propose a new approach which explores interrelations between body part motions in scenarios with people doing a conversation. The novelty of this method is that we analyze body motion-based features in frequency domain to estimate different human social patterns: Interpersonal Behaviors (IBs) and a Social Role (SR). To analyze the dynamics and interrelations of people's body motions, a human movement descriptor is used to extract discriminative features, and a multi-layer Dynamic Bayesian Network (DBN) technique is proposed to model the existent dependencies. Laban Movement Analysis (LMA) is a well-known human movement descriptor, which provides efficient mid-level information of human body motions. The mid-level information is useful to extract the complex interdependencies. The DBN technique is tested in different scenarios to model the mentioned complex dependencies. The study is applied for obtaining four IBs ( Interest , Indicator , Empathy and Emphasis ) to estimate one SR ( Leading ).The obtained results give a good indication of the capabilities of the proposed approach for people interaction analysis with potential applications in human-robot interaction.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 68
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-07-01
    Description: Recovering correlated and individual components of two, possibly temporally misaligned, sets of data is a fundamental task in disciplines such as image, vision, and behavior computing, with application to problems such as multi-modal fusion (via correlated components), predictive analysis, and clustering (via the individual ones). Here, we study the extraction of correlated and individual components under real-world conditions, namely i) the presence of gross non-Gaussian noise and ii) temporally misaligned data. In this light, we propose a method for the Robust Correlated and Individual Component Analysis (RCICA) of two sets of data in the presence of gross, sparse errors. We furthermore extend RCICA in order to handle temporal incongruities arising in the data. To this end, two suitable optimization problems are solved. The generality of the proposed methods is demonstrated by applying them onto $4$ applications, namely i) heterogeneous face recognition, ii) multi-modal feature fusion for human behavior analysis (i.e., audio-visual prediction of interest and conflict), iii) face clustering, and iv) thetemporal alignment of facial expressions. Experimental results on $2$ synthetic and $7$ real world datasets indicate the robustness and effectiveness of the proposed methodson these application domains, outperforming other state-of-the-art methods in the field.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 69
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-07-05
    Description: Automatic behavior analysis from video is a major topic in many areas of research, including computer vision, multimedia, robotics, biology, cognitive science, social psychology, psychiatry, and linguistics. Two major problems are of interest when analyzing behavior. First, we wish to automatically categorize observed behaviors into a discrete set of classes (i.e., classification). For example, to determine word production from video sequences in sign language. Second, we wish to understand the relevance of each behavioral feature in achieving this classification (i.e., decoding). For instance, to know which behavior variables are used to discriminate between the words apple and onion in American Sign Language (ASL). The present paper proposes to model behavior using a labeled graph, where the nodes define behavioral features and the edges are labels specifying their order (e.g., before, overlaps, start). In this approach, classification reduces to a simple labeled graph matching. Unfortunately, the complexity of labeled graph matching grows exponentially with the number of categories we wish to represent. Here, we derive a graph kernel to quickly and accurately compute this graph similarity. This approach is very general and can be plugged into any kernel-based classifier. Specifically, we derive a Labeled Graph Support Vector Machine (LGSVM) and a Labeled Graph Logistic Regressor (LGLR) that can be readily employed to discriminate between many actions (e.g., sign language concepts). The derived approach can be readily used for decoding too, yielding invaluable information for the understanding of a problem (e.g., to know how to teach a sign language). The derived algorithms allow us to achieve higher accuracy results than those of state-of-the-art algorithms in a fraction of the time. We show experimental results on a variety of problems and datasets, including multimodal data.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 70
    Publication Date: 2016-07-01
    Description: In this paper we present a novel real-time algorithm for simultaneous pose and shape estimation for articulated objects, such as human beings and animals. The key of our pose estimation component is to embed the articulated deformation model with exponential-maps-based parametrization into a Gaussian Mixture Model. Benefiting from this probabilistic measurement model, our algorithm requires no explicit point correspondences as opposed to most existing methods. Consequently, our approach is less sensitive to local minimum and handles fast and complex motions well. Moreover, our novel shape adaptation algorithm based on the same probabilistic model automatically captures the shape of the subjects during the dynamic pose estimation process. The personalized shape model in turn improves the tracking accuracy. Furthermore, we propose novel approaches to use either a mesh model or a sphere-set model as the template for both pose and shape estimation under this unified framework. Extensive evaluations on publicly available data sets demonstrate that our method outperforms most state-of-the-art pose estimation algorithms with large margin, especially in the case of challenging motions. Furthermore, our shape estimation method achieves comparable accuracy with state of the arts, yet requires neither statistical shape model nor extra calibration procedure. Our algorithm is not only accurate but also fast, we have implemented the entire processing pipeline on GPU. It can achieve up to 60 frames per second on a middle-range graphics card.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 71
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-07-01
    Description: In this paper, we propose a novel binary local representation for RGB-D video data fusion with a structure-preserving projection. Our contribution consists of two aspects. Toacquire a general feature for the video data, we convert the problem to describing the gradient fields of RGB and depth information of video sequences. With the local fluxes of the gradient fields, which include the orientation and the magnitude of the neighborhood of each point, a new kind of continuous local descriptor called Local Flux Feature(LFF) is obtained. Then the LFFs from RGB and depth channels are fused into a Hamming space via the Structure Preserving Projection (SPP). Specifically, an orthogonal projection matrix is applied to preserve the pairwise structure with a shape constraint to avoid the collapse of data structure in the projected space. Furthermore, a bipartite graph structure of data is taken into consideration, which is regarded as a higher level connection between samples and classes than the pairwise structure of local features. Theextensive experiments show not only the high efficiency of binary codes and the effectiveness of combining LFFs from RGB-D channels via SPP on various action recognition benchmarks of RGB-D data, but also the potential power of LFF for general action recognition.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 72
    Publication Date: 2016-07-05
    Description: Combining multimodal concept streams from heterogeneous sensors is a problem superficially explored for activity recognition. Most studies explore simple sensors in nearly perfect conditions, where temporal synchronization is guaranteed. Sophisticated fusion schemes adopt problem-specific graphical representations of events that are generally deeply linked with their training data and focused on a single sensor. This paper proposes a hybrid framework between knowledge-driven and probabilistic-driven methods for event representation and recognition. It separates semantic modeling from raw sensor data by using an intermediate semantic representation, namely concepts. It introduces an algorithm for sensor alignment that uses concept similarity as a surrogate for the inaccurate temporal information of real life scenarios. Finally, it proposes the combined use of an ontology language, to overcome the rigidity of previous approaches at model definition, and a probabilistic interpretation for ontological models, which equips the framework with a mechanism to handle noisy and ambiguous concept observations, an ability that most knowledge-driven methods lack. We evaluate our contributions in multimodal recordings of elderly people carrying out IADLs. Results demonstrated that the proposed framework outperforms baseline methods both in event recognition performance and in delimiting the temporal boundaries of event instances.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 73
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-06-03
    Description: Provides a listing of the editors, board members, and current staff for this issue of the publication.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 74
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-06-03
    Description: The papers in this special section were presented at the IEEE Computer Vision and Pattern Recognition (CVPR), June, 2014, jointly sponsored by the IEEE and the Computer Vision Foundation.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 75
    Publication Date: 2016-06-03
    Description: Psychophysical studies show motion cues inform about shape even with unknown reflectance. Recent works in computer vision have considered shape recovery for an object of unknown BRDF using light source or object motions. This paper proposes a theory that addresses the remaining problem of determining shape from the (small or differential) motion of the camera, for unknown isotropic BRDFs. Our theory derives a differential stereo relation that relates camera motion to surface depth, which generalizes traditional Lambertian assumptions. Under orthographic projection, we show differential stereo may not determine shape for general BRDFs, but suffices to yield an invariant for several restricted (still unknown) BRDFs exhibited by common materials. For the perspective case, we show that differential stereo yields the surface depth for unknown isotropic BRDF and unknown directional lighting, while additional constraints are obtained with restrictions on the BRDF or lighting. The limits imposed by our theory are intrinsic to the shape recovery problem and independent of choice of reconstruction method. We also illustrate trends shared by theories on shape from differential motion of light source, object or camera, to relate the hardness of surface reconstruction to the complexity of imaging setup.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 76
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-06-03
    Description: In recent years, sparse coding has been widely used in many applications ranging from image processing to pattern recognition. Most existing sparse coding based applications require solving a class of challenging non-smooth and non-convex optimization problems. Despite the fact that many numerical methods have been developed for solving these problems, it remains an open problem to find a numerical method which is not only empirically fast, but also has mathematically guaranteed strong convergence. In this paper, we propose an alternating iteration scheme for solving such problems. A rigorous convergence analysis shows that the proposed method satisfies the global convergence property: the whole sequence of iterates is convergent and converges to a critical point. Besides the theoretical soundness, the practical benefit of the proposed method is validated in applications including image restoration and recognition. Experiments show that the proposed method achieves similar results with less computation when compared to widely used methods such as K-SVD.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 77
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-06-03
    Description: In recent years, fluorescence analysis of scenes has received attention in computer vision. Fluorescence can provide additional information about scenes, and has been used in applications such as camera spectral sensitivity estimation, 3D reconstruction, and color relighting. In particular, hyperspectral images of reflective-fluorescent scenes provide a rich amount of data. However, due to the complex nature of fluorescence, hyperspectral imaging methods rely on specialized equipment such as hyperspectral cameras and specialized illuminants. In this paper, we propose a more practical approach to hyperspectral imaging of reflective-fluorescent scenes using only a conventional RGB camera and varied colored illuminants. The key idea of our approach is to exploit a unique property of fluorescence: the chromaticity of fluorescent emissions are invariant under different illuminants. This allows us to robustly estimate spectral reflectance and fluorescent emission chromaticity. We then show that given the spectral reflectance and fluorescent chromaticity, the fluorescence absorption and emission spectra can also be estimated. We demonstrate in results that all scene spectra can be accurately estimated from RGB images. Finally, we show that our method can be used to accurately relight scenes under novel lighting.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 78
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-06-03
    Description: Distributed algorithms have recently gained immense popularity. With regards to computer vision applications, distributed multi-target tracking in a camera network is a fundamental problem. The goal is for all cameras to have accurate state estimates for all targets. Distributed estimation algorithms work by exchanging information between sensors that are communication neighbors. Vision-based distributed multi-target state estimation has at least two characteristics that distinguishes it from other applications. First, cameras are directional sensors and often neighboring sensors may not be sensing the same targets, i.e., they are naive with respect to that target. Second, in the presence of clutter and multiple targets, each camera must solve a data association problem. This paper presents an information-weighted, consensus-based, distributed multi-target tracking algorithm referred to as the Multi-target Information Consensus (MTIC) algorithm that is designed to address both the naivety and the data association problems. It converges to the centralized minimum mean square error estimate. The proposed MTIC algorithm and its extensions to non-linear camera models, termed as the Extended MTIC (EMTIC), are robust to false measurements and limited resources like power, bandwidth and the real-time operational requirements. Simulation and experimental analysis are provided to support the theoretical results.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 79
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-06-03
    Description: While data has certainly taken the center stage in computer vision in recent years, it can still be difficult to obtain in certain scenarios. In particular, acquiring ground truth 3D shapes of objects pictured in 2D images remains a challenging feat and this has hampered progress in recognition-based object reconstruction from a single image. Here we propose to bypass previous solutions such as 3D scanning or manual design, that scale poorly, and instead populate object category detection datasets semi-automatically with dense, per-object 3D reconstructions, bootstrapped from:(i) class labels, (ii) ground truth figure-ground segmentations and (iii) a small set of keypoint annotations. Our proposed algorithm first estimates camera viewpoint using rigid structure-from-motion and then reconstructs object shapes by optimizing over visual hull proposals guided by loose within-class shape similarity assumptions. The visual hull sampling process attempts to intersect an object's projection cone with the cones of minimal subsets of other similar objects among those pictured from certain vantage points. We show that our method is able to produce convincing per-object 3D reconstructions and to accurately estimate cameras viewpoints on one of the most challenging existing object-category detection datasets, PASCAL VOC. We hope that our results will re-stimulate interest on joint object recognition and 3D reconstruction from a single image.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 80
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-06-03
    Description: We consider the problem of deliberately manipulating the direct and indirect light flowing through a time-varying, general scene in order to simplify its visual analysis. Our approach rests on a crucial link between stereo geometry and light transport: while direct light always obeys the epipolar geometry of a projector-camera pair, indirect light overwhelmingly does not. We show that it is possible to turn this observation into an imaging method that analyzes light transport in real time in the optical domain, prior to acquisition. This yields three key abilities that we demonstrate in an experimental camera prototype: (1) producing a live indirect-only video stream for any scene, regardless of geometric or photometric complexity; (2) capturing images that make existing structured-light shape recovery algorithms robust to indirect transport; and (3) turning them into one-shot methods for dynamic 3D shape capture.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 81
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-06-03
    Description: We propose a novel algorithm to cluster and annotate a set of input images jointly, where the images are clustered into several discriminative groups and each group is identified with representative labels automatically. For these purposes, each input image is first represented by a distribution of candidate labels based on its similarity to images in a labeled reference image database. A set of these label-based representations are then refined collectively through a non-negative matrix factorization with sparsity and orthogonality constraints; the refined representations are employed to cluster and annotate the input images jointly. The proposed approach demonstrates performance improvements in image clustering over existing techniques, and illustrates competitive image labeling accuracy in both quantitative and qualitative evaluation. In addition, we extend our joint clustering and labeling framework to solving the weakly-supervised image classification problem and obtain promising results.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 82
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-06-03
    Description: We consider the energy minimization problem for undirected graphical models, also known as MAP-inference problem for Markov random fields which is NP-hard in general. We propose a novel polynomial time algorithm to obtain a part of its optimal non-relaxed integral solution. Our algorithm is initialized with variables taking integral values in the solution of a convex relaxation of the MAP-inference problem and iteratively prunes those, which do not satisfy our criterion for partial optimality. We show that our pruning strategy is in a certain sense theoretically optimal. Also empirically our method outperforms previous approaches in terms of the number of persistently labelled variables. The method is very general, as it is applicable to models with arbitrary factors of an arbitrary order and can employ any solver for the considered relaxed problem. Our method's runtime is determined by the runtime of the convex relaxation solver for the MAP-inference problem.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 83
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-06-03
    Description: These instructions give guidelines for preparing papers for this publication. Presents information for authors publishing in this journal.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 84
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-06-03
    Description: Pipelines to recognize 3D objects despite clutter and occlusions usually end up with a final verification stage whereby recognition hypotheses are validated or dismissed based on how well they explain sensor measurements. Unlike previous work, we propose a Global Hypothesis Verification (GHV) approach which regards all hypotheses jointly so as to account for mutual interactions. GHV provides a principled framework to tackle the complexity of our visual world by leveraging on a plurality of recognition paradigms and cues. Accordingly, we present a 3D object recognition pipeline deploying both global and local 3D features as well as shape and color. Thereby, and facilitated by the robustness of the verification process, diverse object hypotheses can be gathered and weak hypotheses need not be suppressed too early to trade sensitivity for specificity. Experiments demonstrate the effectiveness of our proposal, which significantly improves over the state-of-art and attains ideal performance (no false negatives, no false positives) on three out of the six most relevant and challenging benchmark datasets.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 85
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-06-03
    Description: Nonlinear dimensionality reduction methods have demonstrated top-notch performance in many pattern recognition and image classification tasks. Despite their popularity, they suffer from highly expensive time and memory requirements, which render them inapplicable to large-scale datasets. To leverage such cases we propose a new method called “Path-Based Isomap”. Similar to Isomap, we exploit geodesic paths to find the low-dimensional embedding. However, instead of preserving pairwise geodesic distances, the low-dimensional embedding is computed via a path-mapping algorithm. Due to the much fewer number of paths compared to number of data points, a significant improvement in time and memory complexity with a comparable performance is achieved. The method demonstrates state-of-the-art performance on well-known synthetic and real-world datasets, as well as in the presence of noise.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 86
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-06-03
    Description: Attributes act as intermediate representations that enable parameter sharing between classes, a must when training data is scarce. We propose to view attribute-based image classification as a label-embedding problem: each class is embedded in the space of attribute vectors. We introduce a function that measures the compatibility between an image and a label embedding. The parameters of this function are learned on a training set of labeled samples to ensure that, given an image, the correct classes rank higher than the incorrect ones. Results on the Animals With Attributes and Caltech-UCSD-Birds datasets show that the proposed framework outperforms the standard Direct Attribute Prediction baseline in a zero-shot learning scenario. Label embedding enjoys a built-in ability to leverage alternative sources of information instead of or in addition to attributes, such as, e.g., class hierarchies or textual descriptions. Moreover, label embedding encompasses the whole range of learning settings from zero-shot learning to regular learning with a large number of labeled examples.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 87
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-06-03
    Description: Provides a listing of board members, committee members, editors, and society officers.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 88
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-06-03
    Description: Finding the centerline and estimating the radius of linear structures is a critical first step in many applications, ranging from road delineation in 2D aerial images to modeling blood vessels, lung bronchi, and dendritic arbors in 3D biomedical image stacks. Existing techniques rely either on filters designed to respond to ideal cylindrical structures or on classification techniques. The former tend to become unreliable when the linear structures are very irregular while the latter often has difficulties distinguishing centerline locations from neighboring ones, thus losing accuracy. We solve this problem by reformulating centerline detection in terms of a regression problem. We first train regressors to return the distances to the closest centerline in scale-space, and we apply them to the input images or volumes. The centerlines and the corresponding scale then correspond to the regressors local maxima, which can be easily identified. We show that our method outperforms state-of-the-art techniques for various 2D and 3D datasets. Moreover, our approach is very generic and also performs well on contour detection. We show an improvement above recent contour detection algorithms on the BSDS500 dataset.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 89
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-06-03
    Description: We describe a learning-based approach to blind image deconvolution. It uses a deep layered architecture, parts of which are borrowed from recent work on neural network learning, and parts of which incorporate computations that are specific to image deconvolution. The system is trained end-to-end on a set of artificially generated training examples, enabling competitive performance in blind deconvolution, both with respect to quality and runtime.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 90
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-06-03
    Description: This paper tackles the supervised evaluation of image segmentation and object proposal algorithms. It surveys, structures, and deduplicates the measures used to compare both segmentation results and object proposals with a ground truth database; and proposes a new measure: the precision-recall for objects and parts. To compare the quality of these measures, eight state-of-the-art object proposal techniques are analyzed and two quantitative meta-measures involving nine state of the art segmentation methods are presented. The meta-measures consist in assuming some plausible hypotheses about the results and assessing how well each measure reflects these hypotheses. As a conclusion of the performed experiments, this paper proposes the tandem of precision-recall curves for boundaries and for objects-and-parts as the tool of choice for the supervised evaluation of image segmentation. We make the datasets and code of all the measures publicly available.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 91
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-06-03
    Description: Linear discriminant analysis (LDA) is a powerful technique in pattern recognition to reduce the dimensionality of data vectors. It maximizes discriminability by retaining only those directions that minimize the ratio of within-class and between-class variance. In this paper, using the same principles as for conventional LDA, we propose to employ uncertainties of the noisy or distorted input data in order to estimate maximally discriminant directions. We demonstrate the efficiency of the proposed uncertain LDA on two applications using state-of-the-art techniques. First, we experiment with an automatic speech recognition task, in which the uncertainty of observations is imposed by real-world additive noise. Next, we examine a full-scale speaker recognition system, considering the utterance duration as the source of uncertainty in authenticating a speaker. The experimental results show that when employing an appropriate uncertainty estimation algorithm, uncertain LDA outperforms its conventional LDA counterpart.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 92
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-09-02
    Description: We address the problem of 3D pose estimation of multiple humans from multiple views. The transition from single to multiple human pose estimation and from the 2D to 3D space is challenging due to a much larger state space, occlusions and across-view ambiguities when not knowing the identity of the humans in advance. To address these problems, we first create a reduced state space by triangulation of corresponding pairs of body parts obtained by part detectors for each camera view. In order to resolve ambiguities of wrong and mixed parts of multiple humans after triangulation and also those coming from false positive detections, we introduce a 3D pictorial structures (3DPS) model. Our model builds on multi-view unary potentials, while a prior model is integrated into pairwise and ternary potential functions. To balance the potentials’ influence, the model parameters are learnt using a Structured SVM (SSVM). The model is generic and applicable to both single and multiple human pose estimation. To evaluate our model on single and multiple human pose estimation, we rely on four different datasets. We first analyse the contribution of the potentials and then compare our results with related work where we demonstrate superior performance.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 93
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-09-02
    Description: This paper aims to accelerate the test-time computation of convolutional neural networks (CNNs), especially very deep CNNs [1] that have substantially impacted the computer vision community. Unlike previous methods that are designed for approximating linear filters or linear responses, our method takes the nonlinear units into account. We develop an effective solution to the resulting nonlinear optimization problem without the need of stochastic gradient descent (SGD). More importantly, while previous methods mainly focus on optimizing one or two layers, our nonlinear method enables an asymmetric reconstruction that reduces the rapidly accumulated error when multiple (e.g., $ge$ 10) layers are approximated. For the widely used very deep VGG-16 model [1] , our method achieves a whole-model speedup of 4 $times$ with merely a 0.3 percent increase of top-5 error in ImageNet classification. Our 4 $times$ accelerated VGG-16 model also shows a graceful accuracy degradation for object detection when plugged into the Fast R-CNN detector [2] .
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 94
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-02-05
    Description: Tensor clustering is an important tool that exploits intrinsically rich structures in real-world multiarray or Tensor datasets. Often in dealing with those datasets, standard practice is to use subspace clustering that is based on vectorizing multiarray data. However, vectorization of tensorial data does not exploit complete structure information. In this paper, we propose a subspace clustering algorithm without adopting any vectorization process. Our approach is based on a novel heterogeneous Tucker decomposition model taking into account cluster membership information. We propose a new clustering algorithm that alternates between different modes of the proposed heterogeneous tensor model. All but the last mode have closed-form updates. Updating the last mode reduces to optimizing over the multinomial manifold for which we investigate second order Riemannian geometry and propose a trust-region algorithm. Numerical experiments show that our proposed algorithm compete effectively with state-of-the-art clustering algorithms that are based on tensor factorization.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 95
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-02-05
    Description: To perform unconstrained face recognition robust to variations in illumination, pose and expression, this paper presents a new scheme to extract “Multi-Directional Multi-Level Dual-Cross Patterns” (MDML-DCPs) from face images. Specifically, the MDML-DCPs scheme exploits the first derivative of Gaussian operator to reduce the impact of differences in illumination and then computes the DCP feature at both the holistic and component levels. DCP is a novel face image descriptor inspired by the unique textural structure of human faces. It is computationally efficient and only doubles the cost of computing local binary patterns, yet is extremely robust to pose and expression variations. MDML-DCPs comprehensively yet efficiently encodes the invariant characteristics of a face image from multiple levels into patterns that are highly discriminative of inter-personal differences but robust to intra-personal variations. Experimental results on the FERET, CAS-PERL-R1, FRGC 2.0, and LFW databases indicate that DCP outperforms the state-of-the-art local descriptors (e.g., LBP, LTP, LPQ, POEM, tLBP, and LGXP) for both face identification and face verification tasks. More impressively, the best performance is achieved on the challenging LFW and FRGC 2.0 databases by deploying MDML-DCPs in a simple recognition scheme.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 96
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-02-05
    Description: Large image datasets such as ImageNet or open-ended photo websites like Flickr are revealing new challenges to image classification that were not apparent in smaller, fixed sets. In particular, the efficient handling of dynamically growing datasets, where not only the amount of training data but also the number of classes increases over time, is a relatively unexplored problem. In this challenging setting, we study how two variants of Random Forests (RF) perform under four strategies to incorporate new classes while avoiding to retrain the RFs from scratch. The various strategies account for different trade-offs between classification accuracy and computational efficiency. In our extensive experiments, we show that both RF variants, one based on Nearest Class Mean classifiers and the other on SVMs, outperform conventional RFs and are well suited for incrementally learning new classes. In particular, we show that RFs initially trained with just 10 classes can be extended to 1,000 classes with an acceptable loss of accuracy compared to training from the full data and with great computational savings compared to retraining for each new batch of classes.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 97
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-02-05
    Description: The problem of estimating subjective visual properties from image and video has attracted increasing interest. A subjective visual property is useful either on its own (e.g. image and video interestingness) or as an intermediate representation for visual recognition (e.g. a relative attribute). Due to its ambiguous nature, annotating the value of a subjective visual property for learning a prediction model is challenging. To make the annotation more reliable, recent studies employ crowdsourcing tools to collect pairwise comparison labels. However, using crowdsourced data also introduces outliers. Existing methods rely on majority voting to prune the annotation outliers/errors. They thus require a large amount of pairwise labels to be collected. More importantly as a local outlier detection method, majority voting is ineffective in identifying outliers that can cause global ranking inconsistencies. In this paper, we propose a more principled way to identify annotation outliers by formulating the subjective visual property prediction task as a unified robust learning to rank problem, tackling both the outlier detection and learning to rank jointly. This differs from existing methods in that (1) the proposed method integrates local pairwise comparison labels together to minimise a cost that corresponds to global inconsistency of ranking order, and (2) the outlier detection and learning to rank problems are solved jointly. This not only leads to better detection of annotation outliers but also enables learning with extremely sparse annotations.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 98
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-02-05
    Description: In this paper, we study a classification problem in which sample labels are randomly corrupted. In this scenario, there is an unobservable sample with noise-free labels. However, before being observed, the true labels are independently flipped with a probability $rho in [0,0.5)$ , and the random label noise can be class-conditional. Here, we address two fundamental problems raised by this scenario. The first is how to best use the abundant surrogate loss functions designed for the traditional classification problem when there is label noise. We prove that any surrogate loss function can be used for classification with noisy labels by using importance reweighting, with consistency assurance that the label noise does not ultimately hinder the search for the optimal classifier of the noise-free sample. The other is the open problem of how to obtain the noise rate $rho$ . We show that the rate is upper bounded by the conditional probability $P(hat{Y}|X)$ of the noisy sample. Consequently, the rate can be estimated, because the upper bound can be easily reached in classification problems. Experimental results on synthetic and real datasets confirm the efficiency of our methods.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 99
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-02-05
    Description: Solving the problem of matching people across non-overlapping multi-camera views, known as person re-identification (re-id), has received increasing interests in computer vision. In a real-world application scenario, a watch-list (gallery set) of a handful of known target people are provided with very few (in many cases only a single) image(s) (shots) per target. Existing re-id methods are largely unsuitable to address this open-world re-id challenge because they are designed for (1) a closed-world scenario where the gallery and probe sets are assumed to contain exactly the same people, (2) person-wise identification whereby the model attempts to verify exhaustively against each individual in the gallery set, and (3) learning a matching model using multi-shots. In this paper, a novel transfer local relative distance comparison (t-LRDC) model is formulated to address the open-world person re-identification problem by one-shot group-based verification. The model is designed to mine and transfer useful information from a labelled open-world non-target dataset. Extensive experiments demonstrate that the proposed approach outperforms both non-transfer learning and existing transfer learning based re-id methods.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 100
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-02-05
    Description: The recently proposed low-rank representation (LRR) method has been empirically shown to be useful in various tasks such as motion segmentation, image segmentation, saliency detection and face recognition. While potentially powerful, LRR depends heavily on the configuration of its key parameter, $lambda$ . In realistic environments where the prior knowledge about data is lacking, however, it is still unknown how to choose $lambda$ in a suitable way. Even more, there is a lack of rigorous analysis about the success conditions of the method, and thus the significance of LRR is a little bit vague. In this paper we therefore establish a theoretical analysis for LRR, striving for figuring out under which conditions LRR can be successful, and deriving a moderately good estimate to the key parameter $lambda$ as well. Simulations on synthetic data points and experiments on real motion sequences verify our claims.
    Print ISSN: 0162-8828
    Electronic ISSN: 1939-3539
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
Close ⊗
This website uses cookies and the analysis tool Matomo. More information can be found here...