ALBERT

All Library Books, journals and Electronic Records Telegrafenberg

Your email was sent successfully. Check your inbox.

An error occurred while sending the email. Please try again.

Proceed reservation?

Export
Filter
  • Articles  (1,827)
  • Data
  • Springer  (1,827)
  • EURASIP Journal on Image and Video Processing  (316)
  • EURASIP Journal on Audio, Speech, and Music Processing  (198)
  • 82739
  • 87676
  • Electrical Engineering, Measurement and Control Technology  (1,827)
Collection
  • Articles  (1,827)
  • Data
Publisher
  • Springer  (1,827)
Years
Topic
  • Electrical Engineering, Measurement and Control Technology  (1,827)
  • 1
    Publication Date: 2020-08-31
    Print ISSN: 1687-5281
    Electronic ISSN: 1687-5176
    Topics: Electrical Engineering, Measurement and Control Technology
    Published by Springer
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 2
    Publication Date: 2020-07-08
    Print ISSN: 1687-5281
    Electronic ISSN: 1687-5176
    Topics: Electrical Engineering, Measurement and Control Technology
    Published by Springer
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 3
    Publication Date: 2015-08-11
    Description: The interaction of users with search services has been recognized as an important mechanism for expressing and handling user information needs. One traditional approach for supporting such interactive search relies on exploiting relevance feedbacks (RF) in the searching process. For large-scale multimedia collections, however, the user efforts required in RF search sessions is considerable. In this paper, we address this issue by proposing a novel semi-supervised approach for implementing RF-based search services. In our approach, supervised learning is performed taking advantage of relevance labels provided by users. Later, an unsupervised learning step is performed with the objective of extracting useful information from the intrinsic dataset structure. Furthermore, our hybrid learning approach considers feedbacks of different users, in collaborative image retrieval (CIR) scenarios. In these scenarios, the relationships among the feedbacks provided by different users are exploited, further reducing the collective efforts. Conducted experiments involving shape, color, and texture datasets demonstrate the effectiveness of the proposed approach. Similar results are also observed in experiments considering multimodal image retrieval tasks.
    Print ISSN: 1687-5281
    Electronic ISSN: 1687-5176
    Topics: Electrical Engineering, Measurement and Control Technology
    Published by Springer
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 4
    Publication Date: 2015-08-08
    Description: Spoken term detection (STD) aims at retrieving data from a speech repository given a textual representation of the search term. Nowadays, it is receiving much interest due to the large volume of multimedia information. STD differs from automatic speech recognition (ASR) in that ASR is interested in all the terms/words that appear in the speech data, whereas STD focuses on a selected list of search terms that must be detected within the speech data. This paper presents the systems submitted to the STD ALBAYZIN 2014 evaluation, held as a part of the ALBAYZIN 2014 evaluation campaign within the context of the IberSPEECH 2014 conference. This is the first STD evaluation that deals with Spanish language. The evaluation consists of retrieving the speech files that contain the search terms, indicating their start and end times within the appropriate speech file, along with a score value that reflects the confidence given to the detection of the search term. The evaluation is conducted on a Spanish spontaneous speech database, which comprises a set of talks from workshops and amounts to about 7 h of speech. We present the database, the evaluation metrics, the systems submitted to the evaluation, the results, and a detailed discussion. Four different research groups took part in the evaluation. Evaluation results show reasonable performance for moderate out-of-vocabulary term rate. This paper compares the systems submitted to the evaluation and makes a deep analysis based on some search term properties (term length, in-vocabulary/out-of-vocabulary terms, single-word/multi-word terms, and in-language/foreign terms).
    Print ISSN: 1687-4714
    Topics: Electrical Engineering, Measurement and Control Technology
    Published by Springer
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 5
    Publication Date: 2015-08-14
    Description: Support vector machines (SVMs) have played an important role in the state-of-the-art language recognition systems. The recently developed extreme learning machine (ELM) tends to have better scalability and achieve similar or much better generalization performance at much faster learning speed than traditional SVM. Inspired by the excellent feature of ELM, in this paper, we propose a novel method called regularized minimum class variance extreme learning machine (RMCVELM) for language recognition. The RMCVELM aims at minimizing empirical risk, structural risk, and the intra-class variance of the training data in the decision space simultaneously. The proposed method, which is computationally inexpensive compared to SVM, suggests a new classifier for language recognition and is evaluated on the 2009 National Institute of Standards and Technology (NIST) language recognition evaluation (LRE). Experimental results show that the proposed RMCVELM obtains much better performance than SVM. In addition, the RMCVELM can also be applied to the popular i-vector space and get comparable results to the existing scoring methods.
    Print ISSN: 1687-4714
    Topics: Electrical Engineering, Measurement and Control Technology
    Published by Springer
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 6
    Publication Date: 2015-08-15
    Description: We investigate the automatic recognition of emotions in the singing voice and study the worth and role of a variety of relevant acoustic parameters. The data set contains phrases and vocalises sung by eight renowned professional opera singers in ten different emotions and a neutral state. The states are mapped to ternary arousal and valence labels. We propose a small set of relevant acoustic features basing on our previous findings on the same data and compare it with a large-scale state-of-the-art feature set for paralinguistics recognition, the baseline feature set of the Interspeech 2013 Computational Paralinguistics ChallengE (ComParE). A feature importance analysis with respect to classification accuracy and correlation of features with the targets is provided in the paper. Results show that the classification performance with both feature sets is similar for arousal, while the ComParE set is superior for valence. Intra singer feature ranking criteria further improve the classification accuracy in a leave-one-singer-out cross validation significantly.
    Print ISSN: 1687-4714
    Topics: Electrical Engineering, Measurement and Control Technology
    Published by Springer
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 7
    Publication Date: 2015-09-11
    Description: This paper proposes a local intensity distribution equalization (LIDE) method for image enhancement. LIDE applies the idea of histogram equalization to parametric model in order to enhance an image using local information. It reduces the amount of computational resources required by traditional method like the adaptive histogram equalization, but allows enhancing detail similar to the latter technique. Integral image was used to efficiently estimate local statistics needed by the parametric model. This data structure drastically reduces the computational cost especially for megapixel image where a large local window is preferred. It should be noted that, with a large local window, the intensity distribution could contain multiple peaks. LIDE can nicely handle such complex distribution via mixture of parametric models. To speed-up the mixture parameter estimation, we propose an EM algorithm that is also based on the integral image data structure. Experimental results show that LIDE produces an enhanced image with greater detail and lower noise compared to several existing methods.
    Print ISSN: 1687-5281
    Electronic ISSN: 1687-5176
    Topics: Electrical Engineering, Measurement and Control Technology
    Published by Springer
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 8
    Publication Date: 2015-09-15
    Description: In recent years, deep learning has not only permeated the computer vision and speech recognition research fields but also fields such as acoustic event detection (AED). One of the aims of AED is to detect and classify non-speech acoustic events occurring in conversation scenes including those produced by both humans and the objects that surround us. In AED, deep learning has enabled modeling of detail-rich features, and among these, high resolution spectrograms have shown a significant advantage over existing predefined features (e.g., Mel-filter bank) that compress and reduce detail. In this paper, we further asses the importance of feature extraction for deep learning-based acoustic event detection. AED, based on spectrogram-input deep neural networks, exploits the fact that sounds have “global” spectral patterns, but sounds also have “local” properties such as being more transient or smoother in the time-frequency domain. These can be exposed by adjusting the time-frequency resolution used to compute the spectrogram, or by using a model that exploits locality leading us to explore two different feature extraction strategies in the context of deep learning: (1) using multiple resolution spectrograms simultaneously and analyzing the overall and event-wise influence to combine the results, and (2) introducing the use of convolutional neural networks (CNN), a state of the art 2D feature extraction model that exploits local structures, with log power spectrogram input for AED. An experimental evaluation shows that the approaches we describe outperform our state-of-the-art deep learning baseline with a noticeable gain in the CNN case and provides insights regarding CNN-based spectrogram characterization for AED.
    Print ISSN: 1687-4714
    Topics: Electrical Engineering, Measurement and Control Technology
    Published by Springer
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 9
    Publication Date: 2015-09-26
    Description: The identity of musical instruments is reflected in the acoustic attributes of musical notes played with them. Recently, it has been argued that these characteristics of musical identity (or timbre) can be best captured through an analysis that encompasses both time and frequency domains; with a focus on the modulations or changes in the signal in the spectrotemporal space. This representation mimics the spectrotemporal receptive field (STRF) analysis believed to underlie processing in the central mammalian auditory system, particularly at the level of primary auditory cortex. How well does this STRF representation capture timbral identity of musical instruments in continuous solo recordings remains unclear. The current work investigates the applicability of the STRF feature space for instrument recognition in solo musical phrases and explores best approaches to leveraging knowledge from isolated musical notes for instrument recognition in solo recordings. The study presents an approach for parsing solo performances into their individual note constituents and adapting back-end classifiers using support vector machines to achieve a generalization of instrument recognition to off-the-shelf, commercially available solo music.
    Print ISSN: 1687-4714
    Topics: Electrical Engineering, Measurement and Control Technology
    Published by Springer
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 10
    Publication Date: 2015-11-22
    Description: This paper presents a detailed study about different algorithmic configurations for estimating soft biometric traits. In particular, a recently introduced common framework is the starting point of the study: it includes an initial facial detection, the subsequent facial traits description, the data reduction step, and the final classification step. The algorithmic configurations are featured by different descriptors and different strategies to build the training dataset and to scale the data in input to the classifier. Experimental proofs have been carried out on both publicly available datasets and image sequences specifically acquired in order to evaluate the performance even under real-world conditions, i.e., in the presence of scaling and rotation.
    Print ISSN: 1687-5281
    Electronic ISSN: 1687-5176
    Topics: Electrical Engineering, Measurement and Control Technology
    Published by Springer
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 11
    Publication Date: 2015-11-26
    Description: The need to have a large amount of parallel data is a large hurdle for the practical use of voice conversion (VC). This paper presents a novel framework of exemplar-based VC that only requires a small number of parallel exemplars. In our previous work, a VC technique using non-negative matrix factorization (NMF) for noisy environments was proposed. This method requires parallel exemplars (which consist of the source exemplars and target exemplars that have the same texts uttered by the source and target speakers) for dictionary construction. In the framework of conventional Gaussian mixture model (GMM)-based VC, some approaches that do not need parallel exemplars have been proposed. However, in the framework of exemplar-based VC for noisy environments, such a method has never been proposed. In this paper, an adaptation matrix in an NMF framework is introduced to adapt the source dictionary to the target dictionary. This adaptation matrix is estimated using only a small parallel speech corpus. We refer to this method as affine NMF, and the effectiveness of this method has been confirmed by comparing its effectiveness with that of a conventional NMF-based method and a GMM-based method in noisy environments.
    Print ISSN: 1687-4714
    Topics: Electrical Engineering, Measurement and Control Technology
    Published by Springer
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 12
    Publication Date: 2016-07-15
    Description: This paper presents a video summarization method that is specifically for the static summary of consumer videos. Considering that the consumer videos usually have unclear shot boundaries and many low-quality o...
    Print ISSN: 1687-5281
    Electronic ISSN: 1687-5176
    Topics: Electrical Engineering, Measurement and Control Technology
    Published by Springer
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 13
    Publication Date: 2016-07-09
    Description: A new voice activity detection algorithm based on long-term pitch divergence is presented. The long-term pitch divergence not only decomposes speech signals with a bionic decomposition but also makes full use ...
    Print ISSN: 1687-4714
    Topics: Electrical Engineering, Measurement and Control Technology
    Published by Springer
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 14
    Publication Date: 2016-03-24
    Description: Besides a high distinctiveness, robustness (or invariance) to image degradations is very desirable for texture feature extraction methods in real-world applications. In this paper, focus is on making arbitrary...
    Print ISSN: 1687-5281
    Electronic ISSN: 1687-5176
    Topics: Electrical Engineering, Measurement and Control Technology
    Published by Springer
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 15
    Publication Date: 2016-07-10
    Description: During the last two decades, satisfactory results have been obtained for face identification techniques based on frontal pose. However, face identification from uncontrolled pose remains a challenging open pro...
    Print ISSN: 1687-5281
    Electronic ISSN: 1687-5176
    Topics: Electrical Engineering, Measurement and Control Technology
    Published by Springer
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 16
    Publication Date: 2016-07-26
    Description: Image super-resolution has wide applications in biomedical imaging, computer vision, image recognition, etc. In this paper, we present a fast single-image super-resolution method based on deconvolution strateg...
    Print ISSN: 1687-5281
    Electronic ISSN: 1687-5176
    Topics: Electrical Engineering, Measurement and Control Technology
    Published by Springer
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 17
    Publication Date: 2016-07-28
    Description: The depth-image-based rendering (DIBR) algorithms used for 3D video applications introduce new types of artifacts mostly located around the disoccluded regions. As the DIBR algorithms involve geometric transfo...
    Print ISSN: 1687-5281
    Electronic ISSN: 1687-5176
    Topics: Electrical Engineering, Measurement and Control Technology
    Published by Springer
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 18
    Publication Date: 2013-09-17
    Description: Excessive depth perception in 3D video is one of the major factors that causes discomfort to the viewer and that can decrease the viewer's quality perception of 3D video. With the idea of real-time quality control of 3D videos, we proposed an edge-based sparse disparity estimation algorithm with a novel similarity metric. The comparative assessment with other four state-of-the-art similarity metrics, implemented within the proposed edge-based disparity estimator, showed higher performance for the novel metric. User tests are conducted to assess the relation between certain disparity statistics and user perception of 3D scene quality that is a retrospective subjective experience of quality. Subjective tests indicate that the viewer discomfort can be predicted best by using maximum and slew rate of 95 percentile scene disparities together.
    Print ISSN: 1687-5281
    Electronic ISSN: 1687-5176
    Topics: Electrical Engineering, Measurement and Control Technology
    Published by Springer
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 19
    Publication Date: 2013-09-18
    Description: Query-by-Example Spoken Term Detection (QbE STD) aims at retrieving data from a speech data repository given an acoustic query containing the term of interest as input. Nowadays, it has been receiving much interest due to the high volume of information stored in audio or audiovisual format. QbE STD differs from Automatic speech recognition (ASR) and keyword spotting (KWS)/spoken term detection (STD) since ASR is interested in all the terms/words that appear in the speech signal and KWS/STD relies on a textual transcription of the search term to retrieve the speech data. This paper presents the systems submitted to the ALBAYZIN 2012 QbE STD evaluation held as a part of ALBAYZIN 2012 evaluation campaign within the context of the IberSPEECH 2012 Conferencea. The evaluation consists of retrieving the speech files that contain the input queries, indicating their start and end timestamps within the appropriate speech file. Evaluation is conducted on a Spanish spontaneous speech database containing a set of talks from MAVIR workshopsb, which amount at about 7 h of speech in total. We present the database metric systems submitted along with all results and some discussion. Four different research groups took part in the evaluation. Evaluation results show the difficulty of this task and the limited performance indicates there is still a lot of room for improvement. The best result is achieved by a dynamic time warping-based search over Gaussian posteriorgrams/posterior phoneme probabilities. This paper also compares the systems aiming at establishing the best technique dealing with that difficult task and looking for defining promising directions for this relatively novel task.
    Print ISSN: 1687-4714
    Topics: Electrical Engineering, Measurement and Control Technology
    Published by Springer
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 20
    Publication Date: 2013-09-24
    Description: Objective metrics for visual quality assessment often base their reliability on the explicit modeling of the highly non-linear behavior of human perception; as a result, they may be complex and computationally expensive. Conversely, machine learning (ML) paradigms allow to tackle the quality assessment task from a different perspective, as the eventual goal is to mimic quality perception instead of designing an explicit model the human visual system. Several studies already proved the ability of ML-based approaches to address visual quality assessment; nevertheless, these paradigms are highly prone to overfitting, and their overall reliability may be questionable. In fact, a prerequisite for successfully using ML in modeling perceptual mechanisms is a profound understanding of the advantages and limitations that characterize learning machines. This paper illustrates and exemplifies the good practices to be followed.
    Print ISSN: 1687-5281
    Electronic ISSN: 1687-5176
    Topics: Electrical Engineering, Measurement and Control Technology
    Published by Springer
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 21
    Publication Date: 2013-06-07
    Description: This paper has two main contributions. The first is a Bayesian framework for removing two common types of degradations on video known as blotches and line scratches. Most removal techniques assume complete obliteration of the original data at the corrupted sites. This often leads to the introduction of restoration artifacts during removal. Our framework is based on modeling corruption as a semi-transparent layer. This model was introduced earlier by Ahmed et al. (ICIP 2009) for the problem of blotch removal. We show much more blotch removal results than the previous work, and we extend the semi-transparent corruption model to the problem of line removal. The second contribution of this paper is an automated technique for ground-truth generation from infrared scans of corruptions. Previous ground-truth generation efforts require manually inpainting the corrupted regions. The restoration results are evaluated by comparing the reconstructed data against the ground-truth estimates. Comparisons with current blotch and line removal techniques show that the proposed corruption removal framework produces better removal and generates less restoration artifacts.
    Print ISSN: 1687-5281
    Electronic ISSN: 1687-5176
    Topics: Electrical Engineering, Measurement and Control Technology
    Published by Springer
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 22
    Publication Date: 2013-06-07
    Description: A novel speech bandwidth extension method based on audio watermark is presented in this paper. The time-domain and frequency-domain envelope parameters are extracted from the high-frequency components of speech signal, and then these parameters are embedded in the corresponding narrowband speech bit stream by the modified least significant bit watermark method which uses perception property. At the decoder, the wideband speech is reproduced with the reconstruction of high-frequency components based on the parameters extracted from bit stream of the narrowband speech. The proposed method can decrease poor auditory effect caused by large local distortion. The simulation results show that the synthesized wideband speech has low spectral distortion and its speech perception quality is greatly improved.
    Print ISSN: 1687-4714
    Topics: Electrical Engineering, Measurement and Control Technology
    Published by Springer
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 23
    Publication Date: 2013-04-03
    Description: In this paper, we suggest a general model for the fixed-valued impulse noise and propose a two-stage method for high density noise suppression while preserving the image details. In the first stage, we apply an iterative impulse detector, exploiting the image entropy, to identify the corrupted pixels and then employ an Adaptive Iterative Mean filter to restore them. The filter is adaptive in terms of the number of iterations, which is different for each noisy pixel, according to the Euclidean distance from the nearest uncorrupted pixel. Experimental results show that the proposed filter is fast and outperforms the best existing techniques in both objective and subjective performance measures.
    Print ISSN: 1687-5281
    Electronic ISSN: 1687-5176
    Topics: Electrical Engineering, Measurement and Control Technology
    Published by Springer
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 24
    Publication Date: 2015-05-11
    Description: In this paper, we propose a new distributed video coding (DVC) method, with hierarchical group of picture (GOP) structure. Coding gain of DVC can be significantly improved by enlarging GOP size for slow-moving frames. The proposed DVC decoder estimates a side information (SI) frame and transmits motion vectors (MVs) of the SI to the proposed encoder. Using the received MVs from the decoder, the proposed encoder can generate a predicted SI (PSI), which is the same as the SI in the decoder, and estimate the quality of PSI with minimal computational complexity. The proposed method decides the best coding mode among key, Wyner-Ziv (WZ), and skip modes, by estimating rate-distortion costs. Based on the selected best coding mode, the best GOP size can be automatically determined. As the GOP size is adaptively decided depending on the SI quality, entropy and parity bits can be effectively consumed. Experimental results show that the proposed algorithm is around 0.80 dB better in Bjøntegaard delta (BD) bitrate than an existing conventional DVC system.
    Print ISSN: 1687-5281
    Electronic ISSN: 1687-5176
    Topics: Electrical Engineering, Measurement and Control Technology
    Published by Springer
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 25
    Publication Date: 2015-05-13
    Description: Deep neural network (DNN)-based approaches have been shown to be effective in many automatic speech recognition systems. However, few works have focused on DNNs for distant-talking speaker recognition. In this study, a bottleneck feature derived from a DNN and a cepstral domain denoising autoencoder (DAE)-based dereverberation are presented for distant-talking speaker identification, and a combination of these two approaches is proposed. For the DNN-based bottleneck feature, we noted that DNNs can transform the reverberant speech feature to a new feature space with greater discriminative classification ability for distant-talking speaker recognition. Conversely, cepstral domain DAE-based dereverberation tries to suppress the reverberation by mapping the cepstrum of reverberant speech to that of clean speech with the expectation of improving the performance of distant-talking speaker recognition. Since the DNN-based discriminant bottleneck feature and DAE-based dereverberation have a strong complementary nature, the combination of these two methods is expected to be very effective for distant-talking speaker identification. A speaker identification experiment was performed on a distant-talking speech set, with reverberant environments differing from the training environments. In suppressing late reverberation, our method outperformed some state-of-the-art dereverberation approaches such as the multichannel least mean squares (MCLMS). Compared with the MCLMS, we obtained a reduction in relative error rates of 21.4% for the bottleneck feature and 47.0% for the autoencoder feature. Moreover, the combination of likelihoods of the DNN-based bottleneck feature and DAE-based dereverberation further improved the performance.
    Print ISSN: 1687-4714
    Topics: Electrical Engineering, Measurement and Control Technology
    Published by Springer
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 26
    Publication Date: 2015-05-08
    Description: Estimating the directions of arrival (DOAs) of multiple simultaneous mobile sound sources is an important step for various audio signal processing applications. In this contribution, we present an approach that improves upon our previous work that is now able to estimate the DOAs of multiple mobile speech sources, while being light in resources, both hardware-wise (only using three microphones) and software-wise. This approach takes advantage of the fact that simultaneous speech sources do not completely overlap each other. To evaluate the performance of this approach, a multi-DOA estimation evaluation system was developed based on a corpus collected from different acoustic scenarios named Acoustic Interactions for Robot Audition (AIRA).
    Print ISSN: 1687-4714
    Topics: Electrical Engineering, Measurement and Control Technology
    Published by Springer
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 27
    Publication Date: 2015-04-27
    Description: Many emerging applications in the field of assisted and autonomous driving rely on accurate position information. Satellite-based positioning is not always sufficiently reliable and accurate for these tasks. Visual odometry can provide a solution to some of these shortcomings. Current systems mainly focus on the use of stereo cameras, which are impractical for large-scale application in consumer vehicles due to their reliance on accurate calibration. Existing monocular solutions on the other hand have significantly lower accuracy. In this paper, we present a novel monocular visual odometry method based on the robust tracking of features in the ground plane. The key concepts behind the method are the modeling of the uncertainty associated with the inverse perspective projection of image features and a parameter space voting scheme to find a consensus on the vehicle state among tracked features. Our approach differs from traditional visual odometry methods by applying 2D scene and motion constraints at the lowest level instead of solving for the 3D pose change. Evaluation both on the public KITTI benchmark and our own dataset show that this is a viable approach for visual odometry which outperforms basic 3D pose estimation due to the exploitation of the largely planar structure of road environments.
    Print ISSN: 1687-5281
    Electronic ISSN: 1687-5176
    Topics: Electrical Engineering, Measurement and Control Technology
    Published by Springer
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 28
    Publication Date: 2015-04-18
    Description: Adequate models of the bread crumb structure can be critical for understanding flow and transport processes in bread manufacturing, creating synthetic bread crumb images for photo-realistic rendering, evaluating similarities, and establishing quality features of different bread crumb types. In this article, multifractal analysis, employing the multifractal spectrum (MFS), has been applied to study the structure of the bread crumb in four varieties of bread (baguette, sliced, bran, and sandwich). The computed spectrum can be used to discriminate among bread crumbs from different types. Also, high correlations were found between some of these parameters and the porosity, coarseness, and heterogeneity of the samples. These results demonstrate that the MFS is an appropriate tool for characterising the internal structure of the bread crumb, and thus, it may be used to establish important quality properties it should have. The MFS has shown to provide local and global image features that are both robust and low-dimensional, leading to feature vectors that capture essential information for classification tasks. Results show that the MFS-based classification is able to distinguish different bread crumbs with very high accuracy. Multifractal modelling of the underlying structure can be an appropriate method for parameterising and simulating the appearance of different bread crumbs.
    Print ISSN: 1687-5281
    Electronic ISSN: 1687-5176
    Topics: Electrical Engineering, Measurement and Control Technology
    Published by Springer
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 29
    Publication Date: 2016-04-13
    Description: Automatic speech recognition is becoming more ubiquitous as recognition performance improves, capable devices increase in number, and areas of new application open up. Neural network acoustic models that can u...
    Print ISSN: 1687-4714
    Topics: Electrical Engineering, Measurement and Control Technology
    Published by Springer
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 30
    Publication Date: 2015-12-31
    Description: Using a proper distribution function for speech signal or for its representations is of crucial importance in statistical-based speech processing algorithms. Although the most commonly used probability density function (pdf) for speech signals is Gaussian, recent studies have shown the superiority of super-Gaussian pdfs. A large research effort has focused on the investigation of a univariate case of speech signal distribution; however, in this paper, we study the multivariate distributions of speech signal and its representations using the conventional distribution functions, e.g., multivariate Gaussian and multivariate Laplace, and the copula-based multivariate distributions as candidates. The copula-based technique is a powerful method in modeling non-Gaussian multivariate distributions with non-linear inter-dimensional dependency. The level of similarity between the candidate pdfs and the real speech pdf in different domains is evaluated using the energy goodness-of-fit test.In our evaluations, the best-fitted distributions for speech signal vectors with different lengths in various domains are determined. A similar experiment is performed for different classes of English phonemes (fricatives, nasals, stops, vowels, and semivowel/glides). The evaluation results demonstrate that the multivariate distribution of speech signals in different domains is mostly super-Gaussian, except for Mel-frequency cepstral coefficient. Also, the results confirm that the distribution of the different phoneme classes is better statistically modeled by a mixture of Gaussian and Laplace pdfs. The copula-based distributions provide better statistical modeling of vectors representing discrete Fourier transform (DFT) amplitude of speech vectors with a length shorter than 500 ms.
    Print ISSN: 1687-4714
    Topics: Electrical Engineering, Measurement and Control Technology
    Published by Springer
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 31
    Publication Date: 2016-01-11
    Description: A great interest is focused on driver assistance systems using the head pose as an indicator of the visual focus of attention and the mental state. In fact, the head pose estimation is a technique allowing to deduce head orientation relatively to a view of camera and could be performed by model-based or appearance-based approaches. Model-based approaches use a face geometrical model usually obtained from facial features, whereas appearance-based techniques use the whole face image characterized by a descriptor and generally consider the pose estimation as a classification problem. Appearance-based methods are faster and more adapted to discrete pose estimation. However, their performance depends strongly on the head descriptor, which should be well chosen in order to reduce the information about identity and lighting contained in the face appearance. In this paper, we propose an appearance-based discrete head pose estimation aiming to determine the driver attention level from monocular visible spectrum images, even if the facial features are not visible. Explicitly, we first propose a novel descriptor resulting from the fusion of four most relevant orientation-based head descriptors, namely the steerable filters, the histogram of oriented gradients (HOG), the Haar features, and an adapted version of speeded up robust feature (SURF) descriptor. Second, in order to derive a compact, relevant, and consistent subset of descriptor’s features, a comparative study is conducted on some well-known feature selection algorithms. Finally, the obtained subset is subject to the classification process, performed by the support vector machine (SVM), to learn head pose variations. As we show in experiments with the public database (Pointing’04) as well as with our real-world sequence, our approach describes the head with a high accuracy and provides robust estimation of the head pose, compared to state-of-the-art methods.
    Print ISSN: 1687-5281
    Electronic ISSN: 1687-5176
    Topics: Electrical Engineering, Measurement and Control Technology
    Published by Springer
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 32
    Publication Date: 2016-03-06
    Description: Time-frequency (T-F) masking is an effective method for stereo speech source separation. However, reliable estimation of the T-F mask from sound mixtures is a challenging task, especially when room reverberati...
    Print ISSN: 1687-4714
    Topics: Electrical Engineering, Measurement and Control Technology
    Published by Springer
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 33
    Publication Date: 2016-03-09
    Description: Computational stereo is in the fields of computer vision and photogrammetry. In the computational stereo and surface reconstruction paradigms, it is very important to achieve appropriate epipolar constraints d...
    Print ISSN: 1687-5281
    Electronic ISSN: 1687-5176
    Topics: Electrical Engineering, Measurement and Control Technology
    Published by Springer
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 34
    Publication Date: 2016-03-09
    Description: The delivery of video over wireless, error-prone transmission channels requires careful allocation of channel and source code rates, given the available bandwidth. In this paper, we present a theoretical frame...
    Print ISSN: 1687-5281
    Electronic ISSN: 1687-5176
    Topics: Electrical Engineering, Measurement and Control Technology
    Published by Springer
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 35
    Publication Date: 2016-01-06
    Description: A challenging area of pattern recognition is the recognition of handwritten texts in different languages and the reduction of a volume of data to the greatest extent while preserving associations (or dependencies) between objects of the original data. Until now, only a few studies have been carried out in the area of dimensionality reduction for handedness detection from off-line handwriting textual data. Nevertheless, further investigating new techniques to reduce the large amount of processed data in this field is worthwhile. In this paper, we demonstrate that it is important to select only the most characterizing features from handwritings and reject all those that do not contribute effectively to the process of handwriting recognition. To achieve this goal, the proposed approach is based mainly on fuzzy conceptual reduction by applying the Lukasiewicz implication. Handwritten texts in both Arabic and English languages are considered in this study. To evaluate the effectiveness of our proposal approach, classification is carried out using a K-Nearest-Neighbors (K-NN) classifier using a database of 121 writers. We consider left/right handedness as parameters for the evaluation where we determine the recall/precision and F-measure of each writer. Then, we apply dimensionality reduction based on fuzzy conceptual reduction by using the Lukasiewicz implication. Our novel feature reduction method achieves a maximum reduction rate of 83.43 %, thus making the testing phase much faster. The proposed fuzzy conceptual reduction algorithm is able to reduce the feature vector dimension by 31.3 % compared to the original “best of all combined features” algorithm.
    Print ISSN: 1687-5281
    Electronic ISSN: 1687-5176
    Topics: Electrical Engineering, Measurement and Control Technology
    Published by Springer
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 36
    Publication Date: 2016-03-25
    Description: In this paper, we address 3D reconstruction of surfaces deforming isometrically. Given that an isometric surface is represented by means of a triangular mesh and that feature/point correspondences on an image ...
    Print ISSN: 1687-5281
    Electronic ISSN: 1687-5176
    Topics: Electrical Engineering, Measurement and Control Technology
    Published by Springer
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 37
    Publication Date: 2016-04-02
    Description: In this paper, rotation invariance and the influence of rotation interpolation methods on texture recognition using several local binary patterns (LBP) variants are investigated.
    Print ISSN: 1687-5281
    Electronic ISSN: 1687-5176
    Topics: Electrical Engineering, Measurement and Control Technology
    Published by Springer
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 38
    Publication Date: 2019
    Description: We propose a new method for music detection from broadcasting contents using the convolutional neural networks with a Mel-scale kernel. In this detection task, music segments should be annotated from the broad...
    Print ISSN: 1687-4714
    Topics: Electrical Engineering, Measurement and Control Technology
    Published by Springer
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 39
    Publication Date: 2019
    Description: Container yards have been facing the increase of freight volume. In order to improve the efficiency of container handling, automatic stations have been established in many terminals. However, current container...
    Print ISSN: 1687-5281
    Electronic ISSN: 1687-5176
    Topics: Electrical Engineering, Measurement and Control Technology
    Published by Springer
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 40
    Publication Date: 2019
    Description: In this paper, we present a novel image steganography method which is based on color palette transformation in color space. Most of the existing image steganography methods modify separate image pixels, and ra...
    Print ISSN: 1687-5281
    Electronic ISSN: 1687-5176
    Topics: Electrical Engineering, Measurement and Control Technology
    Published by Springer
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 41
    Publication Date: 2019
    Description: It is a great challenge to maintain details while suppressing and eliminating noise of the image. Considering the nonconvexity property of the diffusion function and the hypersensitivity of the Laplace operato...
    Print ISSN: 1687-5281
    Electronic ISSN: 1687-5176
    Topics: Electrical Engineering, Measurement and Control Technology
    Published by Springer
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 42
    Publication Date: 2019
    Description: Cirrhosis is a liver disease that is considered to be among the most common diseases in healthcare. Due to its non-invasive nature, ultrasound (US) imaging is a widely accepted technology for the diagnosis of ...
    Print ISSN: 1687-5281
    Electronic ISSN: 1687-5176
    Topics: Electrical Engineering, Measurement and Control Technology
    Published by Springer
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 43
    Publication Date: 2019
    Description: Studying animal locomotion improves our understanding of motor control and aids in the treatment of motor impairment. Mice are a premier model of human disease and are the model system of choice for much of ba...
    Print ISSN: 1687-5281
    Electronic ISSN: 1687-5176
    Topics: Electrical Engineering, Measurement and Control Technology
    Published by Springer
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 44
    Publication Date: 2019
    Description: A method called joint connectionist temporal classification (CTC)-attention-based speech recognition has recently received increasing focus and has achieved impressive performance. A hybrid end-to-end architec...
    Print ISSN: 1687-4714
    Topics: Electrical Engineering, Measurement and Control Technology
    Published by Springer
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 45
    Publication Date: 2019
    Description: Search on speech (SoS) is a challenging area due to the huge amount of information stored in audio and video repositories. Spoken term detection (STD) is an SoS-related task aiming to retrieve data from a spee...
    Print ISSN: 1687-4714
    Topics: Electrical Engineering, Measurement and Control Technology
    Published by Springer
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 46
    Publication Date: 2019
    Description: Speech emotion recognition methods combining articulatory information with acoustic features have been previously shown to improve recognition performance. Collection of articulatory data on a large scale may ...
    Print ISSN: 1687-4714
    Topics: Electrical Engineering, Measurement and Control Technology
    Published by Springer
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 47
    Publication Date: 2015-08-06
    Description: The purpose of this paper is to improve the robustness of traditional image watermarking based on singular value decomposition (SVD) by using optimization-based quantization on multiple singular values in the wavelet domain. In this work, we divide the middle-frequency parts of discrete-time wavelet transform (DWT) into several square blocks and then use multiple singular value quantizations to embed a watermark bit. To minimize the difference between original and watermarked singular values, an optimized-quality formula is proposed. First, the peak signal-to-noise ratio (PSNR) is defined as a performance index in a matrix form. Then, an optimized-quality functional that relates the performance index to the quantization technique is obtained. Finally, the Lagrange Principle is utilized to obtain the optimized-quality formula and then the formula is applied to watermarking. Experimental results show that the watermarked image can keep a high PSNR and achieve better bit-error rate (BER) even when the number of coefficients for embedding a watermark bit increases.
    Print ISSN: 1687-5281
    Electronic ISSN: 1687-5176
    Topics: Electrical Engineering, Measurement and Control Technology
    Published by Springer
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 48
    Publication Date: 2015-08-07
    Description: This paper presents an image denoising algorithm, which applies bilateral filtering (BLF) in the Laplacian subbands. It is noted that the subband images have wider area of photometric similarity than the original, and hence, they can be more benefited by the BLF than the original. Specifically, an image is Gaussian filtered to obtain a low band image, and the low band image is subtracted from the original to have the high band signal, which forms the Laplacian subbands. For the high band image denoising, we derive an adaptive kernel that is dependent on the edge intensity and photometric similarity of subband images. The high band image is convolved with this kernel and then added to the denoised low band signal, which produces the denoised image. We also propose to process the denoised high band signal by the gradient histogram preservation method, for sharpening the edges with less noise amplification. Experimental results show that the proposed denoising method provides higher PSNR than the original BLF and other multi-resolution denoising algorithms. Since the high band image is also effectively denoised in this process, the sharpened image by high band modification is also visually more pleasing when compared with the results of the conventional sharpening methods.
    Print ISSN: 1687-5281
    Electronic ISSN: 1687-5176
    Topics: Electrical Engineering, Measurement and Control Technology
    Published by Springer
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 49
    Publication Date: 2015-10-21
    Description: No description available
    Print ISSN: 1687-4714
    Topics: Electrical Engineering, Measurement and Control Technology
    Published by Springer
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 50
    Publication Date: 2015-10-22
    Description: In this paper, a semi-fragile and blind digital speech watermarking technique for online speaker recognition systems based on the discrete wavelet packet transform (DWPT) and quantization index modulation (QIM) has been proposed that enables embedding of the watermark within an angle of the wavelet’s sub-bands. To minimize the degradation effects of the watermark, these sub-bands were selected from frequency ranges where little speaker-specific information was available (500–3500 Hz and 6000–7000 Hz). Experimental results on the TIMIT, MIT, and MOBIO speech databases show that the degradation results for speaker verification and identification are 0.39 and 0.97 %, respectively, which are negligible. In addition, the proposed watermark technique can provide the appropriate fragility required for different signal processing operations.
    Print ISSN: 1687-4714
    Topics: Electrical Engineering, Measurement and Control Technology
    Published by Springer
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 51
    Publication Date: 2015-10-22
    Description: The presence of physical task stress induces changes in the speech production system which in turn produces changes in speaking behavior. This results in measurable acoustic correlates including changes to formant center frequencies, breath pause placement, and fundamental frequency. Many of these changes are due to the subject’s internal competition between speaking and breathing during the performance of the physical task, which has a corresponding impact on muscle control and airflow within the glottal excitation structure as well as vocal tract articulatory structure. This study considers the effect of physical task stress on voice quality. Three signal processing-based values which include (i) the normalized amplitude quotient (NAQ), (ii) the harmonic richness factor (HRF), and (iii) the fundamental frequency are used to measure voice quality. The effects of physical stress on voice quality depend on the speaker as well as the specific task. While some speakers do not exhibit changes in voice quality, a subset exhibits changes in NAQ and HRF measures of similar magnitude to those observed in studies of soft, loud, and pressed speech. For those speakers demonstrating voice quality changes, the observed changes tend toward breathy or soft voicing as observed in other studies. The effect of physical stress on the fundamental frequency is correlated with the effect of physical stress on the HRF (r = −0.34) and the NAQ (r = −0.53). Also, the inter-speaker variation in baseline NAQ is significantly higher than the variation in NAQ induced by physical task stress. The results illustrate systematic changes in speech production under physical task stress, which in theory will impact subsequent speech technology such as speech recognition, speaker recognition, and voice diarization systems.
    Print ISSN: 1687-4714
    Topics: Electrical Engineering, Measurement and Control Technology
    Published by Springer
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 52
    Publication Date: 2015-07-17
    Description: Acoustic data transmission (ADT) forms a branch of the audio data hiding techniques with its capability of communicating data in short-range aerial space between a loudspeaker and a microphone. In this paper, we propose an acoustic data transmission system extending our previous studies and give an in-depth analysis of its performance. The proposed technique utilizes the phases of modulated complex lapped transform (MCLT) coefficients of the audio signal. To achieve a good trade-off between the audio quality and the data transmission performance, the enhanced segmental SNR adjustment (SSA) algorithm is proposed. Moreover, we also propose a scheme to use multiple microphones for ADT technique. This multi-microphone ADT technique further enhances the transmission performance while ensuring compatibility with the single microphone system. From a series of experimental results, it has been found that the transmission performance improves when the length of the MCLT frame gets longer at the cost of the audio quality degradation. In addition, a good trade-off between the audio quality and data transmission performance is achieved by means of SSA algorithm. The experimental results also reveal that the proposed multi-microphone method is useful in enhancing the transmission performance.
    Print ISSN: 1687-4714
    Topics: Electrical Engineering, Measurement and Control Technology
    Published by Springer
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 53
    Publication Date: 2015-07-17
    Description: While face analysis from images is a well-studied area, little work has explored the dependence of facial appearance on the geographic location from which the image was captured. To fill this gap, we constructed GeoFaces, a large dataset of geotagged face images, and used it to examine the geo-dependence of facial features and attributes, such as ethnicity, gender, or the presence of facial hair. Our analysis illuminates the relationship between raw facial appearance, facial attributes, and geographic location, both globally and in selected major urban areas. Some of our experiments, and the resulting visualizations, confirm prior expectations, such as the predominance of ethnically Asian faces in Asia, while others highlight novel information that can be obtained with this type of analysis, such as the major city with the highest percentage of people with a mustache.
    Print ISSN: 1687-5281
    Electronic ISSN: 1687-5176
    Topics: Electrical Engineering, Measurement and Control Technology
    Published by Springer
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 54
    Publication Date: 2015-06-13
    Description: In this paper, we propose a new variational model for image restoration by incorporating a nonlocal TV regularizer and a nonlocal Laplacian regularizer on the image. The two regularizing terms make use of nonlocal comparisons between pairs of patches in the image. The new model can be seen as a nonlocal version of the CEP- L 2 model. Subsequently, an algorithm combining the alternating directional minimization and the split Bregman iteration is presented to solve the new model. Numerical results verified that the proposed method has better performance for image restoration than CEP- L 2 model, especially for low noised images.
    Print ISSN: 1687-5281
    Electronic ISSN: 1687-5176
    Topics: Electrical Engineering, Measurement and Control Technology
    Published by Springer
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 55
    Publication Date: 2015-05-17
    Description: The occurrence of antinuclear antibodies (ANAs) in patient serum has significant relation to some specific autoimmune diseases. Indirect immunofluorescence (IIF) on human epithelial type 2 (HEp-2) cells is the recommended methodology for detecting ANAs in clinic practice. However, the currently practiced manual detection system suffers from serious problems due to subjective evaluation. In this paper, we present an automated system for HEp-2 cells classification. We adopt a bag-of-words (BoW) framework which has shown impressive performance in image classification tasks because it can obtain discriminative and effective image representation. However, the information loss is inevitable in the coding process. Therefore, we propose a linear local distance coding (LLDC) method to capture more discriminative information. Our LLDC method transforms original local feature to more discriminative local distance vector by searching for local nearest few neighbors of the local feature in the class-specific manifolds. The obtained local distance vector is further encoded and pooled together to get salient image representation. The LLDC method is combined with the traditional coding methods to achieve higher classification accuracy. Incorporated with a linear support vector machine classifier, our proposed method demonstrated its effectiveness on two public datasets, namely, the International Conference on Pattern Recognition (ICPR) 2012 dataset and the International Conference on Image Processing (ICIP) 2013 training dataset. Experimental results show that the LLDC framework can achieve superior performance to the state-of-the-art coding methods for staining pattern classification of HEp-2 cells.
    Print ISSN: 1687-5281
    Electronic ISSN: 1687-5176
    Topics: Electrical Engineering, Measurement and Control Technology
    Published by Springer
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 56
    Publication Date: 2016-08-10
    Description: Substantial amounts of resources are usually required to robustly develop a language model for an open vocabulary speech recognition system as out-of-vocabulary (OOV) words can hurt recognition accuracy. In th...
    Print ISSN: 1687-4714
    Topics: Electrical Engineering, Measurement and Control Technology
    Published by Springer
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 57
    Publication Date: 2016-08-21
    Description: Gait is known to be an effective behavioral biometric trait for the identification of individuals. However, clothing has a dramatic influence on the recognition rate. Researchers have attempted to deal with th...
    Print ISSN: 1687-5281
    Electronic ISSN: 1687-5176
    Topics: Electrical Engineering, Measurement and Control Technology
    Published by Springer
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 58
    Publication Date: 2015-10-13
    Description: This paper presents a simple camera calibration method for estimating human height in video surveillance. Given that most cameras for video surveillance are installed in high positions at a slightly tilted angle, it is possible to retain only three calibration parameters in the original camera model, namely the focal length, the tilting angle and the camera height. These parameters can be directly estimated using a nonlinear regression model from the observed head and foot points of a walking human instead of estimating the vanishing line and point in the image, which is extremely sensitive to noise in practice. With only three unknown parameters, the nonlinear regression model can fit data efficiently. The experimental results show that the proposed method can predict the human height with a mean absolute error of only about 1.39 cm from ground truth data.
    Print ISSN: 1687-5281
    Electronic ISSN: 1687-5176
    Topics: Electrical Engineering, Measurement and Control Technology
    Published by Springer
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 59
    Publication Date: 2015-12-16
    Description: This paper presents an evaluation of high-dynamic-range (HDR) video tone mapping on a small screen device (SSD) under reflections. Reflections are common on mobile devices as these devices are predominantly used on the go. With this evaluation, we study the impact of reflections on the screen and how different HDR video tone mapping operators (TMOs) perform under reflective conditions as well as understand if there is a need to develop a new or hybrid TMO that can deal with reflections better. Two well-known HDR video TMOs were evaluated in order to test their performance with and without on-screen reflections. Ninety participants were asked to rank the TMOs for a number of tone-mapped HDR video sequences on an SSD against a reference HDR display. The results show that the greater the area exposed to reflections, the larger the negative impact on a TMO’s perceptual accuracy. The results also show that under observed conditions, when reflections are present, the hybrid TMOs do not perform better than the standard TMOs.
    Print ISSN: 1687-5281
    Electronic ISSN: 1687-5176
    Topics: Electrical Engineering, Measurement and Control Technology
    Published by Springer
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 60
    Publication Date: 2015-12-25
    Description: Handwriting has remained one of the most frequently occurring patterns that we come across in everyday life. Handwriting offers a number of interesting pattern classification problems including handwriting recognition, writer identification, signature verification, writer demographics classification and script recognition, etc. Research in these and similar related problems requires the availability of handwritten samples for validation of the developed techniques and algorithms. Like any other scientific domain, the handwriting recognition community has developed a large number of standard databases allowing development, evaluation and comparison of different techniques developed for a variety of recognition tasks. This paper is intended to provide a comprehensive survey of the handwriting databases developed during the last two decades. In addition to the statistics of the discussed databases, we also present a comparison of these databases on a number of dimensions. The ground truth information of the databases along with the supported tasks is also discussed. It is expected that this paper would not only allow researchers in handwriting recognition to objectively compare different databases but will also provide them the opportunity to select the most appropriate database(s) for evaluation of their developed systems.
    Print ISSN: 1687-5281
    Electronic ISSN: 1687-5176
    Topics: Electrical Engineering, Measurement and Control Technology
    Published by Springer
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 61
    Publication Date: 2016-06-05
    Description: Human action recognition applications are greatly benefited from the use of commodity depth sensors that are capable of skeleton tracking. Some of these applications (e.g., customizable gesture interfaces) req...
    Print ISSN: 1687-5281
    Electronic ISSN: 1687-5176
    Topics: Electrical Engineering, Measurement and Control Technology
    Published by Springer
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 62
    Publication Date: 2016-06-05
    Description: Due to the demand of 3D visualization and lack of 3D video content, a method converting the 2D to 3D video plays an important role. In this paper, a low-cost and high efficiency post processing method is prese...
    Print ISSN: 1687-5281
    Electronic ISSN: 1687-5176
    Topics: Electrical Engineering, Measurement and Control Technology
    Published by Springer
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 63
    Publication Date: 2016-09-02
    Description: The multi-view video plus depth format is the main representation of a three-dimensional (3D) scene. In the 3D extension of high-efficiency video coding (3D-HEVC), the main framework for depth video is similar...
    Print ISSN: 1687-5281
    Electronic ISSN: 1687-5176
    Topics: Electrical Engineering, Measurement and Control Technology
    Published by Springer
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 64
    Publication Date: 2016-09-02
    Description: Algorithms that predict the degree of visual discomfort experienced when viewing stereoscopic 3D (S3D) images usually first execute some form of disparity calculation. Following that, features are extracted on...
    Print ISSN: 1687-5281
    Electronic ISSN: 1687-5176
    Topics: Electrical Engineering, Measurement and Control Technology
    Published by Springer
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 65
    Publication Date: 2016-09-16
    Description: Virtual view synthesis technique renders a virtual view image from several pre-collected viewpoint images. The hotspot on virtual view synthesis area is depth image-based rendering (DIBR), which has low one-ti...
    Print ISSN: 1687-5281
    Electronic ISSN: 1687-5176
    Topics: Electrical Engineering, Measurement and Control Technology
    Published by Springer
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 66
    Publication Date: 2015-05-22
    Description: This paper presents an objective speech quality model, ViSQOL, the Virtual Speech Quality Objective Listener. It is a signal-based, full-reference, intrusive metric that models human speech quality perception using a spectro-temporal measure of similarity between a reference and a test speech signal. The metric has been particularly designed to be robust for quality issues associated with Voice over IP (VoIP) transmission. This paper describes the algorithm and compares the quality predictions with the ITU-T standard metrics PESQ and POLQA for common problems in VoIP: clock drift, associated time warping, and playout delays. The results indicate that ViSQOL and POLQA significantly outperform PESQ, with ViSQOL competing well with POLQA. An extensive benchmarking against PESQ, POLQA, and simpler distance metrics using three speech corpora (NOIZEUS and E4 and the ITU-T P.Sup. 23 database) is also presented. These experiments benchmark the performance for a wide range of quality impairments, including VoIP degradations, a variety of background noise types, speech enhancement methods, and SNR levels. The results and subsequent analysis show that both ViSQOL and POLQA have some performance weaknesses and under-predict perceived quality in certain VoIP conditions. Both have a wider application and robustness to conditions than PESQ or more trivial distance metrics. ViSQOL is shown to offer a useful alternative to POLQA in predicting speech quality in VoIP scenarios.
    Print ISSN: 1687-4714
    Topics: Electrical Engineering, Measurement and Control Technology
    Published by Springer
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 67
    Publication Date: 2015-06-24
    Description: Automatic segmentation of the epidermis area in skin histopathological images is an essential step for computer-aided diagnosis of various skin cancers. This paper presents a robust technique for epidermis segmentation in the whole slide skin histopathological images. The proposed technique first performs a coarse epidermis segmentation using global thresholding and shape analysis. The epidermis thickness is then measured by a series of line segments perpendicular to the main axis of the initially segmented epidermis mask. If the segmented epidermis mask has a thickness greater than a predefined threshold, the segmentation is assumed to be inaccurate. A second pass of fine segmentation using k-means algorithm is then carried out over these coarsely segmented result to enhance the performance. Experimental results on 64 different skin histopathological images show that the proposed technique provides a superior performance compared to the existing techniques.
    Print ISSN: 1687-5281
    Electronic ISSN: 1687-5176
    Topics: Electrical Engineering, Measurement and Control Technology
    Published by Springer
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 68
    Publication Date: 2015-06-27
    Description: In this paper, we consider the image super-resolution (SR) reconstitution problem. The main goal consists of obtaining a high-resolution (HR) image from a set of low-resolution (LR) ones. For that, we propose a novel approach based on a regularized criterion. The criterion is composed of the classical generalized total variation (TV) but adding a bilateral filter (BTV) regularizer. The main goal of our approach consists of the derivation and the use of an efficient combined deblurring and denoising stage that is applied on the high-resolution image. We demonstrate the existence of minimizers of the combined variational problem in the bounded variation space, and we propose a minimization algorithm. The numerical results obtained by our approach are compared with the classical robust super-resolution (RSR) algorithm and the SR with TV regularization. They confirm that the proposed combined approach allows to overcome efficiently the blurring effect while removing the noise.
    Print ISSN: 1687-5281
    Electronic ISSN: 1687-5176
    Topics: Electrical Engineering, Measurement and Control Technology
    Published by Springer
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 69
    Publication Date: 2015-06-28
    Description: Over recent years, i-vector-based framework has been proven to provide state-of-the-art performance in speaker verification. Each utterance is projected onto a total factor space and is represented by a low-dimensional feature vector. Channel compensation techniques are carried out in this low-dimensional feature space. Most of the compensation techniques take the sets of extracted i-vectors as input. By constructing between-class covariance and within-class covariance, we attempt to minimize the between-class variance mainly caused by channel effect and to maximize the variance between speakers. In the real-world application, enrollment and test data from each user (or speaker) are always scarce. Although it is widely thought that session variability is mostly caused by channel effects, phonetic variability, as a factor that causes session variability, is still a matter to be considered. We propose in this paper a new i-vector extraction algorithm from the total factor matrix which we term component reduction analysis (CRA). This new algorithm contributes to better modelling of session variability in the total factor space.We reported results on the male English trials of the core condition of the NIST 2008 Speaker Recognition Evaluation (SREs) dataset. As measured both by equal error rate and the minimum values of the NIST detection cost function, 10–15 % relative improvement is achieved compared to the baseline of traditional i-vector-based system.
    Print ISSN: 1687-4714
    Topics: Electrical Engineering, Measurement and Control Technology
    Published by Springer
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 70
    Publication Date: 2015-06-26
    Description: Singer identification is a difficult topic in music information retrieval because background instrumental music is included with singing voice which reduces performance of a system. One of the main disadvantages of the existing system is vocals and instrumental are separated manually and only vocals are used to build training model. The research presented in this paper automatically recognize a singer without separating instrumental and singing sounds using audio features like timbre coefficients, pitch class, mel frequency cepstral coefficients (MFCC), linear predictive coding (LPC) coefficients, and loudness of an audio signal from Indian video songs (IVS). Initially, various IVS of distinct playback singers (PS) are collected. After that, 53 audio features (12 dimensional timbre audio feature vectors, 12 pitch classes, 13 MFCC coefficients, 13 LPC coefficients, and 3 loudness feature vector of an audio signal) are extracted from each segment. Dimension of extracted audio features is reduced using principal component analysis (PCA) method. Playback singer model (PSM) is trained using multiclass classification algorithms like back propagation, AdaBoost.M2, k-nearest neighbor (KNN) algorithm, naïve Bayes classifier (NBC), and Gaussian mixture model (GMM). The proposed approach is tested on various combinations of dataset and different combinations of audio feature vectors with various Indian male and female PS’s songs.
    Print ISSN: 1687-4714
    Topics: Electrical Engineering, Measurement and Control Technology
    Published by Springer
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 71
    Publication Date: 2015-06-27
    Description: Manual transcription of audio databases for the development of automatic speech recognition (ASR) systems is a costly and time-consuming process. In the context of deriving acoustic models adapted to a specific application, or in low-resource scenarios, it is therefore essential to explore alternatives capable of improving speech recognition results. In this paper, we investigate the relevance of foreign data characteristics, in particular domain and language, when using this data as an auxiliary data source for training ASR acoustic models based on deep neural networks (DNNs). The acoustic models are evaluated on a challenging bilingual database within the scope of the MediaParl project. Experimental results suggest that in-language (but out-of-domain) data is more beneficial than in-domain (but out-of-language) data when employed in either supervised or semi-supervised training of DNNs. The best performing ASR system, an HMM/GMM acoustic model that exploits DNN as a discriminatively trained feature extractor outperforms the best performing HMM/DNN hybrid by about 5 % relative (in terms of WER). An accumulated relative gain with respect to the MFCC-HMM/GMM baseline is about 30 % WER.
    Print ISSN: 1687-4714
    Topics: Electrical Engineering, Measurement and Control Technology
    Published by Springer
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 72
    Publication Date: 2015-06-10
    Description: Optimal automatic speech recognition (ASR) takes place when the recognition system is tested under circumstances identical to those in which it was trained. However, in the actual real world, there exist many sources of mismatches between the environment of training and the environment of testing. These sources can be due to the sources of noise that exist in real environments. Speech enhancement techniques have been developed to provide ASR systems with the robustness against the sources of noise. In this work, a method based on histogram equalization (HEQ) was proposed to compensate for the nonlinear distortions in speech representation. This approach utilizes stereo simultaneous recordings for clean speech and its corresponding noisy speech to compute stereo Gaussian mixture model (GMM). The stereo GMM is used to compute the cumulative density function (CDF) for both clean speech and noisy speech using a sigmoid function instead of using the order statistics that is used in other HEQ-based methods. In the implementation, we show two choices to apply HEQ, hard decision HEQ and soft decision HEQ. The latter is based on minimum mean square error (MMSE) clean speech estimation. The experimental work shows that the soft HEQ and hard HEQ achieve better recognition results than the other HEQ approaches such as tabular HEQ, quantile HEQ and polynomial fit HEQ. It also shows that soft HEQ achieves notably better recognition results than hard HEQ. The results of the experimental work also show that using HEQ improves the efficiency of other speech enhancement techniques such as stereo piece-wise linear compensation for environment (SPLICE) and vector Taylor series (VTS). The results also show that using HEQ in multi style training (MST) significantly improves the ASR system performance.
    Print ISSN: 1687-4714
    Topics: Electrical Engineering, Measurement and Control Technology
    Published by Springer
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 73
    Publication Date: 2015-06-12
    Description: In this paper, we propose a cascade classifier for high-performance on-road vehicle detection. The proposed system deliberately selects constituent weak classifiers that are expected to show good performance in real detection environments. The weak classifiers selected at a cascade stage using AdaBoost are assessed for their effectiveness in vehicle detection. By applying the selected weak classifiers with their own confidence levels to another set of image samples, the system observes the resultant weights of those samples to assess the biasing of the selected weak classifiers. Once they are estimated as biased toward either positive or negative samples, the weak classifiers are discarded, and the selection process is restarted after adjusting the weights of the training samples. Experimental results show that a cascade classifier using weak classifiers selected by the proposed method has a higher detection performance.
    Print ISSN: 1687-5281
    Electronic ISSN: 1687-5176
    Topics: Electrical Engineering, Measurement and Control Technology
    Published by Springer
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 74
    Publication Date: 2015-06-12
    Description: In this paper, we extend the Richardson-Lucy (RL) method to block-iterative versions, separated BI-RL, and interlaced BI-RL, for image deblurring applications. We propose combining algorithms for separated BI-RL to form block artifact-free output images from separately deblurred block images. For interlaced BI-RL to accelerate the iteration, we propose an interlaced block-iteration algorithm on down-sampled blocks of the observed image. Simulation studies show that separated BI-RL and interlaced BI-RL achieve desired goals in Gaussian and diagonal deblurrings.
    Print ISSN: 1687-5281
    Electronic ISSN: 1687-5176
    Topics: Electrical Engineering, Measurement and Control Technology
    Published by Springer
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 75
    Publication Date: 2015-02-13
    Description: In this paper, a low-complexity algorithm is proposed to reduce the complexity of depth map compression in the high-efficiency video coding (HEVC)-based 3D video coding (3D-HEVC). Since the depth map and the corresponding texture video represent the same scene in a 3D video, there is a high correlation among the coding information from depth map and texture video. An experimental analysis is performed to study depth map and texture video correlation in the coding information such as the motion vector and prediction mode. Based on the correlation, we propose three efficient low-complexity approaches, including early termination mode decision, adaptive search range motion estimation (ME), and fast disparity estimation (DE). Experimental results show that the proposed algorithm can reduce about 66% computational complexity with negligible rate-distortion (RD) performance loss in comparison with the original 3D-HEVC encoder.
    Print ISSN: 1687-5281
    Electronic ISSN: 1687-5176
    Topics: Electrical Engineering, Measurement and Control Technology
    Published by Springer
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 76
    Publication Date: 2015-02-16
    Description: The quality of biometric raw data is one of the main factors affecting the overall performance of biometric systems. Poor biometric samples increase the enrollment failure and decrease the system performance. Hence, controlling the quality of the acquired biometric raw data is essential in order to have useful biometric authentication systems. Towards this goal, we present a generic methodology for the quality assessment of image-based biometric modality combining two types of information: 1) image quality and 2) pattern-based quality using the scale-invariant feature transformation (SIFT) descriptor. The associated metric has the advantages of being multimodal (face, fingerprint, and hand veins) and independent from the used authentication system. Six benchmark databases and one biometric verification system are used to illustrate the benefits of the proposed metric. A comparison study with the National Institute of Standards and Technology (NIST) fingerprint image quality (NFIQ) metric proposed by the NIST shows the benefits of the presented metric.
    Print ISSN: 1687-5281
    Electronic ISSN: 1687-5176
    Topics: Electrical Engineering, Measurement and Control Technology
    Published by Springer
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 77
    Publication Date: 2015-01-20
    Description: Vocal tremor has been simulated using a high-dimensional discrete vocal fold model. Specifically, respiratory, phonatory, and articulatory tremors have been modeled as instabilities in six parameters of the model. Reported results are consistent with previous knowledge in that respiratory tremor mainly causes amplitude modulation of the voice signal while laryngeal tremor causes both amplitude and frequency modulation. In turn, articulatory tremor is commonly assumed to produce only amplitude modulations but the simulation results indicate that it also produces a high-frequency modulation of the output signal. Furthermore, articulatory tremor affects the frequency response of the vocal tract and it might thus be detected by analyzing the spectral envelope of the acoustic signal.
    Print ISSN: 1687-4714
    Topics: Electrical Engineering, Measurement and Control Technology
    Published by Springer
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 78
    Publication Date: 2015-01-23
    Description: In this paper, an initial feature vector based on the combination of the wavelet packet decomposition (WPD) and the Mel frequency cepstral coefficients (MFCCs) is proposed. For optimizing the initial feature vector, a genetic algorithm (GA)-based approach is proposed and compared with the well-known principal component analysis (PCA) approach. The artificial neural network (ANN) with the different learning algorithms is used as the classifier. Some experiments are carried out for evaluating and comparing the classification accuracies which are obtained by the use of the different learning algorithms and the different feature vectors (the initial and the optimized ones). Finally, a hybrid of the ANN with the `trainscg? training algorithm and the genetic algorithm is proposed for the vocal fold pathology diagnosis. Also, the performance of the proposed method is compared with the recent works. The experiments' results show a better performance (the higher classification accuracy) of the proposed method in comparison with the others.
    Print ISSN: 1687-4714
    Topics: Electrical Engineering, Measurement and Control Technology
    Published by Springer
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 79
    Publication Date: 2015-02-12
    Description: The spatio-temporal-prediction (STP) method for multichannel speech enhancement has recently been proposed. This approach makes it theoretically possible to attenuate the residual noise without distorting speech. In addition, the STP method depends only on the second-order statistics and can be implemented using a simple linear filtering framework. Unfortunately, some numerical problems can arise when estimating the filter matrix in transients. In such a case, the speech correlation matrix is usually rank deficient, so that no solution exists. In this paper, we propose to implement the spatio-temporal-prediction method using a signal subspace approach. This allows for nullifying the noise subspace and processing only the noisy signal in the signal-plus-noise subspace. As a result, we are able to not only regularize the solution in transients but also to achieve higher attenuation of the residual noise. The experimental results also show that the signal subspace approach distorts speech less than the conventional method.
    Print ISSN: 1687-4714
    Topics: Electrical Engineering, Measurement and Control Technology
    Published by Springer
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 80
    Publication Date: 2015-02-12
    Description: Deep neural networks (DNNs) have gained remarkable success in speech recognition, partially attributed to the flexibility of DNN models in learning complex patterns of speech signals. This flexibility, however, may lead to serious over-fitting and hence miserable performance degradation in adverse acoustic conditions such as those with high ambient noises. We propose a noisy training approach to tackle this problem: by injecting moderate noises into the training data intentionally and randomly, more generalizable DNN models can be learned. This ‘noise injection’ technique, although known to the neural computation community already, has not been studied with DNNs which involve a highly complex objective function. The experiments presented in this paper confirm that the noisy training approach works well for the DNN model and can provide substantial performance improvement for DNN-based speech recognition.
    Print ISSN: 1687-4714
    Topics: Electrical Engineering, Measurement and Control Technology
    Published by Springer
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 81
    Publication Date: 2015-02-21
    Description: Spiking neural networks (SNN) have gained popularity in embedded applications such as robotics and computer vision. The main advantages of SNN are the temporal plasticity, ease of use in neural interface circuits and reduced computation complexity. SNN have been successfully used for image classification. They provide a model for the mammalian visual cortex, image segmentation and pattern recognition. Different spiking neuron mathematical models exist, but their computational complexity makes them ill-suited for hardware implementation. In this paper, a novel, simplified and computationally efficient model of spike response model (SRM) neuron with spike-time dependent plasticity (STDP) learning is presented. Frequency spike coding based on receptive fields is used for data representation; images are encoded by the network and processed in a similar manner as the primary layers in visual cortex. The network output can be used as a primary feature extractor for further refined recognition or as a simple object classifier. Results show that the model can successfully learn and classify black and white images with added noise or partially obscured samples with up to ×20 computing speed-up at an equivalent classification ratio when compared to classic SRM neuron membrane models. The proposed solution combines spike encoding, network topology, neuron membrane model and STDP learning.
    Print ISSN: 1687-5281
    Electronic ISSN: 1687-5176
    Topics: Electrical Engineering, Measurement and Control Technology
    Published by Springer
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 82
    Publication Date: 2015-02-13
    Description: Music identification via audio fingerprinting has been an active research field in recent years. In the real-world environment, music queries are often deformed by various interferences which typically include signal distortions and time-frequency misalignments caused by time stretching, pitch shifting, etc. Therefore, robustness plays a crucial role in music identification technique. In this paper, we propose to use scale invariant feature transform (SIFT) local descriptors computed from a spectrogram image as sub-fingerprints for music identification. Experiments show that these sub-fingerprints exhibit strong robustness against serious time stretching and pitch shifting simultaneously. In addition, a locality sensitive hashing (LSH)-based nearest sub-fingerprint retrieval method and a matching determination mechanism are applied for robust sub-fingerprint matching, which makes the identification efficient and precise. Finally, as an auxiliary function, we demonstrate that by comparing the time-frequency locations of corresponding SIFT keypoints, the factor of time stretching and pitch shifting that music queries might have experienced can be accurately estimated.
    Print ISSN: 1687-4714
    Topics: Electrical Engineering, Measurement and Control Technology
    Published by Springer
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 83
    Publication Date: 2015-01-31
    Description: Special issue on Animal and Insect BehaviourUnderstanding in Image Sequences
    Print ISSN: 1687-5281
    Electronic ISSN: 1687-5176
    Topics: Electrical Engineering, Measurement and Control Technology
    Published by Springer
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 84
    Publication Date: 2015-01-21
    Description: No description available
    Print ISSN: 1687-4714
    Topics: Electrical Engineering, Measurement and Control Technology
    Published by Springer
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 85
    Publication Date: 2015-01-30
    Description: Owing to the suprasegmental behavior of emotional speech, turn-level features have demonstrated a better success than frame-level features for recognition-related tasks. Conventionally, such features are obtained via a brute-force collection of statistics over frames, thereby losing important local information in the process which affects the performance. To overcome these limitations, a novel feature extraction approach using latent topic models (LTMs) is presented in this study. Speech is assumed to comprise of a mixture of emotion-specific topics, where the latter capture emotionally salient information from the co-occurrences of frame-level acoustic features and yield better descriptors. Specifically, a supervised replicated softmax model (sRSM), based on restricted Boltzmann machines and distributed representations, is proposed to learn naturally discriminative topics. The proposed features are evaluated for the recognition of categorical or continuous emotional attributes via within and cross-corpus experiments conducted over acted and spontaneous expressions. In a within-corpus scenario, sRSM outperforms competing LTMs, while obtaining a significant improvement of 16.75% over popular statistics-based turn-level features for valence-based classification, which is considered to be a difficult task using only speech. Further analyses with respect to the turn duration show that the improvement is even more significant, 35%, on longer turns (〉6 s), which is highly desirable for current turn-based practices. In a cross-corpus scenario, two novel adaptation-based approaches, instance selection, and weight regularization are proposed to reduce the inherent bias due to varying annotation procedures and cultural perceptions across databases. Experimental results indicate a natural, yet less severe, deterioration in performance - only 2.6% and 2.7%, thereby highlighting the generalization ability of the proposed features.
    Print ISSN: 1687-4714
    Topics: Electrical Engineering, Measurement and Control Technology
    Published by Springer
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 86
    Publication Date: 2015-07-15
    Description: Because the reliability of feature for every pixel determines the accuracy of classification, it is important to design a specialized feature mining algorithm for hyperspectral image classification. We propose a feature learning algorithm, contextual deep learning, which is extremely effective for hyperspectral image classification. On the one hand, the learning-based feature extraction algorithm can characterize information better than the pre-defined feature extraction algorithm. On the other hand, spatial contextual information is effective for hyperspectral image classification. Contextual deep learning explicitly learns spectral and spatial features via a deep learning architecture and promotes the feature extractor using a supervised fine-tune strategy. Extensive experiments show that the proposed contextual deep learning algorithm is an excellent feature learning algorithm and can achieve good performance with only a simple classifier.
    Print ISSN: 1687-5281
    Electronic ISSN: 1687-5176
    Topics: Electrical Engineering, Measurement and Control Technology
    Published by Springer
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 87
    Publication Date: 2015-07-17
    Description: Optical flow methods are accurate algorithms for estimating the displacement and velocity fields of objects in a wide variety of applications, being their performance dependent on the configuration of a set of parameters. Since there is a lack of research that aims to automatically tune such parameters, in this work, we have proposed an optimization-based framework for such task based on social-spider optimization, harmony search, particle swarm optimization, and Nelder-Mead algorithm. The proposed framework employed the well-known large displacement optical flow (LDOF) approach as a basis algorithm over the Middlebury and Sintel public datasets, with promising results considering the baseline proposed by the authors of LDOF.
    Print ISSN: 1687-5281
    Electronic ISSN: 1687-5176
    Topics: Electrical Engineering, Measurement and Control Technology
    Published by Springer
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 88
    Publication Date: 2015-07-17
    Description: The Farrow-structure-based steerable broadband beamformer (FSBB) is particularly useful in the applications where sound source of interest may move around a wide angular range. However, in contrast with conventional filter-and-sum beamformer, the passband steerability of FSBB is achieved at the cost of high complexity in structure, i.e., highly increased number of tap weights. Moreover, it has been shown that the FSBB is sensitive to microphone mismatches, and robust FSBB design is of interest to practical applications. To deal with the aforementioned problems, this paper studies the robust design of the FSBB with sparse tap weights via convex optimization by considering some a priori knowledge of microphone mismatches. It is shown that although the worst-case performance (WCP) optimization has been successfully applied to the design of robust filter-and-sum beamformers with bounded microphone mismatches, it may become unapplicable to robust FSBB design due to its over-conservativeness nature. When limited knowledge of mean and variance of microphone mismatches is available, a robust FSBB design approach based on the worst-case mean performance optimization with the passband response variance (PRV) constraint is devised. Unlike the WCP optimization design, this approach performs well with the capability of passband stability control of array response. Finally, the robust FSBB design with sparse tap weights has been studied. It is shown that there is redundancy in the tap weights of FSBB, i.e., robust FSBB design with sparse tap weights is viable, and thus leads to low-complexity FSBB.
    Print ISSN: 1687-4714
    Topics: Electrical Engineering, Measurement and Control Technology
    Published by Springer
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 89
    Publication Date: 2012-12-20
    Description: In this article, a new method for the recognition of obscene video contents is presented. In the proposed algorithm, different episodes of a video file starting by key frames are classified independently by using the proposed features. We present three novel sets of features for the classification of video episodes, including (1) features based on the information of single video frames, (2) features based on 3D spatiotemporal volume (STV), and (3) features based on motion and periodicity characteristics. Furthermore, we propose the connected components' relation tree to find the spatiotemporal relationship between the connected components in consecutive frames for suitable features extraction. To divide an input video into video episodes, a new key frame extraction algorithm is utilized, which combines color histogram of the frames with the entropy of motion vectors. We compare the results of the proposed algorithm with those of other methods. The results reveal that the proposed algorithm increases the recognition rate by more than 9.34% in comparison with existing methods.
    Print ISSN: 1687-5281
    Electronic ISSN: 1687-5176
    Topics: Electrical Engineering, Measurement and Control Technology
    Published by Springer
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 90
    Publication Date: 2013-02-21
    Description: The rapid spread in digital data usage in many real life applications have urged new and effective ways to ensure their security. Efficient secrecy can be achieved, at least in part, by implementing steganograhy techniques. Novel and versatile audio steganographic methods have been proposed. The goal of steganographic systems is to obtain secure and robust way to conceal high rate of secret data. We focus in this paper on digital audio steganography, which has emerged as a prominent source of data hiding across novel telecommunication technologies such as covered voice-over-IP, audio conferencing, etc. The multitude of steganographic criteria has led to a great diversity in these system design techniques. In this paper, we review current digital audio steganographic techniques and we evaluate their performance based on robustness, security and hiding capacity indicators. Another contribution of this paper is the provision of a robustness-based classification of steganographic models depending on their occurrence in the embedding process. A survey of major trends of audio steganography applications is also discussed in this paper.
    Print ISSN: 1687-4714
    Topics: Electrical Engineering, Measurement and Control Technology
    Published by Springer
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 91
    Publication Date: 2012-12-11
    Description: This article presents a shadow removal algorithm with background difference method based on shadow position and edges attributes. First, a novel background subtraction method is proposed to obtain moving objects. This method mainly includes three parts, namely detecting the moving regions approximately by calculating the inter-frames differences of symmetrical frames and counting the static index of each probable moving point; modeling for background by the statistics of brightness information and updating this model combining motion templates; then extracting moving objects and its edges. Second, based on the above processing, we suppress shadows in the HSV color space first, then the direction of shadow is determined by shadow edges and positions combining with the horizontal and vertical projections of the edge image, respectively, the position of the shadow is located accurately through proportion method, the shadow can be removed finally. Experimental results indicate that the proposed method is easy to be realized and can determine the direction of the shadow adaptively, then eliminate the shadow and extract the whole moving object accurately, especially when the chrominance invariant principle is ineffective.
    Print ISSN: 1687-5281
    Electronic ISSN: 1687-5176
    Topics: Electrical Engineering, Measurement and Control Technology
    Published by Springer
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 92
    Publication Date: 2012-12-11
    Description: Digital video stabilization (DVS) allows acquiring video sequences without disturbing jerkiness, removing unwanted camera movements. A good DVS should remove the unwanted camera movements while maintains the intentional camera movements. In this article, we propose a novel DVS algorithm that compensates the camera jitters applying an adaptive fuzzy filter on the global motion of video frames. The adaptive fuzzy filter is a simple infinite impulse response filter which is tuned by a fuzzy system adaptively to the camera motion characteristics. The fuzzy system is also tuned during operation according to the amount of camera jitters. The fuzzy system uses two inputs which are quantitative representations of the unwanted and the intentional camera movements. The global motion of video frames is estimated based on the block motion vectors which resulted by video encoder during motion estimation operation. Experimental results indicate a good performance for the proposed algorithm.
    Print ISSN: 1687-5281
    Electronic ISSN: 1687-5176
    Topics: Electrical Engineering, Measurement and Control Technology
    Published by Springer
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 93
    Publication Date: 2012-12-11
    Description: In this article, we propose a novel blind image deconvolution method developed within the Bayesian framework. We concentrate on the restoration of blurred photographs taken by commercial cameras to show its effectiveness. The proposed method is based on a non-convex lp quasi norm with 0 〈 p 〈 1 that is used for the image, and a total variation (TV) based prior that is utilized for the blur. Bayesian inference is carried out by utilizing bounds for both the image and blur priors using a majorization-minimization principle. Maximum a posteriori estimates of the unknown image, blur and model parameters are calculated. Experimental results (i.e., restorations of more than 30 blurred photographs) are presented to demonstrate the advantage of the proposed method compared to existing ones.
    Print ISSN: 1687-5281
    Electronic ISSN: 1687-5176
    Topics: Electrical Engineering, Measurement and Control Technology
    Published by Springer
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 94
    Publication Date: 2012-12-11
    Description: In this article, we propose a new method for localizing optic disc in retinal images. Localizing the optic disc and its center is the first step of most vessel segmentation, disease diagnostic, and retinal recognition algorithms. We use optic disc of the first four retinal images in DRIVE dataset to extract the histograms of each color component. Then, we calculate the average of histograms for each color as template for localizing the center of optic disc. The DRIVE, STARE, and a local dataset including 273 retinal images are used to evaluate the proposed algorithm. The success rate was 100, 91.36, and 98.9%, respectively.
    Print ISSN: 1687-5281
    Electronic ISSN: 1687-5176
    Topics: Electrical Engineering, Measurement and Control Technology
    Published by Springer
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 95
    Publication Date: 2012-12-11
    Description: Mobile video streaming services are challenging, as they obey several system constraints, such as random access facilities, efficient server storage, and flexible rate adaptation. Rate adaptation can be performed by means of seamless switching among different encoded bitstreams. The H.264 video coding standard explicitly supports bitstream switching using specific frame coding modes, namely switching pictures (SP). Locations of SP frames affect the overall bit rate and quality of streamed video. In this study, we address the issue of optimal joint selection of the SP frames locations and bit budget allocation at frame layer. The optimization is carried out via a game theoretic approach under assigned system constraints on the overall streaming rate and the maximum random access delay. Numerical simulations show that our frame layer optimal encodingprocedure brings advantages in terms of several characteristics of the streamed video, encompassing enhanced rate-distortion, reduced transmission buffer occupancy, equalization of the transmission delays, and moreefficient switching.
    Print ISSN: 1687-5281
    Electronic ISSN: 1687-5176
    Topics: Electrical Engineering, Measurement and Control Technology
    Published by Springer
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 96
    Publication Date: 2012-12-11
    Description: Facial expressions are a valuable source of information that accompanies facial biometrics. Early detection of physiological and psycho-emotional data from facial expressions is linked to the situational awareness module of any advanced biometric system for personal state re/identification. In this article, a new method that utilizes both texture and geometric information of facial fiducial points is presented. We investigate Gauss--Laguerre wavelets, which have rich frequency extraction capabilities, to extract texture information of various facial expressions. Rotation invariance and the multiscale approach of these wavelets make the feature extraction robust. Moreover, geometric positions of fiducial points provide valuable information for upper/lower face action units. The combination of these two types of features is used for facial expression classification. The performance of this system has been validated on three public databases: the JAFFE, the Cohn-Kanade, and the MMI image.
    Print ISSN: 1687-5281
    Electronic ISSN: 1687-5176
    Topics: Electrical Engineering, Measurement and Control Technology
    Published by Springer
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 97
    Publication Date: 2012-12-11
    Description: We propose a novel image segmentation algorithm using piecewise smooth (PS) approximation to image. The proposed algorithm is inspired by four well-known active contour models, i.e., Chan and Vese’ piecewise constant (PC)/smooth models, the region-scalable fitting model, and the local image fitting model. The four models share the same algorithm structure to find a PC/smooth approximation to the original image; the main difference is how to define the energy functional to be minimized and the PC/smooth function. In this article, pursuing the same idea we introduce different energy functional and PS function to search for the optimal PS approximation of the original image. The initial function with our model can be chosen as a constant function, which implies that the proposed algorithm is robust to initialization or even free of manual initialization. Experiments show that the proposed algorithm is very appropriate for a wider range of images, including images with intensity inhomogeneity and infrared ship images with low contrast and complex background.
    Print ISSN: 1687-5281
    Electronic ISSN: 1687-5176
    Topics: Electrical Engineering, Measurement and Control Technology
    Published by Springer
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 98
    Publication Date: 2012-12-11
    Description: Facial expressions (FE) are one of the important cognitive load markers in the context of cardriving. Any muscular activity can be coded as an action unit (AU) which are the buildingblocks of FE. Precise facial point tracking is crucial since it is a necessary step for AUdetection. Here, we present our progress in FE analysis based on AU detection on faceinfrared videos in the context of a car driving simulator. First, we propose a real-time facialpoints tracking method (HCPF-AAM) using a modified particle filter (PF) based on Harriscorner samples which is optimized and combined with an Active Appearance Model(AAM) approach. Robustness of PF, precision of Harris corner-based samples, andoptimization of AAM result in a powerful facial points tracking on very low-contrast imagesacquired under near-infrared (NIR) illumination. Second, detection of the most commonAUs in the context of car driving, identified by a certified Facial Action Coding Systemcoder is presented. For detection of each specified AU, the spatio-temporal analysis ofrelated tracked facial points is performed. Then, a combination of rule-based scheme withProbabilistic Actively Learned Support Vector Machines is developed to classify thefeatures calculated from the related tracked facial points. Results show that with such ascheme, we can obtain more than 91% of precision in the detection of the five most commonAUs for low-contrast NIR images and 90% of precision in the MMI dataset.
    Print ISSN: 1687-5281
    Electronic ISSN: 1687-5176
    Topics: Electrical Engineering, Measurement and Control Technology
    Published by Springer
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 99
    Publication Date: 2012-12-11
    Description: It is well-known DWT (discrete wavelet transform) is shift-sensitive, which means a slight shift of feature in the original signal may cause unpredictable changes in the analysis subbands. Some modified DWTs can reduce the shift sensitivity, however, they are all redundant. In this paper, we shows the shift sensitivity is caused by the aliasing terms formed in the downsampling operation during analysis process. A novel scheme is proposed to design the wavelet, which can reduce the effect of aliasing terms as much as possible in the general framework of DWT. A few of biorthogonal wavelets have been designed and applied in the simulation examples. The results of examples demonstrate the efficiency of the designed wavelets in the term of shift-insensitivity and nonredundancy.
    Print ISSN: 1687-5281
    Electronic ISSN: 1687-5176
    Topics: Electrical Engineering, Measurement and Control Technology
    Published by Springer
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 100
    Publication Date: 2012-12-11
    Description: In this work we evaluate the use of several real-time dense stereo algorithms as a passive 3D sensing technology for potential use as part of a driver assistance system or autonomous vehicle guidance.A key limitation in prior work in this area is that although significant comparative work has been done on dense stereo algorithms using de facto laboratory test sets only limited work has been done on evaluation in real world environments such as that found in potential automotive usage. This comparative study aims to provide an empirical comparison using automotive environment video imagery and compare this against dense stereo results drawn on standard test sequences in addition to considering the computational requirement against performance in real-time.We evaluate five chosen algorithms: Block Matching [1], Semi-Global Matching [2], No-Maximal Disparity [3], Cross- Based Local Approach [4], Adaptive Aggregation with Dynamic Programming [5]. Our comparison shows a contrast between the results obtained on standard test sequences and those for automotive application imagery where a Semi-Global Matching approach gave the best empirical performance.From our study we can conclude that the noise present in automotive applications, can impact the quality of the depth information output from more complex algorithms (No-Maximal Disparity, Cross-Based Local Approach, Adaptive Aggregationwith Dynamic Programming) resulting that in practice the disparity maps produced are comparable with those of simpler approaches such as Block Matching and Semi-Global Matching which empirically perform better in the automotive environmenttest sequences. This empirical result on automotive environment data contradicts the comparative result found on standard dense stereo test sequences using a statistical comparison methodology leading to interesting observations regarding current relative evaulation approaches.
    Print ISSN: 1687-5281
    Electronic ISSN: 1687-5176
    Topics: Electrical Engineering, Measurement and Control Technology
    Published by Springer
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
Close ⊗
This website uses cookies and the analysis tool Matomo. More information can be found here...