ALBERT

All Library Books, journals and Electronic Records Telegrafenberg

Ihre E-Mail wurde erfolgreich gesendet. Bitte prüfen Sie Ihren Maileingang.

Leider ist ein Fehler beim E-Mail-Versand aufgetreten. Bitte versuchen Sie es erneut.

Vorgang fortführen?

Exportieren
Filter
  • Artikel  (1.135)
  • Forschungsdaten
  • Springer  (1.135)
  • EURASIP Journal on Audio, Speech, and Music Processing  (198)
  • EURASIP Journal on Embedded Systems  (121)
  • 65868
  • 82739
  • Elektrotechnik, Elektronik, Nachrichtentechnik  (1.135)
Sammlung
  • Artikel  (1.135)
  • Forschungsdaten
Verlag/Herausgeber
  • Springer  (1.135)
Erscheinungszeitraum
Thema
  • Elektrotechnik, Elektronik, Nachrichtentechnik  (1.135)
  • Informatik  (520)
  • 1
    Publikationsdatum: 2015-08-08
    Beschreibung: Spoken term detection (STD) aims at retrieving data from a speech repository given a textual representation of the search term. Nowadays, it is receiving much interest due to the large volume of multimedia information. STD differs from automatic speech recognition (ASR) in that ASR is interested in all the terms/words that appear in the speech data, whereas STD focuses on a selected list of search terms that must be detected within the speech data. This paper presents the systems submitted to the STD ALBAYZIN 2014 evaluation, held as a part of the ALBAYZIN 2014 evaluation campaign within the context of the IberSPEECH 2014 conference. This is the first STD evaluation that deals with Spanish language. The evaluation consists of retrieving the speech files that contain the search terms, indicating their start and end times within the appropriate speech file, along with a score value that reflects the confidence given to the detection of the search term. The evaluation is conducted on a Spanish spontaneous speech database, which comprises a set of talks from workshops and amounts to about 7 h of speech. We present the database, the evaluation metrics, the systems submitted to the evaluation, the results, and a detailed discussion. Four different research groups took part in the evaluation. Evaluation results show reasonable performance for moderate out-of-vocabulary term rate. This paper compares the systems submitted to the evaluation and makes a deep analysis based on some search term properties (term length, in-vocabulary/out-of-vocabulary terms, single-word/multi-word terms, and in-language/foreign terms).
    Print ISSN: 1687-4714
    Thema: Elektrotechnik, Elektronik, Nachrichtentechnik
    Publiziert von Springer
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 2
    Publikationsdatum: 2015-08-14
    Beschreibung: Support vector machines (SVMs) have played an important role in the state-of-the-art language recognition systems. The recently developed extreme learning machine (ELM) tends to have better scalability and achieve similar or much better generalization performance at much faster learning speed than traditional SVM. Inspired by the excellent feature of ELM, in this paper, we propose a novel method called regularized minimum class variance extreme learning machine (RMCVELM) for language recognition. The RMCVELM aims at minimizing empirical risk, structural risk, and the intra-class variance of the training data in the decision space simultaneously. The proposed method, which is computationally inexpensive compared to SVM, suggests a new classifier for language recognition and is evaluated on the 2009 National Institute of Standards and Technology (NIST) language recognition evaluation (LRE). Experimental results show that the proposed RMCVELM obtains much better performance than SVM. In addition, the RMCVELM can also be applied to the popular i-vector space and get comparable results to the existing scoring methods.
    Print ISSN: 1687-4714
    Thema: Elektrotechnik, Elektronik, Nachrichtentechnik
    Publiziert von Springer
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 3
    Publikationsdatum: 2015-08-15
    Beschreibung: We investigate the automatic recognition of emotions in the singing voice and study the worth and role of a variety of relevant acoustic parameters. The data set contains phrases and vocalises sung by eight renowned professional opera singers in ten different emotions and a neutral state. The states are mapped to ternary arousal and valence labels. We propose a small set of relevant acoustic features basing on our previous findings on the same data and compare it with a large-scale state-of-the-art feature set for paralinguistics recognition, the baseline feature set of the Interspeech 2013 Computational Paralinguistics ChallengE (ComParE). A feature importance analysis with respect to classification accuracy and correlation of features with the targets is provided in the paper. Results show that the classification performance with both feature sets is similar for arousal, while the ComParE set is superior for valence. Intra singer feature ranking criteria further improve the classification accuracy in a leave-one-singer-out cross validation significantly.
    Print ISSN: 1687-4714
    Thema: Elektrotechnik, Elektronik, Nachrichtentechnik
    Publiziert von Springer
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 4
    Publikationsdatum: 2015-09-15
    Beschreibung: In recent years, deep learning has not only permeated the computer vision and speech recognition research fields but also fields such as acoustic event detection (AED). One of the aims of AED is to detect and classify non-speech acoustic events occurring in conversation scenes including those produced by both humans and the objects that surround us. In AED, deep learning has enabled modeling of detail-rich features, and among these, high resolution spectrograms have shown a significant advantage over existing predefined features (e.g., Mel-filter bank) that compress and reduce detail. In this paper, we further asses the importance of feature extraction for deep learning-based acoustic event detection. AED, based on spectrogram-input deep neural networks, exploits the fact that sounds have “global” spectral patterns, but sounds also have “local” properties such as being more transient or smoother in the time-frequency domain. These can be exposed by adjusting the time-frequency resolution used to compute the spectrogram, or by using a model that exploits locality leading us to explore two different feature extraction strategies in the context of deep learning: (1) using multiple resolution spectrograms simultaneously and analyzing the overall and event-wise influence to combine the results, and (2) introducing the use of convolutional neural networks (CNN), a state of the art 2D feature extraction model that exploits local structures, with log power spectrogram input for AED. An experimental evaluation shows that the approaches we describe outperform our state-of-the-art deep learning baseline with a noticeable gain in the CNN case and provides insights regarding CNN-based spectrogram characterization for AED.
    Print ISSN: 1687-4714
    Thema: Elektrotechnik, Elektronik, Nachrichtentechnik
    Publiziert von Springer
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 5
    Publikationsdatum: 2015-09-26
    Beschreibung: The identity of musical instruments is reflected in the acoustic attributes of musical notes played with them. Recently, it has been argued that these characteristics of musical identity (or timbre) can be best captured through an analysis that encompasses both time and frequency domains; with a focus on the modulations or changes in the signal in the spectrotemporal space. This representation mimics the spectrotemporal receptive field (STRF) analysis believed to underlie processing in the central mammalian auditory system, particularly at the level of primary auditory cortex. How well does this STRF representation capture timbral identity of musical instruments in continuous solo recordings remains unclear. The current work investigates the applicability of the STRF feature space for instrument recognition in solo musical phrases and explores best approaches to leveraging knowledge from isolated musical notes for instrument recognition in solo recordings. The study presents an approach for parsing solo performances into their individual note constituents and adapting back-end classifiers using support vector machines to achieve a generalization of instrument recognition to off-the-shelf, commercially available solo music.
    Print ISSN: 1687-4714
    Thema: Elektrotechnik, Elektronik, Nachrichtentechnik
    Publiziert von Springer
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 6
    Publikationsdatum: 2015-11-26
    Beschreibung: The need to have a large amount of parallel data is a large hurdle for the practical use of voice conversion (VC). This paper presents a novel framework of exemplar-based VC that only requires a small number of parallel exemplars. In our previous work, a VC technique using non-negative matrix factorization (NMF) for noisy environments was proposed. This method requires parallel exemplars (which consist of the source exemplars and target exemplars that have the same texts uttered by the source and target speakers) for dictionary construction. In the framework of conventional Gaussian mixture model (GMM)-based VC, some approaches that do not need parallel exemplars have been proposed. However, in the framework of exemplar-based VC for noisy environments, such a method has never been proposed. In this paper, an adaptation matrix in an NMF framework is introduced to adapt the source dictionary to the target dictionary. This adaptation matrix is estimated using only a small parallel speech corpus. We refer to this method as affine NMF, and the effectiveness of this method has been confirmed by comparing its effectiveness with that of a conventional NMF-based method and a GMM-based method in noisy environments.
    Print ISSN: 1687-4714
    Thema: Elektrotechnik, Elektronik, Nachrichtentechnik
    Publiziert von Springer
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 7
    Publikationsdatum: 2016-07-17
    Beschreibung: While mining topics in a document collection, in order to capture the relationships between words and further improve the effectiveness of discovered topics, this paper proposed a feedback recurrent neural net...
    Print ISSN: 1687-3955
    Digitale ISSN: 1687-3963
    Thema: Elektrotechnik, Elektronik, Nachrichtentechnik , Informatik
    Publiziert von Springer
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 8
    Publikationsdatum: 2016-07-26
    Beschreibung: As the adoption of sensing and control networks rises to encompass the most diverse fields, the need for simple, efficient interconnection between many different devices will become ever more pressing. Though ...
    Print ISSN: 1687-3955
    Digitale ISSN: 1687-3963
    Thema: Elektrotechnik, Elektronik, Nachrichtentechnik , Informatik
    Publiziert von Springer
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 9
    Publikationsdatum: 2016-07-26
    Beschreibung: In real-time data-intensive multimedia processing applications, data transfer and storage significantly influence, if not dominate, all the major cost parameters of the design space—namely energy consumption, ...
    Print ISSN: 1687-3955
    Digitale ISSN: 1687-3963
    Thema: Elektrotechnik, Elektronik, Nachrichtentechnik , Informatik
    Publiziert von Springer
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 10
    Publikationsdatum: 2016-07-31
    Beschreibung: Improving energy efficiency and reducing energy wastage is an important topic of our time. But it is quite difficult to figure out how much of our total electricity bill can be mapped to which device or at wha...
    Print ISSN: 1687-3955
    Digitale ISSN: 1687-3963
    Thema: Elektrotechnik, Elektronik, Nachrichtentechnik , Informatik
    Publiziert von Springer
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 11
    Publikationsdatum: 2016-07-15
    Beschreibung: We present a hybrid system spanning a fixed-function microarchitecture and a general-purpose microprocessor, designed to amplify the throughput and decrease the power dissipation of collision detection relativ...
    Print ISSN: 1687-3955
    Digitale ISSN: 1687-3963
    Thema: Elektrotechnik, Elektronik, Nachrichtentechnik , Informatik
    Publiziert von Springer
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 12
    Publikationsdatum: 2016-07-09
    Beschreibung: A new voice activity detection algorithm based on long-term pitch divergence is presented. The long-term pitch divergence not only decomposes speech signals with a bionic decomposition but also makes full use ...
    Print ISSN: 1687-4714
    Thema: Elektrotechnik, Elektronik, Nachrichtentechnik
    Publiziert von Springer
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 13
    Publikationsdatum: 2016-05-06
    Beschreibung: In order to recover data from embedded real-time main memory databases effectively and efficiently, this paper proposes a real-time log-based recovery approach. With respect to the real-time requirement in emb...
    Print ISSN: 1687-3955
    Digitale ISSN: 1687-3963
    Thema: Elektrotechnik, Elektronik, Nachrichtentechnik , Informatik
    Publiziert von Springer
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 14
    Publikationsdatum: 2016-05-11
    Beschreibung: Electric vehicles (EVs) are a promising solution to reduce the transportation dependency on oil, as well as the environmental concerns. Realization of E-transportation relies on providing electrical energy to ...
    Print ISSN: 1687-3955
    Digitale ISSN: 1687-3963
    Thema: Elektrotechnik, Elektronik, Nachrichtentechnik , Informatik
    Publiziert von Springer
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 15
    Publikationsdatum: 2016-05-06
    Beschreibung: Ultra-wideband (UWB) technology is foreseen as a promising solution to overcome the limits of ultra-high frequency (UHF) techniques toward the development of green radio frequency identification (RFID) systems...
    Print ISSN: 1687-3955
    Digitale ISSN: 1687-3963
    Thema: Elektrotechnik, Elektronik, Nachrichtentechnik , Informatik
    Publiziert von Springer
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 16
    Publikationsdatum: 2016-07-19
    Beschreibung: Large fractions of today’s embedded systems’ power consumption can be attributed to the memory subsystem. In order to reduce this fraction, we propose a mathematical model to optimize on-chip memory configurat...
    Print ISSN: 1687-3955
    Digitale ISSN: 1687-3963
    Thema: Elektrotechnik, Elektronik, Nachrichtentechnik , Informatik
    Publiziert von Springer
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 17
    Publikationsdatum: 2016-07-19
    Beschreibung: Optimizing energy consumption in modern mobile handheld devices plays a very important role as lowering energy consumption impacts battery life and system reliability. With next-generation smartphones and tabl...
    Print ISSN: 1687-3955
    Digitale ISSN: 1687-3963
    Thema: Elektrotechnik, Elektronik, Nachrichtentechnik , Informatik
    Publiziert von Springer
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 18
    Publikationsdatum: 2013-09-18
    Beschreibung: Query-by-Example Spoken Term Detection (QbE STD) aims at retrieving data from a speech data repository given an acoustic query containing the term of interest as input. Nowadays, it has been receiving much interest due to the high volume of information stored in audio or audiovisual format. QbE STD differs from Automatic speech recognition (ASR) and keyword spotting (KWS)/spoken term detection (STD) since ASR is interested in all the terms/words that appear in the speech signal and KWS/STD relies on a textual transcription of the search term to retrieve the speech data. This paper presents the systems submitted to the ALBAYZIN 2012 QbE STD evaluation held as a part of ALBAYZIN 2012 evaluation campaign within the context of the IberSPEECH 2012 Conferencea. The evaluation consists of retrieving the speech files that contain the input queries, indicating their start and end timestamps within the appropriate speech file. Evaluation is conducted on a Spanish spontaneous speech database containing a set of talks from MAVIR workshopsb, which amount at about 7 h of speech in total. We present the database metric systems submitted along with all results and some discussion. Four different research groups took part in the evaluation. Evaluation results show the difficulty of this task and the limited performance indicates there is still a lot of room for improvement. The best result is achieved by a dynamic time warping-based search over Gaussian posteriorgrams/posterior phoneme probabilities. This paper also compares the systems aiming at establishing the best technique dealing with that difficult task and looking for defining promising directions for this relatively novel task.
    Print ISSN: 1687-4714
    Thema: Elektrotechnik, Elektronik, Nachrichtentechnik
    Publiziert von Springer
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 19
    Publikationsdatum: 2013-06-07
    Beschreibung: A novel speech bandwidth extension method based on audio watermark is presented in this paper. The time-domain and frequency-domain envelope parameters are extracted from the high-frequency components of speech signal, and then these parameters are embedded in the corresponding narrowband speech bit stream by the modified least significant bit watermark method which uses perception property. At the decoder, the wideband speech is reproduced with the reconstruction of high-frequency components based on the parameters extracted from bit stream of the narrowband speech. The proposed method can decrease poor auditory effect caused by large local distortion. The simulation results show that the synthesized wideband speech has low spectral distortion and its speech perception quality is greatly improved.
    Print ISSN: 1687-4714
    Thema: Elektrotechnik, Elektronik, Nachrichtentechnik
    Publiziert von Springer
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 20
    Publikationsdatum: 2015-05-13
    Beschreibung: Deep neural network (DNN)-based approaches have been shown to be effective in many automatic speech recognition systems. However, few works have focused on DNNs for distant-talking speaker recognition. In this study, a bottleneck feature derived from a DNN and a cepstral domain denoising autoencoder (DAE)-based dereverberation are presented for distant-talking speaker identification, and a combination of these two approaches is proposed. For the DNN-based bottleneck feature, we noted that DNNs can transform the reverberant speech feature to a new feature space with greater discriminative classification ability for distant-talking speaker recognition. Conversely, cepstral domain DAE-based dereverberation tries to suppress the reverberation by mapping the cepstrum of reverberant speech to that of clean speech with the expectation of improving the performance of distant-talking speaker recognition. Since the DNN-based discriminant bottleneck feature and DAE-based dereverberation have a strong complementary nature, the combination of these two methods is expected to be very effective for distant-talking speaker identification. A speaker identification experiment was performed on a distant-talking speech set, with reverberant environments differing from the training environments. In suppressing late reverberation, our method outperformed some state-of-the-art dereverberation approaches such as the multichannel least mean squares (MCLMS). Compared with the MCLMS, we obtained a reduction in relative error rates of 21.4% for the bottleneck feature and 47.0% for the autoencoder feature. Moreover, the combination of likelihoods of the DNN-based bottleneck feature and DAE-based dereverberation further improved the performance.
    Print ISSN: 1687-4714
    Thema: Elektrotechnik, Elektronik, Nachrichtentechnik
    Publiziert von Springer
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 21
    Publikationsdatum: 2015-05-08
    Beschreibung: Estimating the directions of arrival (DOAs) of multiple simultaneous mobile sound sources is an important step for various audio signal processing applications. In this contribution, we present an approach that improves upon our previous work that is now able to estimate the DOAs of multiple mobile speech sources, while being light in resources, both hardware-wise (only using three microphones) and software-wise. This approach takes advantage of the fact that simultaneous speech sources do not completely overlap each other. To evaluate the performance of this approach, a multi-DOA estimation evaluation system was developed based on a corpus collected from different acoustic scenarios named Acoustic Interactions for Robot Audition (AIRA).
    Print ISSN: 1687-4714
    Thema: Elektrotechnik, Elektronik, Nachrichtentechnik
    Publiziert von Springer
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 22
    Publikationsdatum: 2016-04-13
    Beschreibung: Automatic speech recognition is becoming more ubiquitous as recognition performance improves, capable devices increase in number, and areas of new application open up. Neural network acoustic models that can u...
    Print ISSN: 1687-4714
    Thema: Elektrotechnik, Elektronik, Nachrichtentechnik
    Publiziert von Springer
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 23
    Publikationsdatum: 2016-04-13
    Beschreibung: In this paper, we propose two adaptive frame size Aloha algorithms, namely adaptive frame size Aloha 1 (AFSA1) and adaptive frame size Aloha 2 (AFSA2), for solving radio frequency identification (RFID) multipl...
    Print ISSN: 1687-3955
    Digitale ISSN: 1687-3963
    Thema: Elektrotechnik, Elektronik, Nachrichtentechnik , Informatik
    Publiziert von Springer
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 24
    Publikationsdatum: 2015-12-31
    Beschreibung: Using a proper distribution function for speech signal or for its representations is of crucial importance in statistical-based speech processing algorithms. Although the most commonly used probability density function (pdf) for speech signals is Gaussian, recent studies have shown the superiority of super-Gaussian pdfs. A large research effort has focused on the investigation of a univariate case of speech signal distribution; however, in this paper, we study the multivariate distributions of speech signal and its representations using the conventional distribution functions, e.g., multivariate Gaussian and multivariate Laplace, and the copula-based multivariate distributions as candidates. The copula-based technique is a powerful method in modeling non-Gaussian multivariate distributions with non-linear inter-dimensional dependency. The level of similarity between the candidate pdfs and the real speech pdf in different domains is evaluated using the energy goodness-of-fit test.In our evaluations, the best-fitted distributions for speech signal vectors with different lengths in various domains are determined. A similar experiment is performed for different classes of English phonemes (fricatives, nasals, stops, vowels, and semivowel/glides). The evaluation results demonstrate that the multivariate distribution of speech signals in different domains is mostly super-Gaussian, except for Mel-frequency cepstral coefficient. Also, the results confirm that the distribution of the different phoneme classes is better statistically modeled by a mixture of Gaussian and Laplace pdfs. The copula-based distributions provide better statistical modeling of vectors representing discrete Fourier transform (DFT) amplitude of speech vectors with a length shorter than 500 ms.
    Print ISSN: 1687-4714
    Thema: Elektrotechnik, Elektronik, Nachrichtentechnik
    Publiziert von Springer
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 25
    Publikationsdatum: 2016-03-06
    Beschreibung: Time-frequency (T-F) masking is an effective method for stereo speech source separation. However, reliable estimation of the T-F mask from sound mixtures is a challenging task, especially when room reverberati...
    Print ISSN: 1687-4714
    Thema: Elektrotechnik, Elektronik, Nachrichtentechnik
    Publiziert von Springer
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 26
    Publikationsdatum: 2015-12-27
    Beschreibung: In order to solve these problems such as the demand of geographic information service and the short life of the embedded system, as well as network collapse, and so on, the embedded mobile crowd service systems based on opportunistic geological grid and dynamical split was proposed. Firstly, based on the characteristics of geographical spatial information resources and service time series, a mobile geographic crowd service system was established for providing the sensing data with the mobile geographic crowd service model. Then, according to the embedded equipment complex data of the geographic crowd service system, and the relationship between the geography information service object and the user, the embedded system was proposed based on the opportunity geological grid. Finally, the optimization of the geographic crowd system was realized by the dynamic segmentation of the opportunity geographic grid. The experiment results of the equipment utilization, the life cycle of the crowd network, user satisfaction, and control complexity show that the proposed scheme is more suitable for the embedded network geographic information system.
    Print ISSN: 1687-3955
    Digitale ISSN: 1687-3963
    Thema: Elektrotechnik, Elektronik, Nachrichtentechnik , Informatik
    Publiziert von Springer
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 27
    Publikationsdatum: 2016-03-31
    Beschreibung: Nowadays, many enterprises provide cloud services based on their own Hadoop clusters. Because the resources of a Hadoop cluster are limited, the Hadoop cluster must select some specific tasks to allocate limit...
    Print ISSN: 1687-3955
    Digitale ISSN: 1687-3963
    Thema: Elektrotechnik, Elektronik, Nachrichtentechnik , Informatik
    Publiziert von Springer
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 28
    Publikationsdatum: 2019
    Beschreibung: We propose a new method for music detection from broadcasting contents using the convolutional neural networks with a Mel-scale kernel. In this detection task, music segments should be annotated from the broad...
    Print ISSN: 1687-4714
    Thema: Elektrotechnik, Elektronik, Nachrichtentechnik
    Publiziert von Springer
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 29
    Publikationsdatum: 2019
    Beschreibung: A method called joint connectionist temporal classification (CTC)-attention-based speech recognition has recently received increasing focus and has achieved impressive performance. A hybrid end-to-end architec...
    Print ISSN: 1687-4714
    Thema: Elektrotechnik, Elektronik, Nachrichtentechnik
    Publiziert von Springer
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 30
    Publikationsdatum: 2019
    Beschreibung: Search on speech (SoS) is a challenging area due to the huge amount of information stored in audio and video repositories. Spoken term detection (STD) is an SoS-related task aiming to retrieve data from a spee...
    Print ISSN: 1687-4714
    Thema: Elektrotechnik, Elektronik, Nachrichtentechnik
    Publiziert von Springer
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 31
    Publikationsdatum: 2019
    Beschreibung: Speech emotion recognition methods combining articulatory information with acoustic features have been previously shown to improve recognition performance. Collection of articulatory data on a large scale may ...
    Print ISSN: 1687-4714
    Thema: Elektrotechnik, Elektronik, Nachrichtentechnik
    Publiziert von Springer
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 32
    Publikationsdatum: 2015-10-21
    Beschreibung: No description available
    Print ISSN: 1687-4714
    Thema: Elektrotechnik, Elektronik, Nachrichtentechnik
    Publiziert von Springer
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 33
    Publikationsdatum: 2015-10-22
    Beschreibung: In this paper, a semi-fragile and blind digital speech watermarking technique for online speaker recognition systems based on the discrete wavelet packet transform (DWPT) and quantization index modulation (QIM) has been proposed that enables embedding of the watermark within an angle of the wavelet’s sub-bands. To minimize the degradation effects of the watermark, these sub-bands were selected from frequency ranges where little speaker-specific information was available (500–3500 Hz and 6000–7000 Hz). Experimental results on the TIMIT, MIT, and MOBIO speech databases show that the degradation results for speaker verification and identification are 0.39 and 0.97 %, respectively, which are negligible. In addition, the proposed watermark technique can provide the appropriate fragility required for different signal processing operations.
    Print ISSN: 1687-4714
    Thema: Elektrotechnik, Elektronik, Nachrichtentechnik
    Publiziert von Springer
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 34
    Publikationsdatum: 2015-10-22
    Beschreibung: The presence of physical task stress induces changes in the speech production system which in turn produces changes in speaking behavior. This results in measurable acoustic correlates including changes to formant center frequencies, breath pause placement, and fundamental frequency. Many of these changes are due to the subject’s internal competition between speaking and breathing during the performance of the physical task, which has a corresponding impact on muscle control and airflow within the glottal excitation structure as well as vocal tract articulatory structure. This study considers the effect of physical task stress on voice quality. Three signal processing-based values which include (i) the normalized amplitude quotient (NAQ), (ii) the harmonic richness factor (HRF), and (iii) the fundamental frequency are used to measure voice quality. The effects of physical stress on voice quality depend on the speaker as well as the specific task. While some speakers do not exhibit changes in voice quality, a subset exhibits changes in NAQ and HRF measures of similar magnitude to those observed in studies of soft, loud, and pressed speech. For those speakers demonstrating voice quality changes, the observed changes tend toward breathy or soft voicing as observed in other studies. The effect of physical stress on the fundamental frequency is correlated with the effect of physical stress on the HRF (r = −0.34) and the NAQ (r = −0.53). Also, the inter-speaker variation in baseline NAQ is significantly higher than the variation in NAQ induced by physical task stress. The results illustrate systematic changes in speech production under physical task stress, which in theory will impact subsequent speech technology such as speech recognition, speaker recognition, and voice diarization systems.
    Print ISSN: 1687-4714
    Thema: Elektrotechnik, Elektronik, Nachrichtentechnik
    Publiziert von Springer
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 35
    Publikationsdatum: 2015-07-17
    Beschreibung: Acoustic data transmission (ADT) forms a branch of the audio data hiding techniques with its capability of communicating data in short-range aerial space between a loudspeaker and a microphone. In this paper, we propose an acoustic data transmission system extending our previous studies and give an in-depth analysis of its performance. The proposed technique utilizes the phases of modulated complex lapped transform (MCLT) coefficients of the audio signal. To achieve a good trade-off between the audio quality and the data transmission performance, the enhanced segmental SNR adjustment (SSA) algorithm is proposed. Moreover, we also propose a scheme to use multiple microphones for ADT technique. This multi-microphone ADT technique further enhances the transmission performance while ensuring compatibility with the single microphone system. From a series of experimental results, it has been found that the transmission performance improves when the length of the MCLT frame gets longer at the cost of the audio quality degradation. In addition, a good trade-off between the audio quality and data transmission performance is achieved by means of SSA algorithm. The experimental results also reveal that the proposed multi-microphone method is useful in enhancing the transmission performance.
    Print ISSN: 1687-4714
    Thema: Elektrotechnik, Elektronik, Nachrichtentechnik
    Publiziert von Springer
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 36
    Publikationsdatum: 2016-08-13
    Beschreibung: Smart grid, smart metering, electromobility, and the regulation of the power network are keywords of the transition in energy politics. In the future, the power grid will be smart. Based on different works, th...
    Print ISSN: 1687-3955
    Digitale ISSN: 1687-3963
    Thema: Elektrotechnik, Elektronik, Nachrichtentechnik , Informatik
    Publiziert von Springer
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 37
    Publikationsdatum: 2016-08-10
    Beschreibung: Substantial amounts of resources are usually required to robustly develop a language model for an open vocabulary speech recognition system as out-of-vocabulary (OOV) words can hurt recognition accuracy. In th...
    Print ISSN: 1687-4714
    Thema: Elektrotechnik, Elektronik, Nachrichtentechnik
    Publiziert von Springer
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 38
    Publikationsdatum: 2016-08-11
    Beschreibung: This paper presents a didactic framework in embedded electronics systems that is used to elicit awareness into students and engineers on the design issues arising in the realization of a class of underactuated...
    Print ISSN: 1687-3955
    Digitale ISSN: 1687-3963
    Thema: Elektrotechnik, Elektronik, Nachrichtentechnik , Informatik
    Publiziert von Springer
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 39
    Publikationsdatum: 2016-08-12
    Beschreibung: Rock acoustic emission is often used to study the evolution of brittle materials. The cause of rock internal damage can be monitored continuously and real-timely by sensing rock acoustic wave. However, the key...
    Print ISSN: 1687-3955
    Digitale ISSN: 1687-3963
    Thema: Elektrotechnik, Elektronik, Nachrichtentechnik , Informatik
    Publiziert von Springer
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 40
    Publikationsdatum: 2016-08-12
    Beschreibung: Due to its relative simplicity, the JPEG compression algorithm requires less hardware or software resources with respect to new compression algorithms, for example the JPEG2000 and the JPEG XR. This makes it s...
    Print ISSN: 1687-3955
    Digitale ISSN: 1687-3963
    Thema: Elektrotechnik, Elektronik, Nachrichtentechnik , Informatik
    Publiziert von Springer
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 41
    Publikationsdatum: 2015-09-30
    Beschreibung: This paper presents SymRT, a tool based on a combination of symbolic execution and real-time model checking for timing analysis of Java systems. Symbolic execution is used for the generation of a safe and tight timing model of the analyzed system capturing the feasible execution paths. The model is combined with suitable execution environment models capturing the timing behavior of the target host platform including the Java virtual machine and complex hardware features such as caching. The complete timing model is a network of timed automata which directly facilitates safe estimates of worst and best case execution time to be determined using the Uppaal model checker. Furthermore, the integration of the proposed techniques into the TetaSARTS tool facilitates reasoning about additional timing properties such as the schedulability of periodically and sporadically released Java real-time tasks (under specific scheduling policies), worst case response time, and more.
    Print ISSN: 1687-3955
    Digitale ISSN: 1687-3963
    Thema: Elektrotechnik, Elektronik, Nachrichtentechnik , Informatik
    Publiziert von Springer
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 42
    Publikationsdatum: 2016-06-15
    Beschreibung: Dynamic voltage and frequency scaling (DVFS) is a means to adjust the computing capacity and power consumption of computing systems to the application demands. DVFS is generally useful to...
    Print ISSN: 1687-3955
    Digitale ISSN: 1687-3963
    Thema: Elektrotechnik, Elektronik, Nachrichtentechnik , Informatik
    Publiziert von Springer
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 43
    Publikationsdatum: 2016-06-03
    Beschreibung: Big data of biological engineering and mobile control increase the complexity of system control. In order to resolve the above problems and improve biological engineering system performance, this paper propose...
    Print ISSN: 1687-3955
    Digitale ISSN: 1687-3963
    Thema: Elektrotechnik, Elektronik, Nachrichtentechnik , Informatik
    Publiziert von Springer
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 44
    Publikationsdatum: 2016-09-01
    Beschreibung: Internet simultaneous services of large-scale users will lead to server overload and information failure. Static content recommendation system cannot adapt to the dynamic similarity characteristics of users. S...
    Print ISSN: 1687-3955
    Digitale ISSN: 1687-3963
    Thema: Elektrotechnik, Elektronik, Nachrichtentechnik , Informatik
    Publiziert von Springer
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 45
    Publikationsdatum: 2016-09-17
    Beschreibung: In recent years, the use of multiprocessor systems has become increasingly common. Even in the embedded domain, the development of platforms based on multiprocessor systems or the porting of legacy single-core...
    Print ISSN: 1687-3955
    Digitale ISSN: 1687-3963
    Thema: Elektrotechnik, Elektronik, Nachrichtentechnik , Informatik
    Publiziert von Springer
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 46
    Publikationsdatum: 2016-08-23
    Beschreibung: In order to improve the efficiency of mechanical and hydraulic control of the mechanical equipment, the analysis scheme of mechanical hydraulic characteristics based on lightweight crowd data was proposed in m...
    Print ISSN: 1687-3955
    Digitale ISSN: 1687-3963
    Thema: Elektrotechnik, Elektronik, Nachrichtentechnik , Informatik
    Publiziert von Springer
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 47
    Publikationsdatum: 2015-05-22
    Beschreibung: This paper presents an objective speech quality model, ViSQOL, the Virtual Speech Quality Objective Listener. It is a signal-based, full-reference, intrusive metric that models human speech quality perception using a spectro-temporal measure of similarity between a reference and a test speech signal. The metric has been particularly designed to be robust for quality issues associated with Voice over IP (VoIP) transmission. This paper describes the algorithm and compares the quality predictions with the ITU-T standard metrics PESQ and POLQA for common problems in VoIP: clock drift, associated time warping, and playout delays. The results indicate that ViSQOL and POLQA significantly outperform PESQ, with ViSQOL competing well with POLQA. An extensive benchmarking against PESQ, POLQA, and simpler distance metrics using three speech corpora (NOIZEUS and E4 and the ITU-T P.Sup. 23 database) is also presented. These experiments benchmark the performance for a wide range of quality impairments, including VoIP degradations, a variety of background noise types, speech enhancement methods, and SNR levels. The results and subsequent analysis show that both ViSQOL and POLQA have some performance weaknesses and under-predict perceived quality in certain VoIP conditions. Both have a wider application and robustness to conditions than PESQ or more trivial distance metrics. ViSQOL is shown to offer a useful alternative to POLQA in predicting speech quality in VoIP scenarios.
    Print ISSN: 1687-4714
    Thema: Elektrotechnik, Elektronik, Nachrichtentechnik
    Publiziert von Springer
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 48
    Publikationsdatum: 2015-06-28
    Beschreibung: Over recent years, i-vector-based framework has been proven to provide state-of-the-art performance in speaker verification. Each utterance is projected onto a total factor space and is represented by a low-dimensional feature vector. Channel compensation techniques are carried out in this low-dimensional feature space. Most of the compensation techniques take the sets of extracted i-vectors as input. By constructing between-class covariance and within-class covariance, we attempt to minimize the between-class variance mainly caused by channel effect and to maximize the variance between speakers. In the real-world application, enrollment and test data from each user (or speaker) are always scarce. Although it is widely thought that session variability is mostly caused by channel effects, phonetic variability, as a factor that causes session variability, is still a matter to be considered. We propose in this paper a new i-vector extraction algorithm from the total factor matrix which we term component reduction analysis (CRA). This new algorithm contributes to better modelling of session variability in the total factor space.We reported results on the male English trials of the core condition of the NIST 2008 Speaker Recognition Evaluation (SREs) dataset. As measured both by equal error rate and the minimum values of the NIST detection cost function, 10–15 % relative improvement is achieved compared to the baseline of traditional i-vector-based system.
    Print ISSN: 1687-4714
    Thema: Elektrotechnik, Elektronik, Nachrichtentechnik
    Publiziert von Springer
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 49
    Publikationsdatum: 2015-06-26
    Beschreibung: Singer identification is a difficult topic in music information retrieval because background instrumental music is included with singing voice which reduces performance of a system. One of the main disadvantages of the existing system is vocals and instrumental are separated manually and only vocals are used to build training model. The research presented in this paper automatically recognize a singer without separating instrumental and singing sounds using audio features like timbre coefficients, pitch class, mel frequency cepstral coefficients (MFCC), linear predictive coding (LPC) coefficients, and loudness of an audio signal from Indian video songs (IVS). Initially, various IVS of distinct playback singers (PS) are collected. After that, 53 audio features (12 dimensional timbre audio feature vectors, 12 pitch classes, 13 MFCC coefficients, 13 LPC coefficients, and 3 loudness feature vector of an audio signal) are extracted from each segment. Dimension of extracted audio features is reduced using principal component analysis (PCA) method. Playback singer model (PSM) is trained using multiclass classification algorithms like back propagation, AdaBoost.M2, k-nearest neighbor (KNN) algorithm, naïve Bayes classifier (NBC), and Gaussian mixture model (GMM). The proposed approach is tested on various combinations of dataset and different combinations of audio feature vectors with various Indian male and female PS’s songs.
    Print ISSN: 1687-4714
    Thema: Elektrotechnik, Elektronik, Nachrichtentechnik
    Publiziert von Springer
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 50
    Publikationsdatum: 2015-06-27
    Beschreibung: Manual transcription of audio databases for the development of automatic speech recognition (ASR) systems is a costly and time-consuming process. In the context of deriving acoustic models adapted to a specific application, or in low-resource scenarios, it is therefore essential to explore alternatives capable of improving speech recognition results. In this paper, we investigate the relevance of foreign data characteristics, in particular domain and language, when using this data as an auxiliary data source for training ASR acoustic models based on deep neural networks (DNNs). The acoustic models are evaluated on a challenging bilingual database within the scope of the MediaParl project. Experimental results suggest that in-language (but out-of-domain) data is more beneficial than in-domain (but out-of-language) data when employed in either supervised or semi-supervised training of DNNs. The best performing ASR system, an HMM/GMM acoustic model that exploits DNN as a discriminatively trained feature extractor outperforms the best performing HMM/DNN hybrid by about 5 % relative (in terms of WER). An accumulated relative gain with respect to the MFCC-HMM/GMM baseline is about 30 % WER.
    Print ISSN: 1687-4714
    Thema: Elektrotechnik, Elektronik, Nachrichtentechnik
    Publiziert von Springer
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 51
    Publikationsdatum: 2015-06-10
    Beschreibung: Optimal automatic speech recognition (ASR) takes place when the recognition system is tested under circumstances identical to those in which it was trained. However, in the actual real world, there exist many sources of mismatches between the environment of training and the environment of testing. These sources can be due to the sources of noise that exist in real environments. Speech enhancement techniques have been developed to provide ASR systems with the robustness against the sources of noise. In this work, a method based on histogram equalization (HEQ) was proposed to compensate for the nonlinear distortions in speech representation. This approach utilizes stereo simultaneous recordings for clean speech and its corresponding noisy speech to compute stereo Gaussian mixture model (GMM). The stereo GMM is used to compute the cumulative density function (CDF) for both clean speech and noisy speech using a sigmoid function instead of using the order statistics that is used in other HEQ-based methods. In the implementation, we show two choices to apply HEQ, hard decision HEQ and soft decision HEQ. The latter is based on minimum mean square error (MMSE) clean speech estimation. The experimental work shows that the soft HEQ and hard HEQ achieve better recognition results than the other HEQ approaches such as tabular HEQ, quantile HEQ and polynomial fit HEQ. It also shows that soft HEQ achieves notably better recognition results than hard HEQ. The results of the experimental work also show that using HEQ improves the efficiency of other speech enhancement techniques such as stereo piece-wise linear compensation for environment (SPLICE) and vector Taylor series (VTS). The results also show that using HEQ in multi style training (MST) significantly improves the ASR system performance.
    Print ISSN: 1687-4714
    Thema: Elektrotechnik, Elektronik, Nachrichtentechnik
    Publiziert von Springer
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 52
    Publikationsdatum: 2015-01-20
    Beschreibung: Vocal tremor has been simulated using a high-dimensional discrete vocal fold model. Specifically, respiratory, phonatory, and articulatory tremors have been modeled as instabilities in six parameters of the model. Reported results are consistent with previous knowledge in that respiratory tremor mainly causes amplitude modulation of the voice signal while laryngeal tremor causes both amplitude and frequency modulation. In turn, articulatory tremor is commonly assumed to produce only amplitude modulations but the simulation results indicate that it also produces a high-frequency modulation of the output signal. Furthermore, articulatory tremor affects the frequency response of the vocal tract and it might thus be detected by analyzing the spectral envelope of the acoustic signal.
    Print ISSN: 1687-4714
    Thema: Elektrotechnik, Elektronik, Nachrichtentechnik
    Publiziert von Springer
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 53
    Publikationsdatum: 2015-01-23
    Beschreibung: This paper addresses the design of embedded systems for outdoor augmented reality (AR) applications integrated to see-through glasses. The set of tasks includes object positioning, graphic computation, as well as wireless communications, and we consider constraints such as real-time, low power, and low footprint. We introduce an original sailor assistance application, as a typical, useful, and complex outdoor AR application, where context-dependent virtual objects must be placed in the user field of view according to head motions and ambient information. Our study demonstrates that it is worth working on power optimization, since the embedded system based on a standard general-purpose processor (GPP) + graphics processing unit (GPU) consumes more than high-luminosity see-through glasses. This work presents then three main contributions, the first one is the choice and combinations of position and attitude algorithms that fit with the application context. The second one is the architecture of the embedded system, where it is introduced as a fast and simple object processor (OP) optimized for the domain of mobile AR. Finally, the OP implements a new pixel rendering method (incremental pixel shader (IPS)), which is implemented in hardware and takes full advantage of OpenGL ES light model. A GP+OP(s) complete architecture is described and prototyped on field programmable gate-array (FPGA). It includes hardware/software partitioning based on the analysis of application requirements and ergonomics.
    Print ISSN: 1687-3955
    Digitale ISSN: 1687-3963
    Thema: Elektrotechnik, Elektronik, Nachrichtentechnik , Informatik
    Publiziert von Springer
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 54
    Publikationsdatum: 2015-01-23
    Beschreibung: In this paper, an initial feature vector based on the combination of the wavelet packet decomposition (WPD) and the Mel frequency cepstral coefficients (MFCCs) is proposed. For optimizing the initial feature vector, a genetic algorithm (GA)-based approach is proposed and compared with the well-known principal component analysis (PCA) approach. The artificial neural network (ANN) with the different learning algorithms is used as the classifier. Some experiments are carried out for evaluating and comparing the classification accuracies which are obtained by the use of the different learning algorithms and the different feature vectors (the initial and the optimized ones). Finally, a hybrid of the ANN with the `trainscg? training algorithm and the genetic algorithm is proposed for the vocal fold pathology diagnosis. Also, the performance of the proposed method is compared with the recent works. The experiments' results show a better performance (the higher classification accuracy) of the proposed method in comparison with the others.
    Print ISSN: 1687-4714
    Thema: Elektrotechnik, Elektronik, Nachrichtentechnik
    Publiziert von Springer
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 55
    Publikationsdatum: 2015-02-12
    Beschreibung: The spatio-temporal-prediction (STP) method for multichannel speech enhancement has recently been proposed. This approach makes it theoretically possible to attenuate the residual noise without distorting speech. In addition, the STP method depends only on the second-order statistics and can be implemented using a simple linear filtering framework. Unfortunately, some numerical problems can arise when estimating the filter matrix in transients. In such a case, the speech correlation matrix is usually rank deficient, so that no solution exists. In this paper, we propose to implement the spatio-temporal-prediction method using a signal subspace approach. This allows for nullifying the noise subspace and processing only the noisy signal in the signal-plus-noise subspace. As a result, we are able to not only regularize the solution in transients but also to achieve higher attenuation of the residual noise. The experimental results also show that the signal subspace approach distorts speech less than the conventional method.
    Print ISSN: 1687-4714
    Thema: Elektrotechnik, Elektronik, Nachrichtentechnik
    Publiziert von Springer
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 56
    Publikationsdatum: 2015-02-12
    Beschreibung: Deep neural networks (DNNs) have gained remarkable success in speech recognition, partially attributed to the flexibility of DNN models in learning complex patterns of speech signals. This flexibility, however, may lead to serious over-fitting and hence miserable performance degradation in adverse acoustic conditions such as those with high ambient noises. We propose a noisy training approach to tackle this problem: by injecting moderate noises into the training data intentionally and randomly, more generalizable DNN models can be learned. This ‘noise injection’ technique, although known to the neural computation community already, has not been studied with DNNs which involve a highly complex objective function. The experiments presented in this paper confirm that the noisy training approach works well for the DNN model and can provide substantial performance improvement for DNN-based speech recognition.
    Print ISSN: 1687-4714
    Thema: Elektrotechnik, Elektronik, Nachrichtentechnik
    Publiziert von Springer
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 57
    Publikationsdatum: 2015-02-13
    Beschreibung: Music identification via audio fingerprinting has been an active research field in recent years. In the real-world environment, music queries are often deformed by various interferences which typically include signal distortions and time-frequency misalignments caused by time stretching, pitch shifting, etc. Therefore, robustness plays a crucial role in music identification technique. In this paper, we propose to use scale invariant feature transform (SIFT) local descriptors computed from a spectrogram image as sub-fingerprints for music identification. Experiments show that these sub-fingerprints exhibit strong robustness against serious time stretching and pitch shifting simultaneously. In addition, a locality sensitive hashing (LSH)-based nearest sub-fingerprint retrieval method and a matching determination mechanism are applied for robust sub-fingerprint matching, which makes the identification efficient and precise. Finally, as an auxiliary function, we demonstrate that by comparing the time-frequency locations of corresponding SIFT keypoints, the factor of time stretching and pitch shifting that music queries might have experienced can be accurately estimated.
    Print ISSN: 1687-4714
    Thema: Elektrotechnik, Elektronik, Nachrichtentechnik
    Publiziert von Springer
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 58
    Publikationsdatum: 2015-01-21
    Beschreibung: No description available
    Print ISSN: 1687-4714
    Thema: Elektrotechnik, Elektronik, Nachrichtentechnik
    Publiziert von Springer
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 59
    Publikationsdatum: 2015-01-30
    Beschreibung: Owing to the suprasegmental behavior of emotional speech, turn-level features have demonstrated a better success than frame-level features for recognition-related tasks. Conventionally, such features are obtained via a brute-force collection of statistics over frames, thereby losing important local information in the process which affects the performance. To overcome these limitations, a novel feature extraction approach using latent topic models (LTMs) is presented in this study. Speech is assumed to comprise of a mixture of emotion-specific topics, where the latter capture emotionally salient information from the co-occurrences of frame-level acoustic features and yield better descriptors. Specifically, a supervised replicated softmax model (sRSM), based on restricted Boltzmann machines and distributed representations, is proposed to learn naturally discriminative topics. The proposed features are evaluated for the recognition of categorical or continuous emotional attributes via within and cross-corpus experiments conducted over acted and spontaneous expressions. In a within-corpus scenario, sRSM outperforms competing LTMs, while obtaining a significant improvement of 16.75% over popular statistics-based turn-level features for valence-based classification, which is considered to be a difficult task using only speech. Further analyses with respect to the turn duration show that the improvement is even more significant, 35%, on longer turns (〉6 s), which is highly desirable for current turn-based practices. In a cross-corpus scenario, two novel adaptation-based approaches, instance selection, and weight regularization are proposed to reduce the inherent bias due to varying annotation procedures and cultural perceptions across databases. Experimental results indicate a natural, yet less severe, deterioration in performance - only 2.6% and 2.7%, thereby highlighting the generalization ability of the proposed features.
    Print ISSN: 1687-4714
    Thema: Elektrotechnik, Elektronik, Nachrichtentechnik
    Publiziert von Springer
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 60
    Publikationsdatum: 2015-07-17
    Beschreibung: The Farrow-structure-based steerable broadband beamformer (FSBB) is particularly useful in the applications where sound source of interest may move around a wide angular range. However, in contrast with conventional filter-and-sum beamformer, the passband steerability of FSBB is achieved at the cost of high complexity in structure, i.e., highly increased number of tap weights. Moreover, it has been shown that the FSBB is sensitive to microphone mismatches, and robust FSBB design is of interest to practical applications. To deal with the aforementioned problems, this paper studies the robust design of the FSBB with sparse tap weights via convex optimization by considering some a priori knowledge of microphone mismatches. It is shown that although the worst-case performance (WCP) optimization has been successfully applied to the design of robust filter-and-sum beamformers with bounded microphone mismatches, it may become unapplicable to robust FSBB design due to its over-conservativeness nature. When limited knowledge of mean and variance of microphone mismatches is available, a robust FSBB design approach based on the worst-case mean performance optimization with the passband response variance (PRV) constraint is devised. Unlike the WCP optimization design, this approach performs well with the capability of passband stability control of array response. Finally, the robust FSBB design with sparse tap weights has been studied. It is shown that there is redundancy in the tap weights of FSBB, i.e., robust FSBB design with sparse tap weights is viable, and thus leads to low-complexity FSBB.
    Print ISSN: 1687-4714
    Thema: Elektrotechnik, Elektronik, Nachrichtentechnik
    Publiziert von Springer
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 61
    Publikationsdatum: 2013-02-21
    Beschreibung: The rapid spread in digital data usage in many real life applications have urged new and effective ways to ensure their security. Efficient secrecy can be achieved, at least in part, by implementing steganograhy techniques. Novel and versatile audio steganographic methods have been proposed. The goal of steganographic systems is to obtain secure and robust way to conceal high rate of secret data. We focus in this paper on digital audio steganography, which has emerged as a prominent source of data hiding across novel telecommunication technologies such as covered voice-over-IP, audio conferencing, etc. The multitude of steganographic criteria has led to a great diversity in these system design techniques. In this paper, we review current digital audio steganographic techniques and we evaluate their performance based on robustness, security and hiding capacity indicators. Another contribution of this paper is the provision of a robustness-based classification of steganographic models depending on their occurrence in the embedding process. A survey of major trends of audio steganography applications is also discussed in this paper.
    Print ISSN: 1687-4714
    Thema: Elektrotechnik, Elektronik, Nachrichtentechnik
    Publiziert von Springer
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 62
    Publikationsdatum: 2012-12-11
    Beschreibung: This paper discusses the space-time coding (STC) problem for RFID MIMO systems. First, a mathematical model for this kind of system is developed from the viewpoint of signal processing, which makes it easy to design the STC schemes. Then two STC schemes, namely Scheme I and Scheme II, are proposed. Simulation results illustrate that the proposed approaches can greatly improve the symbol-error rate (SER) performance of RFID systems, compared to the non space-time encoded RFID system. The SER performance for Scheme I and Scheme II is thoroughly compared. It is found that Scheme II with the innate real-symbol constellation yields better SER performance than Scheme I. Some design guidelines for RFID-MIMO systems are pointed out.
    Print ISSN: 1687-3955
    Digitale ISSN: 1687-3963
    Thema: Elektrotechnik, Elektronik, Nachrichtentechnik , Informatik
    Publiziert von Springer
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 63
    Publikationsdatum: 2012-12-11
    Beschreibung: To tackle the growing complexity and huge demand for tailored domestic video surveillance systems along with a high demanding time-to-market expectation, engineers at IVV Automation, LDA are exploiting video surveillance domain as families of systems that can be developed following a pay-as-you-go fashion rather than developing an ex-nihilo new product. Several and different new functionalities are required for each new product's hardware platforms (e.g., ranging from mobile phone, PDA to desktop PC) and operating systems (e.g., flavors of Linux, Windows and MAC OS X). Some of these functionalities have special economical constraints of speed and footprint. To better accommodate all the above listing requirements, a model-driven generative software development paradigm supported by mainstream tools is proposed to offer a significant leverage in hiding commonalities and configuring variabilities across families of video surveillance products while maintaining the new product quality.
    Print ISSN: 1687-3955
    Digitale ISSN: 1687-3963
    Thema: Elektrotechnik, Elektronik, Nachrichtentechnik , Informatik
    Publiziert von Springer
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 64
    Publikationsdatum: 2012-12-11
    Beschreibung: As technology scales for increased circuit density and performance, the management of power consumption inembedded systems is becoming critical. Because the operating system (OS) is a basic component of the embedded system, the reduction and characterization of its energy consumption is a main challenge for the designers. In this work, a flow of low power OS energy characterization is introduced. The variation of the energy and powerconsumption of the embedded OS services is studied. The remainder of this article details the methods used todetermine energy and power overheads of a set of basic services of the embedded OS: scheduling, context switchand inter-process communication. The impact of hardware and software parameters like processor frequency andscheduling policy on the energy consumption are analyzed. Also, models and laws of the power and energy areextracted. Then, to quantify the low power OS energetic overhead, the obtained models are integrated in thesystem level design. Our method allows estimating the energy consumption of the low power OS services whenrunning an application on a specific hardware platform.
    Print ISSN: 1687-3955
    Digitale ISSN: 1687-3963
    Thema: Elektrotechnik, Elektronik, Nachrichtentechnik , Informatik
    Publiziert von Springer
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 65
    Publikationsdatum: 2012-12-11
    Beschreibung: Simultaneous Localization And Mapping (SLAM) is a technique widely used by autonomous robots operating in unknown environments. Research community has developed numerous SLAM algorithms in the last ten years. Several works have presented many algorithms optimizations. However, they have not explored a system optimization from the system hardware architecture to the algorithmic development level. New computing technologies (SIMD coprocessors, DSP, multi-cores) can greatly accelerate the system processing but require rethinking the algorithm implementation. This paper presents an efficient implementation of the EKF-SLAM algorithm on a multi-processor architecture. The algorithm-architecture adequacy aims to optimize the implementation of the SLAM algorithm on a low-cost and heterogeneous architecture (implementing an ARM processor with SIMD coprocessor and a DSP core). Experiments were conducted with an instrumented platform. Results aim to demonstrate that an optimized implementation of the algorithm, resulting from an optimization methodology, can help to design embedded systems implementing low-cost multiprocessor architecture operating under real time constraints.
    Print ISSN: 1687-3955
    Digitale ISSN: 1687-3963
    Thema: Elektrotechnik, Elektronik, Nachrichtentechnik , Informatik
    Publiziert von Springer
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 66
    Publikationsdatum: 2012-12-11
    Beschreibung: The dependability deficiencies and bandwidth constraints of the controller area network (CAN) can prevent its use in safety-relevant and performance-demanding applications. This paper introduces mechanisms for fault detection and fault isolation based on an intelligent CAN router, which exploits a priori knowledge about the permitted behavior of attached electronic control units (ECUs) in order to detect and contain failures. Experiments using an FPGA-based implementation of the CAN router evaluate these mechanisms under different failure modes (e.g., timing failures, masquerading failures). Due to its compatibility to the CAN standard, the router can improve the dependability and performance of systems with existing ECUs. In addition, we extend the application areas of CAN to systems with higher performance and dependability requirements than can be supported with a conventional bus-based network.
    Print ISSN: 1687-3955
    Digitale ISSN: 1687-3963
    Thema: Elektrotechnik, Elektronik, Nachrichtentechnik , Informatik
    Publiziert von Springer
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 67
    Publikationsdatum: 2012-12-11
    Beschreibung: In this article we present an ASIP design for a discrete fourier transform (DFT)/discrete cosine transform (DCT)/finite impulse response filters (FIR) engine. The engine is intended for use in an accelerator-chain implementation of wireless communication systems. The engine offers a very high degree of flexibility, accepting and accelerating performance approaches that of any-number DFT and inverse discrete fourier transform, one and two dimension DCT, and even general implementations of FIR equations. Performance approaches that of dedicated implementations of such algorithms. A customized yet flexible redundant memory map allows processor-like access while maintaining the pipeline full in a dedicated architecture-like manner. The engine is supported by a proprietary software tool that automatically sets the rounding pattern for the accelerator rounder to maintain a required signal to quantization noise or output RMS for any given algorithm. Programming of the processor is done through a mid-level language that combines register-specific instructions with DFT/DCT/FIR specific-instructions. Overall the engine allows users to program a very wide range of applications with software-like ease, while delivering performance very close to hardware. This puts the engine in an excellent spot in the current wireless communications environment with its profusion of multi-mode and emerging standards.
    Print ISSN: 1687-3955
    Digitale ISSN: 1687-3963
    Thema: Elektrotechnik, Elektronik, Nachrichtentechnik , Informatik
    Publiziert von Springer
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 68
    Publikationsdatum: 2012-12-11
    Beschreibung: Embedded reconfigurable architectures are currently attracting increasing attention in the wireless communications industry due to the escalating number of wireless standards in today's market. Application specific instruction-set processors (ASIPs) present a reconfigurable solution that offers a compromise between programmability and low power consumption. In this article, the design and implementation of an embedded synchronization and acquisition ASIP for OFDM based systems is proposed. The engine architecture is presented and the programming model is explained in details. The proposed engine is scalable and it can be configured to support a multitude of synchronization algorithms and OFDM standards. While applicable to many OFDM systems, the proposed architecture was successfully verified on long term evolution (LTE Rel. 8) and WiMAX 802.16e systems. A partial list of synchronization and acquisition algorithms are tested on the engine for the two standards, and the results highlight the capabilities of the engine. The processor has been synthesized with 0.18μm standard cell CMOS library. It is estimated to occupy 1.1 mm2 and the projected power consumption is 7.9mW at 120 MHz, which meets the speed requirements of the tested standards. More results are included within the article.
    Print ISSN: 1687-3955
    Digitale ISSN: 1687-3963
    Thema: Elektrotechnik, Elektronik, Nachrichtentechnik , Informatik
    Publiziert von Springer
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 69
    Publikationsdatum: 2012-12-11
    Beschreibung: As the Editor-in-Chief, it is my pleasure to open this new Chapter in the development of EURASIP Journal on Embedded Systems.
    Print ISSN: 1687-3955
    Digitale ISSN: 1687-3963
    Thema: Elektrotechnik, Elektronik, Nachrichtentechnik , Informatik
    Publiziert von Springer
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 70
    Publikationsdatum: 2012-12-11
    Beschreibung: Conventional parametric stereo (PS) audio coding employs inter-channel phase difference and overall phase difference as phase parameters. In this article, it is shown that those parameters cannot correctly represent the phase relationship between the stereo channels when inter-channel correlation (ICC) is less than one, which is common in practical situations. To solve this problem, we introduce new phase parameters, channel phase differences (CPDs), defined as the phase differences between the mono downmix and the stereo channels. Since CPDs have a descriptive relationship with ICC as well as inter-channel intensity difference, they are more relevant to represent the phase difference between the channels in practical situations. We also propose methods of synthesizing CPDs at the decoder. Through computer simulations and subjective listening tests, it is confirmed that the proposed methods produce significantly lower phase errors than conventional PS, and it can noticeably improve sound quality for stereo inputs with low ICCs.
    Print ISSN: 1687-4714
    Thema: Elektrotechnik, Elektronik, Nachrichtentechnik
    Publiziert von Springer
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 71
    Publikationsdatum: 2012-12-11
    Beschreibung: In this paper, the authors propose an optimally designed fixed Beamformer (BF) for Stereophonic Acoustic Echo Cancellation (SAEC) in real hands-free communication applications. Several contributions related to the combination of beamforming and echo cancellation have appeared in the literature so far, but, up to the authors' knowledge, the idea of using optimal fixed BFs in a real-time SAEC system both for echo reduction and stereophonic audio rendering is firstly addressed in this contribution. The employment of such designed BFs allows positively addressing both issues, as the several simulated and real tests seem to confirm. In particular, the endorsement of audio stereo-recording quality attainable through the proposed approach has been preliminarily evaluated by means of subjective listening tests. Moreover, the overall system robustness against microphone array imperfections and noise presence has been experimentally evaluated. This allowed the authors to implement a real hands-free communication system in which the usage of the proposed beamforming technique has proved its superiority with respect to the usual two-microphone one in terms of echo reduction, and guaranteeing a comparable spaciousness effect.Moreover, the proposed framework requires a low computational cost increment with regard to the baseline approach, since only few extra filtering operations with short filters need to be executed. Nevertheless, according to the performed simulations, the BF-based SAEC configuration seems not to necessitate of the signal decorrelation module, resulting in an overall computational saving.
    Print ISSN: 1687-4714
    Thema: Elektrotechnik, Elektronik, Nachrichtentechnik
    Publiziert von Springer
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 72
    Publikationsdatum: 2012-12-11
    Beschreibung: The rapid spread in digital data usage inmany real life applications have urged new and effectiveways to ensure their security. Efficient secrecy can beachieved, at least in part, by implementing steganograhytechniques. Novel and versatile audio steganographicmethods have been proposed. The goal of steganographicsystems is to obtain secure and robust way to conceal highrate of secret data. We focus in this paper on digitalaudio steganography, which has emerged as a prominentsource of data hiding across novel telecommunicationtechnologies such as covered voice-over-IP, audioconferencing, etc. The multitude of steganographiccriteria has led to a great diversity in these system designtechniques. In this paper, we review current digitalaudio steganographic techniques and evaluate theirperformance based on robustness, security and hidingcapacity indicators. Another contribution of this paperis the provision of a robustness-based classification ofsteganographic models depending on their occurrencein the embedding process. A survey of major trends ofaudio steganography applications is also discussed inthis paper.
    Print ISSN: 1687-4714
    Thema: Elektrotechnik, Elektronik, Nachrichtentechnik
    Publiziert von Springer
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 73
    Publikationsdatum: 2012-12-11
    Beschreibung: Mood is an important aspect of music and knowledge of mood can be used as a basic feature in music recommender and retrieval systems. A listening experiment was carried out establishing ratings for various moods and a number of attributes, e.g., valence and arousal. The analysis of these data covers the issues of the number of basic dimensions in music mood, their relation to valence and arousal, the distribution of moods in the valence-arousalplane, distinctiveness of the labels, and appropriate (number of) labels for full coverage of the plane. It is also shown that subject-averaged valence and arousal ratings can be predicted from music features by a linear model.
    Print ISSN: 1687-4714
    Thema: Elektrotechnik, Elektronik, Nachrichtentechnik
    Publiziert von Springer
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 74
    Publikationsdatum: 2012-12-11
    Beschreibung: A vast amount of audio features have been proposed in the literature to characterize the content of audio signals. In order to overcome specific problems related to the existing features (such as lack of discriminative power), as well as to reduce the need for manual feature selection, in this article, we propose an evolutionary feature synthesis technique with a built-in feature selection scheme. The proposed synthesis process searches for optimal linear/nonlinear operators and feature weights from a pre-defined multi-dimensional search space to generate a highly discriminative set of new (artificial) features. The evolutionary search process is based on a stochastic optimization approach in which a multi-dimensional particle swarm optimization algorithm, along with fractional global best formation and heterogeneous particle behavior techniques, is applied. Unlike many existing feature generation approaches, the dimensionality of the synthesized feature vector is also searched and optimized within a set range in order to better meet the varying requirements set by many practical applications and classifiers. The new features generated by the proposed synthesis approach are compared with typical low-level audio features in several classification and retrieval tasks. The results demonstrate a clear improvement of up to 15--20% in average retrieval performance. Moreover, the proposed synthesis technique surpasses the synthesis performance of evolutionary artificial neural networks, exhibiting a considerable capability to accurately distinguish among different audio classes.
    Print ISSN: 1687-4714
    Thema: Elektrotechnik, Elektronik, Nachrichtentechnik
    Publiziert von Springer
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 75
    Publikationsdatum: 2012-12-11
    Beschreibung: Humans exhibit a remarkable ability to reliably classify sound sources in the environment even in presence of high levels of noise. In contrast, most engineering systems suffer a drastic drop in performance when speech signals are corrupted with channel or background distortions. Our brains are equippedwith elaborate machinery for speech analysis and feature extraction, understanding of which would presumably improve the performance of automatic speech processing systems under adverse conditions. The work presented here explores a biologically-motivated multi-resolution speaker informationrepresentation obtained by performing an intricate yet computationally-efficient analysis of the information-rich spectro-temporal attributes of the speech signal. We evaluate the proposed features in a speaker verification task performed on NIST SRE 2010 data. The biomimetic approach yields significant robustness in presence of non-stationary noise and reverberation, offering a new framework for deriving reliable features for speaker recognition and speech processing.
    Print ISSN: 1687-4714
    Thema: Elektrotechnik, Elektronik, Nachrichtentechnik
    Publiziert von Springer
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 76
    facet.materialart.
    Unbekannt
    Springer
    Publikationsdatum: 2012-12-11
    Beschreibung: We present our earlier results (not included in Hars and Petruska due to space and time limitations), as well as some updated versions of those, and a few more recent pseudorandom number generator designs. These tell a systems designer which computer word lengths are suitable for certain high-quality pseudorandom number generators, and which constructions of a large family of designs provide long cycles, the most important property of such generators. The employed mathematical tools could help assessing the bit-mixing and mapping properties of a large class of iterated functions, performing only non-multiplicative computer operations: SHIFT, ROTATE, ADD, and XOR.
    Print ISSN: 1687-3955
    Digitale ISSN: 1687-3963
    Thema: Elektrotechnik, Elektronik, Nachrichtentechnik , Informatik
    Publiziert von Springer
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 77
    Publikationsdatum: 2012-12-11
    Beschreibung: Wireless Sensor Networks (WSNs) require an extremely energy-efficient design. As sensor nodes carry limited power sources, the problem of autonomy is crucial. Energy harvesting provides a potential solution to this problem. However, as current energy harvesters produce only a small amount of energy and the storage capacity is limited, efficient power management techniques must also be considered. In this article we address the problem of modeling and simulating energy harvesting WSN nodes with efficient power management policies. We propose for that a framework that permits to describe and simulate an energy harvesting sensor node. A high level modeling approach based on the power consumption and the energy harvesting is proposed. The node architectural parameters as well as the on-line power management techniques can also be specified. Two novel power management architectures are then introduced taking into account energy-neutral and negative-energy conditions.Simulations results show that they can improve the throughput of a sensor node of about 50% compared to a state of the art power management algorithm for solar harvesting WSN. The simulation framework is then used to find an efficient system sizing for a solar energy harvesting WSN node.
    Print ISSN: 1687-3955
    Digitale ISSN: 1687-3963
    Thema: Elektrotechnik, Elektronik, Nachrichtentechnik , Informatik
    Publiziert von Springer
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 78
    Publikationsdatum: 2012-12-11
    Beschreibung: Dance movements are a complex class of human behavior which convey forms of non-verbal and subjective communication that are performed as cultural vocabularies in all human cultures. The singularity of dance forms imposes fascinating challenges to computer animation and robotics, which in turn presents outstanding opportunities to deepen our understanding about the phenomenon of dance by means of developing models, analyses and syntheses of motion patterns. In this article, we formalize a model for the analysis and representation of popular dance styles of repetitive gestures by specifying the parameters and validation procedures necessary to describe the spatiotemporal elements of the dance movement in relation to its music temporal structure (musical meter). Our representation model is able to precisely describe the structure of dance gestures according to the structure of musical meter, at different temporal resolutions, and is flexible enough to convey the variability of the spatiotemporal relation between music structure and movement in space. It results in a compact and discrete mid-level representation of the dance that can be further applied to algorithms for the generation of movements in different humanoid dancing characters. The validation of our representation model relies upon two hypotheses: (i) the impact of metric resolution and (ii) the impact of variability towards fully and naturally representing a particular dance style of repetitive gestures. We numerically and subjectively assess these hypotheses by analyzing solo dance sequences of Afro-Brazilian samba and American Charleston, captured with a MoCap (Motion Capture) system. From these analyses, we build a set of dance representations modeled with different parameters, and re-synthesize motion sequence variations of the represented dance styles. For specifically assessing the metric hypothesis, we compare the captured dance sequences with repetitive sequences of a fixed dance motion pattern, synthesized at different metric resolutions for both dance styles. In order to evaluate the hypothesis of variability, we compare the same repetitive sequences with others synthesized with variability, by generating and concatenating stochastic variations of the represented dance pattern. The observed results validate the proposition that different dance styles of repetitive gestures might require a minimum and sufficient metric resolution to be fully represented by the proposed representation model. Yet, these also suggest that additional information may be required to synthesize variability in the dance sequences while assuring the naturalness of the performance. Nevertheless, we found evidence that supports the use of the proposed dance representation for flexibly modeling and synthesizing dance sequences from different popular dance styles, with potential developments for the generation of expressive and natural movement profiles onto humanoid dancing characters.
    Print ISSN: 1687-4714
    Thema: Elektrotechnik, Elektronik, Nachrichtentechnik
    Publiziert von Springer
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 79
    Publikationsdatum: 2012-12-11
    Beschreibung: In this article, we present the evaluation results for the task of speaker diarization of broadcast news, which was part of the Albayzin 2010 evaluation campaign of language and speech technologies. The evaluation data consists of a subset of the Catalan broadcast news database recorded from the 3/24 TV channel. The description of five submitted systems from five different research labs is given, marking the common as well as the distinctive system features. The diarization performance is analyzed in the context of the diarization error rate, the number of detected speakers and also the acoustic background conditions. An effort is also made to put the achieved results in relation to the particular system design features.
    Print ISSN: 1687-4714
    Thema: Elektrotechnik, Elektronik, Nachrichtentechnik
    Publiziert von Springer
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 80
    Publikationsdatum: 2012-12-11
    Beschreibung: In this paper, we propose a speaker-dependent model interpolation method for statistical emotional speech synthesis. The basic idea is to combine the neutral model set of the target speaker and an emotional model set selected from a pool of speakers. For model selection and interpolation weight determination, we propose to use a novel monophone-based Mahalanobis distance, which is a proper distancemeasure between two Hidden Markov Model sets. We design Latin-square evaluation to reduce the systematic bias in the subjective listening tests. The proposed interpolation method achieves sound performance on the emotional expressiveness, the naturalness, and the target speaker similarity. Moreover, such performance is achieved without the need to collect the emotional speech of thetarget speaker, saving the cost of data collection and labeling.
    Print ISSN: 1687-4714
    Thema: Elektrotechnik, Elektronik, Nachrichtentechnik
    Publiziert von Springer
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 81
    Publikationsdatum: 2012-12-11
    Beschreibung: A new method to secure speech communication using the discrete wavelet transforms (DWT) and the fast Fourier transform is presented in this article. In the first phase of the hiding technique, we separate the speech high-frequency components from the low-frequency components using the DWT. In a second phase, we exploit the low-pass spectral proprieties of the speech spectrum to hide another secret speech signal in the low-amplitude high-frequency regions of the cover speech signal. The proposed method allows hiding a large amount of secret information while rendering the steganalysis more complex. Experimental results prove the efficiency of the proposed hiding technique since the stego signals are perceptually indistinguishable from the equivalent cover signal, while being able to recover the secret speech message with slight degradation in the quality.
    Print ISSN: 1687-4714
    Thema: Elektrotechnik, Elektronik, Nachrichtentechnik
    Publiziert von Springer
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 82
    Publikationsdatum: 2013-02-02
    Beschreibung: A comprehensive system for facial animation of generic 3D head models driven by speech is presentedin this article. In the training stage, audio-visual information is extracted from audio-visualtraining data, and then used to compute the parameters of a single joint audio-visual hidden Markovmodel (AV-HMM). In contrast to most of the methods in the literature, the proposed approach doesnot require segmentation/classification processing stages of the audio-visual data, avoiding the errorpropagation related to these procedures. The trained AV-HMM provides a compact representation ofthe audio-visual data, without the need of phoneme (word) segmentation, which makes it adaptableto different languages. Visual features are estimated from the speech signal based on the inversionof the AV-HMM. The estimated visual speech features are used to animate a simple face model. Theanimation of a more complex head model is then obtained by automatically mapping the deformationof the simple model to it, using a small number of control points for the interpolation. The proposedalgorithm allows the animation of 3D head models of arbitrary complexity through a simple setupprocedure. The resulting animation is evaluated in terms of intelligibility of visual speech throughperceptual tests, showing a promising performance. The computational complexity of the proposedsystem is analyzed, showing the feasibility of its real-time implementation.
    Print ISSN: 1687-4714
    Thema: Elektrotechnik, Elektronik, Nachrichtentechnik
    Publiziert von Springer
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 83
    Publikationsdatum: 2015-10-08
    Beschreibung: In this paper, a single-channel speech enhancement method based on Bayesian decision and spectral amplitude estimation is proposed, in which the speech detection module and spectral amplitude estimation module are included, and the two modules are strongly coupled. First, under the decisions of speech presence and speech absence, the optimal speech amplitude estimators are obtained by minimizing a combined Bayesian risk function, respectively. Second, using the obtained spectral amplitude estimators, the optimal speech detector is achieved by further minimizing the combined Bayesian risk function. Finally, according to the detection results of speech detector, the optimal decision rule is made and the optimal spectral amplitude estimator is chosen for enhancing noisy speech. Furthermore, by considering both detection and estimation errors, we propose a combined cost function which incorporates two general weighted distortion measures for the speech presence and speech absence of the spectral amplitudes, respectively. The cost parameters in the cost function are employed to balance the speech distortion and residual noise caused by missed detection and false alarm, respectively. In addition, we propose two adaptive calculation methods for the perceptual weighted order p and the spectral amplitude order β concerned in the proposed cost function, respectively. The objective and subjective test results indicate that the proposed method can achieve a more significant segmental signal-noise ratio (SNR) improvement, a lower log-spectral distortion, and a better speech quality than the reference methods.
    Print ISSN: 1687-4714
    Thema: Elektrotechnik, Elektronik, Nachrichtentechnik
    Publiziert von Springer
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 84
    Publikationsdatum: 2016-01-22
    Beschreibung: In woodwind instruments such as a flute, producing a higher-pitched tone than a standard tone by increasing the blowing pressure is called overblowing, and this allows several distinct fingerings for the same notes. This article presents a method that attempts to learn acoustic features that are more appropriate than conventional features such as mel-frequency cepstral coefficients (MFCCs) in detecting the fingering from a flute sound using unsupervised feature learning. To do so, we first extract a spectrogram from the audio and convert it to a mel scale. Then, we concatenate four consecutive mel-spectrogram frames to include short temporal information and use it as a front end for the sparse filtering algorithm. The learned feature is then max-pooled, resulting in a final feature vector for the classifier that has extra robustness. We demonstrate the advantages of the proposed method in a twofold manner: we first visualize and analyze the differences in the learned features between the tones generated by standard and overblown fingerings. We then perform a quantitative evaluation through classification tasks on six selected pitches with up to five different fingerings that include a variety of octave-related and non-octave-related fingerings. The results confirm that the learned features using the proposed method significantly outperform the conventional MFCCs and the residual noise spectrum in every experimental condition for the classification tasks.
    Print ISSN: 1687-4714
    Thema: Elektrotechnik, Elektronik, Nachrichtentechnik
    Publiziert von Springer
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 85
    Publikationsdatum: 2016-01-23
    Beschreibung: The goal of voice conversion is to modify a source speaker’s speech to sound as if spoken by a target speaker. Common conversion methods are based on Gaussian mixture modeling (GMM). They aim to statistically model the spectral structure of the source and target signals and require relatively large training sets (typically dozens of sentences) to avoid over-fitting. Moreover, they often lead to muffled synthesized output signals, due to excessive smoothing of the spectral envelopes.Mobile applications are characterized with low resources in terms of training data, memory footprint, and computational complexity. As technology advances, computational and memory requirements become less limiting; however, the amount of available training data still presents a great challenge, as a typical mobile user is willing to record himself saying just few sentences. In this paper, we propose the grid-based (GB) conversion method for such low resource environments, which is successfully trained using very few sentences (5–10). The GB approach is based on sequential Bayesian tracking, by which the conversion process is expressed as a sequential estimation problem of tracking the target spectrum based on the observed source spectrum. The converted Mel frequency cepstrum coefficient (MFCC) vectors are sequentially evaluated using a weighted sum of the target training vectors used as grid points. The training process includes simple computations of Euclidian distances between the training vectors and is easily performed even in cases of very small training sets.We use global variance (GV) enhancement to improve the perceived quality of the synthesized signals obtained by the proposed and the GMM-based methods. Using just 10 training sentences, our enhanced GB method leads to converted sentences having closer GV values to those of the target and to lower spectral distances at the same time, compared to enhanced version of the GMM-based conversion method. Furthermore, subjective evaluations show that signals produced by the enhanced GB method are perceived as more similar to the target speaker than the enhanced GMM signals, at the expense of a small degradation in the perceived quality.
    Print ISSN: 1687-4714
    Thema: Elektrotechnik, Elektronik, Nachrichtentechnik
    Publiziert von Springer
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 86
    Publikationsdatum: 2016-02-24
    Beschreibung: The goal of voice conversion is to modify a source speaker’s speech to sound as if spoken by a target speaker. Common conversion methods are based on Gaussian mixture modeling (GMM). They aim to statistically ...
    Print ISSN: 1687-4714
    Thema: Elektrotechnik, Elektronik, Nachrichtentechnik
    Publiziert von Springer
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 87
    Publikationsdatum: 2016-02-24
    Beschreibung: In order to solve these problems such as the demand of geographic information service and the short life of the embedded system, as well as network collapse, and so on, the embedded mobile crowd service system...
    Print ISSN: 1687-3955
    Digitale ISSN: 1687-3963
    Thema: Elektrotechnik, Elektronik, Nachrichtentechnik , Informatik
    Publiziert von Springer
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 88
    Publikationsdatum: 2016-03-02
    Beschreibung: Today, a large amount of audio data is available on the web in the form of audiobooks, podcasts, video lectures, video blogs, news bulletins, etc. In addition, we can effortlessly record and store audio data s...
    Print ISSN: 1687-4714
    Thema: Elektrotechnik, Elektronik, Nachrichtentechnik
    Publiziert von Springer
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 89
    Publikationsdatum: 2016-02-24
    Beschreibung: For improving the system performance of mobile Internet, how to provide the Quality of Experience (QoE) guarantee is an important factor. First, based on artificial neural network and adaptive cross-layer perc...
    Print ISSN: 1687-3955
    Digitale ISSN: 1687-3963
    Thema: Elektrotechnik, Elektronik, Nachrichtentechnik , Informatik
    Publiziert von Springer
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 90
    Publikationsdatum: 2016-02-24
    Beschreibung: Query-by-example spoken term detection (QbE STD) aims at retrieving data from a speech repository given an acoustic query containing the term of interest as input. Nowadays, it is receiving much interest due t...
    Print ISSN: 1687-4714
    Thema: Elektrotechnik, Elektronik, Nachrichtentechnik
    Publiziert von Springer
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 91
    Publikationsdatum: 2016-02-24
    Beschreibung: This paper addresses the design of embedded systems for outdoor augmented reality (AR) applications integrated to see-through glasses. The set of tasks includes object positioning, graphic computation, as well...
    Print ISSN: 1687-3955
    Digitale ISSN: 1687-3963
    Thema: Elektrotechnik, Elektronik, Nachrichtentechnik , Informatik
    Publiziert von Springer
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 92
    Publikationsdatum: 2016-02-24
    Beschreibung: In order to improve the intelligent degree and robustness optimization of power grid management system, the opportunistic embedded architecture was proposed for power network measurement with mobile service aw...
    Print ISSN: 1687-3955
    Digitale ISSN: 1687-3963
    Thema: Elektrotechnik, Elektronik, Nachrichtentechnik , Informatik
    Publiziert von Springer
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 93
    Publikationsdatum: 2016-02-25
    Beschreibung: It is known that the collection of the specific needs of mobile users and location management in an electronic commerce recommendation system are important indicators used to evaluate user satisfaction and sys...
    Print ISSN: 1687-3955
    Digitale ISSN: 1687-3963
    Thema: Elektrotechnik, Elektronik, Nachrichtentechnik , Informatik
    Publiziert von Springer
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 94
    Publikationsdatum: 2016-02-24
    Beschreibung: This paper presents SymRT, a tool based on a combination of symbolic execution and real-time model checking for timing analysis of Java systems. Symbolic execution is used for the generation of a safe and tight t...
    Print ISSN: 1687-3955
    Digitale ISSN: 1687-3963
    Thema: Elektrotechnik, Elektronik, Nachrichtentechnik , Informatik
    Publiziert von Springer
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 95
    Publikationsdatum: 2016-02-24
    Beschreibung: Indian classical music, including its two varieties, Carnatic and Hindustani music, has a rich music tradition and enjoys a wide audience from various parts of the world. The Carnatic music which is more popul...
    Print ISSN: 1687-4714
    Thema: Elektrotechnik, Elektronik, Nachrichtentechnik
    Publiziert von Springer
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 96
    Publikationsdatum: 2016-02-24
    Beschreibung: Unit selection based text-to-speech synthesis (TTS) has been the dominant TTS approach of the last decade. Despite its success, unit selection approach has its disadvantages. One of the most significant disadv...
    Print ISSN: 1687-4714
    Thema: Elektrotechnik, Elektronik, Nachrichtentechnik
    Publiziert von Springer
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 97
    Publikationsdatum: 2016-02-24
    Beschreibung: In woodwind instruments such as a flute, producing a higher-pitched tone than a standard tone by increasing the blowing pressure is called overblowing, and this allows several distinct fingerings for the same ...
    Print ISSN: 1687-4714
    Thema: Elektrotechnik, Elektronik, Nachrichtentechnik
    Publiziert von Springer
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 98
    Publikationsdatum: 2016-02-03
    Beschreibung: Unit selection based text-to-speech synthesis (TTS) has been the dominant TTS approach of the last decade. Despite its success, unit selection approach has its disadvantages. One of the most significant disadvantages is the sudden discontinuities in speech that distract the listeners (Speech Commun 51:1039–1064, 2009). The second disadvantage is that significant expertise and large amounts of data is needed for building a high-quality synthesis system which is costly and time-consuming. The statistical speech synthesis (SSS) approach is a promising alternative synthesis technique. Not only that the spurious errors that are observed in the unit selection system are mostly not observed in SSS but also building voice models is far less expensive and faster compared to the unit selection system. However, the resulting speech is typically not as natural-sounding as speech that is synthesized with a high-quality unit selection system. There are hybrid methods that attempt to take advantage of both SSS and unit selection systems. However, existing hybrid methods still require development of a high-quality unit selection system. Here, we propose a novel hybrid statistical/unit selection system for Turkish that aims at improving the quality of the baseline SSS system by improving the prosodic parameters such as intonation and stress. Commonly occurring suffixes in Turkish are stored in the unit selection database and used in the proposed system. As opposed to existing hybrid systems, the proposed system was developed without building a complete unit selection synthesis system. Therefore, the proposed method can be used without collecting large amounts of data or utilizing substantial expertise or time-consuming tuning that is typically required in building unit selection systems. Listeners preferred the hybrid system over the baseline system in the AB preference tests.
    Print ISSN: 1687-4714
    Thema: Elektrotechnik, Elektronik, Nachrichtentechnik
    Publiziert von Springer
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 99
    Publikationsdatum: 2016-02-04
    Beschreibung: In order to improve the intelligent degree and robustness optimization of power grid management system, the opportunistic embedded architecture was proposed for power network measurement with mobile service aware scheme. First, the mobile crowd sensing network for power grid management was proposed to realize the intelligent power grid management. Then, we designed the mobile service aware opportunistic embedded system based on the requirements of intelligent power grid management and deployment of mobile crowd sensing network. Thirdly, the grid of embedded systems was demonstrated for intelligent management. The experimental results show that the proposed scheme has obvious advantages in system complexity, execution efficiency, intelligent power grid management level, etc.
    Print ISSN: 1687-3955
    Digitale ISSN: 1687-3963
    Thema: Elektrotechnik, Elektronik, Nachrichtentechnik , Informatik
    Publiziert von Springer
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 100
    Publikationsdatum: 2015-12-02
    Beschreibung: Audio segmentation is important as a pre-processing task to improve the performance of many speech technology tasks and, therefore, it has an undoubted research interest. This paper describes the database, the metric, the systems and the results for the Albayzín-2014 audio segmentation campaign. In contrast to previous evaluations where the task was the segmentation of non-overlapping classes, Albayzín-2014 evaluation proposes the delimitation of the presence of speech, music and/or noise that can be found simultaneously. The database used in the evaluation was created by fusing different media and noises in order to increase the difficulty of the task. Seven segmentation systems from four different research groups were evaluated and combined. Their experimental results were analyzed and compared with the aim of providing a benchmark and showing up the promising directions in this field.
    Print ISSN: 1687-4714
    Thema: Elektrotechnik, Elektronik, Nachrichtentechnik
    Publiziert von Springer
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
Schließen ⊗
Diese Webseite nutzt Cookies und das Analyse-Tool Matomo. Weitere Informationen finden Sie hier...