首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到18条相似文献,搜索用时 296 毫秒
1.
本文通过实验对比,在语音识别的特征参数方面进行了有效的改进,创新内容是改善Mel频谱倒谱系数(MFCC),将12阶Mel频谱倒谱系数减为11阶,通过实验证明,改进后的参数有效提高了实验的识别率。实验主要采用删减特征分量的方法研究MFCC各阶参数对非特定人特定语音识别的贡献,并通过大量重复性实验得出验证,不同的参数选择对语音识别确实有不同的贡献,而且针对不同的语本模型,贡献也不同。  相似文献   

2.
说话人识别是当前语音识别的研究热点之一。本文主要研究了以下几个方面:说话人语音识别系统,对能够反映人对语音感知特性的Mel频率倒谱系数(MFCC)作为特征参数进行提取。同时,分析了概率神经网络PNN,概率神经网络是性能良好的分类神经网络。实验结果表明,概率神经网络PNN对训练的语音样本有着很高的分类准确率。  相似文献   

3.
徐春辉 《科技广场》2007,(5):208-210
通过分析语音特征参数的特点和说话人识别的基本方法,以线性预测倒谱系数为特征参数提取算法以及隐马尔可夫模型为建模算法,利用凌阳单片机作硬件平台,实现了声控锁的语音控制功能。实验结果表明,系统性能稳定,识别效果良好。  相似文献   

4.
语音信号的特征提取是语音识别中重要的环节之一,特征提取是否准确决定着语音识别的识别率,不同的语音信号有着不同的特征提取方法,本文针对安多藏语的语音特征,进行线性预测分析,对线性预测余量信号通过感觉加权滤波后重新提取特征,使之具有更高的精确度,更好的稳健性。  相似文献   

5.
一种基于改进的LPC参数倒谱分析的说话人识别方法   总被引:2,自引:0,他引:2  
王婧  朱黎 《大众科技》2008,(8):28-29
线性预测倒谱LPCC在说话人识别中已被广泛使用,文章以LPCC为基础进行Mel变换,得到新的特征参数LPMCC,一次作为说话人识别系统的特征参数,并在识别部分采用VQ和HMM相结合的方法进行建模和识别,实验证明该方法提高了系统的识别率。  相似文献   

6.
藏族的主要语种,基于其上的声纹识别技术具有重大的研究意义;而在声纹识别过程中,语音特征参数的选择和精确度直接影响了声纹识别的准确率。文章针对藏语声纹识别的需要,选取MFCC为特征参数,对藏语语音的特征提取进行了研究和实践。  相似文献   

7.
文章介绍了语音识别的基本原理以及用DSK6713实现语音识别算法的一些原则和方法,阐述了语音识别在DSP上的实现技术。系统使用梅尔倒谱系数(MFCC)作为特征参数,采用算法相对简单以及计算量较小的动态时间弯折算法(DTW)实现语音参数的匹配。用MATLAB实现DTW算法的仿真,进而将语音识别技术应用到DSP上,实验结果表明对特定人、小词汇量和孤立词的语音识别效果比较好。  相似文献   

8.
为了适应强噪声环境下的语音识别,进行了基于美尔倒谱系数特征及隐马尔可夫模型的识别算法研究,主要对提取语音信号的线性预测系数、端点检测、语音特征参数提取、语音算法识别流程等进行了初步研究,并进行了说话人识别系统的仿真验证。  相似文献   

9.
语音识别技术已经取得令人鼓舞的成就,市场上也出现了许多相对成熟的语音识别产品,但是大部分语音识别系统仍局限于特定的环境,距离真正的实用化还相差很远。本文以提高语音识别系统的鲁棒性为目标,进行了相关的实验和研究。  相似文献   

10.
系统以16位数字信号处理器TMS320VC5502为核心,采用音频Codec芯片TLV320AIC23对语音信号进行采集和编码转换,通过端点检测、特征参数提取、DTW算法等关键技术实现特定人、小词汇量、孤立词的语音识别,最终根据LED闪灯次数检测数字0~9的识别结果。  相似文献   

11.
在噪声鲁棒语音识别研究中,使用并行模型结合(parallel model combination, PMC)方法得到的模型理论上能够接近匹配噪声环境模型的性能,故成为噪声鲁棒语音识别的重要研究方向。本文首先提出了一种基于前后向差分动态参数的特征MFCC_FWD_BWD,该特征满足PMC对特征构造矩阵可逆的要求。在此基础上,提出了一种用于PMC的新模型——并行子状态隐马尔可夫模型(parallel sub-state hidden Markov model, PSSHMM),该模型每个状态包含平行关系的子状态,且子状态间存在转移关系。实验表明,PSSHMM模型在各种噪声和SNR下取得了较好的识别效果,特别是对于非平稳噪声,其鲁棒性能非常显著。  相似文献   

12.
郭莉莉 《科技广场》2010,(1):150-153
随着语音识别技术的发展,孤立词、小词汇量的语音识别系统在日常生活中得到广泛应用,本文提出了一种基于DSP的孤立词实时语音识别系统,并将动态时间规整技术运用到识别算法中。根据楼宇控制系统的特点,结合BACnet网络协议,把系统设计成BACnet设备的一个嵌入式子系统,从而把语音识别应用到楼宇控制系统中。结合了系统硬件速度快、算法高效的特点,实现了对楼宇更加实时、方便的控制。  相似文献   

13.
张旺俏 《中国科技信息》2007,28(7):124-125,127
采用能够反映人对语音的感知特性的Mel频率倒谱系数(MFCC)作为语音的特征参数,研究了基于MFCC的VQ的识别方法,对单独使用MFCC与使用MFCC和AMFCC结合的识别率进行比较,实验结果表明通过对说话人的特征参数进行倒谱提升之后,MFCC和△MFCC结合能更好地区分不同说话人。  相似文献   

14.
In this paper we present novel ensemble classifier architectures and investigate their influence for offline cursive character recognition. Cursive characters are represented by feature sets that portray different aspects of character images for recognition purposes. The recognition accuracy can be improved by training ensemble of classifiers on the feature sets. Given the feature sets and the base classifiers, we have developed multiple ensemble classifier compositions under four architectures. The first three architectures are based on the use of multiple feature sets whereas the fourth architecture is based on the use of a unique feature set. Type-1 architecture is composed of homogeneous base classifiers and Type-2 architecture is constructed using heterogeneous base classifiers. Type-3 architecture is based on hierarchical fusion of decisions. In Type-4 architecture a unique feature set is learned by a set of homogeneous base classifiers with different learning parameters. The experimental results demonstrate that the recognition accuracy achieved using the proposed ensemble classifier (with best composition of base classifiers and feature sets) is better than the existing recognition accuracies for offline cursive character recognition.  相似文献   

15.
Using an acoustic vector sensor (AVS), an efficient method has been presented recently for direction of arrival (DOA) estimation of multiple speech sources via the clustering of the inter-sensor data ratio (AVS-ISDR). Through extensive experiments on simulated and recorded data, we observed that the performance of the AVS-DOA method is largely dependent on the reliable extraction of the target speech dominated time–frequency points (TD-TFPs) which, however, may be degraded with the increase in the level of additive noise and room reverberation in the background. In this paper, inspired by the great success of deep learning in speech recognition, we design two new soft mask learners, namely deep neural network (DNN) and DNN cascaded with a support vector machine (DNN-SVM), for multi-source DOA estimation, where a novel feature, namely, the tandem local spectrogram block (TLSB) is used as the input to the system. Using our proposed soft mask learners, the TD-TFPs can be accurately extracted under different noisy and reverberant conditions. Additionally, the generated soft masks can be used to calculate the weighted centers of the ISDR-clusters for better DOA estimation as compared to the original center used in our previously proposed AVS-ISDR. Extensive experiments on simulated and recorded data have been presented to show the improved performance of our proposed methods over two baseline AVS-DOA methods in presence of noise and reverberation.  相似文献   

16.
Language modeling (LM), providing a principled mechanism to associate quantitative scores to sequences of words or tokens, has long been an interesting yet challenging problem in the field of speech and language processing. The n-gram model is still the predominant method, while a number of disparate LM methods, exploring either lexical co-occurrence or topic cues, have been developed to complement the n-gram model with some success. In this paper, we explore a novel language modeling framework built on top of the notion of relevance for speech recognition, where the relationship between a search history and the word being predicted is discovered through different granularities of semantic context for relevance modeling. Empirical experiments on a large vocabulary continuous speech recognition (LVCSR) task seem to demonstrate that the various language models deduced from our framework are very comparable to existing language models both in terms of perplexity and recognition error rate reductions.  相似文献   

17.
基于TD-PSOLA算法的汉语普通话韵律合成   总被引:6,自引:0,他引:6  
结合汉语普通话的韵律特征,采用TD-PSOLA算法实现了汉语普通话的韵律合成,并对合成语音和原始语音的韵律参数作了比较分析。实验结果表明,这种方法能够有效地控制语音韵律参数,实现较高质量的语音韵律合成。  相似文献   

18.
Acoustic feature selection for automatic emotion recognition from speech   总被引:1,自引:0,他引:1  
Emotional expression and understanding are normal instincts of human beings, but automatical emotion recognition from speech without referring any language or linguistic information remains an unclosed problem. The limited size of existing emotional data samples, and the relative higher dimensionality have outstripped many dimensionality reduction and feature selection algorithms. This paper focuses on the data preprocessing techniques which aim to extract the most effective acoustic features to improve the performance of the emotion recognition. A novel algorithm is presented in this paper, which can be applied on a small sized data set with a high number of features. The presented algorithm integrates the advantages from a decision tree method and the random forest ensemble. Experiment results on a series of Chinese emotional speech data sets indicate that the presented algorithm can achieve improved results on emotional recognition, and outperform the commonly used Principle Component Analysis (PCA)/Multi-Dimensional Scaling (MDS) methods, and the more recently developed ISOMap dimensionality reduction method.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号