Skip to main content
sgyuri2003
Associate II
May 4, 2012
Question

Speech Recognition

  • May 4, 2012
  • 19 replies
  • 3922 views
Posted on May 04, 2012 at 15:45

Hi All!!

Do you think

this

is possible

in the

Controller

speech recognition

or

speech

-driven applications

to prepare

?

    This topic has been closed for replies.

    19 replies

    Tesla DeLorean
    Guru
    May 4, 2012
    Posted on May 04, 2012 at 16:12

    Do you think

    this

    is possible

    in the

    Controller

    speech recognition

    or

    speech

    -driven applications

    to prepare

    ?

     

    Possible or practical? I doubt it's practical.

    How large is your application, and how much horse power does it need to operate effectively?

    Tips, Buy me a coffee, or three.. PayPal VenmoUp vote any posts that you find helpful, it shows what's working..
    frankmeyer9
    Associate III
    May 4, 2012
    Posted on May 04, 2012 at 17:56

    You probably can exclude everything below a Cortex M4F, assuming you mean spectral based signal analysis.

    It might be possible with some constraints and optimizations, but hardly with the quality of GHz - clocked processors with DP-FPU.

    sgyuri2003
    Associate II
    May 5, 2012
    Posted on May 05, 2012 at 12:08

    Thank you for your

    answer !

    http://www.ebv.com/fileadmin/products/Products/STM/STM32F4_Series/STM32_F-4_Series_marketing_presentation_Customer_presentation_.pdf

    Floating point Unit

    􀂃 Graphic acceleration: moves like rotations and so on...

    􀂃 Advanced algorithms: audio (voice recognition, pitch detection) or image

    processing

    􀂃 Direct Matlab interface: PC tools generate floating point code, directly

    portable on FPU. A fixed point device will require more care and adaptation.

    This text is

    purely

    marketing?

    DSP and

    FPU

    units

    do not help

    ?

    http://www.ebv.com/fileadmin/products/Products/STM/STM32F4_Series/STM32_F-4_Series_marketing_presentation_Customer_presentation_.pdf

    Tesla DeLorean
    Guru
    May 5, 2012
    Posted on May 05, 2012 at 13:38

    I'm sure it helps, but it is going to depend a lot on what YOUR algorithm requires in terms of speed and resources. Don't expect it to function like Siri or Dragon. Or that a 168 MHz Cortex-M4 is going to be able to grind numbers like a 3 GHz x86 with megabytes or gigabytes of RAM.

    I'm sure what marketing is probably talking about is something where someone grunts ON or OFF at the device and it can detect the different intonations, not something that can decipher a stream of speech on the fly.

    Tips, Buy me a coffee, or three.. PayPal VenmoUp vote any posts that you find helpful, it shows what's working..
    Andrew Neil
    Super User
    May 5, 2012
    Posted on May 05, 2012 at 15:31

    From: kovacs.janos:

    http://www.ebv.com/fileadmin/products/Products/STM/STM32F4_Series/STM32_F-4_Series_marketing_presentation_Customer_presentation_.pdf

    You're quoting a page there from

    EBV - so why don't you contact

    EBV and ask them to explain precisely what they mean by it?

    That is, after all, what technical

    Distributors are there for!

    ''We, as a semiconductor specialist, are being asked to provide in-depth application support, value-added services and logistics solutions to a diverse customer base. EBV Elektronik will embrace these challenges as we take advantage of our strong position within the electronic components industry and our highly skilled and motivated employees .... Our goal is to meet and exceed the needs of our customers and vendors like never before, providing access to a new level of resources in technical expertise and supply chain solutions. Don't expect anything else from us than the best in class service in semiconductor distribution.''

    Slobodan Puljarevic

    President & CEO EBV Elektronik

    http://www.ebv.com/en/about-ebv.html?ct_ref=m-6

    Go on - put them to the test!

    http://www.ebv.com/clear.gif

    A complex system that works is invariably found to have evolved from a simple system that worked.A complex system designed from scratch never works and cannot be patched up to make it work.
    frankmeyer9
    Associate III
    May 5, 2012
    Posted on May 05, 2012 at 17:43

    I have to agree with clive and neil.andrew - it depends a lot on the algorithms you intend to use. You might find several packages in source code for PCs, but all are usually based on DP at least.

    To get some PC-based algorithms ported to Cortex M4, you have to put heavy efforts in optimization, and you might have to sacrifice something to tune it for your application.

    On the other side, look at the TDA7590, for instance. That part delivers 120MIPS, with an architecture and instruction set tuned to exactly such applications. While the STM32F4 has theoretically a higher performance and higher numerical resolution (32 Bit SP float, compared to 24 Bit fixed point), it misses the highly optimized DSP engine and heavily tuned libs.

    With the surplus in raw MIPS, you might keep up with that DSP class.

    But, as mentioned, most important are working and efficient algorithms.

    sgyuri2003
    Associate II
    May 8, 2012
    Posted on May 08, 2012 at 07:56

    Remember the

    Erisson

    T39m

    named

    dinosaurus

    ?

    Dew

    weak

    hardware,

    an efficient and

    effective

    voice control

    .

    And 8051

    Core

    sensory

    ''neural

    network

    '' voice

    control

    device?

    The

    goal is not

    a

    dragon

    -featured

    audio

    control device

    .

    I believe that

    these

    resources

    can

    do something

    on this topic.

    Amen to that.

    :)

    sgyuri2003
    Associate II
    May 8, 2012
    sgyuri2003
    Associate II
    May 8, 2012
    Posted on May 08, 2012 at 09:09

    This is :

    http://www.ti.com/lit/an/spra100/spra100.pdf

    frankmeyer9
    Associate III
    May 8, 2012
    Posted on May 08, 2012 at 10:45

    That Microchip link is a good example for what I tried to suggest. It is a specialized solution, that obviously works well for the specific and narrow purpose it's designed for. As an example, it is fixed to english. If your project conditions match with the specs of this solution, it's okay. If not, you will have a problem with it.

    That lib uses obviously scaled integer arithmetics. It is really fast, but very unflexible. With an M4F you have the capability to have a more flexible solution - if your project budget and roadmap allows for that, and you have any use for it at all.