HEARTalk™ is a new natural response technology that introduces natural human voice inflections to computerized voice. It converts a flat computerized voice into a more natural human-like dialogue, so you might feel the 'HEART' from the machine, so to speak.

■ Mechanism

So what's missing in computerized voice, linguistically speaking? In a conversation between two people (i.e., a caller and a responder), the responder typically adjust a number of voice attributes--pitch, length of sounds, loudness or timbre (quality of sound)--to better match the tone of the caller. For example, if the caller were to speak in a cheerful manner, the responder would reply in a similar, bright voice. Conversely, if the caller spoke in a sorrowful voice, the responder would reply in a depressed manner. In other words, humans tend to express empathy in communication, changing the patterns of stress and intonation of their voice, or "prosody," to reflect the mood of the caller's utterance.


When it comes to computerized voice, unnaturalness arises when the response doesn't match the "prosody" of a human utterance, i.e., a low, dark intonation in response to a person talking happily, or a high and strong voice response to a caller talking in a low key manner.

Yamaha "HEARTalk™" solves this problem of unnaturalness, by analyzing human "prosody" in real time and generating a natural "prosody" suitable for response. For example, it returns natural phonetic sounds such as "yes" with natural "prosody" according to the input of the human questioning voice. It does not analyze human utterance "content" but rather operates only with "prosody" analysis processing, enabling it to operate with off-the-shelf systems, and with a modicum of processing.

■ Uses


1. As a supportive response system
You can make a machine that responds to your voice with natural "prosody". For example, we installed this system to talking and moving teddy bear 'Himitsu no Kuma-chan' (supported by T-ARTS Company,Ltd.). First you record what you want the teddy bear to say to you, such as "yes", "okay", "umm", "well", etc., in many "prosodies". Put the recordings into the HEARTalk™ and then connect the teddy bear to HEARTalk™. When you speak to the teddy bear, HEARTalk™ will select the right recording with right "prosody" for its response.

2. As a dialogue system
Yamaha believes that a more advanced spoken dialogue system can be constructed by linking a voice dialogue system consisting of speech recognition and speech synthesis with "HEARTalk™." To realize HEARTalk™'s full potential, the company is conducting collaborative research with FUTUREK Co., Ltd. and NTT TechnoCross Corporation. Through this collaboration, Yamaha aims to commercialize a new spoken dialogue system that recognizes human utterance "content" and responds in natural human-like "prosody".

Presentation slides for J-POP SUMMIT 2017

■ Contact Us

For more information, please contact us.