What's new

Tips and tricks for spoken words (not rap):

BDT

Making bits and bytes sound good...
I recently released a song with spoken words, and boy, prosody for spoken words is MUCH more work than for vocals.
Link to song: https://www.youtube.com/watch?v=8vBEgy9WnG8

To save you time and work, here are a few tricks I had to find for a reasonably believable performance:
  1. Duration of a word: Unlike singing, most spoken words have a lower limit, if your word is shorter/faster than that, it sounds like crap, or at least not real.
  2. The same applies to single syllables.
  3. The key of the text is not that important, you won't notice much difference if you go up or down a note.
    But: no matter where you put it, try to stay more or less there for the complete song to have more consistency in the sound.
  4. Human ears are VERY sensitive to small pitch changes (e.g. at the end of a sentence up for questions, down for statements, etc.).
    By far the easiest way to get this right is to record your own speech, listen to it, and adjust accordingly. Your mobile is sufficient to get the job done!
    It is not uncommon for even a short word to be spread over 4 or 5 different notes, see attached screenshot
Screenshot of the words "no solution":
Clipboard01.jpg

Feel free to add your own tips and tricks!
 
Last edited:
Thanks for these tips. I haven't had much success with the spoken word yet, but in my own observations, certainly in the English language, we raise the pitch to emphasize, and lower the pitch to de-emphasize words or syllables. We might also lower the pitch on running out of energy. Pitch variations seem to only need to be a semi-tone or a tone to be effective as these emphasizers. So finding a centre pitch for a specific voice, to use as a mono-tone, and then varying up or down a semi-tone, to a tone, to emphasize or de-emphasize, seems to be a good way to go. I haven't had much success using the Rap intonation parameter itself for this purpose, but maybe just needs fine tuning.

As I said, I haven't had much success with it myself, so take with a pinch of salt. :)
 
  • Like
Reactions: BDT
I haven't really tried to synthesize much talking, but an observation I've made between talking vs singing is that people speak with far higher tension and closed off vocal folds. I've noticed when people synthesize talking with singing synth software it always sounds like they're doing a stage actor style voice, it's really open, as though you're trying to project to the back of the theatre. I think raising the tension would go a long way to making it sound more natural.
 
Top Bottom