What's new

Reference Thread: An Exhausting List of Vocal Synth Engines

keisouP

New Member
Here's a thread where I attempt to make known many different vocal synthesis engines for the convenience of curious aspiring producers. Note that this post is as of January 2, 2024, unless I decide to edit it with new information. Much of this information comes from VocaDB and various wiki pages, but some of this I already knew myself.

One-time purchase:

VOCALOID by YAMAHA (please note that I'm counting voice packages via the wiki pages so the numbers may be a little off in some ways, and some voices have been discontinued but I'm counting them as historically available):
VOCALOID (aka VOCALOID1), 2004, 5 available voice packages (its 20th anniversary is this year!)
VOCALOID2, 2007, 22 available voice packages (incl. Kagamine Rin and Len Acts 1 and 2 as separate)
VOCALOID3, 2011, 42 available voice packages
VOCALOID4, 2014, 44 available voice packages
VOCALOID5, 2018, 10 available voice packages (counting the 4 default vocals as one package as they come together with the editor)
VOCALOID6, 2022, current engine, 6 voice packages available as of now (counting the 8 current default vocals as one package as they come with the editor)

Synthesizer V by Dreamtonics:
Synthesizer V (2018): "R1" edition, no longer sold; still has previously-available renewable year-long "evaluation", much longer than Vocaloid's two-week to one-month trial periods, 8 available voice packages
Synthesizer V Studio (2020): "R2" edition. Has free Basic edition with stripped-down features; "Lite" version voices are stripped down and not allowed commercial use. One free full voicebank, Yamine Renri, is available via a Google form (she might not be allowed for commercial use but is fully functioning), and Mai is available free with the purchase of the Pro editor. Currently has 60 voices available, counting Tsurumaki Maki's Japanese and English voicebanks as separate.

Emvoice One, 2019, 5 available voices that are sold individually (free tier with a 7-semitone vocal range available).

Chipspeech by Plogue: Retro-style vocal synthesizer emulating old sound chips, released 2015, 12 released voicebanks (Daisy is discontinued)

CeVIO by Techno-Speech (both speech and singing synthesis):
CeVIO Creative Studio (2013): Uses Hidden Markov Model (HMM); human-like tone for the time, but lots of engine noise.
CeVIO AI (2021): Uses deep neural network technology and is significantly more realistic than the Creative Studio version.

Cantor by VirSyn: Vocal synthesis engine released in 2004, shortly after the release of Vocaloid. Unlike Vocaloid, rather than few voicebanks sampled from human voices, there are many voices available that are artificially synthesized. Cantor 2 was released in 2007, in the same year VOCALOID2 was released. A demo version is available, but it requires the purchase of a software protection dongle. Cantor is still available to purchase but no longer being updated.

LaLaVoice by TOSHIBA: A speech and singing synthesis engine released in 2001. Its singing synthesis side, LaLaSong, uses a sheet music layout rather than a piano roll. It is known for being used in voicing Vibri in the PS1 video game Vib-Ribbon.

Virtual Singer by Myriad: An early singing synthesis engine released in 2000 by Myriad, the makers of Harmony Assistant; it itself is an add-on for Melody Assistant. I believe it is one of the first to sample the human voice (as opposed to generating a "voice" from scratch), an approach Vocaloid would be famous for taking later?

Realivox by Realitone: A Kontakt-based vocal synthesizer released in 2012, coming in two forms: The Ladies (a group of singers who can sing in vocables rather than full English) and Blue (a single singer who can sing preset phrases as well as entered English lyrics).

Piapro Studio (VOCALOID version) by Crypton Future Media: A vocal synthesizer using the Vocaloid API released in 2013 with Crypton's "KAITO V3" package, for use with its V3 (and later V4X) vocals; support for other Vocaloids was added a little later. Piapro Studio for V4X was released in 2015 with Megurine Luka V4X. Most vocals come with a VSTi version that is compatible with all Vocaloids up to VOCALOID4 (or VOCALOID3 if only V3 Crypton vocals are owned); Hatsune Miku V4 Chinese includes a standalone variant, but it can only be used with that vocal.

Piapro Studio NT (New Type) by Crypton Future Media: A vocal synthesizer engine released in 2020 out of Crypton's dissatisfaction with the Vocaloid engine's sound. So far only Hatsune Miku NT is available, but 5 other vocals are being tested in the SEGA rhythm game Project SEKAI: Colorful Stage ft. Hatsune Miku (known worldwide as Hatsune Miku: Colorful Stage). As of now it is only available in Japanese. It uses a resampling synthesis method involving very short samples, and is more robotic and "crunchy" sounding than Miku's original Vocaloid versions. For those who dislike the sound of NT, the older versions of Hatsune Miku are still available alongside the new version.

Subscription:

ACE Studio by Beijing Timedomain Technology, 2022 (started out free, became paid via subscription in 2023), 40 available voices and counting

Pocket Singer by ACCIDENTAL AI, mobile, 2023 (also started out free but became subscription-based, from the developers of ACE), practically unlimited "original" voices made by mixing "voice seeds"

Freemium/partially free:

VoiSona (fmr. CeVIO Pro) by Techno-Speech: Editor and one voice (Chis-A Japanese) are free, other voices are either bought through a one-time purchase or used through a paid subscription. Singing and speech versions available. 11 "song" voices available, 6 being VoiSona exclusives and the other 5 being CeVIO AI ports.

Maghni AI by VocaTone and Misbah Studios: Currently-upcoming American vocal synth engine promising an advanced AI voice synthesis model and many new features, including a promise of support of around 40 languages. 20 voices teased as of now, 2 being free test vocals while the other 18 seem to be paid. Similar to VoiSona, the editor itself and the aforementioned test vocals will be free. It was partly funded via a crowdfund; the goal has been reached, but people are still welcome to contribute.

Splash Pro: An AI music generator with a couple of available singing voices utilized. The plug-in version was discontinued, but it is still available as a website.
 
Last edited:
Free:

UTAU by Ameya/Ayame: Free-of-charge vocal synthesizer (optional shareware version available), available since 2008; it notoriously allows users to create their own voicebanks by recording and configuring their voices. Practically infinite amount of freely-available voicebanks, probably in every available language one could think to make a voicebank for, more being made every day I'm sure. There are paid UTAU but most of them are discontinued and/or rare by now.

Synthesizer U by Ameya/Ayame: Successor/alternate editor for UTAU currently being developed by the software's original creator, similar in appearance (and name) to Synthesizer V. Looks to be more user-friendly than the original UTAU.

Sharpkey by Boxstar: Chinese vocal synthesizer similar in layout to VOCALOID2, released in 2016. Had an original editor and a "Galaxy" edition. Seven released characters, one of which was bilingual with a Chinese and Japanese voicebank.

DeepVocal by Boxstar: Chinese vocal synthesizer and successor to Sharpkey, released in 2019. Unlike Sharpkey, users are able to make their own voicebanks. Built for Chinese, and most other voicebanks are in languages such as Japanese and Korean, as languages like English are difficult to develop for with DeepVocal's voicebank creation process.

NEUTRINO by SHACHI: Neural-network singing voice synthesizer, released in 2020, one of the earliest popular AI-powered vocal synthesizers. Currently has 10 free voices, and 3 voices bundled with paid A.I.voice talk synthesizer voices. Currently Japanese-only with no native UI, meant to convert MusicXML files into singing; a fan-made UI is available on GitHub.

NNSVS by Ryuichi Yamamoto: Open-source neural-network singing voice synthesizer released in 2020, similar in function to NEUTRINO. Users are able to create their own AI voicebanks with the engine. Many use it via the UTAU plugin ENUNU as it does not natively have a user interface.

DiffSinger by MoonInTheRiver: Chinese AI singing voice synthesis released in 2022 using a shallow diffusion mechanism, users are able to train their own voicebanks for the engine like with NNSVS. Can be used via OpenUTAU.

OpenUtau by stakira: Open-source frontend for the UTAU, NNSVS/ENUNU and DiffSinger engines (possibly more engines in the future)? Released in 2021, aims to be much more user-friendly as well as developer-friendly than the original UTAU editor. Includes "phonemizers" for ease of input, which is especially invaluable for phonetically-complex languages like English.

UTSU by titinko: A Japanese frontend for UTAU.

CeVIO Creative Studio FREE by Techno-Speech: Original free version of CeVIO, released in 2013, 1 available vocal. Pitch bend and some other features are unsupported. No longer available on the website, but will work if downloaded, though it will tell you that this version is obsolete and that you should buy the full version.

NIAONiao, MUTA (discontinued), and AISingers: Released in 2011, 2015 and 2020, respectively. Chinese vocal synthesizers similar in function to UTAU, CeVIO, and...ok, well, the last one's pretty unique as it's a cloud platform, I guess it could be compared to NEUTRINO if anything? For the latter, a GUI called AISingers Studio, similar in look to NIAONiao, is available. NIAONiao allows users to create their own voicebanks. MUTA is no longer available, but five vocals were available for it, and three would have been upcoming before its discontinuation. AISingers currently has 32 vocals listed on its website, some being under testing.

Symphoneme by TechnoBrave: A web-based "converter" singing voice synthesis program released in 2023, with no UI but a story to go along with it. Unfortunately, it is infamous for poor-quality, glitchy vocals.

AquesTone by AQUEST: A VST-based MIDI vocal synthesizer; Japanese only, 32-bit only, uses artificially-generated vocals. Its speech-synthesis counterpart, AquesTalk, is the basis for UTAU's default vocal.

Sinsy: A web-based "converter" singing voice program using MusicXML files. Has 9 voices; five are HMM-based, and four are DNN/AI-based. It is one of the first known vocal synthesizers to utilize AI in this way.

RenoidPlayer: A web-based sequencer-style singing synthesizer using voices in the form of soundfont files. It does not have available pitch tuning, but it has a few adjustable parameters at the top, such as vibrato, portamento and humanization.

PaintVoice by Kumao.Works: A mobile singing synthesizer with a MIDI sequencer interface. It is very simple in quality, and simple (CV-style) UTAU can be imported.

Web Synthesizer V: A web-based demo of Synthesizer V released before Synthesizer V Studio as a test. It had two available voices.

Project Vogen: A free Cantonese singing synthesis program available on Github. I...don't know very much about this one.

SugarCape/SaltCase (discontinued): A singing voice synthesis engine for Mac.

VOCALINA (discontinued): A Korean singing synthesis engine with two vocals available: Khylin and VORA. VORA was free, and Khylin was paid, but VORA was discontinued first. It is available as abandonware from some sites.

VocalSharp: A Chinese vocal synthesizer that allows the creation of your own voicebank, similar to UTAU and DeepVocal. Released in 2021, it is one of the newest concatenative vocal synthesis engines (that I can think of anyway).

X Studio Singer: A Chinese AI vocal synthesizer by Microsoft, released in 2020. It launched with four voicebanks/ Many features, including many voicebanks and the ability to export WAV files (fans have circumvented this by using a screen recorder), are locked behind the entry of a Chinese phone number.

Geji Geji: A free Chinese AI vocal synthesizer released in 2021 with a PC and mobile edition. It allows the creation of one's own voicebanks and claims to use AI technology to help users write songs.

VOCALOID Beta Studio (exclusive and temporary): Users who register for this program can be drawn from a lottery to get this engine for free. It was designed to test future features. It is a VST that uses a DAW's MIDI input; several parameters are available to adjust. Services will end in late March/early April. It includes 12 regular vocals and one multi-vocal. Cross-lingual capabilities are supported, at least between Japanese and English (I haven't seen Chinese).

Alter/Ego by Plogue: A MIDI-based vocal synth using input from your DAW or a MIDI controller. Voices are more modern-sounding than Chipspeech but not terribly realistic in general. 3 officially available voicebanks, 2 discontinued voicebanks and 1 controversial but available voicebank.

Unsure:
ELMIRAIVE VOX/Singer by Bandai Namco: An upcoming AI singing voice synthesizer by Bandai Namco. Has 2 voices announced.

VocalWriter by KAE Labs (discontinued): An early vocal synthesizer for Mac released in 1998, with the second edition released in 2005. I assume it was paid? It is available in some places as abandonware.
 
Last edited:
I have tried, or own, quite a few (Emvoice, Ace Studio, Eastwest Hollywood Choirs and Hollywood Backup Singers, Realitone Ladies and Blue) and so far I find I can do much more, more quickly, and much more realistic vocals with Synthesizer V.

I keep my eye on other companies in this emerging technology, and suspect someone might come out with something even better, but I haven't found anything else yet. It's good to have this list, and I hope it gets updated in the future.
 
can anyone confirm that SynthV is actually the best technology available?
It depends on what you are wanting to do. For many popular styles Synth V is great. But it doesn’t sing classical well. On the whole the women seem a bit better than the men.
 
For many popular styles Synth V is great. But it doesn’t sing classical well.
I agree that Synth V is aimed at the commercial market (not classical) for obvious economic reasons. But there is no reason why, with a bit of programming, it can't be used to make a convincing classical vocal, with a reasonable amount of work. First you need to abandon quantisization (of both the vocal and accompaniment.) Second you need to have knowledge of, and the ability to program, classical style embellishments, glissandos, melisimas, vibrato, etc. and probably be good at Italian, Latin, German and French for that matter. Then you probably need to use RVC (voice conversion) to get the tone right, at least for the women, who don't have either the tone or range for classical.

But if you use the new Audio-to-MIDI conversion feature on great classical singers you can hear Synth-V sound totally convincing. That, of course, would be cheating, but a way to study how the program can achieve a classical style. Tell me your favorite classical vocal piece and I will send to you Synth V singing it and you will be amazed.
 
I agree that Synth V is aimed at the commercial market (not classical) for obvious economic reasons. But there is no reason why, with a bit of programming, it can't be used to make a convincing classical vocal, with a reasonable amount of work.
If you have an example that you think works I’d love to hear it. I’ve listened to tons of SynthV examples purporting to sing classical or opera and nothing I’ve heard supports a classical tone longer than a note or two. No one can even sing a phrase and not fall out of style. Well Asterian can sing syllables ok, but that’s it.

If you really want to try this, I’ll find a short example for you. What do you need? Score, audio example, midi?
 
Synth V Singing in a Classical Style

With the audio-to-midi converter in Synth V, in about 5 minutes Solaria sounds like this:

View attachment Solaria - Mandoline Op 58.mp3
Solaria - Mandoline Op 58 (Gabriel Fauré)

This was done automatically, so the phonetic detection probably sounds un-intelligible to a native speaker, but that can be tweaked. Right now it is probably gibberish, but operatic in style. Or perhaps it can be substituted with English, because this song cycle has been translated.

The point is you are hearing Solaria 100%, but her pitch curves have been modeled by Leontyne Price. Everything you hear theoretically could be done by programming, but you would have to study many examples like this inside Synth V to see exactly what is being done in vibrato, melisma, note-approach, and so forth. Here is a screen grab:
Solaria pitch curve.jpg

Of course, once you have the right pitch curves and vibrato, you can transpose it and change the voice. For example, here is Asterian singing the same thing:

View attachment Asterian - Mandoline(2).mp3
Asterian - Mandoline
 
Last edited:
Top Bottom