Amazon Unveils 980M Parameter LLM with ‘Emergent Abilities’ in Latest Training

Amazon’s New Large Language Model (LLM) for Text-to-Speech

Introduction

Researchers at Amazon have trained a new large language model (LLM) for text-to-speech that they claim exhibits “emergent” abilities. The 980 million parameter model, called BASE TTS, is the largest text-to-speech model yet created. The researchers trained models of various sizes on up to 100,000 hours of public domain speech data to see if they would observe the same performance leaps that occur in natural language processing models once they grow past a certain scale.

Performance of the Model

  • Medium-sized 400 million parameter model showed a marked improvement in versatility and robustness on tricky test sentences
  • The test sentences contained complex lexical, syntactic, and paralinguistic features like compound nouns, emotions, foreign words, and punctuation that normally trip up text-to-speech systems.
  • BASE TTS made significantly fewer errors in stress, intonation, and pronunciation than existing models.

Model’s Abilities

The researchers found that these sentences are designed to contain challenging tasks none of which BASE TTS is explicitly trained to perform. The largest 980 million parameter version of the model – trained on 100,000 hours of audio – did not demonstrate further abilities beyond the 400 million parameter version. This indicates that the growth in performance eventually plateaus.

Lightweight and Streamable Model

The model is designed to be lightweight and streamable, packaging emotional and prosodic data separately. This could allow the natural-sounding spoken audio to be transmitted across low-bandwidth connections.

Further Research

While an experimental process, the creation of BASE TTS demonstrates these models can reach new versatility thresholds as they scale—an encouraging sign for conversational AI. The researchers plan further work to identify optimal model size for emergent abilities.

Conclusion

Amazon’s new large language model (LLM) for text-to-speech, known as BASE TTS, demonstrates significant progress in handling complex lexical, syntactic, and paralinguistic features in text-to-speech systems. The model is designed to be lightweight and streamable, making it suitable for transmission across low-bandwidth connections. These findings are encouraging for further development in conversational AI.

Resources

You can find the full BASE TTS paper on arXiv here.

Related Topics

OpenAI rolls out ChatGPT memory to select users

Upcoming Events

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London.

Tags

ai, Amazon, artificial intelligence, base tts, conversational ai, large language model, LLM

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *