Multimodal & Creative
3 concepts

Beginner
Multimodal & Creative
Multimodal AI
Multimodal AI systems process and generate multiple data types — text, images, audio, video — within a single model, enabling cross-modal understanding and creation.

Beginner
Multimodal & Creative
Speech AI
Speech AI covers technologies for converting speech to text (STT), text to speech (TTS), voice cloning, and speech translation, enabling natural voice interaction with AI.

Beginner
Multimodal & Creative
Text-to-Image Generation
Text-to-image generation uses AI models to create images from natural language descriptions, powered by diffusion models in tools like Midjourney, DALL-E, and Stable Diffusion.