Multimodal & Creative

3 concepts

Multimodal AI

Multimodal AI systems process and generate multiple data types — text, images, audio, video — within a single model, enabling cross-modal understanding and creation.

Beginner

Multimodal & Creative

Speech AI

Speech AI covers technologies for converting speech to text (STT), text to speech (TTS), voice cloning, and speech translation, enabling natural voice interaction with AI.

Beginner

Multimodal & Creative

Text-to-Image Generation

Text-to-image generation uses AI models to create images from natural language descriptions, powered by diffusion models in tools like Midjourney, DALL-E, and Stable Diffusion.