Multimodal Generative AI Demystified

Multimodal generative AI has recently seen significant advancements, enabling the creation of realistic images, videos, and audio from textual or other inputs. However, due to the complexity of these models, understanding how they function and how to apply them in practical settings can be challenging. During this talk, Ekaterina will shed light on the inner workings of multimodal generative AI models by discussing key concepts and techniques used in their development. She will also explore various applications and use cases of this technology. The talk is intended for anyone interested in the current state of AI and its potential to produce realistic and immersive multimedia experiences.

Biography

Ekaterina specializes in leveraging AI techniques, such as multimodal generative AI and large language models, to tackle computer vision and language processing challenges. She is skilled in end-to-end AI productization, encompassing the entire process from development to optimized deployment, whether it be in the cloud or at the edge. Previously, Ekaterina was a research engineer applying deep learning to medical image analysis. She has also authored several peer-reviewed journals and conference publications on various applications of image-based 3D reconstruction, localization, and tracking. Ekaterina received her Ph.D. in Computer Science and M.Sc in Media Informatics; she also holds a Diploma in Business Informatics.