Yonas Addisu - Software Engineer | Full Stack Developer

AI video generation is no longer a futuristic concept-it's here. In building VidGen, I explored how combining powerful LLMs like Gemini with specialized media models can create a unified content creation pipeline.

The Pipeline

Our system uses a multi-stage process:

Scripting: Gemini generates the narrative.
Audio: Deepgram and specialized TTS models handle voiceovers.
Visuals: Hugging Face models generate or process video frames.
Assembly: Inngest handles the background orchestration of these heavy tasks.

Lessons Learned

Handling long-running media tasks requires robust state persistence. Using Convex allowed us to maintain a real-time reactive UI while heavy processing happened in the background.

The future of content creation is collaborative, where the engineer builds the "director" that empowers users to create.

The Future of AI Video Generation

The Pipeline

Lessons Learned