Applying Foundational AI Models: Our Journey with Gan.ai
We believe the recent flurry of attention on India’s emerging LLMs is a spark for what is already a gathering storm of energy and ideas.
Anandamoy Roychowdhary
Published June 5, 2023
A number of founders have asked us how much of a role India’s much-vaunted India tech sector has played so far in global AI innovation – and what lessons can be learned from the folks who have already attempted this. We have caught a glimpse of this through the early chapters of Suvrat Bhooshan’s journey with his company Gan.ai.
For the last two years, the Gan.ai team has been creating generative AI tools to help brands to build deeper connections with their customers by creating highly personalized videos at scale. The company’s vision is to build completely personalized videos, in many different languages with perfect voice and lip sync and pitch transfer, to help drive communication.
Video is amazing. The level of engagement that the average person has with the Internet has shot up significantly since video became dominant, be it YouTube, Snapchat, TikTok or other platforms. Yet video is embarrassingly sterile. It is a one size fits all, one-to-many form of communication that doesn’t react to its consumer much. Even base personalization on video metadata can have a stunning impact on engagement.
Now imagine a world where the message is personalized to you – referring to your name, to the products you wish to buy, which flights you want to take, what story you want to hear. The future is personalized video.
We met Suvrat in 2021 when he’d just come back to India from the US to launch his own startup. Generative AI at that point was of course very early, and only a handful of folks who knew of its potential were excited about this. When Suvrat explained to us that the work he was doing was beating the state of the art in various benchmarks, we were interested in what that could mean. The innovations in voice AI – perfect emotional state transfer, the ability to shift emphasis to maintain tone and meaning across spoken languages et al. A whole host of AI breakthroughs that could yield better personalization outcomes.
Now at this point, we had already partnered with a couple of early Generative AI startups, so we were not noobs to the potential of this space. However the capital intensity of building with LLMs is formidable, and we had assumed that the density of talent required to build this could, at that time, only be found in the Valley. It wasn’t clear that foundational model research would happen in India. I’ll admit to being more than a little hesitant, but something in Suvrat’s assured, calm demeanor made us feel like if anyone could pull it off it would be him. And so we partnered with a pre-seed startup when it was completely unclear what we would actually build.
Our thesis was simple: While the capital requirements for foundational AI work is formidable, clever resourceful teams can achieve quite a bit by some smart marshalling of resources and ideas. One key takeaway: Don’t shy away from building a moat in foundational models, as they do not necessarily have to be capex intensive. We don’t all get to be OpenAI, and often that’s not even desirable. But it is possible to build very deep tech over several years if you are willing to also build a business alongside.
Now the business part wasn’t easy either. We thought Gan.ai would have a killer use-case in dubbing and helping folks produce YouTube videos in different languages easily. And so we went to work on that. It would take a few months to realize that the use-case, while very interesting from a tech perspective, is structurally very hard to implement in most voice studios because of the disruption to the workforce it represents. We learned another lesson there: To succeed, new technology must not only disrupt but also create new heroes that can champion its cause and its adoption.
Many other founders might have given up at this point, but Suvrat came back with an even more difficult proposition: to produce not only voice, but also video to go with it. Truly personalized AI-generated video that can be applied by businesses to engage customers. Email personalization is a big market; the postulation that the video personalization market can be even bigger was an interesting one. A third lesson: the best founders trailblaze to new applications and grow the market for them, and there is a bigger AI-native universe waiting to be created.
Since then Suvrat went on to roll out Myna, an AI tool that any marketer can use to create personalised videos at scale. Using a short clip from Virat Kohli, for example, Vivo created a launch campaign for its V27 smartphone that featured the cricket star calling users by their names and directing them to specific sales reps at their nearest store.
A demo is worth a thousand words, so you can take it out for a spin, here. But Myna is the first of many tools that Suvrat and his team plan to produce.
We are excited to watch the creation of what could be a legendary platform infrastructure AI company in the field of voice and videos. Gan.ai’s journey illustrates that India has been a force to be reckoned with in the AI space for a while now. We believe the recent flurry of attention on India’s emerging LLMs is a spark for what is already a gathering storm of energy and ideas. We would love to hear from folks who want to build truly foundational tech in AI in the service of moving the state of art forward.