Hi All, I've been following the piece on <https://...
# 06-technical-discussion
Hi All, I've been following the piece on https://caryn.ai/ and i'm curious to know what architecture you feel they used to build the virtual influencer? The articles claim that they used training data from her youtube channel and then built it on top of GPT-4. Would love to learn if anyone here knows more information?
if I had to take a guess, they would've used a generative voice cloning tool to get the speech working (all that youtube audio is more than enough data to get a good clone). Then GPT-4 would drive the "brain" of the AI, deciding how to respond to communications. So user input -> gpt-4 response generation -> text to speech with the audio clone tool -> voice message back to the user
They claim it’s trained on her actual responses, not just her voice. There’s a text interface as well. So maybe it’s a fine tuned language model on all of her previous messages and transcribed speech? Or maybe they’re just injecting examples into the GPT4 prompts. But I would imagine people want to use things like this for dirty talk, which would not be possible using open AI so maybe they’re using a custom model?
I'm curious how compelling the economics of this approach are. GPT-4 is pretty expensive and the OpenAI Usage Policies have some pretty strong no-adult-chat exclusions. There's already a huge outsourcing industry involving people providing performer-themed adult chat services for OnlyFans performers, presumably largely from low-wage countries. I'm not an expert at call center operations but my read is the cost of a contractor in the Philippines handling half a dozen simultaneous text conversations is comparable to or cheaper than the GPT-4 API for low-skill, low-training tasks like this. The big difference (I think) is this tech is providing voice responses rather than text responses, but that seems like a fairly simple layer for these existing human-based service providers to add onto their offerings.
They must be injecting examples into prompts since openai hasn't allowed for GPT-4 fine tuning as far as I'm aware
You'd also need to have Caryn herself write 10k+ examples of responses to fine tune well