I made some rough estimates this morning on the cost per hou Cerebral Valley #06-technical-discussion

I made some rough estimates this morning on the co...

Don Alvarez

04/23/2023, 5:46 PM

I made some rough estimates this morning on the cost per hour of providing an ai chat representative. The numbers are biased towards customer support or sales support scenarios where the human is having a focused, back-and-forth text chat conversation with an ai "representative" but are probably decent hand-waving estimates for a decent range of chat scenarios. Cost per hour of chat: GPT-3.5-turbo: $1 GPT-4: $7 - $430 Call center employee in the Philippines: $2 Call center employee in the US: $6 Lawyer in the US: $300 - $600 details in 🧵 for math types (edit added): My read on these number is GPT-3.5 is already cheap enough to open up tons of business models that didn't previously make sense with humans, but that GPT-4 is currently about 10x too expensive today to do the same (it's too close to the cost of appropriately educated human labor when run in a steady-state continuous mode - yes, special cases will exist, but it doesn't change the business model of "everything"). That said, the gpt-3.5-turbo release dropped the cost per token for the gpt-3.5 family by 10x. If openAI can pull off the same 10x cost reduction for the gpt-4 family, then pretty much everything opens up as a business model. Personally, I'm guessing there's at least a 3x cost reduction to be had for the GPT-4 family models in the not too distant future, and I wouldn't bet against a 10x or 30x or 100x reduction in the next few years.

💡 5

Don Alvarez

04/23/2023, 5:46 PM

These are obviously ymmv estimates. The calculations assume 1000 tokens of prompt and an average of 500 tokens of chat history in the request (eg. the chat history grows on average from 0 to 1000 over the course of the chat) for about 1500 tokens per chat response. When asked for 30-50 word answers gpt-3.5-turbo takes about 4 seconds to complete its response (shorter responses mean more chat iterations per hour, longer responses mean fewer chat iterations per hour, my read is the final cost per hour isn't super sensitive to the average response length, but ymmv). Lets assume the human takes about 6 seconds to respond back to a chat response. That means the conversational back-and-forth cycle time is 4 + 6 = 10 seconds, or one GPT-3.5 completion required every 10 seconds. Putting this all together means gpt-3.5-turbo conversations burn about 1500 tokens every 10 seconds, or about 500K tokens/hour. GPT-3.5-turbo tokens cost $0.002/1K tokens, or $1 per hour of chat. gpt-4 is slower (fewer tokens per hour in a conversational chat scenario) and more expensive per token. gpt-4 takes about 9 seconds to produce 40 word responses, or 9 + 6 = 15 seconds per conversational cycle, or one GPT-4 completion required every 15 seconds. This means gpt-4 conversations burn about 1500 tokens every 15 seconds, or about 360K tokens/hour. GPT-4 pricing is more complicated than GPT-3.5 pricing. I'm going to assume the costs are dominated by the prompts rather than the responses, because that looks to be true in most of the calculations I tried. Using the smaller/cheaper gpt-4-0314 variant we get 360K tokens/hour * $0.02/1K = $7/hour of chat and using the larger gpt4-32k-0314 model we get 360K tokens/hour * $0.12/1K = $43/hour. That said, there's no point in using the 32k model with 1500 token completion requests. You should only use the 32k model if you are sending more than 8k requests, so we assume the request context in the 32k case is 15,000 not 1,500 and get a cost of $430/hour(!). Call center employee in the Philippines makes about $6/hour ($1K/month). An average utilization factor of 3 simultaneous text conversations per employee results in $2 per chat hour. Call center employees in the US make about $18/hour or $6 per chat hour at a utilization of 3 simultaneous text conversations.

Brennan Woodruff

04/23/2023, 11:22 PM

This is great analysis. Hot take I don't think it will require a model the size of GPT4 to do call centers which means even lower costs coming

➕ 1

Han

04/24/2023, 4:18 AM

IMO it would be more suitable to compare paralegal instead of legal counsel.

Don Alvarez

04/24/2023, 4:20 AM

The reason for listing the lawyer price was to point out that chatting with the 32K GPT-4 language model costs about the same per hour as chatting with a corporate lawyer. It's a cost comparison, not a quality comparison.

Han

04/24/2023, 4:59 AM

Interesting choice. My question is then, when a corp retains a legal counsel, external or internal, the consideration is usually beyond simple information retrieval tasks solved by companies like Everlaw or paralegal tasks solved by startups like Spellbook.legal. So why would we compare costs of GPT-4 to people who are licensed to practice law and provide legal representation?

Don Alvarez

04/24/2023, 5:02 AM

The intent is to point out how expensive chatting with 32K GPT-4 is. It's as expensive as a chatting with corporate lawyer.

🙏 1

🙌 1

Don Alvarez

04/24/2023, 5:07 AM

And, to be super clear - this is looking at situations where a human is iteratively chatting with the AI, repeatedly sending messages or asking questions and getting responses. There are likely plenty of scenarios where 32K GPT-4 can give single request answers much more cheaply than a lawyer can, but if you want to have an actual conversation with the AI then you're going to be sending lots of tokens back and forth repeatedly which is where the costs get out of hand.

Han

04/24/2023, 5:28 AM

From an information retrieval perspective, GPT-4 do not have good query understanding. It has great natural language understanding, but query understanding is subpar. Thus, all the fancy prompt engineering and context injection to guide the model to generate from the desired latent space. The context length is actually not the problem here.

🤔 1

Open in Slack

Previous Next