Tuesday, October 1, 2024

The Evolution of Token Pricing: A Cost Breakdown for Popular Models:

As the competition among language models heats up, the costs of generating text continue to drop significantly. This post will explore the current expenses of three of the most cost-effective LLMs: GPT-4o Mini, Gemini 1.5 Flash, and Claude 3 Haiku, each offering a unique mix of capabilities and pricing structures. We’ll also calculate how much it would cost to run a chat with 1000 message exchanges using these models.




🚀 This question frequently comes up on my YouTube channel, Murat Karakaya Akademi (https://www.youtube.com/@MuratKarakayaAkademi), where I recently discussed the evolution of token pricing and how it impacts the implementation of AI-driven systems. A viewer recently commented on one of my tutorials, asking how much it would cost to run a chatbot at scale, and it was a great opportunity to explore the numbers in more detail here.


📊 Models and Their Pricing as of October 2024:

🧮 GPT-4o Mini

Input Token Cost: $0.150 / 1M tokens

Output Token Cost: $0.600 / 1M tokens

Context Size: 128K tokens

Notes: Smarter and cheaper than GPT-3.5 Turbo, with added vision capabilities.


🧮 Gemini 1.5 Flash

Input Token Cost: $0.075 / 1M tokens

Output Token Cost: $0.300 / 1M tokens

Context Size: 128K tokens

Notes: Google’s fastest multimodal model, optimized for diverse and repetitive tasks.


🧮 Claude 3 Haiku

Input Token Cost: $0.25 / 1M tokens

Output Token Cost: $1.25 / 1M tokens

Context Size: 200K tokens

Notes: Known for its efficiency, especially with large context windows, making it ideal for longer chats or document generation.


🧮 Cost Calculation for 1,000 Chat Exchanges: Now, let’s assume a scenario where a chat consists of 1,000 exchanges, with the following setup:

📊 Input Size per Exchange: 1,000 tokens

📊 Output Size per Exchange: 750 tokens

📊 Each new input includes all previous inputs and outputs, so the token count grows progressively.


This results in a total of:

🚀 875,125,000 input tokens

🚀 750,000 output tokens


📊Let’s break down the costs for each model based on this usage:

🧮 GPT-4o Mini

Input Token Cost: $131.27

Output Token Cost: $0.45

Total Cost: $131.72


🧮 Gemini 1.5 Flash

Input Token Cost: $65.63

Output Token Cost: $0.23

Total Cost: $65.86


🧮 Claude 3 Haiku

Input Token Cost: $218.78

Output Token Cost: $0.94

Total Cost: $219.72


🚀 Why It Matters

The declining costs of LLM token generation mean that you can now run more complex, token-heavy tasks like chatbot conversations, document analysis, and content generation more affordably than ever before. As demonstrated in the above scenario, using a model like Gemini 1.5 Flash allows for more cost-efficient usage, making it an attractive option for developers who need to run large-scale chat applications with high token throughput.


🧠 Learn More: If you’re interested in learning more about implementing cost-efficient AI solutions, check out my latest video on this topic over at Murat Karakaya Akademi.