Cost control tutorial

How to control AI API costs before they surprise you

This tutorial explains how API pricing works for OpenAI, Claude, Gemini and similar model providers, how to calculate token cost, where hidden costs appear, and what to click before putting an AI workflow into production.

Start with formula GPUJet Prices Professional Path

The simple cost formula

Most model APIs charge separately for input tokens and output tokens. Input is what you send to the model. Output is what the model generates back. Some providers also charge for cached input, web search, file tools, image/audio/video tokens, containers or long-context features.

Formula:
Input cost = input tokens ÷ 1,000,000 × input price
Output cost = output tokens ÷ 1,000,000 × output price
Total cost = input cost + output cost + tool costs

Concrete workflow: what to click before launch

Open the provider pricing page

Before using a model, open the official pricing page. Do not rely on screenshots or old blog posts because model names and prices change often.

Create a separate API key for this project

Go to the provider dashboard, create a new key for this project only, name it clearly, and avoid reusing the same key across every test.

Set limits and alerts

Open billing or usage settings. Set a monthly budget, warning alert and low test limit before connecting the workflow to real users.

Log every request size

For each request, log model name, estimated input tokens, output tokens, tool calls, latency and error result. This gives you evidence instead of guessing.

Hidden cost checklist

Long prompts

Repeated system prompts, huge documents and pasted histories can cost more than the final answer.

Tool calls

Search, file tools, code containers and retrieval systems can add separate charges or storage costs.

Retry loops

If an agent fails and retries automatically, a small bug can multiply cost quickly.

Official pricing links

OpenAI API Pricing Anthropic Claude Pricing Google Gemini API Pricing DeepSeek API Pricing

GPUJet rule: measure before you scale

Start with one model, one workflow and one usage limit. After you have real logs, choose whether to optimize prompt size, switch model, cache context or move to another provider.