Picking a free AI API in 2026 is harder than the pricing pages suggest
I compared eight inference providers that still show up in every “free LLM API” thread: Groq, Google Gemini API, Mistral, Cloudflare Workers AI, OpenRouter, Cohere, Together, and Hugging Face Inference Providers. The field is narrower than marketing suggests once you stop counting one-time promos and start asking what you can actually call this week for a side project.
The credible standing-free options today are Groq, Google Gemini API, Mistral Free mode, Cloudflare Workers AI, and OpenRouter’s free-model lane. Cohere has a real trial, but trial keys are explicitly not allowed for production or commercial use. Together has no free trial and requires at least $5 in prepaid credits. Hugging Face Inference Providers technically has a free tier, but it is $0.10 per month. Enough to sample. Not enough to run a backend.
My default for a typical side project is Groq. Free API key, fast open-weight models, official SDKs, and a services agreement that does not permit training on your inputs unless you explicitly allow it. I would pick Gemini when model quality matters more than latency, and accept that Google’s pricing page marks free-tier usage as used to improve Google products. I would pick Mistral when prompt privacy is the main worry. If the app already lives on Cloudflare Workers, Workers AI is the obvious third choice instead of Mistral. OpenRouter is a router and fallback layer, not the primary forever-free backend.
You cannot compare these on token price alone
Vendors bill tokens, requests, neurons, or dollar credits. Pick the wrong mental model and you will mis-budget.
| Mental model | Providers | What burns quota |
|---|---|---|
| Token buckets | Gemini, Groq, Mistral, Together | Long outputs, chat history, big contexts |
| Request/day caps | OpenRouter free | Any API call, even a tiny one |
| Neuron allocation | Cloudflare Workers AI | Model size times generation length |
| Dollar credit | Hugging Face ($0.10/mo) | Routed provider pass-through pricing |
| Call count | Cohere trial (1,000/mo) | Every endpoint hit |
Several providers now keep exact live RPM and TPM inside dashboards or response headers, not in public docs. I have marked those fields as publicly unspecified below rather than filling gaps with community screenshots.
You will not get a perfect apples-to-apples table. Use the numbers as directional, then sanity-check them against how your app actually calls models.
Free tiers at a glance
Token pricing in the paid-baseline column is input/output dollars per 1M tokens. Where a provider exposes many models, I used the cheapest clearly usable chat model on the official pricing page. Hugging Face and OpenRouter are routers, so their paid pricing is variable or pass-through.
| Provider | Free access reality | Free-tier quota | Public rate limits | Paid baseline (representative) | Training / retention | Commercial use on free | Representative models |
|---|---|---|---|---|---|---|---|
| Gemini API | Standing free tier | Free within model limits; pricing page shows free Gemini 2.5 Flash and grounding quotas such as 500 RPD for Search and 500 RPD for Maps on 2.5 Flash standard | Interactive free-tier RPM/TPM not fully published; Google says live limits are in AI Studio | Gemini 2.5 Flash: $0.30 / $2.50 per 1M in/out | Free tier: used to improve Google products (Yes). Paid tier: No | Unspecified in public docs; governed by Gemini API terms | gemini-2.5-flash and preview audio/TTS variants; menu changes over time |
| Groq | Standing free API key | Free plan exists; no numeric free-credit bucket published on docs | RPM/RPD/TPM/TPD defined; exact free-plan numbers publicly unspecified | Llama 3.1 8B Instant: $0.05 / $0.08 per 1M | Does not train on inputs/outputs unless permitted; ZDR available for eligible customers | Yes per services agreement | openai/gpt-oss-20b, openai/gpt-oss-120b, Llama 4 Scout, Qwen3 32B, Llama 3.3 70B, Llama 3.1 8B Instant |
| Cohere | Trial only | 1,000 API calls/month on trial keys | Trial chat: 20 req/min; token errors document 100,000 tokens/min | Command R7B: $0.0375 / $0.15 per 1M | Trial inputs/outputs may be used for R&D (de-identified) | No. Trial keys not permitted for production or commercial use | Command A, A Reasoning, A Translate, A Vision, Command R+, Command R, Command R7B, North Mini Code |
| Together | No true free tier | None; minimum $5 credit purchase | Dynamic per model and org; no free plan | LFM2 24B A2B: $0.03 / $0.12 per 1M | No training without opt-in; privacy settings can disable retention | N/A (no free tier) | No free-tier models |
| OpenRouter | Standing free plan, thin | 25+ free models, 4 free providers; 50 req/day and 20 RPM free; preload $10 for 1,000 free-model req/day at same 20 RPM | 20 RPM on :free models | openrouter/free: $0 / $0; paid models are provider-based plus 5.5% platform fee on pay-as-you-go | ZDR by default unless you opt into prompt logging; provider policies vary | Depends on underlying model license and provider policy | openrouter/free routes ~24 models: Owl Alpha, Nemotron 3 Ultra/Super/Nano, GPT-OSS 120B/20B, Gemma 4 31B/26B, others |
| Mistral | Standing free mode | Free mode on by default, no credit card; positioned for evaluation/prototyping | RPS, tokens/min, tokens/month enforced; exact figures in Admin > Limits only | Mistral Small 4: $0.10 / $0.30 per 1M | API data not used for model training; ZDR is Scale-only | Unclear for free mode; docs describe evaluation/prototyping use | mistral-small-latest, mistral-medium-latest, devstral-small-latest, codestral-latest, mistral-embed, subject to free-mode limits |
| Cloudflare Workers AI | Standing free allocation | 10,000 Neurons/day on Workers Free and Paid before overage | Text gen: 300 RPM default; embeddings: 3,000 RPM; summarization: 1,500 RPM | IBM Granite 4.0 H Micro: $0.017 / $0.112; Gemma 4 26B: $0.10 / $0.30; GPT-OSS 20B: $0.20 / $0.30 per 1M (neuron-backed) | Does not train on customer content without consent; stored only if paired with R2/KV | No public commercial-use ban found in reviewed docs | @cf/meta/llama-3.2-1b-instruct, @cf/meta/llama-3.1-8b-instruct-fp8-fast, @cf/openai/gpt-oss-20b, @cf/openai/gpt-oss-120b, @cf/google/gemma-4-26b-a4b-it, @cf/moonshotai/kimi-k2.5 |
| Hugging Face | Standing, tiny | $0.10/month free; $2/month PRO | No global RPM/TPM table; depends on routed provider/model | Variable pass-through at provider rates, no markup | Does not store request bodies/responses when routing; debug logs up to 30 days; no HF training on user data | Generally usable; underlying model license still governs | 200+ routed models; examples: openai/gpt-oss-120b, black-forest-labs/FLUX.1-dev, microsoft/harrier-oss-v1-0.6b |
Two things to keep in mind. Normalized paid prices are for comparison, not budget forecasting. Pick a specific model ID before you estimate spend. And on router products, commercial viability is partly a function of the underlying model license, not just the platform’s own docs.
What the free tiers actually cost you later
Groq: fast, callable, quota opacity
Groq’s pricing page publishes unusually concrete throughput numbers. Llama 3.1 8B Instant at 840 TPS. GPT OSS 20B at 1,000 TPS. The services agreement goes further than many competitors: Groq is not permitted to use inputs or outputs for training or fine-tuning unless you explicitly allow it.
The catch is governance, not capability. Public docs confirm a Free Plan exists, but they do not publish a clean table of exact free-plan RPM/TPM by model. If you need predictable public quotas for a launch announcement, Groq looks less transparent than it feels in practice. For a side project where you just want a working backend today, it is still one of the best zero-cost starting points here.
Gemini: strong model quality, worse data policy on free
Google’s pricing page clearly shows a real free tier for Gemini 2.5 Flash, plus useful daily free grounding quotas. The practical downside is that Google now pushes exact live rate limits into AI Studio rather than publishing a fully static interactive limit table.
The bigger caveat is data policy. The same pricing page marks free-tier Gemini usage as used to improve Google products. Paid tier usage is marked No. For a hobby bot or internal tool, I can live with that. For customer content or sensitive prompts, it is a serious reason to move to paid or pick Mistral or Groq instead.
Mistral: privacy-first free mode, limits in the console
Mistral’s docs say API data sent through the API is not used for model training. Free mode needs no credit card. That is the cleanest privacy story among the standing-free providers in this set.
The trade-off is opacity. Mistral positions free mode for evaluation and prototyping. Exact RPS, TPM, and monthly token ceilings live in Admin > Limits, not in a public table. Fine for prototyping. Slightly harder to pick as the one backend for a public side project if you expect bursts.
Cloudflare Workers AI: edge-native and well documented
Workers AI gives you 10,000 neurons per day free, and unlike several competitors it publishes task-level RPM limits in the docs. Text generation defaults to 300 RPM. Cloudflare also says it does not use your content to train models or improve services without explicit consent.
The limitation is economics. Neuron accounting is less intuitive than tokens, and the free allocation disappears quickly on larger models or longer generations. If your app already sits on Workers, the operational simplicity is excellent. If not, Groq or Mistral are usually easier to wire up.
OpenRouter: real free lane, not a primary backend
OpenRouter’s free plan is real: 25+ free models, 50 requests per day, 20 RPM, and openrouter/free as a single router across a rotating pool. Excellent for experimentation, model comparison, and rough production fallback.
It is less convincing as your primary always-free backend. Fifty requests per day is easy to outgrow. The 1,000/day free-model limit only appears once you preload at least $10. Privacy is strong on the OpenRouter side (ZDR by default unless you opt into logging), but provider logging and training policies vary by endpoint.
Cohere: generous trial, explicit non-production clause
Cohere’s trial is generous enough to learn the platform: 1,000 calls per month, 20 req/min on chat models, access to all models and APIs per the pricing docs. Production pricing for Command R7B is excellent.
The same pricing page says trial keys are not permitted for production or commercial purposes. That single clause makes Cohere trial a non-starter for a shipped side project. Treat it as an evaluation environment that becomes useful once you switch to a production key.
Together: cheap paid, not free
Together’s billing docs are unambiguous: no free trials currently, minimum $5 credit purchase. Some published token rates are very low, and the OpenAI-compatible API is clean. Together belongs in a cheap-paid shortlist, not a free-callable shortlist.
Hugging Face: great router, useless free credit as a backend
Hugging Face Inference Providers is excellent as a control plane: one token, many providers, pass-through pricing, fastest/cheapest/preferred routing, and good privacy defaults on the Hugging Face side. The free allowance is $0.10 per month. Enough to try a few requests. Not enough to run an app people will actually use.
What to wire on Monday
The stacks I would actually start with have simple auth, stable SDKs, and minimal changes to existing app code.
Groq with the official SDK
Groq’s quickstart uses GROQ_API_KEY and the groq SDK. Lowest-friction way to get a fast working backend for chat or coding tools.
import os
from groq import Groq
client = Groq(api_key=os.environ["GROQ_API_KEY"])
resp = client.chat.completions.create(
model="llama-3.1-8b-instant",
messages=[
{"role": "system", "content": "You are a concise backend assistant."},
{"role": "user", "content": "Return JSON with a title and slug for a post about side-project APIs."},
],
)
print(resp.choices[0].message.content) Use this if you want a free default backend behind a small Node or Python API and you care about latency more than frontier closed-model quality.
OpenRouter with an OpenAI-compatible client
OpenRouter’s API is Bearer-token authenticated and OpenAI-compatible via https://openrouter.ai/api/v1. One integration while you test free models, paid models, and fallback logic.
import OpenAI from 'openai';
const client = new OpenAI({
baseURL: 'https://openrouter.ai/api/v1',
apiKey: process.env.OPENROUTER_API_KEY
});
const completion = await client.chat.completions.create({
model: 'openrouter/free',
messages: [
{ role: 'user', content: 'Give me a one-line product description for a weekend SaaS.' }
]
});
console.log(completion.choices[0].message.content); Use this if you want one abstraction layer and expect to switch providers later. Do not assume the free plan is enough for user-facing traffic without adding credits.
Cloudflare Workers AI via raw HTTP
Cloudflare’s REST path needs an API token plus account ID. Easy to automate if you already have a Workers account.
curl "https://api.cloudflare.com/client/v4/accounts/$CLOUDFLARE_ACCOUNT_ID/ai/run/@cf/meta/llama-3.1-8b-instruct"
-H "Authorization: Bearer $CLOUDFLARE_API_TOKEN"
-d '{ "prompt": "Write a short release note for a side-project launch." }' Use this if your frontend, auth, KV, R2, or edge routing already lives on Cloudflare. Watch the neuron budget.
Cohere for evaluation-only work
Cohere’s ClientV2 chat API is straightforward. The trial key is not allowed for production or commercial use. Useful for model evaluation or internal prototyping, not for shipping a public side project on the free tier.
import os
import cohere
co = cohere.ClientV2(api_key=os.environ["COHERE_API_KEY"])
res = co.chat(
model="command-r7b-12-2024",
messages=[{"role": "user", "content": "Summarize the trade-offs of free AI APIs in two sentences."}],
)
print(res.message.content[0].text) If you only want one practical setup decision: Groq for the fastest route to a real app with zero spend. Mistral for a stronger privacy posture on API prompts. Cloudflare Workers AI if your stack is already Cloudflare-native. OpenRouter later as a fallback layer or experimentation surface.
Where each option fits best
| Need | Best fit | Why | Hidden catch |
|---|---|---|---|
| Lowest latency at zero spend | Groq | High published TPS numbers; no training on prompts by default | Exact free-plan quotas not in public docs |
| Strongest model quality while staying free | Gemini API | Real standing free tier on Gemini 2.5 Flash | Free tier used to improve Google products; live limits in AI Studio |
| Best privacy among standing-free APIs | Mistral Free mode | API data not used for training; no card required | Limits console-only; ZDR is Scale-only |
| Already deploy at the edge | Cloudflare Workers AI | Real free allocation, published RPM limits, clean Workers integration | Neuron math burns fast on large models |
| Router or fallback layer | OpenRouter | One API, many free and paid models, ZDR defaults | 50 req/day free unless you preload credits; provider policies vary |
| Evaluation-only trial | Cohere | Good trial coverage; strong low-cost production models later | Trial explicitly non-commercial and non-production |
| Cheap paid, not free | Together | Low serverless catalog pricing; OpenAI-compatible API | No free trial; $5 minimum preload |
| Unified sampler | Hugging Face Inference Providers | Multi-provider UX; no training on routed bodies/responses | Free credit is $0.10/month |
My top three for a builder who wants free, callable, and practical:
Groq ranks first. Free API key, official SDKs, OpenAI-like usage patterns, very fast inference, strong no-training default. Main weakness is quota transparency, not capability.
Gemini API free tier ranks second. If you care most about model quality per dollar while still spending zero, Gemini is hard to ignore. It falls behind Groq because Google’s free-tier data-use posture is worse for sensitive workloads and the live limit table lives in AI Studio.
Mistral Free mode ranks third. Not as frictionless as Groq, and free-mode limits are less transparent than Cloudflare’s RPM table. But the API docs say data is not used for training, free mode needs no card, and the paid upgrade path is straightforward. If you already build on Cloudflare Workers, I would swap Mistral for Workers AI as my personal number three.
flowchart TD
start[Need a free inference API] --> freeHard{Free spend must stay at zero?}
freeHard -->|No| together[Together: cheap paid from $5]
freeHard -->|Yes| stack{Where does your app run?}
stack -->|Cloudflare Workers| cf[Cloudflare Workers AI]
stack -->|Anywhere else| privacy{Sensitive prompts?}
privacy -->|Yes| mistral[Mistral Free mode]
privacy -->|No| quality{Frontier quality over latency?}
quality -->|Yes| gemini[Gemini API free tier]
quality -->|No| groq[Groq free API key]
groq --> router[Add OpenRouter later for fallback]
gemini --> router
mistral --> router For a hobby project shipping this weekend, I would wire Groq first, add OpenRouter as a fallback once traffic is real, and move to paid Gemini or Mistral the moment prompts contain anything I would not paste into a public GitHub issue.
Gaps in the public docs
Several providers keep exact live limits inside dashboards or response headers. That is why Google, Groq, Mistral, Together, and Hugging Face have publicly unspecified fields in the table above. Where the official docs say “view in console,” I used the console as the source of truth instead of community numbers or third-party screenshots.
Normalized paid baselines use a representative low-cost chat model, not an average across all models. That is enough to guide decisions. Not enough for budget forecasting without picking a specific model ID first.
On OpenRouter and Hugging Face, commercial viability depends partly on the underlying model license and provider policy, not just the router’s platform docs. That is why those rows are written more cautiously than Groq’s or Cohere’s.
Sources
Google Gemini
Gemini API pricing
Gemini API rate limits
Google AI Studio
Groq
Groq pricing
Groq rate limits
Groq quickstart
Groq services agreement
Cohere
Cohere pricing
Cohere SDK docs
Together
Together billing
Together pricing
OpenRouter
OpenRouter pricing
OpenRouter quickstart
OpenRouter API reference
Mistral
Mistral platform tiers
Mistral pricing
Mistral data usage opt-out
Cloudflare Workers AI
Workers AI pricing
Workers AI limits
Workers AI REST API