Picking a free AI API in 2026 is harder than the pricing pages suggest

I compared eight inference providers that still show up in every “free LLM API” thread: Groq, Google Gemini API, Mistral, Cloudflare Workers AI, OpenRouter, Cohere, Together, and Hugging Face Inference Providers. The field is narrower than marketing suggests once you stop counting one-time promos and start asking what you can actually call this week for a side project.

The credible standing-free options today are Groq, Google Gemini API, Mistral Free mode, Cloudflare Workers AI, and OpenRouter’s free-model lane. Cohere has a real trial, but trial keys are explicitly not allowed for production or commercial use. Together has no free trial and requires at least $5 in prepaid credits. Hugging Face Inference Providers technically has a free tier, but it is $0.10 per month. Enough to sample. Not enough to run a backend.

My default for a typical side project is Groq. Free API key, fast open-weight models, official SDKs, and a services agreement that does not permit training on your inputs unless you explicitly allow it. I would pick Gemini when model quality matters more than latency, and accept that Google’s pricing page marks free-tier usage as used to improve Google products. I would pick Mistral when prompt privacy is the main worry. If the app already lives on Cloudflare Workers, Workers AI is the obvious third choice instead of Mistral. OpenRouter is a router and fallback layer, not the primary forever-free backend.

You cannot compare these on token price alone

Vendors bill tokens, requests, neurons, or dollar credits. Pick the wrong mental model and you will mis-budget.

Mental model	Providers	What burns quota
Token buckets	Gemini, Groq, Mistral, Together	Long outputs, chat history, big contexts
Request/day caps	OpenRouter free	Any API call, even a tiny one
Neuron allocation	Cloudflare Workers AI	Model size times generation length
Dollar credit	Hugging Face ($0.10/mo)	Routed provider pass-through pricing
Call count	Cohere trial (1,000/mo)	Every endpoint hit

Several providers now keep exact live RPM and TPM inside dashboards or response headers, not in public docs. I have marked those fields as publicly unspecified below rather than filling gaps with community screenshots.

You will not get a perfect apples-to-apples table. Use the numbers as directional, then sanity-check them against how your app actually calls models.

Free tiers at a glance

Token pricing in the paid-baseline column is input/output dollars per 1M tokens. Where a provider exposes many models, I used the cheapest clearly usable chat model on the official pricing page. Hugging Face and OpenRouter are routers, so their paid pricing is variable or pass-through.

Provider	Free access reality	Free-tier quota	Public rate limits	Paid baseline (representative)	Training / retention	Commercial use on free	Representative models
Gemini API	Standing free tier	Free within model limits; pricing page shows free Gemini 2.5 Flash and grounding quotas such as 500 RPD for Search and 500 RPD for Maps on 2.5 Flash standard	Interactive free-tier RPM/TPM not fully published; Google says live limits are in AI Studio	Gemini 2.5 Flash: $0.30 / $2.50 per 1M in/out	Free tier: used to improve Google products (Yes). Paid tier: No	Unspecified in public docs; governed by Gemini API terms	`gemini-2.5-flash` and preview audio/TTS variants; menu changes over time
Groq	Standing free API key	Free plan exists; no numeric free-credit bucket published on docs	RPM/RPD/TPM/TPD defined; exact free-plan numbers publicly unspecified	Llama 3.1 8B Instant: $0.05 / $0.08 per 1M	Does not train on inputs/outputs unless permitted; ZDR available for eligible customers	Yes per services agreement	`openai/gpt-oss-20b`, `openai/gpt-oss-120b`, Llama 4 Scout, Qwen3 32B, Llama 3.3 70B, Llama 3.1 8B Instant
Cohere	Trial only	1,000 API calls/month on trial keys	Trial chat: 20 req/min; token errors document 100,000 tokens/min	Command R7B: $0.0375 / $0.15 per 1M	Trial inputs/outputs may be used for R&D (de-identified)	No. Trial keys not permitted for production or commercial use	Command A, A Reasoning, A Translate, A Vision, Command R+, Command R, Command R7B, North Mini Code
Together	No true free tier	None; minimum $5 credit purchase	Dynamic per model and org; no free plan	LFM2 24B A2B: $0.03 / $0.12 per 1M	No training without opt-in; privacy settings can disable retention	N/A (no free tier)	No free-tier models
OpenRouter	Standing free plan, thin	25+ free models, 4 free providers; 50 req/day and 20 RPM free; preload $10 for 1,000 free-model req/day at same 20 RPM	20 RPM on `:free` models	`openrouter/free`: $0 / $0; paid models are provider-based plus 5.5% platform fee on pay-as-you-go	ZDR by default unless you opt into prompt logging; provider policies vary	Depends on underlying model license and provider policy	`openrouter/free` routes ~24 models: Owl Alpha, Nemotron 3 Ultra/Super/Nano, GPT-OSS 120B/20B, Gemma 4 31B/26B, others
Mistral	Standing free mode	Free mode on by default, no credit card; positioned for evaluation/prototyping	RPS, tokens/min, tokens/month enforced; exact figures in Admin > Limits only	Mistral Small 4: $0.10 / $0.30 per 1M	API data not used for model training; ZDR is Scale-only	Unclear for free mode; docs describe evaluation/prototyping use	`mistral-small-latest`, `mistral-medium-latest`, `devstral-small-latest`, `codestral-latest`, `mistral-embed`, subject to free-mode limits
Cloudflare Workers AI	Standing free allocation	10,000 Neurons/day on Workers Free and Paid before overage	Text gen: 300 RPM default; embeddings: 3,000 RPM; summarization: 1,500 RPM	IBM Granite 4.0 H Micro: $0.017 / $0.112; Gemma 4 26B: $0.10 / $0.30; GPT-OSS 20B: $0.20 / $0.30 per 1M (neuron-backed)	Does not train on customer content without consent; stored only if paired with R2/KV	No public commercial-use ban found in reviewed docs	`@cf/meta/llama-3.2-1b-instruct`, `@cf/meta/llama-3.1-8b-instruct-fp8-fast`, `@cf/openai/gpt-oss-20b`, `@cf/openai/gpt-oss-120b`, `@cf/google/gemma-4-26b-a4b-it`, `@cf/moonshotai/kimi-k2.5`
Hugging Face	Standing, tiny	$0.10/month free; $2/month PRO	No global RPM/TPM table; depends on routed provider/model	Variable pass-through at provider rates, no markup	Does not store request bodies/responses when routing; debug logs up to 30 days; no HF training on user data	Generally usable; underlying model license still governs	200+ routed models; examples: `openai/gpt-oss-120b`, `black-forest-labs/FLUX.1-dev`, `microsoft/harrier-oss-v1-0.6b`

Two things to keep in mind. Normalized paid prices are for comparison, not budget forecasting. Pick a specific model ID before you estimate spend. And on router products, commercial viability is partly a function of the underlying model license, not just the platform’s own docs.

What the free tiers actually cost you later

Groq: fast, callable, quota opacity

Groq’s pricing page publishes unusually concrete throughput numbers. Llama 3.1 8B Instant at 840 TPS. GPT OSS 20B at 1,000 TPS. The services agreement goes further than many competitors: Groq is not permitted to use inputs or outputs for training or fine-tuning unless you explicitly allow it.

The catch is governance, not capability. Public docs confirm a Free Plan exists, but they do not publish a clean table of exact free-plan RPM/TPM by model. If you need predictable public quotas for a launch announcement, Groq looks less transparent than it feels in practice. For a side project where you just want a working backend today, it is still one of the best zero-cost starting points here.

Gemini: strong model quality, worse data policy on free

Google’s pricing page clearly shows a real free tier for Gemini 2.5 Flash, plus useful daily free grounding quotas. The practical downside is that Google now pushes exact live rate limits into AI Studio rather than publishing a fully static interactive limit table.

The bigger caveat is data policy. The same pricing page marks free-tier Gemini usage as used to improve Google products. Paid tier usage is marked No. For a hobby bot or internal tool, I can live with that. For customer content or sensitive prompts, it is a serious reason to move to paid or pick Mistral or Groq instead.

Mistral: privacy-first free mode, limits in the console

Mistral’s docs say API data sent through the API is not used for model training. Free mode needs no credit card. That is the cleanest privacy story among the standing-free providers in this set.

The trade-off is opacity. Mistral positions free mode for evaluation and prototyping. Exact RPS, TPM, and monthly token ceilings live in Admin > Limits, not in a public table. Fine for prototyping. Slightly harder to pick as the one backend for a public side project if you expect bursts.

Cloudflare Workers AI: edge-native and well documented

Workers AI gives you 10,000 neurons per day free, and unlike several competitors it publishes task-level RPM limits in the docs. Text generation defaults to 300 RPM. Cloudflare also says it does not use your content to train models or improve services without explicit consent.

The limitation is economics. Neuron accounting is less intuitive than tokens, and the free allocation disappears quickly on larger models or longer generations. If your app already sits on Workers, the operational simplicity is excellent. If not, Groq or Mistral are usually easier to wire up.

OpenRouter: real free lane, not a primary backend

OpenRouter’s free plan is real: 25+ free models, 50 requests per day, 20 RPM, and openrouter/free as a single router across a rotating pool. Excellent for experimentation, model comparison, and rough production fallback.

It is less convincing as your primary always-free backend. Fifty requests per day is easy to outgrow. The 1,000/day free-model limit only appears once you preload at least $10. Privacy is strong on the OpenRouter side (ZDR by default unless you opt into logging), but provider logging and training policies vary by endpoint.

Cohere: generous trial, explicit non-production clause

Cohere’s trial is generous enough to learn the platform: 1,000 calls per month, 20 req/min on chat models, access to all models and APIs per the pricing docs. Production pricing for Command R7B is excellent.

The same pricing page says trial keys are not permitted for production or commercial purposes. That single clause makes Cohere trial a non-starter for a shipped side project. Treat it as an evaluation environment that becomes useful once you switch to a production key.

Together: cheap paid, not free

Together’s billing docs are unambiguous: no free trials currently, minimum $5 credit purchase. Some published token rates are very low, and the OpenAI-compatible API is clean. Together belongs in a cheap-paid shortlist, not a free-callable shortlist.

Hugging Face: great router, useless free credit as a backend

Hugging Face Inference Providers is excellent as a control plane: one token, many providers, pass-through pricing, fastest/cheapest/preferred routing, and good privacy defaults on the Hugging Face side. The free allowance is $0.10 per month. Enough to try a few requests. Not enough to run an app people will actually use.

What to wire on Monday

The stacks I would actually start with have simple auth, stable SDKs, and minimal changes to existing app code.

Groq with the official SDK

Groq’s quickstart uses GROQ_API_KEY and the groq SDK. Lowest-friction way to get a fast working backend for chat or coding tools.

import os
from groq import Groq

client = Groq(api_key=os.environ["GROQ_API_KEY"])

resp = client.chat.completions.create(
    model="llama-3.1-8b-instant",
    messages=[
        {"role": "system", "content": "You are a concise backend assistant."},
        {"role": "user", "content": "Return JSON with a title and slug for a post about side-project APIs."},
    ],
)

print(resp.choices[0].message.content)

Use this if you want a free default backend behind a small Node or Python API and you care about latency more than frontier closed-model quality.

OpenRouter with an OpenAI-compatible client

OpenRouter’s API is Bearer-token authenticated and OpenAI-compatible via https://openrouter.ai/api/v1. One integration while you test free models, paid models, and fallback logic.

import OpenAI from 'openai';

const client = new OpenAI({
	baseURL: 'https://openrouter.ai/api/v1',
	apiKey: process.env.OPENROUTER_API_KEY
});

const completion = await client.chat.completions.create({
	model: 'openrouter/free',
	messages: [
		{ role: 'user', content: 'Give me a one-line product description for a weekend SaaS.' }
	]
});

console.log(completion.choices[0].message.content);

Use this if you want one abstraction layer and expect to switch providers later. Do not assume the free plan is enough for user-facing traffic without adding credits.

Cloudflare Workers AI via raw HTTP

Cloudflare’s REST path needs an API token plus account ID. Easy to automate if you already have a Workers account.

curl "https://api.cloudflare.com/client/v4/accounts/$CLOUDFLARE_ACCOUNT_ID/ai/run/@cf/meta/llama-3.1-8b-instruct" 
  -H "Authorization: Bearer $CLOUDFLARE_API_TOKEN" 
  -d '{ "prompt": "Write a short release note for a side-project launch." }'

Use this if your frontend, auth, KV, R2, or edge routing already lives on Cloudflare. Watch the neuron budget.

Cohere for evaluation-only work

Cohere’s ClientV2 chat API is straightforward. The trial key is not allowed for production or commercial use. Useful for model evaluation or internal prototyping, not for shipping a public side project on the free tier.

import os
import cohere

co = cohere.ClientV2(api_key=os.environ["COHERE_API_KEY"])

res = co.chat(
    model="command-r7b-12-2024",
    messages=[{"role": "user", "content": "Summarize the trade-offs of free AI APIs in two sentences."}],
)

print(res.message.content[0].text)

If you only want one practical setup decision: Groq for the fastest route to a real app with zero spend. Mistral for a stronger privacy posture on API prompts. Cloudflare Workers AI if your stack is already Cloudflare-native. OpenRouter later as a fallback layer or experimentation surface.

Where each option fits best

Need	Best fit	Why	Hidden catch
Lowest latency at zero spend	Groq	High published TPS numbers; no training on prompts by default	Exact free-plan quotas not in public docs
Strongest model quality while staying free	Gemini API	Real standing free tier on Gemini 2.5 Flash	Free tier used to improve Google products; live limits in AI Studio
Best privacy among standing-free APIs	Mistral Free mode	API data not used for training; no card required	Limits console-only; ZDR is Scale-only
Already deploy at the edge	Cloudflare Workers AI	Real free allocation, published RPM limits, clean Workers integration	Neuron math burns fast on large models
Router or fallback layer	OpenRouter	One API, many free and paid models, ZDR defaults	50 req/day free unless you preload credits; provider policies vary
Evaluation-only trial	Cohere	Good trial coverage; strong low-cost production models later	Trial explicitly non-commercial and non-production
Cheap paid, not free	Together	Low serverless catalog pricing; OpenAI-compatible API	No free trial; $5 minimum preload
Unified sampler	Hugging Face Inference Providers	Multi-provider UX; no training on routed bodies/responses	Free credit is $0.10/month

My top three for a builder who wants free, callable, and practical:

Groq ranks first. Free API key, official SDKs, OpenAI-like usage patterns, very fast inference, strong no-training default. Main weakness is quota transparency, not capability.

Gemini API free tier ranks second. If you care most about model quality per dollar while still spending zero, Gemini is hard to ignore. It falls behind Groq because Google’s free-tier data-use posture is worse for sensitive workloads and the live limit table lives in AI Studio.

Mistral Free mode ranks third. Not as frictionless as Groq, and free-mode limits are less transparent than Cloudflare’s RPM table. But the API docs say data is not used for training, free mode needs no card, and the paid upgrade path is straightforward. If you already build on Cloudflare Workers, I would swap Mistral for Workers AI as my personal number three.

flowchart TD
    start[Need a free inference API] --> freeHard{Free spend must stay at zero?}
    freeHard -->|No| together[Together: cheap paid from $5]
    freeHard -->|Yes| stack{Where does your app run?}
    stack -->|Cloudflare Workers| cf[Cloudflare Workers AI]
    stack -->|Anywhere else| privacy{Sensitive prompts?}
    privacy -->|Yes| mistral[Mistral Free mode]
    privacy -->|No| quality{Frontier quality over latency?}
    quality -->|Yes| gemini[Gemini API free tier]
    quality -->|No| groq[Groq free API key]
    groq --> router[Add OpenRouter later for fallback]
    gemini --> router
    mistral --> router

For a hobby project shipping this weekend, I would wire Groq first, add OpenRouter as a fallback once traffic is real, and move to paid Gemini or Mistral the moment prompts contain anything I would not paste into a public GitHub issue.

Gaps in the public docs

Several providers keep exact live limits inside dashboards or response headers. That is why Google, Groq, Mistral, Together, and Hugging Face have publicly unspecified fields in the table above. Where the official docs say “view in console,” I used the console as the source of truth instead of community numbers or third-party screenshots.

Normalized paid baselines use a representative low-cost chat model, not an average across all models. That is enough to guide decisions. Not enough for budget forecasting without picking a specific model ID first.

On OpenRouter and Hugging Face, commercial viability depends partly on the underlying model license and provider policy, not just the router’s platform docs. That is why those rows are written more cautiously than Groq’s or Cohere’s.