Picking a free AI API in 2026 is harder than the pricing pages suggest

I compared eight inference providers that still show up in every “free LLM API” thread: Groq, Google Gemini API, Mistral, Cloudflare Workers AI, OpenRouter, Cohere, Together, and Hugging Face Inference Providers. The field is narrower than marketing suggests once you stop counting one-time promos and start asking what you can actually call this week for a side project.

The credible standing-free options today are Groq, Google Gemini API, Mistral Free mode, Cloudflare Workers AI, and OpenRouter’s free-model lane. Cohere has a real trial, but trial keys are explicitly not allowed for production or commercial use. Together has no free trial and requires at least $5 in prepaid credits. Hugging Face Inference Providers technically has a free tier, but it is $0.10 per month. Enough to sample. Not enough to run a backend.

My default for a typical side project is Groq. Free API key, fast open-weight models, official SDKs, and a services agreement that does not permit training on your inputs unless you explicitly allow it. I would pick Gemini when model quality matters more than latency, and accept that Google’s pricing page marks free-tier usage as used to improve Google products. I would pick Mistral when prompt privacy is the main worry. If the app already lives on Cloudflare Workers, Workers AI is the obvious third choice instead of Mistral. OpenRouter is a router and fallback layer, not the primary forever-free backend.

You cannot compare these on token price alone

Vendors bill tokens, requests, neurons, or dollar credits. Pick the wrong mental model and you will mis-budget.

Mental modelProvidersWhat burns quota
Token bucketsGemini, Groq, Mistral, TogetherLong outputs, chat history, big contexts
Request/day capsOpenRouter freeAny API call, even a tiny one
Neuron allocationCloudflare Workers AIModel size times generation length
Dollar creditHugging Face ($0.10/mo)Routed provider pass-through pricing
Call countCohere trial (1,000/mo)Every endpoint hit

Several providers now keep exact live RPM and TPM inside dashboards or response headers, not in public docs. I have marked those fields as publicly unspecified below rather than filling gaps with community screenshots.

You will not get a perfect apples-to-apples table. Use the numbers as directional, then sanity-check them against how your app actually calls models.

Free tiers at a glance

Token pricing in the paid-baseline column is input/output dollars per 1M tokens. Where a provider exposes many models, I used the cheapest clearly usable chat model on the official pricing page. Hugging Face and OpenRouter are routers, so their paid pricing is variable or pass-through.

ProviderFree access realityFree-tier quotaPublic rate limitsPaid baseline (representative)Training / retentionCommercial use on freeRepresentative models
Gemini APIStanding free tierFree within model limits; pricing page shows free Gemini 2.5 Flash and grounding quotas such as 500 RPD for Search and 500 RPD for Maps on 2.5 Flash standardInteractive free-tier RPM/TPM not fully published; Google says live limits are in AI StudioGemini 2.5 Flash: $0.30 / $2.50 per 1M in/outFree tier: used to improve Google products (Yes). Paid tier: NoUnspecified in public docs; governed by Gemini API termsgemini-2.5-flash and preview audio/TTS variants; menu changes over time
GroqStanding free API keyFree plan exists; no numeric free-credit bucket published on docsRPM/RPD/TPM/TPD defined; exact free-plan numbers publicly unspecifiedLlama 3.1 8B Instant: $0.05 / $0.08 per 1MDoes not train on inputs/outputs unless permitted; ZDR available for eligible customersYes per services agreementopenai/gpt-oss-20b, openai/gpt-oss-120b, Llama 4 Scout, Qwen3 32B, Llama 3.3 70B, Llama 3.1 8B Instant
CohereTrial only1,000 API calls/month on trial keysTrial chat: 20 req/min; token errors document 100,000 tokens/minCommand R7B: $0.0375 / $0.15 per 1MTrial inputs/outputs may be used for R&D (de-identified)No. Trial keys not permitted for production or commercial useCommand A, A Reasoning, A Translate, A Vision, Command R+, Command R, Command R7B, North Mini Code
TogetherNo true free tierNone; minimum $5 credit purchaseDynamic per model and org; no free planLFM2 24B A2B: $0.03 / $0.12 per 1MNo training without opt-in; privacy settings can disable retentionN/A (no free tier)No free-tier models
OpenRouterStanding free plan, thin25+ free models, 4 free providers; 50 req/day and 20 RPM free; preload $10 for 1,000 free-model req/day at same 20 RPM20 RPM on :free modelsopenrouter/free: $0 / $0; paid models are provider-based plus 5.5% platform fee on pay-as-you-goZDR by default unless you opt into prompt logging; provider policies varyDepends on underlying model license and provider policyopenrouter/free routes ~24 models: Owl Alpha, Nemotron 3 Ultra/Super/Nano, GPT-OSS 120B/20B, Gemma 4 31B/26B, others
MistralStanding free modeFree mode on by default, no credit card; positioned for evaluation/prototypingRPS, tokens/min, tokens/month enforced; exact figures in Admin > Limits onlyMistral Small 4: $0.10 / $0.30 per 1MAPI data not used for model training; ZDR is Scale-onlyUnclear for free mode; docs describe evaluation/prototyping usemistral-small-latest, mistral-medium-latest, devstral-small-latest, codestral-latest, mistral-embed, subject to free-mode limits
Cloudflare Workers AIStanding free allocation10,000 Neurons/day on Workers Free and Paid before overageText gen: 300 RPM default; embeddings: 3,000 RPM; summarization: 1,500 RPMIBM Granite 4.0 H Micro: $0.017 / $0.112; Gemma 4 26B: $0.10 / $0.30; GPT-OSS 20B: $0.20 / $0.30 per 1M (neuron-backed)Does not train on customer content without consent; stored only if paired with R2/KVNo public commercial-use ban found in reviewed docs@cf/meta/llama-3.2-1b-instruct, @cf/meta/llama-3.1-8b-instruct-fp8-fast, @cf/openai/gpt-oss-20b, @cf/openai/gpt-oss-120b, @cf/google/gemma-4-26b-a4b-it, @cf/moonshotai/kimi-k2.5
Hugging FaceStanding, tiny$0.10/month free; $2/month PRONo global RPM/TPM table; depends on routed provider/modelVariable pass-through at provider rates, no markupDoes not store request bodies/responses when routing; debug logs up to 30 days; no HF training on user dataGenerally usable; underlying model license still governs200+ routed models; examples: openai/gpt-oss-120b, black-forest-labs/FLUX.1-dev, microsoft/harrier-oss-v1-0.6b

Two things to keep in mind. Normalized paid prices are for comparison, not budget forecasting. Pick a specific model ID before you estimate spend. And on router products, commercial viability is partly a function of the underlying model license, not just the platform’s own docs.

What the free tiers actually cost you later

Groq: fast, callable, quota opacity

Groq’s pricing page publishes unusually concrete throughput numbers. Llama 3.1 8B Instant at 840 TPS. GPT OSS 20B at 1,000 TPS. The services agreement goes further than many competitors: Groq is not permitted to use inputs or outputs for training or fine-tuning unless you explicitly allow it.

The catch is governance, not capability. Public docs confirm a Free Plan exists, but they do not publish a clean table of exact free-plan RPM/TPM by model. If you need predictable public quotas for a launch announcement, Groq looks less transparent than it feels in practice. For a side project where you just want a working backend today, it is still one of the best zero-cost starting points here.

Gemini: strong model quality, worse data policy on free

Google’s pricing page clearly shows a real free tier for Gemini 2.5 Flash, plus useful daily free grounding quotas. The practical downside is that Google now pushes exact live rate limits into AI Studio rather than publishing a fully static interactive limit table.

The bigger caveat is data policy. The same pricing page marks free-tier Gemini usage as used to improve Google products. Paid tier usage is marked No. For a hobby bot or internal tool, I can live with that. For customer content or sensitive prompts, it is a serious reason to move to paid or pick Mistral or Groq instead.

Mistral: privacy-first free mode, limits in the console

Mistral’s docs say API data sent through the API is not used for model training. Free mode needs no credit card. That is the cleanest privacy story among the standing-free providers in this set.

The trade-off is opacity. Mistral positions free mode for evaluation and prototyping. Exact RPS, TPM, and monthly token ceilings live in Admin > Limits, not in a public table. Fine for prototyping. Slightly harder to pick as the one backend for a public side project if you expect bursts.

Cloudflare Workers AI: edge-native and well documented

Workers AI gives you 10,000 neurons per day free, and unlike several competitors it publishes task-level RPM limits in the docs. Text generation defaults to 300 RPM. Cloudflare also says it does not use your content to train models or improve services without explicit consent.

The limitation is economics. Neuron accounting is less intuitive than tokens, and the free allocation disappears quickly on larger models or longer generations. If your app already sits on Workers, the operational simplicity is excellent. If not, Groq or Mistral are usually easier to wire up.

OpenRouter: real free lane, not a primary backend

OpenRouter’s free plan is real: 25+ free models, 50 requests per day, 20 RPM, and openrouter/free as a single router across a rotating pool. Excellent for experimentation, model comparison, and rough production fallback.

It is less convincing as your primary always-free backend. Fifty requests per day is easy to outgrow. The 1,000/day free-model limit only appears once you preload at least $10. Privacy is strong on the OpenRouter side (ZDR by default unless you opt into logging), but provider logging and training policies vary by endpoint.

Cohere: generous trial, explicit non-production clause

Cohere’s trial is generous enough to learn the platform: 1,000 calls per month, 20 req/min on chat models, access to all models and APIs per the pricing docs. Production pricing for Command R7B is excellent.

The same pricing page says trial keys are not permitted for production or commercial purposes. That single clause makes Cohere trial a non-starter for a shipped side project. Treat it as an evaluation environment that becomes useful once you switch to a production key.

Together: cheap paid, not free

Together’s billing docs are unambiguous: no free trials currently, minimum $5 credit purchase. Some published token rates are very low, and the OpenAI-compatible API is clean. Together belongs in a cheap-paid shortlist, not a free-callable shortlist.

Hugging Face: great router, useless free credit as a backend

Hugging Face Inference Providers is excellent as a control plane: one token, many providers, pass-through pricing, fastest/cheapest/preferred routing, and good privacy defaults on the Hugging Face side. The free allowance is $0.10 per month. Enough to try a few requests. Not enough to run an app people will actually use.

What to wire on Monday

The stacks I would actually start with have simple auth, stable SDKs, and minimal changes to existing app code.

Groq with the official SDK

Groq’s quickstart uses GROQ_API_KEY and the groq SDK. Lowest-friction way to get a fast working backend for chat or coding tools.

import os
from groq import Groq

client = Groq(api_key=os.environ["GROQ_API_KEY"])

resp = client.chat.completions.create(
    model="llama-3.1-8b-instant",
    messages=[
        {"role": "system", "content": "You are a concise backend assistant."},
        {"role": "user", "content": "Return JSON with a title and slug for a post about side-project APIs."},
    ],
)

print(resp.choices[0].message.content)

Use this if you want a free default backend behind a small Node or Python API and you care about latency more than frontier closed-model quality.

OpenRouter with an OpenAI-compatible client

OpenRouter’s API is Bearer-token authenticated and OpenAI-compatible via https://openrouter.ai/api/v1. One integration while you test free models, paid models, and fallback logic.

import OpenAI from 'openai';

const client = new OpenAI({
	baseURL: 'https://openrouter.ai/api/v1',
	apiKey: process.env.OPENROUTER_API_KEY
});

const completion = await client.chat.completions.create({
	model: 'openrouter/free',
	messages: [
		{ role: 'user', content: 'Give me a one-line product description for a weekend SaaS.' }
	]
});

console.log(completion.choices[0].message.content);

Use this if you want one abstraction layer and expect to switch providers later. Do not assume the free plan is enough for user-facing traffic without adding credits.

Cloudflare Workers AI via raw HTTP

Cloudflare’s REST path needs an API token plus account ID. Easy to automate if you already have a Workers account.

curl "https://api.cloudflare.com/client/v4/accounts/$CLOUDFLARE_ACCOUNT_ID/ai/run/@cf/meta/llama-3.1-8b-instruct" 
  -H "Authorization: Bearer $CLOUDFLARE_API_TOKEN" 
  -d '{ "prompt": "Write a short release note for a side-project launch." }'

Use this if your frontend, auth, KV, R2, or edge routing already lives on Cloudflare. Watch the neuron budget.

Cohere for evaluation-only work

Cohere’s ClientV2 chat API is straightforward. The trial key is not allowed for production or commercial use. Useful for model evaluation or internal prototyping, not for shipping a public side project on the free tier.

import os
import cohere

co = cohere.ClientV2(api_key=os.environ["COHERE_API_KEY"])

res = co.chat(
    model="command-r7b-12-2024",
    messages=[{"role": "user", "content": "Summarize the trade-offs of free AI APIs in two sentences."}],
)

print(res.message.content[0].text)

If you only want one practical setup decision: Groq for the fastest route to a real app with zero spend. Mistral for a stronger privacy posture on API prompts. Cloudflare Workers AI if your stack is already Cloudflare-native. OpenRouter later as a fallback layer or experimentation surface.

Where each option fits best

NeedBest fitWhyHidden catch
Lowest latency at zero spendGroqHigh published TPS numbers; no training on prompts by defaultExact free-plan quotas not in public docs
Strongest model quality while staying freeGemini APIReal standing free tier on Gemini 2.5 FlashFree tier used to improve Google products; live limits in AI Studio
Best privacy among standing-free APIsMistral Free modeAPI data not used for training; no card requiredLimits console-only; ZDR is Scale-only
Already deploy at the edgeCloudflare Workers AIReal free allocation, published RPM limits, clean Workers integrationNeuron math burns fast on large models
Router or fallback layerOpenRouterOne API, many free and paid models, ZDR defaults50 req/day free unless you preload credits; provider policies vary
Evaluation-only trialCohereGood trial coverage; strong low-cost production models laterTrial explicitly non-commercial and non-production
Cheap paid, not freeTogetherLow serverless catalog pricing; OpenAI-compatible APINo free trial; $5 minimum preload
Unified samplerHugging Face Inference ProvidersMulti-provider UX; no training on routed bodies/responsesFree credit is $0.10/month

My top three for a builder who wants free, callable, and practical:

Groq ranks first. Free API key, official SDKs, OpenAI-like usage patterns, very fast inference, strong no-training default. Main weakness is quota transparency, not capability.

Gemini API free tier ranks second. If you care most about model quality per dollar while still spending zero, Gemini is hard to ignore. It falls behind Groq because Google’s free-tier data-use posture is worse for sensitive workloads and the live limit table lives in AI Studio.

Mistral Free mode ranks third. Not as frictionless as Groq, and free-mode limits are less transparent than Cloudflare’s RPM table. But the API docs say data is not used for training, free mode needs no card, and the paid upgrade path is straightforward. If you already build on Cloudflare Workers, I would swap Mistral for Workers AI as my personal number three.

flowchart TD
    start[Need a free inference API] --> freeHard{Free spend must stay at zero?}
    freeHard -->|No| together[Together: cheap paid from $5]
    freeHard -->|Yes| stack{Where does your app run?}
    stack -->|Cloudflare Workers| cf[Cloudflare Workers AI]
    stack -->|Anywhere else| privacy{Sensitive prompts?}
    privacy -->|Yes| mistral[Mistral Free mode]
    privacy -->|No| quality{Frontier quality over latency?}
    quality -->|Yes| gemini[Gemini API free tier]
    quality -->|No| groq[Groq free API key]
    groq --> router[Add OpenRouter later for fallback]
    gemini --> router
    mistral --> router

For a hobby project shipping this weekend, I would wire Groq first, add OpenRouter as a fallback once traffic is real, and move to paid Gemini or Mistral the moment prompts contain anything I would not paste into a public GitHub issue.

Gaps in the public docs

Several providers keep exact live limits inside dashboards or response headers. That is why Google, Groq, Mistral, Together, and Hugging Face have publicly unspecified fields in the table above. Where the official docs say “view in console,” I used the console as the source of truth instead of community numbers or third-party screenshots.

Normalized paid baselines use a representative low-cost chat model, not an average across all models. That is enough to guide decisions. Not enough for budget forecasting without picking a specific model ID first.

On OpenRouter and Hugging Face, commercial viability depends partly on the underlying model license and provider policy, not just the router’s platform docs. That is why those rows are written more cautiously than Groq’s or Cohere’s.

Sources

Google Gemini

Gemini API pricing
Gemini API rate limits
Google AI Studio

Groq

Groq pricing
Groq rate limits
Groq quickstart
Groq services agreement

Cohere

Cohere pricing
Cohere SDK docs

Together

Together billing
Together pricing

OpenRouter

OpenRouter pricing
OpenRouter quickstart
OpenRouter API reference

Mistral

Mistral platform tiers
Mistral pricing
Mistral data usage opt-out

Cloudflare Workers AI

Workers AI pricing
Workers AI limits
Workers AI REST API

Hugging Face

Inference Providers overview
Inference Providers pricing