Cloudflare AI Demo — Imbila.AI

Input

Ready

Model

Gemma 3 12B

Google · 128K context · Multilingual

Llama 3.3 70B

Meta · Fast FP8 · Strong reasoning

Qwen 2.5 Coder 32B

Alibaba · Code specialist · GPT-4o level

GLM 4.7 Flash

Zhipu AI · 131K context · Fast

Prompt

Stream

Max 500 tokens prompt · 30 req/min

Response

Latency: —

Response will appear here...

Model: — Tokens: — Est. Cost: —

Texts to Embed

Ready

Enter 2–10 text snippets. We'll generate embeddings and show how similar they are to each other.

Similarity Matrix

Latency: —

Similarity scores will appear here...

Model: bge-base-en-v1.5 Dimensions: —

How It Works

What Just Happened

Your prompt was sent to a Cloudflare Worker running on the nearest edge node. The Worker called Workers AI, which routed the request to a GPU in one of 200+ cities. The response streamed back via Server-Sent Events. No server to manage, no GPU to provision. Cost: fractions of a cent.

Zero Cold Start

Workers AI models are always warm. No container spin-up, no Lambda cold starts. First token arrives in milliseconds.

Global by Default

Your request was handled by the nearest edge location. Users in Johannesburg hit Johannesburg GPUs. Users in London hit London.

Pay per Neuron

$0.011 per 1,000 neurons. No minimums, no reservations. 10,000 free neurons/day on the free tier.

Cloudflare AILive from the Edge

Input

Response

Texts to Embed

Similarity Matrix

What Just Happened

Zero Cold Start

Global by Default

Pay per Neuron

Cloudflare AI
Live from the Edge