Build a Retrieval-Augmented Generation (RAG) app on Cloud.gov: Quick start with architecture guide

May 20, 2026 · 4 min read

Want to see vector search and RAG in action? Deploy our demo app to your free Cloud.gov sandbox and start asking questions about your documents.

What you get

Document Q&A powered by pgvector and Flan-T5
Runs in the free 1 GB sandbox
Semantic search over Markdown files
No Kubernetes or separate vector database needed

What are vector databases and RAG?

Vector databases store embeddings—numerical representations that capture meaning—instead of just text. This enables semantic search: finding "machine learning systems" when you search for "AI applications" without requiring exact keyword matches. pgvector adds this ability to PostgreSQL, so you don't need a separate vector database service.

RAG (Retrieval-Augmented Generation) combines search with AI generation. Instead of asking a language model to answer from memory, RAG first searches your documents for relevant information. It then generates answers based on that specific context. This reduces hallucinations and grounds responses in your actual data.

Deploy it now

# Clone and deploy
git clone https://workshop.cloud.gov/cloud-gov/platform/rag-demo.git
cd rag-demo
cf login -a api.fr.cloud.gov --sso
cf create-service aws-rds micro-psql my-rag-db
cf push

# Get your URL
cf app rag-demo

First deployment takes 2-3 minutes. Visit your app URL, upload Markdown files, and ask questions.

How it works

Here's what's happening when you ask a question:

The processing flow:

User submits a question through the web interface
Flask app generates an embedding for the query using sentence-transformers
pgvector searches for similar documents using cosine distance
Top matching documents become context for the language model
Embedded Flan-T5 generates an answer (or GSA USAi if configured)
User receives response with source document references

By default, everything stays within Cloud.gov—complete data sovereignty. The dotted line to GSA USAi is optional, which we'll configure next.

Upgrade to GSA USAi

Want faster, better answers? Switch to GSA USAi with three commands:

cf set-env rag-demo LLM_PROVIDER gsa_usai
cf set-env rag-demo LLM_API_KEY your-api-key
cf restage rag-demo

Why switch to GSA USAi?

Characteristics:

Sub-second response times
Significantly better answer quality
Lower memory footprint (under 512 MB)
Queries leave Cloud.gov boundary (sent to GSA's API)
Per-request API costs
Perfect for: Public-facing apps, non-sensitive data, better user experience

This gives you Claude 4.5 Opus quality while keeping the same pgvector search infrastructure.

Add pgvector to your own app

Already have a Cloud.gov PostgreSQL database? Enable vector search.

To connect to your database locally and run psql statements, follow the instructions at https://docs.cloud.gov/platform/services/relational-database/#using-cf-connect-to-service-to-open-a-tunnel

Once connected:

CREATE EXTENSION vector;

Create a table with vector columns:

CREATE TABLE documents (
  id SERIAL PRIMARY KEY,
  content TEXT,
  embedding vector(384)
);

Check out the demo code for embedding and search patterns: https://workshop.cloud.gov/cloud-gov/platform/rag-demo.git

Next steps

Get a free sandbox: https://cloud.gov/sign-up/
Deploy the demo and experiment
Adapt the code for your documents
Need production resources? Contact inquiries@cloud.gov for an Inter-Agency Agreement

The 1 GB sandbox limit works great for testing. Scale up when you're ready.

What you get​

What are vector databases and RAG?​

Deploy it now​

How it works​

Upgrade to GSA USAi​

Why switch to GSA USAi?​

Add pgvector to your own app​

Next steps​