Skip to main content

An official website of the United States government

Here’s how you know

Build a Retrieval-Augmented Generation (RAG) app on Cloud.gov: Quick start with architecture guide

· 4 min read

Want to see vector search and RAG in action? Deploy our demo app to your free Cloud.gov sandbox and start asking questions about your documents.

What you get

  • Document Q&A powered by pgvector and Flan-T5
  • Runs in the free 1 GB sandbox
  • Semantic search over Markdown files
  • No Kubernetes or separate vector database needed

What are vector databases and RAG?

Vector databases store embeddings—numerical representations that capture meaning—instead of just text. This enables semantic search: finding "machine learning systems" when you search for "AI applications" without requiring exact keyword matches. pgvector adds this ability to PostgreSQL, so you don't need a separate vector database service.

RAG (Retrieval-Augmented Generation) combines search with AI generation. Instead of asking a language model to answer from memory, RAG first searches your documents for relevant information. It then generates answers based on that specific context. This reduces hallucinations and grounds responses in your actual data.

Deploy it now

# Clone and deploy
git clone https://workshop.cloud.gov/cloud-gov/platform/rag-demo.git
cd rag-demo
cf login -a api.fr.cloud.gov --sso
cf create-service aws-rds micro-psql my-rag-db
cf push

# Get your URL
cf app rag-demo

First deployment takes 2-3 minutes. Visit your app URL, upload Markdown files, and ask questions.

How it works

Here's what's happening when you ask a question:

The processing flow:

  1. User submits a question through the web interface
  2. Flask app generates an embedding for the query using sentence-transformers
  3. pgvector searches for similar documents using cosine distance
  4. Top matching documents become context for the language model
  5. Embedded Flan-T5 generates an answer (or GSA USAi if configured)
  6. User receives response with source document references

By default, everything stays within Cloud.gov—complete data sovereignty. The dotted line to GSA USAi is optional, which we'll configure next.

Upgrade to GSA USAi

Want faster, better answers? Switch to GSA USAi with three commands:

cf set-env rag-demo LLM_PROVIDER gsa_usai
cf set-env rag-demo LLM_API_KEY your-api-key
cf restage rag-demo

Why switch to GSA USAi?

Characteristics:

  • Sub-second response times
  • Significantly better answer quality
  • Lower memory footprint (under 512 MB)
  • Queries leave Cloud.gov boundary (sent to GSA's API)
  • Per-request API costs
  • Perfect for: Public-facing apps, non-sensitive data, better user experience

This gives you Claude 4.5 Opus quality while keeping the same pgvector search infrastructure.

Add pgvector to your own app

Already have a Cloud.gov PostgreSQL database? Enable vector search.

To connect to your database locally and run psql statements, follow the instructions at https://docs.cloud.gov/platform/services/relational-database/#using-cf-connect-to-service-to-open-a-tunnel

Once connected:

CREATE EXTENSION vector;

Create a table with vector columns:

CREATE TABLE documents (
id SERIAL PRIMARY KEY,
content TEXT,
embedding vector(384)
);

Check out the demo code for embedding and search patterns: https://workshop.cloud.gov/cloud-gov/platform/rag-demo.git

Next steps

The 1 GB sandbox limit works great for testing. Scale up when you're ready.

GSA.gov

An official website of the U.S. General Services Administration

Looking for U.S. government information and services?
Visit USA.gov