Build a Retrieval-Augmented Generation (RAG) app on Cloud.gov: Quick start with architecture guide
Want to see vector search and RAG in action? Deploy our demo app to your free Cloud.gov sandbox and start asking questions about your documents.
What you get
- Document Q&A powered by pgvector and Flan-T5
- Runs in the free 1 GB sandbox
- Semantic search over Markdown files
- No Kubernetes or separate vector database needed
What are vector databases and RAG?
Vector databases store embeddings—numerical representations that capture meaning—instead of just text. This enables semantic search: finding "machine learning systems" when you search for "AI applications" without requiring exact keyword matches. pgvector adds this ability to PostgreSQL, so you don't need a separate vector database service.
RAG (Retrieval-Augmented Generation) combines search with AI generation. Instead of asking a language model to answer from memory, RAG first searches your documents for relevant information. It then generates answers based on that specific context. This reduces hallucinations and grounds responses in your actual data.
Deploy it now
# Clone and deploy
git clone https://workshop.cloud.gov/cloud-gov/platform/rag-demo.git
cd rag-demo
cf login -a api.fr.cloud.gov --sso
cf create-service aws-rds micro-psql my-rag-db
cf push
# Get your URL
cf app rag-demo
First deployment takes 2-3 minutes. Visit your app URL, upload Markdown files, and ask questions.
How it works
Here's what's happening when you ask a question:
The processing flow:
- User submits a question through the web interface
- Flask app generates an embedding for the query using sentence-transformers
- pgvector searches for similar documents using cosine distance
- Top matching documents become context for the language model
- Embedded Flan-T5 generates an answer (or GSA USAi if configured)
- User receives response with source document references
By default, everything stays within Cloud.gov—complete data sovereignty. The dotted line to GSA USAi is optional, which we'll configure next.
Upgrade to GSA USAi
Want faster, better answers? Switch to GSA USAi with three commands:
cf set-env rag-demo LLM_PROVIDER gsa_usai
cf set-env rag-demo LLM_API_KEY your-api-key
cf restage rag-demo
Why switch to GSA USAi?
Characteristics:
- Sub-second response times
- Significantly better answer quality
- Lower memory footprint (under 512 MB)
- Queries leave Cloud.gov boundary (sent to GSA's API)
- Per-request API costs
- Perfect for: Public-facing apps, non-sensitive data, better user experience
This gives you Claude 4.5 Opus quality while keeping the same pgvector search infrastructure.
Add pgvector to your own app
Already have a Cloud.gov PostgreSQL database? Enable vector search.
To connect to your database locally and run psql statements, follow the instructions at https://docs.cloud.gov/platform/services/relational-database/#using-cf-connect-to-service-to-open-a-tunnel
Once connected:
CREATE EXTENSION vector;
Create a table with vector columns:
CREATE TABLE documents (
id SERIAL PRIMARY KEY,
content TEXT,
embedding vector(384)
);
Check out the demo code for embedding and search patterns: https://workshop.cloud.gov/cloud-gov/platform/rag-demo.git
Next steps
- Get a free sandbox: https://cloud.gov/sign-up/
- Deploy the demo and experiment
- Adapt the code for your documents
- Need production resources? Contact inquiries@cloud.gov for an Inter-Agency Agreement
The 1 GB sandbox limit works great for testing. Scale up when you're ready.