Tutorial

How to Build RAG with Supabase pgvector and Next.js

Abe Reyes

•February 6, 2026•9 min read

How to Build RAG with Supabase pgvector and Next.js

Retrieval-Augmented Generation (RAG) is transforming how we build AI applications. Instead of relying solely on an LLM's training data, RAG systems retrieve relevant information from your own data sources and inject it into the prompt. This gives you accurate, up-to-date answers without the hallucinations that plague pure LLMs.

In this guide, I'll show you how to build a production-ready RAG system using Supabase pgvector and Next.js. We'll cover everything from database setup to streaming chat interfaces, with real code you can ship today.

What is RAG and Why Use Supabase?

RAG stands for Retrieval-Augmented Generation. Here's the simplified flow:

User asks a question
Your app converts the question into a vector embedding
You search your database for similar content using vector similarity
You inject the relevant content into the LLM's context
The LLM generates an answer based on your actual data

Traditional approaches use specialized vector databases like Pinecone or Weaviate. But for most projects, Supabase pgvector is a better choice:

One database for everything: Store vectors alongside your relational data
Lower costs: No separate vector database subscription
Familiar SQL: Write queries you already understand
Real-time subscriptions: Built-in WebSocket support
Row-level security: Protect user data at the database level

I've used this pattern for client documentation search, customer support chatbots, and internal knowledge bases. For projects under 1 million vectors, Supabase pgvector performs beautifully.

Architecture Overview

Here's what we're building:

User → Next.js API Route → Supabase pgvector → OpenAI
         ↓                    ↓                  ↓
    Embed query      Find similar docs    Generate answer

Component breakdown:

Embedding table: Stores document chunks with their vector embeddings
API routes: Handle embedding generation and similarity search
Chat interface: Streams responses to the user
Background jobs: Process and embed new content

The beauty of this architecture is that each piece is independently testable and replaceable. Need a different LLM? Swap OpenAI for Anthropic. Want semantic caching? Add Redis. The core pattern stays the same.

Step 1: Setting Up Supabase pgvector

First, enable the pgvector extension in your Supabase project:

-- Run in Supabase SQL Editor
create extension if not exists vector;

-- Create embeddings table
create table embeddings (
  id uuid primary key default gen_random_uuid(),
  content text not null,
  embedding vector(1536), -- OpenAI ada-002 dimension
  metadata jsonb default '{}'::jsonb,
  created_at timestamp with time zone default now()
);

-- Create index for similarity search
create index on embeddings using ivfflat (embedding vector_cosine_ops)
with (lists = 100);

-- Enable Row Level Security
alter table embeddings enable row level security;

-- Policy: Everyone can read embeddings
create policy "Public embeddings are viewable by everyone"
  on embeddings for select
  using (true);

The ivfflat index is crucial for performance. It uses approximate nearest neighbor search, which is 10-100x faster than exact search for large datasets. The lists parameter controls the trade-off between speed and accuracy. Start with 100 and increase if you have millions of vectors.

Save this as a Supabase migration in supabase/migrations/.

Step 2: Generating Embeddings in Next.js

Create an API route to process documents and generate embeddings:

// app/api/embeddings/generate/route.ts
import { NextRequest, NextResponse } from 'next/server';
import { Configuration, OpenAIApi } from 'openai';
import { createClient } from '@supabase/supabase-js';

const openai = new OpenAIApi(
  new Configuration({ apiKey: process.env.OPENAI_API_KEY })
);

const supabase = createClient(
  process.env.NEXT_PUBLIC_SUPABASE_URL!,
  process.env.SUPABASE_SERVICE_KEY!
);

// Split text into chunks (simple implementation)
function chunkText(text: string, maxLength: number = 1000): string[] {
  const chunks: string[] = [];
  const paragraphs = text.split('\n\n');
  let currentChunk = '';

  for (const paragraph of paragraphs) {
    if ((currentChunk + paragraph).length > maxLength) {
      if (currentChunk) chunks.push(currentChunk.trim());
      currentChunk = paragraph;
    } else {
      currentChunk += '\n\n' + paragraph;
    }
  }

  if (currentChunk) chunks.push(currentChunk.trim());
  return chunks;
}

export async function POST(req: NextRequest) {
  const { content, metadata } = await req.json();

  // Split content into chunks
  const chunks = chunkText(content);

  // Generate embeddings for all chunks
  const embeddingsData = await Promise.all(
    chunks.map(async (chunk) => {
      const response = await openai.createEmbedding({
        model: 'text-embedding-ada-002',
        input: chunk,
      });

      return {
        content: chunk,
        embedding: response.data.data[0].embedding,
        metadata,
      };
    })
  );

  // Insert into Supabase
  const { error } = await supabase
    .from('embeddings')
    .insert(embeddingsData);

  if (error) {
    return NextResponse.json({ error: error.message }, { status: 500 });
  }

  return NextResponse.json({
    success: true,
    chunksProcessed: chunks.length,
  });
}

Pro tip: OpenAI's API has rate limits. For large document sets, implement a queue system like BullMQ or use background jobs with Vercel Cron. I cover resilient API patterns in my post on building a circuit breaker pattern.

Step 3: Similarity Search Implementation

Now create the search endpoint:

// app/api/embeddings/search/route.ts
import { NextRequest, NextResponse } from 'next/server';
import { Configuration, OpenAIApi } from 'openai';
import { createClient } from '@supabase/supabase-js';

const openai = new OpenAIApi(
  new Configuration({ apiKey: process.env.OPENAI_API_KEY })
);

const supabase = createClient(
  process.env.NEXT_PUBLIC_SUPABASE_URL!,
  process.env.SUPABASE_SERVICE_KEY!
);

export async function POST(req: NextRequest) {
  const { query, matchCount = 5, threshold = 0.5 } = await req.json();

  // Generate embedding for user query
  const embeddingResponse = await openai.createEmbedding({
    model: 'text-embedding-ada-002',
    input: query,
  });

  const queryEmbedding = embeddingResponse.data.data[0].embedding;

  // Similarity search using pgvector
  const { data: matches, error } = await supabase.rpc('match_embeddings', {
    query_embedding: queryEmbedding,
    match_count: matchCount,
    match_threshold: threshold,
  });

  if (error) {
    return NextResponse.json({ error: error.message }, { status: 500 });
  }

  return NextResponse.json({ matches });
}

Create the matching function in Supabase:

-- Run in Supabase SQL Editor
create or replace function match_embeddings(
  query_embedding vector(1536),
  match_count int default 5,
  match_threshold float default 0.5
)
returns table (
  id uuid,
  content text,
  metadata jsonb,
  similarity float
)
language sql stable
as $$
  select
    embeddings.id,
    embeddings.content,
    embeddings.metadata,
    1 - (embeddings.embedding <=> query_embedding) as similarity
  from embeddings
  where 1 - (embeddings.embedding <=> query_embedding) > match_threshold
  order by embeddings.embedding <=> query_embedding
  limit match_count;
$$;

The <=> operator performs cosine distance calculation. We subtract from 1 to get similarity scores (higher is better).

Performance tips:

Cache embeddings for common queries using Redis
Adjust match_threshold to balance relevance vs. retrieval count
Monitor query times and increase index lists if searches slow down
Consider hybrid search (keyword + vector) for better results

Step 4: Building the Chat Interface

Now tie it together with a streaming chat interface using Vercel AI SDK:

// app/api/chat/route.ts
import { OpenAIStream, StreamingTextResponse } from 'ai';
import { Configuration, OpenAIApi } from 'openai';

const openai = new OpenAIApi(
  new Configuration({ apiKey: process.env.OPENAI_API_KEY })
);

export async function POST(req: Request) {
  const { messages } = await req.json();
  const lastMessage = messages[messages.length - 1].content;

  // Get relevant context from embeddings
  const searchResponse = await fetch(
    `${process.env.NEXT_PUBLIC_APP_URL}/api/embeddings/search`,
    {
      method: 'POST',
      headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify({ query: lastMessage }),
    }
  );

  const { matches } = await searchResponse.json();

  // Build context from matches
  const context = matches
    .map((match: any) => match.content)
    .join('\n\n---\n\n');

  // Inject context into system message
  const systemMessage = {
    role: 'system',
    content: `You are a helpful assistant. Use the following context to answer the user's question. If the answer isn't in the context, say so.

Context:
${context}`,
  };

  // Generate streaming response
  const response = await openai.createChatCompletion({
    model: 'gpt-4',
    messages: [systemMessage, ...messages],
    stream: true,
  });

  const stream = OpenAIStream(response);
  return new StreamingTextResponse(stream);
}

Frontend component:

// app/components/ChatInterface.tsx
'use client';

import { useChat } from 'ai/react';

export default function ChatInterface() {
  const { messages, input, handleInputChange, handleSubmit, isLoading } =
    useChat({
      api: '/api/chat',
    });

  return (
    <div className="flex flex-col h-screen max-w-4xl mx-auto p-4">
      <div className="flex-1 overflow-y-auto space-y-4 mb-4">
        {messages.map((message) => (
          <div
            key={message.id}
            className={`p-4 rounded-lg ${
              message.role === 'user'
                ? 'bg-blue-100 ml-auto max-w-md'
                : 'bg-gray-100 mr-auto max-w-2xl'
            }`}
          >
            <p className="text-gray-800">{message.content}</p>
          </div>
        ))}
      </div>

      <form onSubmit={handleSubmit} className="flex gap-2">
        <input
          value={input}
          onChange={handleInputChange}
          placeholder="Ask a question..."
          disabled={isLoading}
          className="flex-1 p-3 border border-gray-300 rounded-lg focus:ring-2 focus:ring-blue-500"
        />
        <button
          type="submit"
          disabled={isLoading}
          className="px-6 py-3 bg-emerald-600 text-white rounded-lg hover:bg-emerald-700 disabled:opacity-50"
        >
          Send
        </button>
      </form>
    </div>
  );
}

The Vercel AI SDK handles streaming, loading states, and message history automatically. You get a ChatGPT-like experience with minimal code.

Production Considerations

Before launching your RAG system, consider these factors:

Cost estimation:

OpenAI embeddings: $0.0001 per 1K tokens
Storage: ~6KB per embedding (1536 dimensions × 4 bytes)
For 100K documents at 500 tokens each: ~$5 embedding cost, ~600MB storage

Monitoring:

Track search latency (target: <200ms)
Monitor relevance scores (avg similarity >0.7 indicates good matches)
Log failed embeddings for reprocessing
Set up alerts for API rate limits

Optimization:

Cache common queries in Redis (10x faster)
Pre-compute embeddings for frequently accessed content
Use batching for bulk uploads (100-200 documents per batch)
Implement request deduplication to prevent duplicate embeddings

Scaling:

At 1M+ vectors, consider dedicated vector databases
Use read replicas for high-traffic search endpoints
Implement semantic caching to reduce OpenAI costs
Consider fine-tuning embeddings for domain-specific content

Need help building a production RAG system? I specialize in AI-powered applications with Next.js and Supabase. Check out my services or view my portfolio to see similar projects I've shipped.

Next Steps

Once you have basic RAG working, here are advanced patterns to explore:

Hybrid search: Combine vector similarity with full-text search for better results. Use Supabase's built-in FTS alongside pgvector.

Multi-tenant RAG: Add tenant isolation using Row Level Security policies. Each customer's embeddings stay private.

Reranking: Use a reranking model like Cohere to improve result quality after initial retrieval.

Metadata filtering: Add WHERE clauses to your similarity queries to filter by document type, date, or user permissions.

Streaming updates: Use Supabase real-time subscriptions to update embeddings when content changes.

RAG is one of the most practical AI patterns you can implement today. It gives you the power of LLMs without the hallucination risks, and Supabase makes it surprisingly simple to build.

If you're building a RAG system and want expert guidance, get in touch. I help teams ship AI features that actually work in production.

Need Help Getting Things Done?

Whether it's a project you've been putting off or ongoing support you need, we're here to help.

Get Started View Services

How to Build RAG with Supabase pgvector and Next.js

What is RAG and Why Use Supabase?

Architecture Overview

Step 1: Setting Up Supabase pgvector

Step 2: Generating Embeddings in Next.js

Step 3: Similarity Search Implementation

Step 4: Building the Chat Interface

Production Considerations

Next Steps

Tags

Need Help Getting Things Done?