Sai Umesh

Learning RAG Systems My Journey from Beginner to Real Understanding

11 min read

When I first started exploring AI, I kept seeing words like RAG, embeddings, and vector databases. Everyone seemed to throw these terms around as if everyone magically understood them. The tutorials I found would show a simple code snippet, maybe a quick chatbot demo, and call it a day.

But no one really explained how RAG systems actually work under the hood.

I didn’t want to just use AI. I wanted to actually understand it. And if you’re anything like me, you probably feel the same way.

Why Most AI Articles Felt Empty

Most AI articles today either:

  • Show one tiny example without explaining how the pieces connect
  • Talk in buzzwords that don’t actually teach you anything real

I realized very quickly that understanding AI (especially RAG) means going deeper. Not just copying code.

So I decided to slow down and ask simple but real questions:

  • What is chunking?
  • What is an embedding?
  • How do vector databases actually work?
  • How do all these fit together to make a RAG system?

What I Learned About RAG (Retrieval-Augmented Generation)

Here’s the real flow, now that I understand it:

Knowledge Base

Start with your documents (Markdown files in my case).

const EMBED_MODEL = 'nomic-embed-text';
const CHAT_MODEL = 'mistral';
const DOCS_DIR = './docs';

async function loadDocs() {
  const files = await fs.readdir(DOCS_DIR);
  const db = [];

  for (const file of files) {
    const content = await fs.readFile(path.join(DOCS_DIR, file), 'utf-8');
    const chunks = chunkText(content, 500);

    for (const chunk of chunks) {
      const embedding = await getEmbedding(chunk);
      db.push({ chunk, embedding });
    }
  }
  console.log(`Embedded ${db.length} text chunks.`)
  return db;
}

Chunking

Break those documents into small, meaningful parts.

export function chunkText(text: string, size = 500) {
    const chunks = [];
    for (let i = 0; i < text.length; i += size) {
        chunks.push(text.slice(i, i + size));
    }
    return chunks;
}

Embeddings

Use an embedding model to turn each chunk into a vector (a list of numbers that captures the meaning).

async function getEmbedding(text: string) {
  const res = await axios.post('http://localhost:11434/api/embeddings', {
    model: EMBED_MODEL,
    prompt: text,
  });
  return res.data.embedding;
}

How Embedding Works

An embedding model reads a piece of text and converts it into a list of numbers that capture the meaning of that text. In simple terms:

Text Vector (Example Numbers)
Reset password [0.24, 0.51, 0.13, …]
Recover account password [0.23, 0.50, 0.15, …]
Launch a rocket [-0.64, 0.19, 0.93, …]
  • Similar meanings (like “Reset password” and “Recover account password”) produce similar vectors.
  • Different meanings (like “Launch a rocket”) produce different vectors.

This way, we can find meaning-based matches, not just exact word matches.

Store Embeddings

Save these vectors into a vector database (like ChromaDB or even a local JSON file for now). In my case, I’m storing the embeddings in memory.

Querying

Now, we can search for similar chunks based on the vectors.

async function findRelevantChunks(db: string[], query: string) {
  const queryEmbedding = await getEmbedding(query);
  const scored = db.map(entry => ({
    chunk: entry.chunk,
    score: cosineSimilarity(queryEmbedding, entry.embedding),
  }));
  scored.sort((a, b) => b.score - a.score);
  return scored.slice(0, 3).map(x => x.chunk);
}

User Question

When a user asks something, embed the question too.


async function askMistral(context: string, question: string) {
  const prompt = `Use the following context to answer:\n\n${context.join('\n\n')}\n\nQuestion: ${question}`;
  const res = await axios.post('http://localhost:11434/api/generate', {
    model: CHAT_MODEL,
    prompt: prompt,
    stream: false,
  });
  return res.data.response.trim();
}

const context = await findRelevantChunks(db, question);
const answer = await askMistral(context, question);

Find the most similar chunks from the vector database (JSON file in my case).

function cosineSimilarity(vecA: number[], vecB: number[]): number {
  const dot = vecA.reduce((sum, a, idx) => sum + a * vecB[idx], 0);
  const normA = Math.sqrt(vecA.reduce((sum, a) => sum + a * a, 0));
  const normB = Math.sqrt(vecB.reduce((sum, b) => sum + b * b, 0));
  return dot / (normA * normB);
}

In this simple code, we’re manually calculating the cosine similarity between vectors to find the best matches. This is exactly what a real vector database like ChromaDB, Pinecone, or Milvus would do internally. The only difference is:

What we’re doing manually What a Vector Database does
Calculate cosine similarity between vectors Same (but optimized, super fast)
Sort by score and pick top results Same
Works for small projects Scales to millions of vectors

In short:

cosineSimilarity = Brain of vector search.
Manual search = Fine for small apps.
Vector DB = Needed for big, fast production apps.

Answer

async function generateAnswer(context: string[], question: string): Promise<string> {
  const prompt = `Use the following context to answer:\n${context.join('\n')}\n\nQuestion: ${question}`;
  const res = await axios.post('http://localhost:11434/api/generate', {
    model: 'mistral',
    prompt,
    stream: false,
  });
  return res.data.response.trim();
}

In simple words

Markdown Docs --> Chunking --> Embedding --> Save to Vector DB

User asks Question --> Embed Question --> Search DB --> Retrieve Chunks --> Give to Model --> Get Answer

This flow is now so clear to me that it’s honestly shocking how badly most tutorials explain it.

Why Chunking and Embeddings Matter More Than People Think

What I also realized:

  • If your chunking is bad (random cuts, huge blocks), your retrieval will be bad.

  • If your embeddings are low quality (bad models), even good chunks won’t match properly.

Garbage chunks = Garbage retrieval = Garbage answers.

  • Good chunking + good embeddings = sharp, precise answers, even without a fancy model.

  • Most of the real “magic” in RAG systems isn’t in the model. It’s in how you prepare and retrieve the right knowledge.

Where I’m Heading Next

Now that I understand the pieces properly, I’m focusing on building:

  • Smart document RAG systems using Markdown files as the knowledge base.
  • Clean chunking strategies (splitting by sections, logical grouping).
  • Embedding everything into a fast vector database like ChromaDB.

I’m also working on a simple tool to help you:

Final Thoughts

If you’re learning AI today, my advice is simple:

Slow down. Understand the basics first.
Don't get distracted by shiny new tools.
Understand what **chunking** and **embedding** really mean.
Build small systems, validate your learning, and THEN scale.

(This was a personal journey into RAG systems. I’ll be writing more soon about the small real-world apps I’m building around this knowledge.)


Sai Umesh

I’m Sai Umesh, a software engineer based in India. Working as a DevOps engineer.