Learning RAG Systems My Journey from Beginner to Real Understanding
When I first started exploring AI, I kept seeing words like RAG, embeddings, and vector databases. Everyone seemed to throw these terms around as if everyone magically understood them. The tutorials I found would show a simple code snippet, maybe a quick chatbot demo, and call it a day.
But no one really explained how RAG systems actually work under the hood.
I didn’t want to just use AI. I wanted to actually understand it. And if you’re anything like me, you probably feel the same way.
Why Most AI Articles Felt Empty
Most AI articles today either:
- Show one tiny example without explaining how the pieces connect
- Talk in buzzwords that don’t actually teach you anything real
I realized very quickly that understanding AI (especially RAG) means going deeper. Not just copying code.
So I decided to slow down and ask simple but real questions:
- What is chunking?
- What is an embedding?
- How do vector databases actually work?
- How do all these fit together to make a RAG system?
What I Learned About RAG (Retrieval-Augmented Generation)
Here’s the real flow, now that I understand it:
Knowledge Base
Start with your documents (Markdown files in my case).
const EMBED_MODEL = 'nomic-embed-text';
const CHAT_MODEL = 'mistral';
const DOCS_DIR = './docs';
async function loadDocs() {
const files = await fs.readdir(DOCS_DIR);
const db = [];
for (const file of files) {
const content = await fs.readFile(path.join(DOCS_DIR, file), 'utf-8');
const chunks = chunkText(content, 500);
for (const chunk of chunks) {
const embedding = await getEmbedding(chunk);
db.push({ chunk, embedding });
}
}
console.log(`Embedded ${db.length} text chunks.`)
return db;
}
Chunking
Break those documents into small, meaningful parts.
export function chunkText(text: string, size = 500) {
const chunks = [];
for (let i = 0; i < text.length; i += size) {
chunks.push(text.slice(i, i + size));
}
return chunks;
}
Embeddings
Use an embedding model to turn each chunk into a vector (a list of numbers that captures the meaning).
async function getEmbedding(text: string) {
const res = await axios.post('http://localhost:11434/api/embeddings', {
model: EMBED_MODEL,
prompt: text,
});
return res.data.embedding;
}
How Embedding Works
An embedding model reads a piece of text and converts it into a list of numbers that capture the meaning of that text. In simple terms:
Text | Vector (Example Numbers) |
---|---|
Reset password | [0.24, 0.51, 0.13, …] |
Recover account password | [0.23, 0.50, 0.15, …] |
Launch a rocket | [-0.64, 0.19, 0.93, …] |
- Similar meanings (like “Reset password” and “Recover account password”) produce similar vectors.
- Different meanings (like “Launch a rocket”) produce different vectors.
This way, we can find meaning-based matches, not just exact word matches.
Store Embeddings
Save these vectors into a vector database (like ChromaDB or even a local JSON file for now). In my case, I’m storing the embeddings in memory.
Querying
Now, we can search for similar chunks based on the vectors.
async function findRelevantChunks(db: string[], query: string) {
const queryEmbedding = await getEmbedding(query);
const scored = db.map(entry => ({
chunk: entry.chunk,
score: cosineSimilarity(queryEmbedding, entry.embedding),
}));
scored.sort((a, b) => b.score - a.score);
return scored.slice(0, 3).map(x => x.chunk);
}
User Question
When a user asks something, embed the question too.
async function askMistral(context: string, question: string) {
const prompt = `Use the following context to answer:\n\n${context.join('\n\n')}\n\nQuestion: ${question}`;
const res = await axios.post('http://localhost:11434/api/generate', {
model: CHAT_MODEL,
prompt: prompt,
stream: false,
});
return res.data.response.trim();
}
const context = await findRelevantChunks(db, question);
const answer = await askMistral(context, question);
Similarity Search
Find the most similar chunks from the vector database (JSON file in my case).
function cosineSimilarity(vecA: number[], vecB: number[]): number {
const dot = vecA.reduce((sum, a, idx) => sum + a * vecB[idx], 0);
const normA = Math.sqrt(vecA.reduce((sum, a) => sum + a * a, 0));
const normB = Math.sqrt(vecB.reduce((sum, b) => sum + b * b, 0));
return dot / (normA * normB);
}
Important Insight About Search:
In this simple code, we’re manually calculating the cosine similarity between vectors to find the best matches. This is exactly what a real vector database like ChromaDB, Pinecone, or Milvus would do internally. The only difference is:
What we’re doing manually | What a Vector Database does |
---|---|
Calculate cosine similarity between vectors | Same (but optimized, super fast) |
Sort by score and pick top results | Same |
Works for small projects | Scales to millions of vectors |
In short:
cosineSimilarity = Brain of vector search.
Manual search = Fine for small apps.
Vector DB = Needed for big, fast production apps.
Answer
async function generateAnswer(context: string[], question: string): Promise<string> {
const prompt = `Use the following context to answer:\n${context.join('\n')}\n\nQuestion: ${question}`;
const res = await axios.post('http://localhost:11434/api/generate', {
model: 'mistral',
prompt,
stream: false,
});
return res.data.response.trim();
}
In simple words
Markdown Docs --> Chunking --> Embedding --> Save to Vector DB
User asks Question --> Embed Question --> Search DB --> Retrieve Chunks --> Give to Model --> Get Answer
This flow is now so clear to me that it’s honestly shocking how badly most tutorials explain it.
Why Chunking and Embeddings Matter More Than People Think
What I also realized:
If your chunking is bad (random cuts, huge blocks), your retrieval will be bad.
If your embeddings are low quality (bad models), even good chunks won’t match properly.
Garbage chunks = Garbage retrieval = Garbage answers.
Good chunking + good embeddings = sharp, precise answers, even without a fancy model.
Most of the real “magic” in RAG systems isn’t in the model. It’s in how you prepare and retrieve the right knowledge.
Where I’m Heading Next
Now that I understand the pieces properly, I’m focusing on building:
- Smart document RAG systems using Markdown files as the knowledge base.
- Clean chunking strategies (splitting by sections, logical grouping).
- Embedding everything into a fast vector database like ChromaDB.
I’m also working on a simple tool to help you:
Final Thoughts
If you’re learning AI today, my advice is simple:
Slow down. Understand the basics first.
Don't get distracted by shiny new tools.
Understand what **chunking** and **embedding** really mean.
Build small systems, validate your learning, and THEN scale.
(This was a personal journey into RAG systems. I’ll be writing more soon about the small real-world apps I’m building around this knowledge.)