RAG Teacher

A RAG system designed to give teachers a better understanding of what their students a using LLMs for

Inspiration

    While preparing a presentation on Chroma, a vector database, for my Advanced Databases class, I discovered retrieval-augmented generation (RAG). That same week, I was using an LLM to study for a final, but the responses I received weren’t specific enough to my class material. This sparked the idea for a project: create an educational tool powered by RAG to help students study with more tailored responses, provide teachers with insights into student queries, and display the referenced documents for added transparency.

How I built this project

    The first step was to build the RAG model itself. Using LangChain, I connected the Chroma vector database and OpenAI's LLMs. Fine-tuning the system message to ensure the model behaved as intended was particularly rewarding.
    Choosing the front-end framework was simple: I wanted to learn Svelte, which consistently ranks as one of the most loved frameworks in developer surveys. To speed up development, I used shadcn-Svelte components for a polished and functional design. The initial task was to build a file upload page, where students could upload class materials. This process involved several steps: selecting the file displaying the file in an upload table for user review, providing a button to start the upload, uploading the file to the server, passing the file to Chroma, where it was parsed into embeddingsn and then confirming success and clearing the table for the next upload. This was when I realized the project’s scale. Each feature required a lot of moving parts to function seamlessly.
    To store uploaded documents, I integrated an AWS S3 bucket for the first time. The S3 bucket served as a reliable storage solution, allowing students to access their uploaded files alongside the LLM-generated responses.
    I wanted to provide teachers with summaries of what students were struggling with. To achieve this, I stored conversations with the LLM and used the LLM itself to summarize common areas of confusion. While testing showed these summaries were useful, I became intrigued by the idea of clustering student queries using a vector database for more advanced insights—a feature I might explore in the future.

Conclusion

    I learned so many, genuinely cool technologies from this project from LangChain, to OpenAi's API, to S3 buckets, to SvelteKit, and so much more. I'm not sure what my long term plan is for this project, the MVP is complete and yet I hardly feel done with it. I hope that there is some use for it in the future, but if not, it was still well worth my time.

Technologies

  • Svelte

  • SvelteKit

  • Node.js

  • S3 bucket

  • LangChain

  • Chroma

  • OpenAi API

  • shadcn-Svelte

  • TailwindCSS

  • Python