Skip to content

the-full-stack/ask-fsdl

Repository files navigation

🥞🦜 askFSDL 🦜🥞

askFSDL is a demonstration of a retrieval-augmented question-answering application.

You can try it out via the Discord bot frontend in the Full Stack Discord!

We use our educational materials as a corpus: the Full Stack LLM Bootcamp, the Full Stack Deep Learning course, and the Opinionated LLM++ Lit Review.

So the resulting application is great at answering questions like

  • Which is cheaper: running experiments on cheap, slower GPUs or fast, more expensive GPUs?
  • How do I build an ML team?
  • What's a data flywheel?
  • Should I use a dedicated vector store for my embeddings?
  • What is zero-shot chain-of-thought reasoning?

EXPERIMENTAL: run it yourself

This project is under rapid development, so expect sharp edges while setting it up in your environment.

Thanks to community contributions, we can share a best-effort guide to running the application yourself here.

Note that this application uses cloud services. For most of these services, regular usage of the app will fall under the free tier. However, OpenAI API calls can easily become expensive, so make sure to se usage limits to prevent surprise bills.

Stack

We use langchain to organize our LLM invocations and prompt magic.

We stood up a MongoDB instance on Atlas to store our cleaned and organized document corpus. See the Running ETL to Build the Document Corpus notebook for details.

For fast search of relevant documents to insert into our prompt, we use a FAISS index.

We host the application backend on Modal, which provides serverless execution and scaling. That's also where we execute batch jobs, like writing to the document store and refreshing the vector index.

For creating a simple user interface in pure Python, we use Gradio. This UI is great for quick tests without deploying a full frontend but with a better developer experience than curl-ing from the command line.

We host the Discord bot on Modal as well, relying on Discord's interactions endpoints to run the bot serverlessly.

We use Gantry to monitor model behvaior in production and collect feedback from users.