Slide Sage

typescript react node.js redis postgresql docker rag

As a student with classic ADHD symptoms, I learn better independently than during drawn-out lectures. To help others study in a way that suits their learning style, I built Slide Sage, leveraging multimodal language models to tailor interactive, context-utilizing slide shows.

The Features

  • Dynamic Context Retrieval and RAG: Slide Sage lets users ask questions about a given slide. To ensure that these responses are not delivered in a vacuum, I embed students’ queries and retrieve relevant context efficiently with a Pinecone-hosted vector database, allowing users to reference previous slides without overloading the LLM with irrelevant context.
  • Background Task Queue: I didn’t want to process summaries for all slides upfront since a student might not reach the end of their lecture, but I also wanted to avoid making students wait for a summary on every slide. To balance these concerns, I implemented a background task queue using Redis, which processes slide summaries for a fixed number of slides ahead of the student’s current position.
  • Markdown and Latex Rendering: LLMs deliver text in a variety of formats, including markdown and latex, and I ensured that this text is rendered nicely for students as it is streamed.

The Future

An area of LLM research I’ve been exploring is adjusting a model’s response based on user interactions with an app, leveraging these interactions as a form of implicit feedback. I’m building on the concept of coactive learning introduced in this paper to utilize student’s questions as feedback, allowing me to tailor lecture summaries to their taste over time. However, to avoid fine-tuning models for individual users, I am applying the technique of verbal reinforcement learning through prompt engineering described here.

In addition to tailoring responses to individual students, I am working to fine-tune an open source model for the more general task of lecture summarization to reduce my dependency on APIs.