So far in this project, I’ve been building a system that can scan all my blog posts, documents, and notes, extract the useful stuff, and make it searchable via natural language. The aim is to get something that works like a real-time assistant — answering questions using my own content as the source.
We’re now at Stage 4. Here’s what’s happened so far, and what comes next.
What’s Working
There are two sides to this system:
- One piece of code handles data ingestion. It scans my files, pulls out the text, and stores it in Pinecone (a vector database).
- The other piece lets me query that data using natural language.
The ingestion script (PopulateChatSystemDataRepository.py
) currently runs manually — mostly because I’m trying to avoid hitting API rate limits. Eventually, it’ll move to Google Cloud Run so it runs continuously without needing my laptop open.
On the querying side, I started with basic keyword search. It was fine, but not great — too brittle. Now I’m using embedding-based retrieval with Pinecone, which is far better at handling fuzzier, more conversational queries.
The current setup includes a FastAPI service deployed on Cloud Run. It accepts queries via a simple URL. For example:
/search/?query=what%20does%20a%20good%20PPC%20text%20advert%20look%20like?
Type that into a browser, and it returns a relevant result from my content. It’s rough around the edges, but it works.
Why Speech-to-Text Is in the Mix
You might notice I’ve already wired in Google’s Speech-to-Text API — even though I’m still in the text-only phase. That’s for later. Eventually, I want this system to handle real-time conversations — voice in, answers out. But for now, I’m keeping things simple.
What’s Next: A Text Interface
This is the next step. I want to build a simple text interface — something that lets me talk to the system like an old-school text adventure game. No need for a fancy UI yet. Just a clean loop where I type a question, the system replies, and I can keep the conversation going.
Why this? Because before I worry about polish, I want to know the core experience works — the retrieval is accurate, the flow makes sense, and I can actually use it.
The checklist:
- Add all my content to the system (done)
- Build a basic interface for interaction (next)
That’s Stage 4. Getting the interface up and running is the next focus — and from there, it gets a lot more interesting.
Leave a Reply