Smooth on the Surface: Building a Retrieval Engine in 5 Minutes

By Rich Scudellari

I haven’t written code in 15 years, but I just built a retrieval engine in five minutes, thanks to Ducky.

In college, I studied engineering, but like any good engineer, I went into investment banking after graduating (hopefully, you can feel the sarcasm oozing through the screen). So, it’s been a minute since I even wrote a basic Python script. However, the urge to build has been overwhelming since LLMs exploded onto the scene a few years ago. I see so many opportunities to integrate emerging tools into our workflows at Penny Jar to make us more effective and efficient. So, with Replit as my canvas and GPT4o as my paintbrush, I started to tinker.

It’s been a fun challenge. I’ve built several tools, each time getting a little more sophisticated. But one element has been more challenging than fun: retrieval. A massive component of the promise of LLMs is based on our ability to feed the contextually relevant data to help answer questions or complete a task. To do that, you need to retrieve the data effectively, and that was a problem. I will digress briefly with an abridged version to give you an idea of the tedious journey to build an adequate retrieval system.

When you want to search/retrieve something (let’s assume it’s just text), you have two choices: keyword search and semantic search. Keyword search is exactly what it sounds like: it effectively ranks documents by keyword frequency. Keyword search has been around for a while, and there are many well-understood and well-established open-source libraries that you can leverage (with some work).

Semantic search finds the meaning of a collection of words. Semantic search turns a collection of words (the query) into a numerical representation and finds the closest (or nearest) matches in a database. There are hundreds of choices to be made, tested, adjusted, and tested again. This results in days, if not weeks, of experimentation to find the ideal setup. And it can all change if your data changes.

Unfortunately, the best approach is hybrid search—running both in parallel and then re-ranking (or comparing the results and taking the best of both). All of this work just so you can start building the application leveraging your data. Hopefully, you can appreciate the challenges of building, tweaking, and maintaining a production-level retrieval system. And I’m not alone, swaths of tenured engineers are struggling with the same challenges, and this is where Ducky comes in.

You may think I am exaggerating, but it took only five minutes to build an incredibly powerful retrieval engine with Ducky. I simply wrote a script to move my data from MongoDB to Ducky, then a simple front-end in Flask, and I was querying (with exceptional results) in minutes. It was a relief because all of the complexity was abstracted. It just worked. Importantly, Ducky handles all the complex questions around data preprocessing for you and includes hybrid search and re-ranking out of the box. They make it easy to build a powerful retrieval system that can be the foundation of whatever tool you’re building.

Ducky simplifies the foundational blocking and tackling required to leverage LLMs effectively. In the future, when these LLMs are used ubiquitously, Ducky will be a critical component of infrastructure that will make life easy for any engineer of any caliber.