Extending Code Llama
I’ve been diggin' Code Llama lately. It's free, easy to set up, and using Code Llama via Ollama can be a lot of fun. Code Llama has already proved itself useful out-of-the-box; nevertheless, I was curious if I could boost my productivity by using retrieval-augmented generation (or RAG for short). I decided on pointing Code Llama directly at specific code for question answering. I wanted to see if Code Llama could help me understand an underlying code base and help me develop new features.
If you're willing to roll up your sleeves, this is a fairly easy process! All you have to do is write a little Python code.
What's RAG?
Briefly, RAG is technique that combines something specific (like code, documents, images, etc) with a more general LLM. In this way, as Kim Martineau explains, RAG improves the:
quality of LLM-generated responses by grounding the model on external sources of knowledge to supplement the LLM’s internal representation of information.
For instance, if I asked an LLM, such as OpenAI's ChatGPT, to summarize everything I've recently written about leadership, I'd be out of luck. ChatGPT has no knowledge of my recent blog entries as the model was trained before I wrote my articles. Nevertheless, I could provide the model with some relevant information, such as everything amazing I've recently written and then ask the LLM to summarize my leadership articles. Depending on the quality and quantity of the specific information provided, the LLM could surprise you!
In essence, via RAG, you provide the LLM with some specific information, and then the corresponding answers to your questions are more informed through the use of that specific data.
Code Llama and RAG
Leveraging RAG with Code Llama takes about 60 lines of Python code. What's more, I leveraged Ollama along with LangChain. You'll need to ensure that you've both pulled and run the Code Llava LLM via Ollama beforehand. The steps to create a RAG with Code Llama are:
- Create an instance of an LLM.
- Create a prompt (i.e. context for the LLM).
- Load a directory full of code.
- Split the various files from step 3 into chunks.
- Store those chunks in a vector database.
- Create a chain from all the above which generates answers to questions.
I created a simple class named ChatCode
that has three methods: an initialization one (i.e. Python'sinit
), ingest
, and an ask
. The init
method creates an instance of ChatOllama
using the Code Llama LLM and creates a prompt context.
The key aspect of init
is the creation of a PromptTemplate
, which seeds the LLM with specific context (i.e the LLM should act as an expert programmer). The context provided connects the interaction between the LLM and a user. As Harpreet Sahota eloquently states, LangChain's PromptTemplate
class enables
the refined language through which we converse with sophisticated AI systems, ensuring precision, clarity, and adaptability in every interaction.
The magic of RAG happens in the ingest
method where a number of important operations occur. Firstly, a file system path is provided and all Ruby files are loaded into memory via the GenericLoader
class. Those Ruby files are then spit into chunks for efficient indexing into a vector store. The logic here is obviously specific to Ruby and you can easily do the same for Python, Go, Java, or a handful of other languages.
The vector store in this case is Chroma, which by default stores vectors (i.e. documents) in memory. A LangChain Retriever instance is created that returns results from Chroma given a specific query. In this case, a similarity score threshold retrieval is used, which returns a maximum of three documents above a threshold of 0.5. Finally, a LangChain chain
is created that links everything together. This final step is where RAG's power is leveraged as an existing model, like Code Llama, is combined with specific context, such as Ruby code.
With RAG set up, asking a question of a code base couldn't be any easier! The ask
method simply invokes the aforementioned chain
like so:
As you can see, there's not a lot of complex code. It's quite simple! Specific context is loaded into memory and combined with an LLM for a more precise, context-aware answer. In this case, asked questions will be answered by what's found in the underlying indexed code.
Show me the money, baby!
As I have a few different Rails application on hand, I fired up the Python interpreter on my local machine and ingested a simple dictionary application named Locution.
>>> from chat_code import ChatCode
>>> code = ChatCode()
>>> code.ingest("/Users/ajglover/Development/projects/github/locution")
The ingest
method might take a few moments depending on how many files are loaded. Once ingest
returns, you can then invoke the ask
method. I decided to start simple and give CodeChat
a softball question:
While the question is an easy one, it also demonstrates that CodeChat is focused on my Locution code base!
With CodeChat
providing a specific answer, I'm confident that my RAG setup with Code Llama is working correctly. Next, I'll ask a more relevant question for what would be a typical use case for a question-answer style application pointed at a code base. In my case, I want to know about the underlying database.
CodeChat
dutifully tells me that two tables are defined (did I mention this is a simple Rails application?).
The generated response is quite helpful as it describes the details of the two tables words
and definitions
. But I'm not quite satisfied as I'd like to see the DDL. It's no problem though – all I have to do is ask!
CodeChat
gives me what appears as a cogent answer:
I was immediately suspicious of a few odd aspects of the answer, however. While there are indeed two tables, word_id
isn't the primary key of the definitions
table nor is word
the primary key of the words
table. Skepticism towards what an LLM will confidently tell you is a key skill! The good news is that with code, you can usually spot any issues quickly.
Perhaps asking low level database questions isn't necessarily the right level of abstraction required at this point. Afterall, it's a Rails application, so let me see if I can figure out how to use application code to find a specific word:
My RAG-ified Code Llama application produced a detailed answer to the above question, containing some great examples and explanations of the corresponding code.
This type of detailed response with code examples tailored to my specific code base is incredibly helpful. It's akin to reading a tutorial specific to an application. I've easily spent a few hours asking ChatCode
various questions about my simple Rails application and I'd say that 80% of the generated answers are helpful. I suspect that I can extract better answers by providing better questions!
I found that asking questions like "how do I create a controller method that removes a word" and "how do I add a new attribute to the word model" yielded fairly competent responses with detailed elaborate steps. In this way, I can see how an AI coding assistant could be helpful when becoming acclimated with a new code base. Rather than spending a lot of context switching time between my IDE and Googling for answers, this workflow is intuitively more productive.
The legacy of code as a use case
Throughout much of my career, I've had to work on ancient code. This sort of code is often labeled as legacy code. It's frequently mystifying and definitely frustrating when getting started. The above use case of leveraging a RAG that combines a specific code base with an LLM, like Code Llama, could certainly make the experience of familiarizing yourself with legacy code way more easy. What's more, what I've demonstrated here is a command line interface. This experience is less optimal than a snazzy UI or even an integrated IDE experience (like Github's Copilot). Both UI and IDE integration options are entirely possible nevertheless!
AI enhanced productivity is here
Throughout my career, I've often heard complaints about new technologies. Java was too slow in 1997. Mobile devices weren't powerful enough to do anything interesting in the early 2000s. And AI isn't too sophisticated – it hallucinates, frequently making mistakes (as I've shown above). But the history of technology has clearly shown that useful technologies improve. Today's LLMs are the worst they'll be. LLMs can only get better as we learn how to both leverage them and importantly, improve them.
The skeptics of today are right to point out deficiencies! But remember that the deficiencies of today will be solved tomorrow provided the underlying technology is promising. Don't overlook AI because of its shortcomings. It's here to stay and will only become more pervasive and pragmatic!
Want to learn more?
I created a Github repository, aptly named windbag, which has the above code. Moreover, Duy Huynh's article on Hackernoon was extremely helpful in demonstrating the various aspects of RAG. It certainly helped me get started. Furthermore, while I used Code Llama, you can just as easily use OpenAI's ChatGPT LLM using ChainLang's API. It would be interesting to compare and contrast the two!
Can you dig it?