Pairing with my good friend Code Llama
Because it’s my bag, I’ve been diving deep into the incredibly exciting and rapidly evolving world of AI. The innovations in the open source arena are astonishing; indeed, you can now run a diverse set of LLMs from the convenience of your laptop. If you are willing to roll up your sleeves, there are plenty of OSS models and frameworks that cost you only the time it takes to set things up.
One popular tool for running open source LLMs locally is Ollama. I started using it recently and I’ve been blown away with how useful, productive, and enlightening common workflows can become if you take the time to explore, understand, and ultimately leverage AI. With Ollama, you can run various LLMs such as Meta's Llama2, Mistral, LLaVA, and my new favorite, Meta's Code Llama.
Meta's Code Llama
Code Llama is an LLM that can use text prompts to generate and discuss code. With Code Llama, you can generate Python, Java, Typescript, and more. As Meta's documentation for Code Llama states:
It has the potential to make workflows faster and more efficient for developers and lower the barrier to entry for people who are learning to code.
Indeed, the evolution of AI enhanced productivity will come in the form of boosted workflows. These workflows will have fewer speed bumps resulting from context switching and accordingly increase productive capacity.
I saw AI's boosting of productivity potential firsthand recently. It’s been quite a long time since I’ve coded in Python (circa 2005!) and in an effort to refresh my memory on the finer aspects of Python, I decided to play some ping-pong. But in this case, I decided that I'd use Code Llama as my coding partner.
After installing Ollama, you need to download the Code Llama LLM like so:
After downloading the LLM, which could take a bit, you’ll be presented with an in-terminal chat-like interface. You prompt Code Llama and it'll magically respond with (mostly) cogent answers.
Ping-pong pair programming
In ping-pong, your partner writes a failing test case. Your job is to write the least amount of code that makes that failing test case pass. Once you’ve done that, you then write a failing test case and your partner in turn writes the least amount of code to make your test case pass. You go back and forth and in the process of playing ping-pong, you'll undoubtedly learn new techniques, styles, and ways of approaching problem solving. Over the years, I've learned valuable lessons when pairing with folks! Ping pong has certainly made me a better programmer. I've even employed ping-pong in the context of interviewing.
I figured I’d get right into AI pair programming by building a simple stack implementation in Python. Stacks commonly have push
and pop
methods which follow a last in, first out (or LIFO) algorithm. push
adds items onto the stack and pop
removes the last item added. I was also aware that out-of-the-box, Python comes with a unit testing framework dubbed unittest (akin to Java's JUnit).
Kicking things off, I began with the following prompt:
As requested, Code Llama promptly returned some code using Python’s unittest framework:
Code Llama additionally described the usage of unittest’s setup
method as well as the usage of assertEqual
and assertRaises
.
Of course, I was provided with way more implementation code than what I really wanted. Nevertheless, the example provided was enough to get me started as the example code quickly got me up to speed on unittest and also informed me that Python has an implementation of a stack data structure via the deque
object.
I copied the code into a file dubbed test_stack.py
and then ran it. Unfortunately, nothing happened. Then I remembered from my Python days of yore that I needed the equivalent of amain
method. But I had forgotten the syntax. No problem, I just asked Code Llama and it told me to insert this:
I added the above code and reran the test. Unsurprisingly, the test errors out as the first assert of self.stack.pop()
raises an IndexError
because the underlying list is initially empty. You can see that the test case, in fact, verifies this case correctly in the last part of the test_pop
method.
Testing, testing, 1, 2, 3
As it turns out, AI (inadvertently) wrote me a failing test! Now all I needed to do was make the test pass with as little code as possible. I figured that I would trim down the test case and simply test for an empty pop
yielding None
.
Next, I wanted to write my own stack implementation and remove deque
. Looking at the TestStack
class, I remembered some important aspects of coding in Python: methods are defined with def
, whitespace is a delimiter, and don't forget about colons. But I couldn't remember exactly how imports work. No problem! All I had to do was ask Code Llama:
I was informed via a nice description that I could create a file, using snake case, and inside of the file define a class. Then in another file, I can import that class using the from
file import
class syntax. Excellent! Next, I needed to define a minimal stack – ideally, with only a pop
method. Might as well ask my groovy fried Code Llama how to define a custom stack in Python:
Code Llama dutifully spat out the following code:
Not only did I get what appears to be working pop
and push
methods, but I also got a peek
and size
method to boot! This is obviously more code than I needed; accordingly, I simplified the code with the intent of making the aforementioned test pass:
When I rerun the test case, things pass! Of course, I've merely written the least amount of code necessary to make the test pass. It's now my friend Code Llama's turn to write a failing test case. This wasn't too hard of a prompt to fashion:
Impressively, Code Llama did exactly what I wanted. It generated a simple test case that will force a failure:
I added another test case to my existing TestStack
, ran it, and sure enough, I've got a failing test. It's time to make it pass. My brutally simple pop
method needs to do some real logic as opposed to returning None
.
Can't pop without a push
Satisfying my friend Code Llama's new test case means I need to flesh out a push
method so I can test popping a value.
My Stack
now has a push
method that adds an item to an internal list (self.stack
) and the pop
method simply returns the top item or None
if the list is empty.
I've refactored my test case slightly and have two passing test cases now.
After a few back and forth interactions with Code Llama and my Python confidence is increased. I can see how this technique would work equally well with a language I wasn't familiar with too.
What a match!
The game of ping-pong need not go on forever. In fact, a few rounds is often enough to get into the flow of things and learn a little from your partner. In my case, I had a lot of fun getting to know how Code Llama approaches problem solving and along the way, it helped re-acquainted me with Python.
I dig ping-pong pairing with an AI and will continue to do it going forward. As with all things AI related, prompting is a crucial skill as well as having a keen eye for code correctness. LLMs will confidently give you an incorrect answer (i.e. code that doesn't work).
In my case, I was using a terminal; nevertheless, there are other in-IDE options ranging from GitHub's Copilot to other innovative tools. You can also take advantage of Ollama and Code Llama via CodeGPT and have yourself some sweet VSCode integration.
Can you dig it, man?