Navigating (large) codebases while keeping your sanity

Navigating codebases

I joined Netflix at the end of September last year, yes that is the now unforgettable year of our lives, 2020. This year 2021, I turn 10 years of working as a software engineer and I thought to combine these two relevant events to write about the merits and perils of navigating existing systems when joining a team. It’s a subject that it’s not sufficiently explored and one that we all we’ll have to face in our careers considering that as soon as software comes into being the needs that addresses changes and 70% of development budget goes into software maintenance. So how do you effectively navigate a new codebase? These are the strategies that I use and the questions I ask myself in the process.

First, let’s set some context. Depending on the role one joins, the team and the business the company is in, the task of navigating an existing codebase will vary widely. When I joined Upwave (formerly Survata), the startup I worked at until last year, my very first task was to fix a long standing bug in a Grails backend codebase. Since that story is not part of this post, I’ll use it to compare it to my task joining the Device Identity Systems team at Netflix (we are hiring!! ) which was taking over existing services and rebuilding them in a new system. You can see those two tasks have very different goals in focus and outcome. This is the place for the proverbial YMMV, Your mileage might vary, adjust accordingly.

Identifying your goals to not get lost (too much)

Trusting your knowledge

Process

Technical process

Exploring new code works fundamentally in two ways. I refer to them as top-down and bottom-up. Although you can always go the random way and open a random file that caught your attention.

In a top down approach I try to find the closest interface the service exposes to clients or callers. Whatever that might be, it can be the interface of a REST or gRPC service or the API documentation or a web page. The first thing that gets called for the service, the outmost interface. I try to look at it from above down. From there you start finding out deeper and deeper on the call stack until you reach a point where the system completes a subtask, like sending a message, calling an external system or saving data to a data store.

Top down approach to code navigation

In the bottom up approach you start by identifying where calls or your data ends, the boundaries of your service, usually some kind of data storage, another service, another call, a messaging is sent, a remote procedure call is initiated. From there, start finding out what calls that up the stack. For instance, let’s say you are working with a codebase in Java for an enterprise web application; you will focus on finding the module(s), files or classes that deal with storage access (e.g DAO pattern) or sending messages and from there lookup by finding usages of these functions. This has the added advantage that you get to see how the data flows in the application or service.

Bottom up approach to code navigation

I don’t particularly endorse one approach or the other, I think it depends on the task itself and how you see systems (do you see the trees first or the forest) and your learning style. I bet you weren’t expecting self-awareness tips in this post so you are a little unsure how we got here. Bear with me.

Knowing these two elements: how you see systems and your primary ways of learning can help in navigating codebases, at least providing starting points with the least frustrating path. Once you have your approach the hunt begins. You are chasing the piece of code closer and closer to the code you need to find. Ideally you also find upstream and downstream calling functions, services and clients, you better figure out early what is going to be affected by your changes if any. The hunt is the messy middle, you are kind of in the wilderness on how long this will take and what tigers will you find on the way. Embrace the messy middle. Once you find “the code”, make sure to get an understanding of the clients and/or callers sufficiently to know the effect that part of the code has on callers, and for all that is precious in your life, make sure to run existing tests (I hope that’s the case) and to add new ones to validate the behavior and your understanding of the system. Additionally, it’s always helpful to run the service, class, application and see the behavior in runtime in an environment that’s not production (ideally). Another tip that might be relevant is to keep notes of parts that you find intriguing or that doesn’t look like they make sense there so you can come back to that later. I keep a folder with project notes where I can come back and search later when it becomes relevant. Some things don’t make sense until we have gathered a deeper understanding of the system.

Making changes

Some additional suggestions:

  • Do several passes of the code, chances are the first time you look at the code it won’t make all the sense in the world.
  • On the first pass, come to an understanding of the code change at a high level.
  • On the subsequent passes, pay more attention to semantic details.

Emotional side of things

Tools

What if you are not given any tasks? So you have more unstructured work ahead? I mentioned at the beginning that my first big project at Netflix was a long term one and kind of unstructured. For others it might be to improve the codebase, refactor for performance, find bottlenecks or find ways to improve the system even if it’s working as expected today. These are all different beasts altogether as you are not looking for specific blocks of code. You need to understand how the system works as a whole and how it communicates with other systems, what dependencies use as well as understand details in the code like language idioms, APIs being used frequently, patterns, sorting algorithms or lists implementations being used. In those cases, I create diagrams of the services, how they interact with each other and look to understand how the flows in the system. I go back to the basics, pen and paper or whiteboard, I look at it like a conversation, what service is talking to what other service and so on down to the function level of the parts I’m interested about.

Von voyage

Resources: new codebase, who dis is a great article on the whole process of joining a new team that covers other aspects not covered here like specific tools and git commands to use in the process. General Guide For Exploring Large Open Source Codebases covers even more commands and tools specific for open source projects.

PS: We are indeed hiring at my org at Netflix, to build next generation systems at a huge scale in the service that all your families and friends know and use. My friends have asked for free subscriptions, to be called for roles if they need a latino bold man who used to be a Math professor and for episodes of telenovelas. Get in touch if you are interested, after all, you’ll get to work with me.

Engineer, sometimes poet. Diversity advocate. VenusIT and VoiceFirst Weekly founder.