How We Optimize Development Workflows for Quick Delivery
If you've worked as part of a software team (or even as an individual contributor on a project) you know that keeping issues organized and prioritized can be very challenging. Over the last 3 years of working on cloudtamer.io we've seen the project grow from a proof of concept running as a monolithic application, to a highly-available, enterprise-scale cloud application consisting of more than 20 microservices. Similarly, our team has grown in both size and complexity as we hire new developers and fill entirely new roles to continue delivering a cutting-edge product. In this post, I'll share how we've optimized our development workflows to meet delivery objectives.
Moving Beyond a Rough Idea
When we first started working on our product we had a rough idea of what we wanted to do and how we would do it, but we were not as bound by time and customer demands. With growth, we've had increasing need to add features while at the same time supporting our existing customers and enhancing our underlying application. This need for forward progress, in tandem with product sustainment, has led to our current development methodology.
Trust the Process
Over these 3 years, we've iterated a good process for managing and prioritizing our tasking. Our methodology has evolved a lot and we've learned from our mistakes. I hope giving insight into our current process will be helpful to others on their path to a workable process.
Like many agile-aspiring teams, we work in 2-week sprints. In our case, we have 3 flavors of sprints: "Planning", "Implementation", and "QA". The building blocks of these sprints are "Issues" that range in size from a small text change/bug fix to an entire feature. To help understand the scope of tickets, these issues are assigned a weight. These weight values are integers that represent the approximate number of days needed to complete the issue. Small bugs receive a weight of one and help to offset the scope creep we may see in other issues. By reducing the precision of these values we also avoid some of the compounding errors we might see if we allowed down-to-the-hour estimates.
By adding weights to issues and tracking how long milestones take we have a sense of our capacity for work and how much we can get done during a given milestone. We also added finer-grain time tracking to help us get a sense for how accurate (or inaccurate!) our weights are. By comparing actual time spent to the weights we can see how accurate our estimates are. This can be used to drive the tasking contained in future sprints. More interestingly, this time tracking will also give insight into the efficiency of our development operations. If developers are only able to spend a few hours a day working on their issues than there may be too many other demands on their time.
Once a week the project management team gets together for backlog grooming to discuss newly created issues. These may be bugs found by our QA team, feature changes/requests from our current customers, or new features requested by sales that will help grow our position in the market. During this meeting we discuss the issue to make sure everyone is on the same page and we have weighted it properly. We then add it to a planning sprint, an implementation sprint, or to our backlog.
How We Sprint
During a planning sprint, feature requirements are fleshed out. For small tickets this will simply be in the form of some functional criteria that can be used to validate the ticket's completion. In the case of larger issues we create a PR/FAQ (a Press Release & Frequently Asked Questions document) to fully document feature details and the value proposition and benefit to our customers. The requirements are then discussed in a "3 Amigos" meeting where we bring together members of the development team, the project management team, and the delivery and support team. The goal here is to make sure that the feature is designed in a way that will address customer needs, support our project timeline, and is deemed technically feasible by engineers.
After an issue has gone through a planning sprint it is moved into an implementation sprint. This is when the engineers build out the feature and merge it into our code base. We attempt to break down features into sufficiently small issues that they can be completed in a single sprint. Once issues are completed, we go through a peer review process to ensure the code we release is architected properly and then the code is merged into master to be included in a later release.
Every 2 sprints (4 weeks) we cut a release candidate from master and begin a QA sprint to evaluate the release, conduct additional testing, and confirm that the feature set is complete and conforms to requirements. Once this is complete a new version of cloudtamer.io is released. We do an internal demo so that all employees are aware of product progress.
For every one of our sprints (planning, implementation, and QA) we always conduct a retrospective. This allows us to reflect on the preceding weeks and identify things that worked and those that could be improved. It is through this process of reflection that we have iterated to where we are now, and reflection will continue to drive further improvements to this process.
Making Our Lives Easier
A big part of our mission at cloudtamer.io is to make people's lives easier in the cloud through innovation. This effort definitely starts in-house as we try to constantly improve our internal processes and tooling to make everyone's lives easier and our processes simpler. While the sprint planning/execution steps discussed earlier are largely manual, they tie into a number of automations that provide some leverage internally.
We use GitLab to manage our issues/milestones because of how well it integrates with other parts of our development process. It allows us to track our code evolution, tie those changes back to our project issues, and then test and deploy our code via a CI/CD pipeline.
Not a Pipe[line] Dream
Developers' time is best spent developing. I say this as a developer, but also as someone who doesn't enjoy fiddling around with infrastructure issues. Nine times out of ten a professional DevOps engineer is going to do a better job at these types of tasks than someone writing the underlying code. GitLab's pipeline feature is how we let people focus on the work they do best.
We have pipelines in a number of our internal repositories, but the two I want to touch on here relate to how we release cloudtamer.io to our customers quickly and reliably.
Move Fast and Don't Break Things
We put out one to two releases per week. Our goal is to get features out to customers quickly, and get any bug fixes out even faster. In order to prevent this from taking away from our development the release process must be as simple and reliable as possible. The first half of our pipeline (up to and including testing) will automatically run for any commits made to release, and the second half (all the way out to deploying to our internal environments) will be run as part of the release process.
During testing we ensure the application builds properly and that both our front- and back-end codebases pass lint checks. We then have a collection of tests that ensure our database migrations are safe, that our back end unit/integration tests all pass, and that our front-end e2e test passes. Finally, we perform various security and vulnerability scans on our code and containers to make sure they are free of the latest vulnerabilities.
Once we deem the version "release ready" (which includes some manual QA testing not discussed here) we push the code into a couple of our test environments (most importantly QA and Staging) to make sure the deployment process goes smoothly and that everything is working as expected. From there we move on to our acceptance stages where our deployment is actually uploaded to s3 to be available to customers. With this completed we then deploy to our internal environments that we use to manage our own cloud accounts.
Documentation Pipe as Megaphone
While our release pipeline handles versioning of our code, updates are not pushed to customers until we run our documentation pipeline. This ensures documentation gets updated with every release of the product and has the added benefit of allowing us to release without actually making it public in case we want to have a release sitting in reserve for a specific announcement date (key feature announcement, marketing event, etc).
This pipeline is dramatically less exciting but no less important. When triggered, it will compile all of our updated application documentation as well as our release notes and upload them to our support portal hosted in Zendesk. Within Zendesk, customers have the ability to subscribe to our release notes topics to be notified as soon as a new release is made available.
The process detailed above is far from an exact science, but it has been refined through a lot of iteration. A big issue it has helped us address is endlessly rolling issues: tasks that continue from one sprint to another, never getting done. These types of tasks are incredibly inefficient and also tend to put a drain on morale. By having manageable sprints, the team is able to pull together for a common goal and try to hit or come close to 100% completion. Since adopting this processes sprints have gone much more smoothly. Our issue burn down is now tracking closely with the idealized path for the sprint, and tickets are staying more up to date.
David leads software engineering at cloudtamer.io.