Tue. May 12th, 2026

How a Decade of Open-Source Contribution Prepared Me to Create My Own SDK

iStock 1039072216


iStock 1039072216
iStock 1039072216

Years before I created MultiCloudJ, an open-source Java SDK for multi-cloud development, I spent a long apprenticeship submitting patches to projects I didn’t own.

My first patch to a major open-source database took a few days to design and several weeks to merge. Three maintainers reviewed it. Two asked me to rethink the approach to play safer with existing deployments. I fixed Apache HBase’s WAL replication pipeline where ChainWALEntryFilter was passing through empty entries after all cells had been filtered out, causing useless data to replicate across clusters. Reviewers pushed back on my initial approach of adding a boolean flag to the existing core filter. They pointed out the change would require rebuilding HBase to toggle, and more fundamentally, that changing a shared class’s default behavior could silently break existing replication setups. The design evolved over multiple iterations across a 19-day review cycle, ultimately landing as a separate opt-in filter class (ChainWALEmptyEntryFilter)  that operators could configure per replication peer without touching core code.

The Discipline of Writing Code

That experience set the tone for everything that followed. Over several years, I contributed to projects across the Apache ecosystem and other open-source efforts, including distributed databases, query engines, data processing frameworks, and cloud abstraction libraries. What I learned in those years had less to do with algorithms and more to do with the discipline of writing code that thousands of people depend on, most of whom will never file a bug report but simply just stop using it.

Backwards compatibility is the most underrated constraint in software engineering. At a company, you can coordinate a breaking change across teams with an email and a migration guide. In open source, you can’t. Large Apache projects classify APIs by audience and stability level and require that stable interfaces survive across entire major releases. An API deprecated in one version can’t be removed until the next major release lands. That kind of discipline forced me to stop asking what the cleanest API was and start asking what API I could live with for the next five years. This hit home during my first major contribution to Google’s go-cloud, adding atomic writes to the docstore interface. Because I was touching a core abstraction, I came to the review with three different proposals for the write semantics. Once locked in, those semantics would be nearly impossible to refactor without breaking every downstream consumer. I worked through the tradeoffs with the project’s creator, and we converged on the approach that optimized for long-term sustainability over short-term elegance.

Open-source code review operates on a different frequency than corporate review. Inside a company, reviewers focus on correctness, style, and whether the change meets the immediate requirement. Open-source committers evaluate how a change interacts with features planned for the next two releases. They ask about edge cases in deployment topologies you’ve never encountered. Research on failures in large-scale distributed systems has documented how upgrade failures, partial network partitions, and crash recovery bugs cause some of the most severe outages. Open-source reviewers think in those failure modes by default. They flag what could break two releases from now, not just what fails today. One review on an HBase pull request of mine taught me that lesson directly. I had peeked twice from a queue assuming no side effects, which was true for the implementation at the time. A reviewer pointed out that the assumption was load bearing on internal behavior I didn’t control. If a future contributor changed the queue implementation, my code would silently break. The fix was small. The shift in how I think about implicit contracts has stayed with me.

The ASF ‘lazy consensus’ approach

The governance model of these communities also reshaped how I approach technical decisions. The Apache Software Foundation operates on lazy consensus: a few positive votes with no vetoes is enough to proceed, but a negative vote must include a concrete alternative. There’s no manager to escalate to. You have to persuade people who work at different companies, in different time zones, with different priorities. That habit has stayed with me; when I disagree with a technical direction on any team, I write up what I would do instead and why, before raising the objection.

Engineers who want to contribute to a major project often ask where to start. My advice: read the issue tracker for a month before writing any code. Watch how committers review patches. The Apache Incubator’s guidance on community governance lays out how these communities function: decisions happen on the mailing list, merit is earned through sustained contribution, and vendor neutrality is enforced deliberately. Understanding that culture before you submit your first patch saves months of frustration.

When you do start, pick a problem you’ve actually encountered. My most productive contributions came from bugs I hit while running distributed databases at scale. I understood the failure path because I’d traced it through production logs. That context gave my patches credibility that a cold contribution wouldn’t have had. One production issue I still think about was a set of MapReduce jobs handling HBase data migration were running out of memory on critical business operations, and the retries were failing too. The cause was unnecessary memory usage in the job pipeline and the fix shipped as HBASE-24859. Another came out of debugging Spark jobs that were silently deadlocking, timing out, and getting retried, which was adding millions of dollars in cloud spend before anyone noticed, and that fix shipped as SPARK-39283.

A decade of contributing to open-source infrastructure projects taught me how to think about ambiguity, interface longevity, and systems that fail gracefully when the assumptions they were built on turn out to be wrong. Those years also gave me the foundation to start MultiCloudJ, the open-source Java SDK for multi-cloud development I now help maintain. Designing portable APIs across AWS, GCP, and other providers required every habit I had picked up from years of review cycles, lazy consensus debates, and arguments about backwards compatibility. The contribution discipline came first. The SDK was what it built toward. You earn those skills one rejected patch at a time.

By uttu

Related Post

Leave a Reply

Your email address will not be published. Required fields are marked *