Code as Infrastructure8 czerwca, 2021
A few months ago, I was asked if there were any older technologies other than COBOL where we were in serious danger of running out of talent. They wanted me to talk about Fortran, but I didn’t take the bait. I don’t think there will be a critical shortage of Fortran programmers now or at any time in the future. But there’s a bigger question lurking behind Fortran and COBOL: what are the ingredients of a technology shortage? Why is running out of COBOL programmers a problem?
The answer, I think, is fairly simple. We always hear about the millions (if not billions) of lines of COBOL code running financial and government institutions, in many cases code that was written in the 1960s or 70s and hasn’t been touched since. That means that COBOL code is infrastructure we rely on, like roads and bridges. If a bridge collapses, or an interstate highway falls into disrepair, that’s a big problem. The same is true of the software running banks.
Fortran isn’t the same. Yes, the language was invented in 1957, two years earlier than COBOL. Yes, millions of lines of code have been written in it. (Probably billions, maybe even trillions.) However, Fortran and COBOL are used in fundamentally different ways. While Fortran was used to create infrastructure, software written in Fortran isn’t itself infrastructure. (There are some exceptions, but not at the scale of COBOL.) Fortran is used to solve specific problems in engineering and science. Nobody cares anymore about the Fortran code written in the 60s, 70s, and 80s to design new bridges and cars. Fortran is still heavily used in engineering—but that old code has retired. Those older tools have been reworked and replaced. Libraries for linear algebra are still important (LAPACK), some modeling applications are still in use (NEC4, used to design antennas), and even some important libraries used primarily by other languages (the Python machine learning library scikit-learn calls both NumPy and SciPy, which in turn call LAPACK and other low level mathematical libraries written in Fortran and C). But if all the world’s Fortran programmers were to magically disappear, these libraries and applications could be rebuilt fairly quickly in modern languages—many of which already have excellent libraries for linear algebra and machine learning. The continued maintenance of Fortran libraries that are used primarily by Fortran programmers is, almost by definition, not a problem.
If shortages of COBOL programmers are a problem because COBOL code is infrastructure, and if we don’t expect shortages of Fortran talent to be a problem because Fortran code isn’t infrastructure, where should we expect to find future crises? What other shortages might occur?
When you look at the problem this way, it’s a no-brainer. For the past 15 years or so, we’ve been using the slogan “infrastructure as code.” So what’s the code that creates the infrastructure? Some of it is written in languages like Python and Perl. I don’t think that’s where shortages will appear. But what about the configuration files for the systems that manage our complex distributed applications? Those configuration files are code, too, and should be managed as such.
Right now, companies are moving applications to the cloud en masse. In addition to simple lift and shift, they’re refactoring monolithic applications into systems of microservices, frequently orchestrated by Kubernetes. Microservices in some form will probably be the dominant architectural style for the foreseeable future (where “foreseeable” means at least 3 years, but probably not 20). The microservices themselves will be written in Java, Python, C++, Rust, whatever; these languages all have a lot of life left in them.
But it’s a safe bet that many of these systems will still be running 20 or 30 years from now; they’re the next generation’s “legacy apps.” The infrastructure they run on will be managed by Kubernetes—which may well be replaced by something simpler (or just more stylish). And that’s where I see the potential for a shortage—not now, but 10 or 20 years from now. Kubernetes configuration is complex, a distinct specialty in its own right. If Kubernetes is replaced by something simpler (which I think is inevitable), who will maintain the infrastructure that already relies on it? What happens when learning Kubernetes isn’t the ticket to the next job or promotion? The YAML files that configure Kubernetes aren’t a Turing-complete programming language like Python; but they are code. The number of people who understand how to work with that code will inevitably dwindle, and may eventually become a “dying breed.” When that happens, who will maintain the infrastructure? Programming languages have lifetimes measured in decades; popular infrastructure tools don’t stick around that long.
It’s not my intent to prophesy disaster or gloom. Nor is it my intention to critique Kubernetes; it’s just one example of a tool that has become critical infrastructure, and if we want to understand where talent shortages might arise, I’d look at critical infrastructure. Who’s maintaining the software we can’t afford not to run? If it’s not Kubernetes, it’s likely to be something else. Who maintains the CI/CD pipelines? What happens when Jenkins, CircleCI, and their relatives have been superseded? Who maintains the source archives? What happens when git is a legacy technology?
Infrastructure as code: that’s a great way to build systems. It reflects a lot of hard lessons from the 1980s and 90s about how to build, deploy, and operate mission-critical software. But it’s also a warning: know where your infrastructure is, and ensure that you have the talent to maintain it.