Tuesday, July 28, 2009

Making Big Problems Smaller

Software engineers, like all other mortals, operate in a continuum between ignorance and certainty. Good engineers are aware of this, and are careful about what they claim to know for sure. Hypotheses are tested by designing experiments that yield results.

...

As a first example, let's say that nearly every day around lunch-time, site performance on your web site slows to a crawl. With only that information, there is a lot you don't know. Good engineers react by first identifying what they do know, then asking questions about what else they think they need to know and can find out. Were there any alerts from any of the affected systems? What was network traffic, CPU usage, disk usage on the system when the problems started happening? What about in the few minutes before? Are there any system errors or messages? What do the logs say?

This is a big problem because it is poorly understood. But notice that just asking "What happened?" doesn't tell you anything. You need to ask questions with answers that provide new information. Such questions transform the initial monolithic, intractable problem into a series of smaller, tractable problems. And each smaller problem in turn yields new results which help you build a hypothesis explaining the big problem.

In the easy case, one or more of these initial questions reveals a smoking gun. But even in the less easy case, where our first investigations don't point to a cause, we already know about a bunch of things that didn't cause the problem. Hopefully there are some well understood additional questions we can investigate. But even if there are not we can backpedal. "How can we find out more from the system when this happens next time?" "Can we reproduce this somewhere? Once? Repeatably?" Hard problems may take longer to solve, but there are always more questions you can ask.

...

Designing software systems benefits from a different style of problem decomposition. At the start of the process you do not know everything about how a system should be built. But you know something. You likely know, for example, how users will access your program. In a browser? Using a dedicated program on the iPhone? As you work on the system, you continually confront all that you don't know -- after all, you are building what doesn't yet exist. This can cause anxiety, or worse (in some cases, actual conniption fits), but it doesn't have to.

Let's imagine a web site that lets customers check the weather in their area. A first stab at a model of this system might look like this:


This picture is ridiculously simplistic, so much so that it may seem useless. Yet it still meets both of our decomposition criteria. It breaks up the monolithic initial problem into three chunks, and it yields results you can act on, in this case some questions you can begin investigating answers to. For example, how will the system know a user's location to show them weather results in their area? What are some data elements about weather it will need to store? Hey, what about data describing users? The current model labels the data cylinder "Weather Data" -- looks like you need to update the model.

You could ask these questions in a vacuum. But as soon as you have a representation of some part of the system, however simple, you can look at that part of the system in greater detail and, crucially, with a clearer sense of "what that thing is." Guided by representations of the system, by metaphors essentially, you can drill down from more abstract to more concretely detailed and back up again without becoming disoriented. Successive decomposition brings structure to what was amorphous -- a solution literally "takes shape" as you work.

So then in the design domain you have a different continuum. Looking at the code, you have local certainty about a particular part of the application, but you are ignorant of the system as a whole. On the other hand, a picture with boxes and arrows offers a global view, at the price of total uncertainty about local behavior in any part of the system. Software engineers move between these poles as they work -- high abstraction/low detail to low abstraction/high detail, and back again.

...

It should be evident at this point that the only way to go about this business of making software is a little at a time. In the beginning, you only know enough to draw a picture and ask some questions. As you answer the questions, partially and bit by bit, you come to know enough to write some code. Back and forth and so on. More code, more questions, more answers, more code.

The current phrase we use to describe this truism is Agile software development, a method of working which assumes that each project starts out confronting the mother (and father) of all monolithic and unknown problems, "What are we building? And, how are we going to build it?" Agile decomposes big problems into problems small enough to code answers to, all the while generating new problems and questions which are decomposed in turn.

The truth is that "What are we building and how" is never completely answered. The answer is evolving over time, so all you ever have is the current best possible answer. This may sound bleak to fans of certainty, but in fact the agile process benefits those paying for and charting the course of a software project. First, every time the project pauses to deliver its current best answer, stakeholders have something real to respond to. They might decide to halt the project. Or release all or part of it to users as is. Or request a series of adjustments to make between now and the next delivery of "current best answer." Again the same pattern -- break a big unsolvable problem into a series of reasonable and smaller ones, each of which yields results. Those results in turn lead to decisions about the next set of problems to be solved, questions to ask, steps to take.

...

We've now come to the portion of our show where we ask what these insights mean to those parts of our lives not governed by 0's and 1's. Well, first, notice that productivity systems like Getting Things Done are essentially advocating this same simple pattern. Take the big problem of being overloaded and unorganized, and the big question of "What should I be doing now?" Decompose it by creating categories and rules for how to categorize and prioritize and act on things.

The pattern applies just as well with less formality. Don't know what you need to get done? Make a list. (God did this in response to the question of "How should a person live their life?" You surely do not have any problems to decompose as hard as that one.) You can always start to transform the unknown into a series of more manageable, better understood steps.

Finally, we can learn from Agile. The first insight is that we must accept uncertainty; you will almost never be certain of what to do, but you will still need to decide. Fortunately Agile also suggests a path toward learning to make better decisions. Identify any actions you can take and questions you can ask. Act on what you can and pursue answers to the questions. This uncovers new actions to take and new questions to answer. Repeat. Repeat again.

"Live and learn," my mom used to say. To which I add, learn to live.

Thursday, June 18, 2009

What is Architecture?

At my job, my title is Architect. Of course in my case this means "thinks about how software and computer systems should work." My uncle, the kind or architect who designs buildings and ballparks, isn't too happy about this dilution of his noble profession's good name.

Until recently, I didn't have much of an answer to his challenge, because I didn't have a clear idea myself of what I meant by "architect" and "architecture" in the context of software applications and the computers that run them.

In the last few months I've been researching different approaches to data storage and retrieval, and that focus has given me my first partial answers to both questions.

Businesses have been building data warehouses for more than 20 years. The first mainstream book on the subject is Bill Inmon's "Building the Data Warehouse" from 1992, and the second standard text on the subject, Ralph Kimball's "The Data Warehouse Toolkit," dates to 1996. Kimball's approach has had a strong hold on the industry since then. So this is not a new problem.

Kimball advocates a series of design choices and implementation techniques that try to make traditional database systems do things that don't come easily to them for very large amounts of data, such as deal with records related to each other in a hierarchy (e.g. - bosses, middle managers, peons actually doing work), and perform computations such as calculating totals and averages on very large numbers of records. It has turned out that while you can do a lot to improve the capacity of this approach to handle more and more data, you can't do enough. Especially as the amounts of data, and the variety of forms it takes, have grown so much since 1996, due in large part (all together now) to the fact that the Internet essentially emerged since Kimball wrote his book.

So, the need for what data warehouses do is more pressing than ever. But the ability to wring more performance out of the current approach is waning. Which means (all together now) new approaches have emerged in the last few years -- ways to achieve the same end results using different tools and different techniques.

One approach getting lots of buzz of late is called MapReduce. The idea here is to forego working with a traditional database altogether, divide up a potentially complex series of processing steps into many small steps, identify groups of steps which can be run at the same time, and spread all these small steps running at the same time over many computers.

Another approach (called a column-store database) is to implement something that looks like a traditional database to other systems but stores data using a different arrangement better suited to the kinds of selections of records and computations that users of data warehouses perform.

So then this is one thing that software and systems "architecture" is -- an approach to solving a problem. Column stores and Hadoop (the MapReduce implementation that currently has the most traction with businesses) start with the same business problem, and provide users the same end results. But each uses very different tools, and builds a very different "machine" with those tools. And each approach is quite different from using a traditional database and building a traditional data warehouse using that database.

Notice that this leads to two more implications. First, architectural "approach" here is very much about a real working system. Architecture is applied. One makes a few fundamental decisions about how a system should make use of software and hardware, and then one applies those decisions to build something that works. Second, the "machine" one builds is a software system, software that uses hardware. At this highest level, it doesn't make sense to separate software architecture and hardware architecture. Because we are talking about approaches to implementing a working system, we need a holistic definition.

One last crucial point is that sometimes the way to solve a hard engineering challenge is to solve a different engineering challenge. People have spent years trying to improve the performance of data warehouses running on traditional databases. It appears there is a fairly firm limit on how much this approach can yield. So then the way to get past the limit is to play a different game entirely -- use a different approach to build something that works the same way for users. Architecture is about how the system is built, not what it does.

Which leads us to a description of what an architect is. If architecture is an approach to solving a problem, then architects come up with approaches, define how a system using that approach should be built, and guide a team that builds that system.

Although this blog tries to be relevant to technical and non-technical readers, much of the above discussion is pretty technical. So, are there ideas here that apply more generally? Well (all together now) sure there are.

First, it's useful to be reminded once again that theory is theoretical and practice practical. An approach to solving problems proves its value only when it is actually used to solve something. Second, what matters is solving the human problem. We build machines to help us do something. If the machine works, we don't care how it works. This gives us license to try completely different approaches as long as they yield the same results. Sometimes to win you have to play a different game.

Wednesday, June 17, 2009

New Blog

I blogged last year for about six months. I burned out. This time I'm not going to try that hard. When I have something to say, I'll say it, briefly. My hope is that I'll have ideas now and than that are worth your time to read.

One goal of the blog (the one that motivated me to start) is to speak about technical issues in a manner that is useful to technical professionals but also relevant and accessible to others. Technology is all around us and affects all of us profoundly. And it is the way of engineering to advance, relentlessly, so things will keep changing. It is my hope that this blog will help you make sense of some of these changes in some small way.