At my job, my title is Architect. Of course in my case this means "thinks about how software and computer systems should work." My uncle, the kind or architect who designs buildings and ballparks, isn't too happy about this dilution of his noble profession's good name.
Until recently, I didn't have much of an answer to his challenge, because I didn't have a clear idea myself of what I meant by "architect" and "architecture" in the context of software applications and the computers that run them.
In the last few months I've been researching different approaches to data storage and retrieval, and that focus has given me my first partial answers to both questions.
Businesses have been building data warehouses for more than 20 years. The first mainstream book on the subject is Bill Inmon's "Building the Data Warehouse" from 1992, and the second standard text on the subject, Ralph Kimball's "The Data Warehouse Toolkit," dates to 1996. Kimball's approach has had a strong hold on the industry since then. So this is not a new problem.
Kimball advocates a series of design choices and implementation techniques that try to make traditional database systems do things that don't come easily to them for very large amounts of data, such as deal with records related to each other in a hierarchy (e.g. - bosses, middle managers, peons actually doing work), and perform computations such as calculating totals and averages on very large numbers of records. It has turned out that while you can do a lot to improve the capacity of this approach to handle more and more data, you can't do enough. Especially as the amounts of data, and the variety of forms it takes, have grown so much since 1996, due in large part (all together now) to the fact that the Internet essentially emerged since Kimball wrote his book.
So, the need for what data warehouses do is more pressing than ever. But the ability to wring more performance out of the current approach is waning. Which means (all together now) new approaches have emerged in the last few years -- ways to achieve the same end results using different tools and different techniques.
One approach getting lots of buzz of late is called MapReduce. The idea here is to forego working with a traditional database altogether, divide up a potentially complex series of processing steps into many small steps, identify groups of steps which can be run at the same time, and spread all these small steps running at the same time over many computers.
Another approach (called a column-store database) is to implement something that looks like a traditional database to other systems but stores data using a different arrangement better suited to the kinds of selections of records and computations that users of data warehouses perform.
So then this is one thing that software and systems "architecture" is -- an approach to solving a problem. Column stores and Hadoop (the MapReduce implementation that currently has the most traction with businesses) start with the same business problem, and provide users the same end results. But each uses very different tools, and builds a very different "machine" with those tools. And each approach is quite different from using a traditional database and building a traditional data warehouse using that database.
Notice that this leads to two more implications. First, architectural "approach" here is very much about a real working system. Architecture is applied. One makes a few fundamental decisions about how a system should make use of software and hardware, and then one applies those decisions to build something that works. Second, the "machine" one builds is a software system, software that uses hardware. At this highest level, it doesn't make sense to separate software architecture and hardware architecture. Because we are talking about approaches to implementing a working system, we need a holistic definition.
One last crucial point is that sometimes the way to solve a hard engineering challenge is to solve a different engineering challenge. People have spent years trying to improve the performance of data warehouses running on traditional databases. It appears there is a fairly firm limit on how much this approach can yield. So then the way to get past the limit is to play a different game entirely -- use a different approach to build something that works the same way for users. Architecture is about how the system is built, not what it does.
Which leads us to a description of what an architect is. If architecture is an approach to solving a problem, then architects come up with approaches, define how a system using that approach should be built, and guide a team that builds that system.
Although this blog tries to be relevant to technical and non-technical readers, much of the above discussion is pretty technical. So, are there ideas here that apply more generally? Well (all together now) sure there are.
First, it's useful to be reminded once again that theory is theoretical and practice practical. An approach to solving problems proves its value only when it is actually used to solve something. Second, what matters is solving the human problem. We build machines to help us do something. If the machine works, we don't care how it works. This gives us license to try completely different approaches as long as they yield the same results. Sometimes to win you have to play a different game.
No comments:
Post a Comment