The world of the microprocessor is one buzzing with innovation, right? Well, maybe, but a few days at the Microprocessor Forum at the end of last year could have had you feeling that the industry was in something of a rut. Faster clock speeds, and increased degrees of parallelism are the staples of the chip makers: evolution, rather than revolution. So what about the prospects for something really different? There is optical computing of course, and Intel Corp and Hewlett-Packard Co’s collaboration, which is thought to be based upon Very Long Instruction Words, but that seems about it. However, while the rest of the world cranks up clock speeds further and further, a few groups are experimenting with processors with no clocks at all. These asynchronous processors do away with the idea of having a single central clock keeping the chip’s functional units in step. Instead, each part of the processor, the arithmetic units, the branch units and so on, all work at their own pace, negotiating with each other whenever data needs to be passed between them.
Deterministic
Today’s computers are clock-based for good reasons – for a start it makes them easier to build. A central clock means that all the components crank together and form a deterministic whole. So it is relatively easy to verify a synchronous design, and consequently to ensure that the chip will work under all circumstances. By comparison, verifying an asynchronous design, with each part working at its own pace, is difficult. But clocks too are proving to have their problems. As the clocks get faster, the chips get bigger and the wires get finer and so it becomes increasingly difficult to ensure that all parts of the processor are ticking along in step with each other. Even though the electrical clock pulses are travelling at a substantial fraction of the speed of light, the delays in getting from one side of a small piece of silicon to the other can be enough to throw out the chip’s operation. Steve Furber, ICL Professor of Computer Engineering at the University of Manchester in the UK, points out various other shortcomings. One of the most interesting is the realisation that the whole of the conventionally clocked chip has to be slowed right down so that the most sluggish function of the most sluggish functional unit does not get left behind. To take a simple example: how long does it take for an arithmetic unit to add two numbers? The answer is that it depends on what the two numbers are. In rare cases additions can slow down quite dramatically due to the pattern of bits generated and the way that the carry-bits have to be handled. Faced with this problem, the conventional designer has two choices. She can throw in some extra circuitry to try to speed up these slow special cases, or alternatively just shrug and slow everything down to take account of the lowest common denominator. Either way the result is that resources are wasted or the chip’s speed is determined by an instruction that may hardly ever be executed. It would be much more sensible if the chip only became more sluggish when a tricky operation was encountered. In addition, conventional processors are becoming increasingly power-hungry. Today’s highest-end chips, like Digital Equipment Corp’s Alpha and the PowerPC 620 give off around 20 to 30 Watts in normal operation. Furber extrapolates current trends to show that by the turn of the millennium, if we were to continue to use 5V supplies, we could expect to see a 0.1 micron processor dissipating 2KW. Of course, we will not be using 5V at 0.1 micron, but reducing the supply to 3V, and then 2V, only reduces the 2KW to 660W and then 330W, which are still very high dissipation levels. He also argues that one of the shortcomings of the conventional clocked processor is that many of the logic gates in the chip are forced to switch states simply because they are being driven by the clock, and not because they are doing any useful work. CMOS gates only consume power when they switch, so removing the clock removes unnecessary power consumption
.
By Chris Rose
Last March, as we saw in CI No 2,570, a team led by Furber succeeded in building a completely asynchronous version of Advanced RISC Machines Ltd’s ARM processor – the chip that powers Apple Computer Inc’s Newton. The result is claimed to be the world’s first asynchronous implementation of a commercial processor – it runs standard ARM binaries. In building the chip, the team had to overcome some formidable problems. Number one was the lack of suitable design tools. Today’s chip-makers rely heavily on computer-aided design to help them lay out their silicon. These tools can indicate whether a design will function in the way intended and can perform quite complex tasks such as spotting and deleting redundant circuitry. Unfortunately, they screw up when applied to asynchronous processors where the criteria of ‘good design’ are different. Take, for example, the problem of ‘race hazards’ and what happens when a signal gets ever so slightly delayed in a chip. Ironically, timing can be more critical within an asynchronous chip than in a conventionally clocked one. Imagine a logic element on a chip with two inputs and one output. If the data to one of the inputs arrives infinitesimally later than the data to the other input, the output will be momentarily wrong. On a conventional chip this does not matter, as long as the glitch has gone by the time the next clock cycle comes around all will be well. But on an asynch chip… oh dear – no clock, and nothing to differentiate between glitch and data. The problem can be overcome by adding a couple of extra transistors to the gate design, but yes, you guessed it – conventional integrated circuit design tools will try to remove them as being redundant. At a higher level, there are other timing problems. It is quite possible for a badly designed processor to deadlock, with one functional unit waiting for data from another, unaware that it will never come. Conceptually, it is the same kind of problem that bedevils anyone writing parallel software for multiprocessor machines. Guaranteeing that the processor never dies in this way is tricky. Although you can run simulations, they only establish that the chip fails to freeze under the simulated conditions, and not that it is totally freeze-free. The upside is that deadlocks make it easy to spot exactly when and where in the processor a problem occurred. Furber points out that when a clocked design hits similar problems it will simply grind on to the end of its run, spitting out an incorrect answer, so that trying to identify exactly where the problem occurred is even more problematic. The resultant asynchronous ARM chip, dubbed Amulet1 is still a little rough around the edges: compared with a conventional ARM6 it is bigger, uses more transistors and draws more power. Bearing in mind that ARM6 is the fourth iteration of a chip based upon well-known RISC principles, that is hardly surprising. The design of Amulet2 is virtually complete, with silicon expected towards the middle of the year. Meanwhile, the existing chip shows some novel properties, in particular an ability to degrade gracefully in performance terms as voltage is decreased and the temperature changes – the test chips worked correctly between -50oC and 120oC, running more slowly as the temperature was increased. So, will asynchronous processors become the next big thing? Even their best friends are cautious.
Mix’n’match chips
Furber’s best guess is that asynchronous techniques will find their way into certain niches: in particular he cites embedded applications such as compact disk error correctors, where the work required is extremely bursty, or something like the Apple Newton, where power-saving requirements make the approach attractive. It is also possible that we will see mix ‘n’ match chips, mainly clocked, but with some asynchronous parts. This is the approach that Advanced RISC Machines itself is considering, with the company’s engineering director suggesting that complete functional blocks, like a multiplier possibly
, may go asynchronous in a couple more product generations. Meanwhile Philips Research Labs in the Netherlands is looking at the technology for signal processing and Ivan Sutherland, the doyen of asynchronous technology has reportedly described an asynchronous implementation of the Sparc architecture. Moreover, as the returns to be gained from tweaking existing RISC designs begin to diminish, p rocessor designers will no doubt continue to keep revisiting all kinds of esoteric technology in search of that elusive competitive edge. Who’s to say that they will not someday decide to throw away their clocks?