By Timothy Prickett Morgan
More details are emerging about the Blue Gene supercomputer project that IBM announced last week. It seems that Blue Gene will be a much more radical departure from the RS/6000 SP parallel supercomputers that IBM has been selling since 1993. Blue Gene is such a different design that it will not use Power or PowerPC processors designs, will not use SP switch interconnections and it will not run IBM’s AIX Unix variant. The reasons for this are complex, but they all come down to one thing: with a million processors in a single server, IBM has to do everything to simplify the components in Blue Gene and use the cheapest components rather than the latest-greatest technology. If IBM didn’t take this approach, it wouldn’t take very long for its engineers at the TJ Watson Research Center in Yorktown Heights where Blue Gene will be to burn through $100m.
Whenever anyone designs a new computer, the first thing anyone wants to look at is the specs of the processor used in the machine. The Blue Gene processors, says Monty Denneau, the chief architect of the project, use what he calls an ultra-minimalist approach. Denneau was the architect of the GF11 supercomputer, which was designed in the late 1980s and used at Watson for research that lead to the RS/6000 SP line. The GF11 had 566 custom processors and was capable of performing at about 12 gigaflops, which was pretty powerful for the time. Denneau worked with fellow IBMer Peter Hochschild to create the high-speed switching interconnect that makes the RS/6000 SPs possible.
The basic idea behind Blue Gene is to strip all but the bare essential instructions out of a RISC processor and to simplify the operating environment that makes use of that machine so the maximum amount of processing can be squeezed out of the copper- silicon wafers that the Blue Gene processors will be etched on. Denneau says that the Blue Gene processors will have only 57 instructions rather than the several hundred that are included in IBM’s Power3 and PowerPC processors. After running some tests, IBM found that many of the complex instructions that eat up chip real estate are never even used by either commercial or technical applications, so these were removed. The memory architecture in Blue Gene is also simpler – main memory is integrated onto each cluster of 32 processors on a single wafer, and there is no L2 cache memory and hence no need for sophisticated controller circuits for it. For many workloads, IBM has found that the L2 cache size chip makers can integrate into modern processors is so small relative to the datasets used in technical workloads that L2 cache in fact inhibits good performance.
IBM believes the shared memory architecture of Blue Gene, which is roughly akin to non-uniform memory access (NUMA) clustering in Unix and NT servers today, will be a big improvement over the memory approach used in many parallel supercomputers today. The way that instructions are pushed through the processing elements is also more leisurely than with today’s RISC processors, which have complex data and instruction prefetching algorithms and superscalar circuits that allow them to process three, four or even six instructions per clock cycle. One Blue Gene processing element will have eight processing threads, but it will take an average of four cycles to complete an instruction, or an average of 2 instructions per cycle.
While the 500MHz Blue Gene chips appear to be running at quarter speed, the idea is to have such an overwhelming over-capacity of processing threads available in the whole box – Blue Gene will have over 1 million processors – that sophisticated instruction scheduling electronics won’t be necessary and that, on balance, by putting more processing elements on a silicon wafer, IBM will be able to get it to do more work even with such a lazy pace because it crams 32 processors and 16Mb of memory on a single 20mm by 20mm die. IBM says that the die size could go up or down by 1mm, and that it is pushing the limits in terms of size. If it needs to add more processors, it will have to cut back on memory. Conversely (and much more likely), if it needs to add memory, IBM will have to scale back the number of processors on a Blue Gene wafer.
The Blue Gene chips are expected to be implemented in IBM’s current CMOS-7SF process, which IBM is using to create processors like the RS/6000 S80’s Pulsar PowerPC as well as embedded DRAM. The idea is that three years from now, CMOS-7SF will be so inexpensive that the Blue Gene chips will be very inexpensive. Denneau says that depending on how IBM’s silicon-on-insulator CMOS-8SF process ramps up, Blue Gene could use that. It all depends on cost. In addition to the processing elements and memory, each Blue Gene 32-way chip will contain integrated communications switches and 12 switching channels capable of 12Gbps in and 12Gbps out of simultaneous bandwidth. The resulting Blue Gene 32-way processing chips will be stacked up in a 4x4x4 cube; eight of these will be put into a single rack and 64 racks will make up the whole machine. The resulting computer will have 500 times the power of the current ASCI Blue Pacific supercomputer, which is based on SP technology, but at 2,000 square feet, will occupy about one quarter the floor space. It will be water cooled, and the TJ Watson facility already has the capacity to deal with the 1 million watts of heat Blue Gene will crank out. The interconnect between processing modules is not fiber optic, by the way, but cheap flexible copper cabling.
With such a large die size, IBM is not going to be able to produce wafers that are perfect. So Blue Gene will include testing and processor allocation algorithms that will route processing requests around bad processors in the original wafers and around processors as they begin to fail within the running machine. Remember, it will take a year to run the protein folding job that IBM has Blue Gene pegged for, so it cannot crash. That is why Blue Gene will have self-healing algorithms based into its operating environment. Although Denneau claims to be unaware of Hewlett-Packard’s Teramac defect tolerant computer, the self- healing ideas behind Blue Gene are similar (CI No 3,469). Denneau hedges about calling Blue Gene’s operating environment a full operating system. Blue Gene will not run AIX, although for some time he says IBM was considering running the Linux kernel on it. Denneau says that GNU C and Fortran class libraries will be ported to the box and that a basic kernel plus communications to external Netfinity servers (which will act as disk arrays for each of Blue Gene’s 64 racks and which will likely hold terabytes of data each) will comprise the entire operating environment. á