Vector processing is a concept usually associated with Cray Research Inc and the expensive world of scientific supercomputing. But this stereotype may not hold for much longer. Not if companies such as Meiko Scientific and Teradata Corp have their way. These are companies determined to bring the message of parallel processing to the commercial market, and promote the cost-effectiveness of converting database management systems for parallel architectures. As Active Memory Technology, the ICL spin-off devoted to developing systems based on the Distributed Array Processor, points out, however, the UK market particularly is slow to adapt to change, so the attack on the commercial market may be a laborious one. But what is parallel processing, and what makes it suitable for commercial applications? There are two basic strains of parallelism small-scale parallel architectures as employed by Cray, and massively parallel architectures, as used by companies like Active Memory Technology and Meiko Scientific, spin-off from Transputer developer Inmos. The latter type of parallelism can be divided again, into single instruction multiple data or SIMD architectures, as used by Thinking Machines and Maspar in the US and Active Memory Technology in the UK, versus multiple instruction multiple data or MIMD architectures, as used by nCube in the US and Meiko Scientific Ltd here in the UK.
Vector processing
The different brands of parallelism have their own advantages and disadvantages, and have therefore found different applications to which they are best suited. Although a believer in parallel processing, Cray Research swears first and foremost by vector processing, something that IBM has latched onto, stealing market share from Cray with its Vector Facility for 3090 mainframes for the scientific market – according to Cray, a six-processor IBM 3090 with vector capabilities is the rough power equivalent of the dual-processor Cray Y-MP/2. Vector processing, according to Cray, is what differentiates a supercomputer from a mainframe system – vector architectures, says the company, can yield processing speeds up to 10 times faster than those of scalar architectures. Mainframes, like workstations and personal computers, are built around scalar architectures, where one instruction is processed at a time to produce one result. With Cray’s 64-bit vector architecture, one instruction can generate more than one result, turning a compute-bound problem into one bound only by input-output capacity. With scalar processing, all the components of a calculation must be completed sequentially before the next calculation can be started. But vector processors have a pipelined structure. Each Cray vector processor has four independent memory ports: two of these are arithmetic function units – enabling an addition and a multiplication function to operate simultaneously, one accesses memory and the last is for input-output. Once one functional unit has operated on one set of mathematical components, the next set moves into its place, not having to wait until an entire calculation has been completed. In a Cray Y-MP vector system two results are turned out per clock tick. If the vector processors are then arranged in parallel, the number of instructions that can be processed simultaneously becomes a multiple of two – for example, a high-end Cray Y-MP 8 which has eight processors in parallel, processes 16 instructions simultaneously per cycle – this yields a peak system performance of 2,600m floating points per second, or 2.6 GFLOPS – but only when there are no dependencies and all the pipelines can be kept full.
By Susan Norris
The problem with vector processing is that it is highly expensive, to which Cray supercomputers are testimony – the lower-end two-processor Y-MP 2E, which peaks at 667 MFLOPS, starts at $3m, while the high-end Y-MP 8 costs between $15m and $20m. Cray has always maintained that a modest number of powerful processors is preferable to a larger number of modest processors, because it is easier to write software for a small number of proc
essors which share the same central memory, rather than to try and write a program that will co-ordinate large numbers of distributed-memory processors. This is fair enough, since Cray systems are designed for the science and engineering markets, where they are used to solve a wide range of large, computationally intensive problems. Massive parallelism, on the other hand, is less suited to such general purpose power computing. The benefits of massive parallelism lie chiefly in the domains of scalability, price, and the ability to manipulate very large amounts of data simultaneously, at extremely high speeds – the main areas of application being text retrieval, image and signal processing, and computational fluid dynamics, where there are large amounts of data involved, but the data manipulation required is simple, and doesn’t require much communication between processors. Single instruction multiple data massive parallelism, while being easier to program for than multiple instruction, is the least flexible of the parallel architectures, because although the memory resource is distributed across the many processors, each scalar processor is allocated only a segment of that memory, and all processors are therefore restricted to executing the same instruction at the same time – hence the single instruction multiple data label. SIMD processing is cheap – Active Memory Techsystems, based on the Distributed Array Processor, or DAP, that are typically a fifth of the cost of a Cray. Universities are Active Memory’s main customers, and the company admits that it hasn’t been very successful in industry. Whereas Active Memory’s systems are suited to the computational fluid dynamics applications required by car manufacturers and designers of next generation aeroplanes, Cray systems are a better all-round solution for these industrial customers, because they can also cope with the complexity of the numerical calculations required for crash analysis by simulation. Since the DAP can be miniaturised and embedded, military contractors are also among the customers of Active Memory’s DAP systems, using them for signal processing within surveillance systems. The simple nature of text retrieval makes database applications ideally suited to massively parallel architectures, and Dow Jones in the US is currently using a single instruction multiple data machine from Thinking Machines to increase the speed of access to its historical information system by several hundred times – Active Memory is currently working on a similar application with a customer in Europe.
Oracle implemented
In multiple instruction multiple data or distributed memory architecture, each of the many processors has its own individual memory and can issue instructions in its own right – the different processors can therefore compute different parts of a problem simultaneously. Meiko Scientific is a firm believer in this type of architecture, which it employs in its Computing Surface systems, and says that multiple instruction is more generally applicable than single. Says the company, one of the key features of the Transputer-based Computing Surface systems is that the distributed-memory processors can be reconfigured for different applications – the systems are therefore scalable so that additional processing power can be added when needed this scalability is important for companies that might otherwise feel they can’t justify the cost of dedicated high-performance computing – the average cost of a Meiko Computing Surface system is around UKP250,000. Unlike single instruction systems, says Meiko, multiple instruction multiple data architectures are capable of running an operating system, and the company has licensed Sun MicrosystSunOS operating system to run on Computing Surface. Meiko hopes that this, plus the fact that Oracle’s database has also been implemented for the Computing Surface, will direct the systems to the commercial market, for database and network management applications. As to the difficulties of programming for multiple instruction architectures, says M
eiko, the problem is being addressed and there are dedicated compilers, or MIMD-isers, available that convert code written in C or Fortran for multiple instruction architectures.