Use the word cybernetics, Norbert, because nobody knows what it means. This will always put you at an advantage in arguments. ~Claude Shannon
CELL PROCESSORSShounak Acharya(4th Yr-IT)
Overview
Power redefined! Ever wondered how the heart of the most powerful gaming console ‘The Sony Playstation3’ works? It was delayed in its launch because experts acclaimed that it is so much powerful that terrorist agencies can easily use it to plan and execute their dreadful intentions. It is so powerful that it can run an Apple5 using only one of its processing fragments. It was developed by Peter Hofstee and his team. Here’s a brief description of components and their working.
Introduction
Sony designed the PlayStation 3 to be more than just a video game console. It supports all kinds of digital entertainment and is basically a home-entertainment computer. This computer sports a specially designed CPU called the Cell processor. Sony, Toshiba and IBM worked together to develop the Cell processor. It's their answer to the growing trend toward multi-core processing, in which manufacturers place as many processors as possible onto one chip. The Cell processor is scalable for different performance needs. The one used in the PlayStation 3 crams 234 million transistors onto a single die.
Architecture
The setup of the Cell processor is like having a team of processors all working together on one chip to handle the large computational workload needed to run next-generation video games. In order to understand how the Cell processor works, it helps to look at each of the major parts that comprise this processor.
The "Processing Element (PPE)" of the Cell is a 3.2-GHz PowerPC core equipped with 512 KB of L2 cache. The PowerPC core is a type of microprocessor similar to the one you would find running the Apple G5. It's a powerful processor on its own and could easily run a computer by itself; but in the Cell, the PowerPC core is not the sole processor. Instead, it's more of a "managing processor." It delegates processing to the eight other processors on the chip, the Synergistic Processing Elements.
The computational workload comes in through the PowerPC core. The core then assesses the work that needs to be done, looks at what the SPEs are currently processing and decides how to best dole out the workload to achieve maximum efficiency.
The SPEs used in the Cell processor are each SIMD (Single Instruction, Multiple Data), 128-bit vector processors. Vector processors are designed to quickly process several pieces of data at once. They were commonly used in the 1980s in large, powerful, scientific supercomputers and were created as a faster alternative to the more common scalar processor. Scalar processors can only work one data element at a time. Despite this limitation, advances in scalar design and performance have made the use of vector processors very rare these days in most computers. However, because of the vector processor's ability to handle several data elements at once, IBM resurrected this design for the Cell. There are eight SPEs on the chip, but only seven of them handle processing. The eighth SPE is built in as redundancy in case one of the other seven fails.
The SPEs each come loaded with 256 KB SRAM. This high-speed memory helps each SPE crunch numbers quickly. The SPE memory is also visible to the main Processing Element. This allows the PowerPC Core to utilize the resources of each SPE in the most efficient way possible. All of this amounts to unprecedented power for a piece of consumer electronics.
To achieve the high performance needed for mathematically intensive tasks, such as decoding/encoding MPEG streams, generating or transforming three-dimensional data, or undertaking Fourier analysis of data, the Cell processor marries the SPEs and the PPE via the EIB to give access to both main memory and to other external data storage. The PPE, which is capable of running a conventional operating system, has control over the SPEs and can start, stop, interrupt, and schedule processes running on the SPEs. To this end the PPE has additional instructions relating to control of the SPEs. Despite having Turing complete architectures, the SPEs are not fully autonomous and require the PPE to prime them before they can do any useful work. However, most of the "horsepower" of the system comes from the synergistic processing elements.The PPE and bus architecture includes various modes of operation giving different levels of memory protection, allowing areas of memory to be protected from access by specific processes running on the SPEs or the PPE.
Both the PPE and SPE are RISC architectures with a fixed-width 32-bit instruction format. The PPE contains a 64-bit general purpose register set (GPR), a 64-bit floating point register set (FPR), and a 128-bit Altivec register set. The SPE contains 128-bit registers only. These can be used for scalar data types ranging from 8-bits to 128-bits in size or for SIMD computations on a variety of integer and floating point formats. System memory addresses for both the PPE and SPE are expressed as 64-bit values for a theoretic address range of 264 bytes (16,777,216 terabytes). In practice, not all of these bits are implemented in hardware; the address space is extremely large nevertheless. Local store addresses internal to the SPU processor are expressed as a 32-bit word. In documentation relating to Cell a word is always taken to mean 32 bits, a doubleword means 64 bits, and a quadword means 128 bits.
An Application other than Gaming
Super Computing: IBM's new planned supercomputer, IBM Roadrunner, will be a hybrid of General Purpose CISC as well as Cell processors. It is reported that this combination will produce the first computer to run at petaflop speeds. It will use an updated version of the Cell processor, manufactured using 65 nm technology and enhanced SPUs that can handle double precision calculations in the 128 bit registers, reaching double precision 100 GFLOPs.