Chapter 1. Introducing the SGI Origin 3000 Series

This chapter introduces you to the SGI Origin 3000 series of server products in the following sections:

Product Description

The SGI Origin 3000 series is a family of modular computer server systems. The various internal components of the various SGI Origin 3000 servers and their functions are divided into separate units called “bricks” for easy system customization to meet your computing needs. These bricks are housed in short or tall rack enclosures (depending on the model you have chosen) like those shown in Figure 1-1.

Figure 1-1. SGI Origin 3000 Series Servers

SGI Origin 3000 Series Servers

Bricks

SGI's third generation of ccNUMA architecture is known as NUMA 3 and is integral to the design of the SGI Origin 3000 series of servers. The NUMA 3 architecture is the basis for building a server that is highly flexible and resilient to failure.

Modular building blocks representing separate functional parts of the server are used to configure a server that matches your application environment. These individual ccNUMA building blocks are referred to as “bricks,” and are covered individually in this section.

Table 1-1 lists the various bricks available with an SGI Origin 3000 series server. Each brick has a dedicated chapter in this guide that explains the brick's function in detail.

Table 1-1. Bricks

Brick

Provided Function

Described in

C–brick

Processing and memory

Chapter 4, “C–brick”

 

I–brick

Base system I/O

Chapter 5, “I–brick”

 

P–brick

PCI bus interfaces

Chapter 6, “P–brick”

 

X–brick

XIO interface

Chapter 7, “X–brick”

 

D–brick

Storage modules

Chapter 8, “D–brick”

 

R–brick

Interconnect fabric (routers)

Chapter 9, “R–brick”

 

G–brick and V–brick

Graphics subsystems[a] 

SGI Onyx 3000 Series Graphics System Hardware Owner's Guide

N–brick

A cost and space saving alternative to other I/O bricks to connect the C–brick with a G–brick.

SGI Onyx 3000 Series Graphics System Hardware Owner's Guide

Power bay

Power supplies and control

Chapter 10, “System Power”

 

[a] Can use either G–brick or V–brick, but not both in the same graphics system.


Interactions Among Bricks

The C–brick performs the computing function for the server system; it contains two or four processors (each with 4 or 8 MB of private secondary cache), memory, and a crossbar switch used as a memory controller.

This switch acts as a channel between the internal processors and local memory. The R–brick (router) is used to connect to other C–bricks. The I/O interface bricks (I–brick, P–brick, X–brick) connect to individual C–bricks.

A D–brick can be optionally added to your server. The D–brick is used to provide storage modules to your server.

The G–brick or a V–brick (but not both on the same server system) can also be optionally installed onto your server to provide your system with sophisticated graphics capabilities. The N–brick can be used in systems with a G–brick to replace up to four I– or X–bricks to connect C–bricks with the InfiniteReality graphics pipes on the G–brick. See SGI Onyx 3000 Series Graphics System Hardware Owner's Guide for details on these bricks.

See Figure 1-2 for an illustration of how the bricks are interconnected to create your server system. (For illustrations of how these various bricks are cabled together, see Chapter 4, “C–brick”.)

Figure 1-2. Interaction between SGI Origin 3000 Server Series Bricks

Interaction between SGI Origin 3000 Server Series Bricks

Brick Cooling and Fans

The bricks are air-cooled devices; airflow is from the front of the brick to the rear. All bricks have three fans at the front, except for the R–brick, which has two fans.

The fans run at variable speeds; the speed is controlled by the brick's L1 controller, which monitors and controls the brick's operating temperature. The R–brick fans are smaller than other brick fans and run at a single speed.

All fans are N+1 redundant and can be hot-swapped by a qualified SGI system support engineer (SSE). If a fan fails, the remaining functional fans run at higher speeds to compensate, and error messages are issued.

Bricks and Controllers

The C–brick, R–brick, I–brick, P–brick, and X–brick have L1 controllers that monitor the activities of their brick. The L1 controller generates status and error messages for the brick that are displayed on the L1 controller display located on the brick's front panel.

The L2 controller comes with all tall racks with C–bricks (L2 controller is optional for the short racks). The L2 controller in tall racks has a touch display unit located on the front door of the rack. The L2 controller displays system controller status and error messages. It displays the status and error messages generated by the L1 controllers.

The L1 controller messages can also be displayed in a server system console connected to your server system. You can also enter L1 and L2 commands at your console to control activity in your server system.

For details about the L1 and L2 controllers, see Chapter 3, “System Control ”. For a list of L1 and L2 commands, see Appendix B, “System Controller Commands”.

The D–brick has an ESI/ops panel module with a microcontroller for monitoring and controlling all elements of the D–brick.

See SGI Onyx 3000 Series Graphics System Hardware Owner's Guide for details on monitoring and controlling the activity of the G–brick, V–brick, and N–brick.

Racks

Two rack sizes are used in the SGI 3000 server series. The short rack (17U) shown in Figure 1-3 is used for SGI Origin 3200 server model, and the tall rack (39U) shown in Figure 1-4 is used for both the SGI Origin 3400 server and SGI Origin 3800 server models.

Short Rack (17U)

The short rack (shown in Figure 1-3) has the following features and components:

  • Front door and rear door. Both doors have keylocks that prevent unauthorized access to the system.

  • Visible L1 controller display and visible brick LEDs with the doors closed.

  • Cable entry/exit area at the bottom of the rack. Cables are attached at the rear of the rack. The rack is mounted on four casters; the rear two casters swivel. The base of the rack has leveling pads, a ground strap, and seismic tie-downs.

  • Power distribution strip (PDS). The PDS has six outlet connectors to connect to the power bay, one inlet connector, and a circuit breaker switch.

  • L2 controller (optional). Used to display system controller status and error messages. The L2 controller displays the individual brick's status and error messages generated by each brick's L1 controller. Although short racks do not have an L2 controller touch display, the L2 controller can be monitored and managed on a local workstation connected to the L2 controller.

  • Single power bay with three power supplies.


    Note: The short rack is used to house the SGI Origin 3200 server model.


    Figure 1-3. Front View of the Short Rack

    Front View of the Short Rack

Tall Rack (39U)

The tall rack shown in Figure 1-4 has the following features and components:

  • Front door and rear door. Both doors have keylocks that prevent unauthorized access to the system.

  • L1 controller display and brick LEDs with the doors closed.

  • Cable entry/exit area at the bottom rear of the rack. Cables are attached at the rear of the rack. The rack is mounted on four casters; the two rear casters swivel. The base of the rack has four M12 weldnuts for seismic tie-downs.

    The tall rack also has cable entry/exit areas at the top, bottom, and sides of the rack. I/O and power cables pass through the bottom of the rack. NUMAlink cables pass through the top and sides of the rack. Cable management occurs in the rear of the rack.

  • L2 controller. Used to display system controller status and error messages. The L2 controller can display the individual brick's status and error messages generated by each brick's L1 controller. Each tall rack with C–bricks comes with an L2 controller and an L2 controller touch display located on the front door of the system.

  • One or two power bays, depending on your computing needs. Each power bay on the tall racks has four power supplies.

  • One or two Power Distribution Units (PDUs) per rack, depending on the number of power bays. (Each power bay requires four connections, one per each power supply.) The PDU can be single-phase or three-phase. The single-phase PDU, which supports one power bay, has one opening with six cables to connect to the power bay. This PDU has two input power-plug cables, a single outlet connector, and a circuit breaker switch.

    The three-phase PDU, which supports two power bays, has two openings, and each of these has six cables to connect to the two power bays. This PDU has one input power-plug cable, a single outlet connector, and a circuit breaker switch.


    Note: The tall racks house a combination of bricks that compose SGI Origin 3400 servers and SGI Origin 3800 servers.


    Figure 1-4. Front View of the Tall Rack (with Side Covers Removed)

    Front View of the Tall Rack (with Side Covers Removed)

Measuring Racks and Bricks

The racks are measured in EIA standard units; one SU or unit (U) is equal to 1.75 in. (4.49 cm).

Figure 1-5 illustrates the size in standard units of each brick type available with an SGI Origin 3000 series server, except for the G–brick, V–brick, and N–brick. The G–brick is 18 units high, the V–brick four units high, and the N–brick two units high. If you have a G–brick or a V–brick, or a G–brick with an N–brick, see SGI Onyx 3000 Series Graphics System Hardware Owner's Guide for details.

Figure 1-5. Rack and Brick Measurements

Rack and Brick Measurements

Bay (Unit) Numbering

Bays in the racks are numbered using standard units, as shown in Figure 1-5. The illustration also identifies the size (in units) of the different brick types.

Because bricks require multiple standard units to fit in a rack, brick locations within a rack are identified by the bottom unit (U) in which the brick resides. For example, in a tall 39U rack, the C–brick positioned in U10, U11, and U12 is identified as C10. In a short 17U rack, an I–brick positioned in U10, U11, U12, and U13 is identified as I10.

These identifiers or bay locations are represented as decimal numbers in the hardware graph path. The hardware graph path has the following form:

/hw/module/rrrTuu

where rrr = rack, T= brick type, and uu = location in rack.

Below is an example of hinv (IRIX hardware inventory command) output identifying the rack and bay numbers:

$ hinv 
/hw/module/001c10 
/hw/module/002r19

Rack Numbering

A rack is numbered with a three-digit number. A compute rack (designated for C–bricks) is numbered sequentially beginning with 001. An I/O rack (designated for I–bricks, P–bricks, X–bricks, and D–bricks) is numbered sequentially (also beginning with 001) and by the physical quadrant in which the I/O rack resides.

In a single-rack configuration (for all 3200 server configurations and some 3400 server configurations), the compute rack is numbered 001 even if it contains an I/O brick or D–brick.

Figure 1-6 shows the rack numbering scheme for multiple-rack server systems.

Figure 1-6. Rack Numbering

Rack Numbering

Rack numbers are represented as decimal numbers in the hardware graph path, for example:

hinv 
/hw/module/001C10

SGI Origin 3000 Series Models

The C–brick contains the processors (two or four processors per C–brick) for the server system. The number of processors and the combination of functional bricks you have in your server system determines the SGI Origin 3000 server model. The following are the models available: the SGI Origin 3200 server, the SGI Origin 3400 server, and the SGI Origin 3800 server.

SGI Origin 3200 Server

SGI Origin 3200 server system has up to 8 processors, a minimum of one I/O brick (an I–brick), no routers, and a single power bay. The system is housed in a short 17U rack enclosure with a single power distribution strip (PDS). Although the L2 controller is optional with the SGI Origin 3200 server, the L2 controller touch display is not.

An I, P, or X–brick can be added to the SGI Origin 3200 server between the I–brick and the topmost C–brick.

Also, additional racks containing D–bricks can be added to your SGI Origin 3200 server system.

Figure 1-7 shows an example of one possible SGI Origin 3200 server system configuration.

Figure 1-7. SGI Origin 3200 Server System

SGI Origin 3200 Server System

SGI Origin 3400 Server

The SGI Origin 3400 server system has up to 32 processors, a minimum of one I/O brick (an I–brick), two 6-port routers, and at least one power bay. The system needs at least one tall 39U rack enclosure, which has at least a single power bay with one single-phase power distribution unit (PDU). The SGI Origin 3400 is also offered with a three-phase PDU, which supports two power bays. (The single-phase PDU has one opening with six cables to connect to the power bay. The three-phase PDU has two openings with six cables from each opening to connect to the power bays.)

Each tall rack enclosure containing C–bricks comes with an L2 controller and an L2 controller touch display.

Figure 1-8 shows an example of one possible SGI Origin 3400 server system configuration.

The system can be expanded to include a second tall rack to add D–bricks, and I/O bricks (I–bricks, P–bricks, and X–bricks) to your server system.

Figure 1-8. SGI Origin 3400 Server System

SGI Origin 3400 Server System

SGI Origin 3800 Server

The SGI Origin 3800 server system has a minimum four C–bricks, a minimum of 16 and a maximum of 512 processors, a minimum of one I–brick, one P–brick, and two 8-port routers. The system needs a minimum of two tall rack enclosures, with at least one power bay per rack (one for the compute rack and another for the I/O rack), and one single-phase PDU per rack. (The single-phase PDU has one opening, which has six cables to connect to the power bay. The SGI Origin 3800 is also offered with a three-phase PDU, which has two openings, each with six cables to connect to the power bays.)

Each tall rack enclosure containing C–bricks comes with an L2 controller and an L2 controller touch display.

Figure 1-9 shows an example of a possible SGI Origin 3800 server configuration.

Additional racks containing C–bricks, R–bricks, D–bricks, and I/O bricks (I–bricks, P–bricks, and X–bricks) can be added to your server system.

Figure 1-9. SGI Origin 3800 Server System

SGI Origin 3800 Server System

Server System Features

The following sections introduce the main features of the SGI Origin 3000 series server systems:

More than 3600 third-party software applications are available for the SGI Origin 3000 server series systems. For a current list of applications, see the following URL:

http://www.sgi.com/Products/appsdirectory.html

Modularity and Scalability

The SGI Origin 3000 server series systems are scalable systems with the ability to independently scale processors and memory, I/O bandwidth, and storage. Furthermore, the SGI Origin 3000 server series systems can be clustered to increase the number of processors from 512 to thousands of processors.

The SGI Origin 3000 server series system's functions (such as computing, I/O, and storage) are divided into separate components housed in building blocks called bricks. These bricks can be independently added to a system to achieve the desired system configuration. As bricks are added to a system, the bandwidth and performance scale in a manner that is almost linear, without significantly affecting system latencies.

Distributed Shared Memory (DSM)

In the SGI Origin 3000 server series, memory is physically distributed among the C–bricks (compute nodes); however, it is accessible to and shared by all C–bricks. When a processor accesses memory that is physically located on a C–brick, this is referred to as the C–brick's local memory. The total memory within the system is referred to as global memory.

When processors access memory located in other C–bricks, the memory is referred to as remote memory.

The memory latency, which is the amount of time it takes for a processor to retrieve data from memory, is lowest when a processor accesses memory that is local to its C–brick.

Distributed Shared I/O

Like DSM, I/O devices are distributed among the C–bricks (each C–brick has an I/O port that can connect to an I/O brick) and are accessible by all C–bricks through the NUMAlink interconnect fabric.

Cache-coherent Non–uniform Memory Access (ccNUMA) Architecture

As the name implies, the ccNUMA architecture has two parts: cache coherency and nonuniform memory access.

Cache Coherency

The SGI Origin 3000 server series use caches to reduce memory latency. Although data exists in local or remote memory, copies of the data can exist in various processor caches throughout the system. Cache coherency keeps the cached copies consistent.

To keep the copies consistent, the ccNUMA architecture uses directory-based coherence protocol. In directory-based coherence protocol, each block of memory (128 bytes) has an entry in a table that is referred to as a directory. Like the blocks of memory that they represent, the directories are distributed among the C–bricks. A block of memory is also referred to as a cache line.

Each directory entry indicates the state of the memory block that it represents. For example, when the block is not cached, it is in an unowned state. When only one processor has a copy of the memory block, it is in an exclusive state. And when more than one processor has a copy of the block, it is in a shared state; a bit vector indicates which caches contain a copy.

When a processor modifies a block of data, the processors that have the same block of data in their caches must be notified of the modification. The SGI Origin 3000 server series use an invalidation method to maintain cache coherence. The invalidation method purges all unmodified copies of the block of data and the processor that wants to modify the block receives exclusive ownership of the block.

Non–uniform Memory Access (NUMA)

In DSM systems, memory is physically located at various distances from the processors. As a result, memory access times (latencies) are different or “non–uniform.” For example, it takes less time for a processor to reference its local memory than it does to reference remote memory.

In a NUMA system, program performance is based on proper placement of important data structures. In general, data should be located close to the processor that will access it. IRIX provides a service to enable applications to achieve this.

Reliability, Availability, and Serviceability (RAS)

The SGI Origin 3000 server series components have the following features to increase the reliability, availability, and serviceability (RAS) of the systems.

  • Power and cooling:

    • Power supplies are redundant and can be hot-swapped by your SGI system support engineer (SSE).

    • Bricks have overcurrent protection.

    • Fans are redundant and can be hot-swapped by your SGI system support engineer (SSE).

    • Fans run at multiple speeds: in all bricks except the R–brick, speed increases automatically when temperature increases or when a single fan fails.

  • System monitoring:

    • System controllers monitor the internal power and temperature of the bricks, and automatically shut down bricks to prevent overheating.

    • Memory and secondary cache are protected by single-bit error correction and double-bit error detection (SECDED).

    • The NUMAlink3 interconnect network is protected by cyclic redundancy check (CRC).

    • The primary cache is protected by parity.

    • Each brick has failure LEDs that indicate where in the PROM code the system stopped when booting. If IRIX is up and running, these LEDs are CPU usage indicators. These LEDs are readable via the system controllers.

    • Systems support Embedded Support Partner (ESP), a tool that monitors the system; when a condition occurs that may cause a failure, ESP notifies the appropriate SGI personnel.

    • Systems support remote console and maintenance activities.

  • Power-on and boot:

    • Automatic testing occurs after you power on the system (power-on self-tests or POST; these tests are also referred to as power-on diagnostics or POD).

    • Processors and memory are automatically deallocated when a self-test failure occurs.

    • Boot times are minimized.

  • Further RAS features:

    • Systems support partitioning.

    • PCI cards and disk drive modules can be added to the system without powering off the brick (hot-pluggable).

    • IRIX has enhanced reliability.

    • Systems have a local field-replaceable unit (FRU) analyzer.

    • All system faults are logged–in files.

    • Memory can be scrubbed when a single-bit error occurs.