Chapter 1. Introduction to the X Window System

This chapter introduces many of the most important concepts on which the X Window System is based, and describes the environment in which the X Toolkit operates. This chapter assumes that you are new to programming the X Window System. If you already have some experience programming the X Window System, you may wish to skim this chapter for a brief review or even begin with Chapter 2.

The X Window System (or simply X)[2] is a hardware- and operating system-independent windowing system. It was developed jointly by MIT and Digital Equipment Corporation, and has been adopted by the computer industry as a standard for graphics applications.

X controls a “bit-mapped” display in which each pixel on the screen is individually controllable. This allows applications to draw pictures as well as text. Until recently, individual control of screen pixels was widely available only on personal computers (PCs) and high-priced technical workstations. Most general-purpose machines were limited to output on text-only terminals. X brings a consistent world of graphic output to both PCs and more powerful machines. Figure 1-1 compares an X application to an application running on a traditional text terminal.

Figure 1-1. An X application, and an application on a traditional text terminal

Like other windowing systems, X divides the screen into multiple input and output areas called windows. Using a terminal emulator, windows can act as “virtual terminals,” running ordinary text-based applications. However, as shown in Figure 1-1, windows can also run applications designed to take advantage of the graphic power of the bitmapped display.

X takes user input from a pointer. The pointer is usually a mouse but could just as well be a track-ball or a tablet. The pointer allows the user to control a program without using the keyboard, by pointing at objects drawn on the screen such as menus and command buttons. This method of using programs is often easier to learn than traditional keyboard control because it is more intuitive. Figure 1-2 shows an application with a typical three-button pointer being used to select a menu item.

Figure 1-2. A three-button mouse directing the pointer to select a menu item

Of course, X also handles keyboard input. The pointer directs keyboard input from window to window. Only one window at a time can receive keyboard input.

In X, as in many other window systems, each application need not (and usually does not) consist of a single window. Any part of an application can have its own separate subwindow, which simplifies the management of input and output within the application code. Such child windows are visible only within the confines of their parent window.

Windows are rectangular and oriented along the same axes as the edges of the display.[3] Each window has its own coordinate system, with the origin in the upper-left corner of the window inside its border. The application or the user can change the dimensions of windows. Figure 1-3 shows a typical screen with several virtual terminals running. The screen also shows some applications, such as xterm, oclock, and xcalc, that run in their own windows.

Figure 1-3. Screen layout of a typical user's X Window System


X supports both color and black-and-white displays.

Many of the above characteristics are also true of several other window systems. What is unusual about X is that it is based on a network protocol instead of on system-specific procedure and system calls. This network protocol enables X to be ported to different computer architectures and operating systems; it also allows programs to run on one architecture or operating system while displaying on another. Because of its unique design, X can make a network of different computers cooperate. For example, a computationally intensive application might run on a supercomputer, but take input from and display output on a workstation connected across a local area network. To the user, the application would simply appear to be running on the workstation.

The Server and Client

To allow programs to be run on one machine and display on another, X was designed as a network protocol--a predefined set of requests and replies--between two processes. One of these processes is an application program called a client, and the other, the server, controls the display hardware, keyboard, and pointer.

The user sits at the machine running the server. At first, this use of the word “server” may seem a little odd, since file and print servers normally are remote machines, but the usage is consistent. The local display is accessible to other systems across the network, and for those systems the X server does act like other types of server.

The X server acts as an intermediary between user programs (called clients or applications) and the resources of the local system such as the keyboard and screen. It contains all device-specific code, and insulates applications from differences among display hardware. The server (without extensions) performs the following tasks:

  • Allows access to the display by multiple clients. The server may deny access from clients running on certain machines.

  • Interprets network messages from clients and acts on them. These messages are known as requests. Some requests command the server to move windows and do two-dimensional drawing, while others ask the server for information. Protocol requests are generated by client calls to Xlib, either directly or through Xt and other function libraries.

  • Passes user input to clients by sending network messages known as events, which represent key or button presses, pointer motion, and so forth. Events are generated asynchronously, and events from different devices may be intermingled. The server must pass the appropriate events to each client. The client must be prepared to handle any event it has selected at any time.

  • Maintains complex data structures, including windows and fonts, so that the server can perform its tasks efficiently. Clients refer to these abstractions by ID numbers. Server-maintained abstractions reduce the amount of data that has to be maintained by each client and the amount of data that has to be transferred over the network.

In X, the term display is often used as a synonym for server, as is the combined term display server. However, the terms display and screen are not synonymous. A screen is the actual hardware on which the graphics are drawn. A server may control more than one screen. For example, a single server might control both a color screen and a monochrome screen, allowing users to debug an application on both types of screen without leaving their seat.

The user programs displaying on screens managed by a server are called its clients. There may be several clients connected to a single server. Clients may run on the same machine as the server if that machine supports multitasking, or clients may run on other machines in the network. In either case, the X Protocol is used by the client to send requests to draw graphics or to query the server for information, and is used by the server to send user input and replies to information requests back to the client.[4] All communication between client and server uses the X Protocol. The communication path between a client and the server is called a connection.

It is common for a user to have programs running on several different hosts in the network, all invoked from and displaying their windows on a single screen (see Figure 1-4). Clients running remotely can be started from the remote machine or from the local machine using the network utilities rlogin or rsh.

Figure 1-4. Applications can run on any system across the network


This use of the network is known as distributed processing. It allows graphic output for powerful systems that don't have their own built-in graphics facilities. Distributed processing can also help solve the problem of unbalanced system loads. When one host machine is overloaded, users running clients on that machine can arrange for some of their clients to run on other hosts. Eventually there may be automatic load-balancing applications, but currently such remote execution is performed manually. It is not unusual to see users in the X environment having several xload load monitor applications running on various systems throughout the network but displaying on their screen, so that they can see the balance of loads throughout the network.

Before leaving the subject of servers and clients, we should mention PC servers and X terminals. Software is available that allows various types of PCs to operate as X servers.[5] X terminals are special-purpose devices designed to run just an X server, and to connect to remote systems over a local area network. PC servers and X terminals are the least expensive way to provide an X screen for a user. Since most PCs use single-tasking operating systems, they can't run any clients at the same time as the server. Therefore, they too require a network adapter to connect to another system where clients are run.

X terminals and PC servers both demonstrate the strength of X's client-server model. Even though PCs and X terminals aren't able to do multitasking on their own, they give the user the effect of multitasking workstations, because they can interact simultaneously with several clients running on remote multitasking systems.

The Software Hierarchy

This book is about writing client applications for the X Window System, in C, using the Xt Intrinsics library and a set of widgets. This is only one of the many ways to write X applications, since X is not restricted to a single language or operating system. The only requirement of an X application is that it generate and receive X protocol messages according to the X Consortium Protocol specification.[6] However, using the Xt Intrinsics and a widget set is, and is expected to be, the most common way of writing applications for several reasons:

  • It is quite powerful.

  • It results in applications that cooperate well with other X applications.

  • It supports several popular user-interface conventions.

  • The C Language is widely available.

Figure 1-5 shows the layering of software in an application that uses the Xt Intrinsics and a widget set. Notice that the Intrinsics are based upon Xlib, the lowest-level C-Language interface to X. Xlib provides full access to the capabilities of the X Protocol, but does little to make programming easier. It handles the interface between an application and the network, and includes some optimizations that encourage efficient network usage.

Figure 1-5. The software architecture of Xt Intrinsics-based applications


Xt is built upon Xlib. The purpose of Xt is to provide an object-oriented layer that supports the user-interface abstraction called a widget. A widget is a reusable, configurable piece of code that operates independently of the application except through prearranged interactions. A widget set is a collection of widgets that provide commonly used user-interface components tied together with a consistent appearance and user interface (also called look and feel). Several different widget sets are available from various vendors that are designed to work with Xt. The use of widgets separates application code from user-interface code and provides ready-to-use user-interface components such as buttons and scrollbars. Xt, widgets, and widget sets are described in much more detail in "Chapter 2, Introduction to the X Toolkit."

In this book, we'll refer to the combination of the Xt Intrinsics and one widget set as the X Toolkit or just the Toolkit. When referring to the Xt Intrinsics layer alone, we'll use Xt, or the Intrinsics.

Applications often need to call Xlib directly to accomplish certain tasks such as drawing. Xt does not provide its own graphics calls, nor does it provide access to every X protocol feature. This book describes the features of Xlib that you may need from an Xt application, but it will not repeat the detailed description of Xlib programming found in Volume One, Xlib Programming Manual. You will find Volume One and Volume Two, invaluable when you need to make Xlib calls.

Xlib, Xt, and several widget sets are available on MIT's public software distribution. The Motif and OPEN LOOK widget sets are not on the Release 4 or 5 distributions from MIT, but they are available for minimal cost from the vendors themselves (OSF, AT&T, or Sun, respectively.) The darkly shaded areas of Figure 1-5 indicate interfaces that are exclusive standards of the X Consortium. That Xlib is an exclusive standard means that computer manufacturers wishing to comply with the X Consortium standard must offer Xlib and cannot offer any other low-level X interface in C. The lightly shaded areas (such as the Xt Intrinsics) are nonexclusive standards--vendors are required to provide Xt but are also allowed to provide other toolkit-level layers for the C Language. For example, Sun and AT&T offer Xt, but they also offer XView as an alternate C-Language toolkit-level layer. XView was originally designed for porting existing SunView™ applications to X, but it can also be used for writing new applications. Volume Seven, describes programming with XView.

X software is unlike that of many other window systems in that it was designed to provide mechanism without mandating any certain style of user interface. In the words of its designers, X provides “mechanism without policy.” The Xlib and Xt layers are standard because they can support any kind of interface. It is the widget set that actually imposes user-interface conventions, and it is this layer for which no standard has yet been considered by the X Consortium. However, because there is a strong need in the market for one or two standard widget sets that provide consistent appearance and user-interface conventions, it is likely that one or two widget sets will emerge as de-facto standards in the near future.

It is important to note that the X Consortium standards for Xlib and Xt define the programming interface to each library (often referred to as the Application Programmer's Interface, or API), not the underlying code. This means that vendors are allowed to modify or rewrite the code to gain the best performance from their particular system, as long as they keep the programming interface the same. To you, the application writer and user of the Intrinsics, this means that you must always rely on documented behavior if you want your application to run on different systems. You must avoid accessing private structures, because they may be different in another vendor's release of the library, or they may be changed in a future release of X.

Event-driven Programming

Programming a graphically-based window system is fundamentally different from standard procedural programming. In traditional character-based interfaces, once the application starts, it is always in control. It knows only what kind of input it will allow, and may define exclusive modes to limit that input. For example, the application might ask the user for input with a menu, and use the reply to go down a level to a new menu, where the actions that were possible at the previous level are no longer available. Or a text editor may operate in one mode in which keyboard input is interpreted as editor commands, and another in which it is interpreted as data to be stored in an editor buffer. In any case, only keyboard input is expected.

In a window system, by contrast, multiple graphic applications may be running simultaneously. In addition to the keyboard, the user can use the pointer to select data, click on buttons or scrollbars, or change the keyboard focus from one application to another. Except in special cases (for example, where a “dialog box” will not relinquish control until the user provides some necessary information), applications are modeless--the user can suddenly switch from the keyboard to the mouse, or from one application area to another. Furthermore, as the user moves and resizes windows on the screen, application windows may be obscured or redisplayed. The application must be prepared to respond to any one of many different events at any time.

An X event is a data structure sent by the server that describes something that just happened that may be of interest to the application. There are two major categories of events: user input and window system side effects. For example, the user pressing a keyboard key or clicking a mouse button generates an event; a window being moved on the screen also generates events--possibly in other applications as well if the movement changes the visible portions of their windows. It is the server's job to distribute events to the various windows on the screen.

Event-driven window programming reduces modes to a minimum, so that the user does not need to navigate a deep menu structure and can perform any action at any time. The user, not the application, is in control. The application simply performs some setup and then goes into a loop from which application functions may be invoked in any order as events arrive.

The Window Manager

Because multiple applications can be running simultaneously, rules must exist for arbitrating conflicting demands for input. For example, does keyboard input automatically go to whichever window the pointer is in, or must the user explicitly select a window? How does the user move or resize windows?

Unlike most window systems, X itself makes no rules about this kind of thing. Instead, there is a special client called the window manager that manages the positions and sizes of the main windows of applications on a server's display. In Motif, this client is mwm. The window manager is just another client, but by convention it is given special responsibility to mediate competing demands for the physical resources of a display, including screen space, color resources, and the keyboard. The window manager allows the user to move windows around on the screen, resize them, and usually start new applications. The window manager also defines much of the visible behavior of the window system, such as whether windows are allowed to overlap or are forced to tile (side by side), and whether the keyboard focus simply follows the pointer from one window to the next window, or whether the user must click a pointer button in a window to change the keyboard focus.

Applications are required to give the window manager certain information to help it mediate competing demands for screen space or other resources. For example, an application specifies its preferred size and size increments. These are known as window manager hints because the window manager is not required to honor them. The Toolkit provides an easy way for applications to set window manager hints.

The conventions for interaction with the window manager and with other clients have been standardized by the X Consortium in a manual called the Inter-Client Communication Conventions Manual (ICCCM for short). The ICCCM defines basic policy intentionally omitted from X itself, such as the rules for transferring selections of data between applications, for transferring keyboard focus, for installing colormaps, and so on.

As long as applications and window managers follow the conventions set out in the ICCCM, applications created with different toolkits will be able to coexist and work together on the same server. Toolkit applications should be immune to the effects of changes from earlier conventions because the conventions are implemented by code hidden in a standard widget called Shell. However, you should be aware that some older applications and window managers do not play by the current rules.

Extensions to X

X is also extensible. The code includes a defined mechanism for incorporating extensions, so that vendors aren't forced to modify the existing system in incompatible ways when adding features. An extension requires an additional piece of software on the server side and an additional library at the same level as Xlib on the client side. After an initial query to see whether the server portion of the extension software is installed, these extensions are used just as Xlib routines and perform at the same level.

As time goes on, some extensions will become a basic part of what is called “X,” and will become X Consortium standards themselves. For example, as of Release 5 the X Consortium has standardized three extensions: the non-rectangular window Shape extension, the X Input extension for supporting input devices other than the keyboard and mouse, and PEX for 3-D graphics. The only one of these libraries that is widely available on X servers, and is commonly used in conjunction with Xt, is the Shape extension. The C programming library used to access the Shape extension is -lXext.



[2] The name “X Windows” is frowned upon by the developers of X.

[3] Note however that there is a standard extension, Shape, that supports non-rectangular windows.

[4] The X Protocol runs on top of any lower-level network protocol that provides bidirectional communication, and delivers bytes unduplicated and in sequence. TCP/IP and DECnet are the most common low-level network protocols currently supported by X servers.

[5] Companies such as Graphics Software Systems, Interactive Systems, and Locus Computing offer server implementations for IBM-compatible PCs. White Pine Software offers an X server that runs under Multifinder on the Macintosh. An Amiga server is available from GfxBase/Boing. X terminals are available from Visual Technology, NCR, Network Computing Devices (NCD), Tektronix, Graphon Corp, and other companies. The number of X products on the market is growing rapidly.

[6] Volume Zero, provides a conceptual discussion of the X Protocol and its detailed specification.