Chapter 6. Archive Logging

Performance monitoring and management in complex systems demands the ability to accurately capture performance characteristics for subsequent review, analysis, and comparison. Performance Co-Pilot (PCP) provides extensive support for the creation and management of archive logs that capture a user-specified profile of performance information to support retrospective performance analysis.

The following major sections are included in this chapter:

Introduction to Archive Logging

Within the PCP, the pmlogger utility may be configured to collect archives of performance metrics. The archive creation process is easy and very flexible, incorporating the following features:

  • Archive log creation at either a PCP collector (typically a server) or a PCP monitor system (typically a workstation), or at some designated PCP archive logger host.

  • Concurrent independent logging, both local and remote. The performance analyst can activate a private pmlogger instance to collect only the metrics of interest for the problem at hand, independent of other logging on the workstation or remote host.

  • Record mode in various GUI monitoring tools to create archives as needed from the current visualization.

  • Independent determination of logging frequency for individual metrics or metric instances. For example, you could log the “5 minute” load average every half hour, the write I/O rate on the DBMS log spindle every 10 seconds, and aggregate I/O rates on the other disks every minute.

  • Dynamic adjustment of what is to be logged, and how frequently, via pmlc. This feature may be used to disable logging or to increase the sample interval during periods of low activity or chronic high activity (to minimize logging overhead and intrusion). A local pmlc may interrogate and control a remote pmlogger, subject to the access control restrictions implemented by pmlogger.

  • Self-contained logs that include all system configuration and metadata required to interpret the values in the log. These logs can be kept for analysis at a much later time, potentially after the hardware or software has been reconfigured and the logs have been stored as discrete, autonomous files for remote analysis.

  • Archive folios as a convenient aggregation of multiple archive logs. Archive folios may be created with the mkaf utility and processed with the pmafm tool.

Archive Logs and the PMAPI

Critical to the success of the PCP archive logging scheme is the fact that the library routines providing access to real-time feeds of performance metrics also provide access to the archive logs.

Live feeds (or real-time) sources of performance metrics and archives are literally interchangeable, with a single Performance Metrics Application Programming Interface (PMAPI) that preserves the same semantics for both styles of metric source. In this way, applications and tools developed against the PMAPI can automatically process either live or historical performance data.

The only restriction is that both live and historical data cannot be monitored simultaneously with the same invocation of a visualization tool.

Retrospective Analysis Using Archive Logs

One of the most important applications of archive logging services provided by PCP is in the area of retrospective analysis. In many cases, understanding today's performance problems can be assisted by side-by-side comparisons with yesterday's performance. With routine creation of performance archive logs, you can concurrently replay pictures of system performance for two or more periods in the past.

Archive logs are also an invaluable source of intelligence when trying to diagnose what went wrong, as in a performance postmortem. Because the PCP archive logs are entirely self-contained, this analysis can be performed off-site if necessary.

Each archive log contains metric values from only one host. However, many PCP tools can simultaneously visualize values from multiple archives collected from different hosts.

The archives can be replayed against the inference engine (pmie is an application that uses the PMAPI). This allows you to automate the regular, first-level analysis of system performance.

Such analysis can be performed by constructing suitable expressions to capture the essence of common resource saturation problems, then periodically creating an archive and playing it against the expressions. For example, you may wish to create a daily performance audit (run by the cron command) to detect performance regressions.

For more about pmie, see Chapter 5, “Performance Metrics Inference Engine”.

Using Archive Logs for Capacity Planning

By collecting performance archives with relatively long sampling periods, or by reducing the daily archives to produce summary logs, the capacity planner can collect the base data required for forward projections, and can estimate resource demands and explore “what if” scenarios by replaying data using visualization tools and the inference engine.

Using Archive Logs with Performance Visualization Tools

Most PCP tools default to real-time display of current values for performance metrics from PCP collector host(s). However, most PCP tools also have the capability to display values for performance metrics retrieved from PCP archive log(s). The following sections describe plans, steps, and general issues involving archive logs and the PCP tools.

Coordination between pmlogger and PCP tools

Most commonly, a PCP tool would be invoked with the -a option to process an archive log some time after pmlogger had finished creating the archive. However, a tool such as oview that uses a Time Control dialog (see “Time Duration and Control” in Chapter 3) stops when the end of archive is reached, but could resume if more data is written to the PCP archive log.


Note: pmlogger uses buffered I/O to write the archive log so that the end of the archive may be aligned with an I/O buffer boundary, rather than with a logical archive log record. If such an archive was read by a PCP tool, it would appear truncated and might confuse the tool. These problems may be avoided by sending pmlogger a SIGUSR1 signal, or by using the flush command of pmlc to force pmlogger to flush its output buffers.


Archive Log File Management

PCP archive log files can occupy a great deal of disk space, and management of archive logs can be a large task in itself. The following sections provide information to assist you in PCP archive log file management.

Basename Conventions

When a PCP archive is created by pmlogger, an archive basename must be specified and several physical files are created, as shown in Table 6-1.

Table 6-1. Filenames for PCP Archive Log Components (archive.*)

Filename

Contents

archive.index

Temporal index for rapid access to archive contents.

archive.meta

Metadata descriptions for performance metrics and instance domains appearing in the archive.

archive.N

Volumes of performance metrics values, for N = 0,1,2,...


Basenames for Managed Archive Log Files

The PCP archive management tools support a consistent scheme for selecting the basenames for the files in a collection of archives and for mapping these files to a suitable directory hierarchy.

Once configured, the PCP tools that manage archive logs employ a consistent scheme for selecting the basename for an archive each time pmlogger is launched, namely the current date and time in the format YYYYMMDD.HH.MM. Typically, at the end of each day, all archives for a particular host on that day would be merged to produce a single archive with a basename constructed from the date, namely YYYYMMDD. The pmlogger_daily script performs this action and a number of other routine housekeeping chores.

Directory Organization for Archive Log Files

If you are using a deployment of PCP tools and daemons to collect metrics from a variety of hosts and storing them all at a central location, you should develop an organized strategy for storing and naming your log files.


Note: There are many possible configurations of pmlogger. The directory organization described in this section is recommended for any system on which pmlogger is configured for permanent execution (as opposed to short-term executions, for example, as launched from pmchart to record some performance data of current interest).

Typically, the IRIX filesystem structure can be used to reflect the number of hosts for which a pmlogger instance is expected to be running locally, obviating the need for lengthy and cumbersome filenames. It makes considerable sense to place all logs for a particular host in a separate directory named after that host. Because each instance of pmlogger can only log metrics fetched from a single host, this also simplifies some of the archive log management and administration tasks.

For example, consider the filesystem and naming structure shown in Figure 6-1.

Figure 6-1. Archive Log Directory Structure

Archive Log Directory Structure

The specification of where to place the archive log files for particular pmlogger instances is encoded in the configuration file /var/pcp/config/pmlogger/control, and this file should be customized on each host running an instance of pmlogger.

If many archives are being created, and the associated PCP collector systems form peer classes based upon service type (for example, Web servers, DBMS servers, NFS servers, and so on), then it may be appropriate to introduce another layer into the directory structure, or use symbolic links to group together hosts providing similar service types.

Log Volumes

A single PCP archive may be partitioned into a number of volumes. These volumes may expedite management of the archive; however, the metadata file and at least one volume must be present before a PCP tool can process the archive.

You can control the size of an archive log volume by using the -v command line option to pmlogger. This option specifies how large a volume should become before pmlogger starts a new volume. Archive log volumes retain the same base filename as other files in the archive log, and are differentiated by a numeric suffix that is incremented with each volume change. For example, you might have a log volume sequence that looks like this:

netserver.log.0
netserver.log.1
netserver.log.2

You can also cause an existing log to be closed and a new one to be opened by sending a SIGHUP signal to pmlogger, or by using the pmlc command to change the pmlogger instructions dynamically, without interrupting pmlogger operation. Complete information on log volumes is found in the pmlogger(1) man page.

Configuration of pmlogger

The configuration files used by pmlogger describe which metrics are to be logged. Groups of metrics may be logged at different intervals to other groups of metrics. Two states, mandatory and advisory, also apply to each group of metrics, defining whether metrics definitely should be logged or not logged, or whether a later advisory definition may change that state.

The mandatory state takes precedence if it is on or off, causing any subsequent request for a change in advisory state to have no effect. If the mandatory state is maybe, then the advisory state determines if logging is enabled or not.

The mandatory states are on, off, and maybe. The advisory states, which only affect metrics that are mandatory maybe, are on and off. Therefore, a metric that is mandatory maybe in one definition and advisory on in another definition would be logged at the advisory interval. Metrics that are not specified in the pmlogger configuration file are mandatory maybe and advisory off by default and are not logged.

A complete description of the pmlogger configuration format can be found on the pmlogger(1) man page.

PCP Archive Contents

Once a PCP archive log has been created, the pmdumplog utility may be used to display various information about the contents of the archive. For example, start with the following command:

pmdumplog -l /var/adm/pcplog/www.sgi.com/960731

It might produce the following output:

Log Label (Log Format Version 1)
Performance metrics from host www.sgi.com
     commencing Wed Jul 31 00:16:34.941 1996
     ending     Thu Aug  1 00:18:01.468 1996

The simplest way to discover what performance metrics are contained within an archive is to use pminfo as shown in Example 6-1:

Example 6-1. Using pminfo to Obtain Archive Information

pminfo -a /var/adm/pcplog/www.sgi.com/960731 network.mbuf
network.mbuf.alloc 
network.mbuf.typealloc
network.mbuf.clustalloc
network.mbuf.clustfree
network.mbuf.failed
network.mbuf.waited
network.mbuf.drained


Other Archive Logging Features and Services

Other archive logging features and services include PCP archive folios, manipulating archive logs, primary logger, and using pmlc.

PCP Archive Folios

A collection of one or more PCP archive logs may be combined with a control file to produce a PCP archive folio. Archive folios are created using either mkaf or the interactive record mode services of various PCP GUI monitoring tools.

  • Checking the integrity of the archives in the folio.

  • Displaying information about the component archives.

  • Executing PCP tools with their source of performance metrics assigned concurrently to all of the component archives (where the tool supports this), or serially executing the PCP tool once per component archive.

  • If the folio was created by a single PCP monitoring tool, replaying all of the archives in the folio with that monitoring tool.

  • Restricting the processing to particular archives, or the archives associated with particular hosts.

Using pmlc

You may tailor pmlogger dynamically with the pmlc command. Normally, the pmlogger configuration is read at startup. If you choose to modify the config file to change the parameters under which pmlogger operates, you must stop and restart the program for your changes to have effect. Alternatively, you may change parameters whenever required by using the pmlc interface.

To run the pmlc tool, enter:

pmlc

By default, pmlc acts on the primary instance of pmlogger on the current host. See the pmlc(1) man page for a description of command line options. When it is invoked, pmlc presents you with a prompt:

pmlc> 

You may obtain a listing of the available commands by entering a question mark (?) and pressing Enter. You see output similar to that in Example 6-2:

Example 6-2. Listing Available Commands

     show loggers [@<host>]           display <pid>s of running pmloggers
     connect _logger_id [@<host>]     connect to designated pmlogger
     status                           information about connected pmlogger
     query metric-list                show logging state of metrics
     new volume                       start a new log volume
     flush                            flush the log buffers to disk
     log { mandatory | advisory } on <interval> _metric-list
     log { mandatory | advisory } off _metric-list
     log mandatory maybe _metric-list
     timezone local|logger|'<timezone>' change reporting timezone
     help                               print this help message
     quit                               exit from pmlc
     _logger_id   is  primary | <pid> | port <n>
     _metric-list is  _metric-spec | { _metric-spec ... }
     _metric-spec is  <metric-name> | <metric-name> [ <instance> ... ]

Here is an example:

pmlc
pmlc> show loggers @babylon
The following pmloggers are running on babylon:
       primary (1892)
pmlc> connect 1892 @babylon
pmlc> log advisory on 2 secs disk.dev.read
pmlc> query disk.dev
disk.dev.read
       adv  on  nl       5 min  [131073 or “dks0d1”]
       adv  on  nl       5 min  [131074 or “dks0d2”]
pmlc> quit



Note: Any changes to the set of logged metrics made via pmlc are not saved, and are lost the next time pmlogger is started with the same configuration file. Permanent changes are made by modifying the pmlogger configuration file(s).

Refer to the pmlc(1) and pmlogger(1) man pages for complete details.

Cookbook for Archive Logging

The following sections present a checklist of tasks that may be performed to enable PCP archive logging with minimal effort. For a complete explanation, refer to the other sections in this chapter and the man pages for pmlogger and related tools.

Primary Logger

Assume you wish to activate primary archive logging on the PCP collector host pluto. Execute all of the following tasks while logged into pluto as the superuser (root).

  1. Create the directory to hold the archive logs:

    mkdir /var/adm/pcplog/pluto

  2. Choose a suitable pmlogger configuration file. Here are some examples:

    • The default configuration: /var/pcp/config/pmlogger/config.default.

    • A broad summary configuration, sufficient to be used with dkvis, mpvis, nfsvis, and pmkstat: /var/pcp/config/pmlogger/config.Summary.

    • One of the other config.* files in the /var/pcp/config/pmlogger directory, tailored for an application, a PCP add-on product, a pmchart view, or a PCP monitor tool.

      Copy the chosen configuration file to /var/adm/pcplog/pluto/config.default (possibly after some customization).

  3. Edit /var/pcp/config/pmlogger/control. Using the line for the “local primary logger” as a template, add the following line to the file:

    pluto  y  n  /var/adm/pcplog/pluto  -c config.default 

  4. Make sure PMCD and pmlogger are enabled and running:

    chkconfig pmcd on
    chkconfig pmlogger on
    /etc/init.d/pcp start
    Performance Co-Pilot PMCD started (logfile is .... /pmcd.log)
    Performance Co-Pilot Primary Logger started

  5. Verify that the primary pmlogger instance is running:

    pmlc
    pmlc> connect primary
    pmlc> status
    pmlogger [primary] on host pluto is logging metrics from host pluto
    log started      Thu Aug  8 14:33:01 1996 (times in local time)
    last log entry   Thu Aug  8 14:34:11 1996
    current time     Thu Aug  8 14:36:54 1996
    log volume       0
    log size         284

  6. Verify that the archive files are being created in the correct place:

    ls /var/adm/pcplog/pluto
    960808.14.33.0 
    960808.14.33.index
    960808.14.33.meta
    Latest
    pmlogger.log

Other Logger Configurations

Assume you wish to create archive logs on the local host for performance metrics collected from the remote host bert. Execute all of the following tasks while logged into the local host as the superuser (root).

Procedure 6-1. Creating Archive Logs

  1. Create the directory to hold the archive logs:

    mkdir /var/adm/pcplog/bert 

  2. Choose a suitable pmlogger configuration file. Here are three examples:

    • The default configuration: /var/pcp/config/pmlogger/config.default.

    • A broad summary configuration, sufficient to be used with dkvis, mpvis, nfsvis, and pmkstat: /var/pcp/config/pmlogger/config.Summary.

    • One of the other config.* files in the /var/pcp/config/pmlogger directory, tailored for an application, a PCP add-on product, a pmchart view, or a PCP monitor tool.

      Copy the chosen configuration file to /var/adm/pcplog/bert/config.default (possibly after some customization).

  3. Edit /var/pcp/config/pmlogger/control. Using the line for remote as a template, add the following line to the file:

    bert  n  n  /var/adm/pcplog/bert  -c ./config.default

  4. Start pmlogger:

    /usr/pcp/bin/pmlogger_check
    Restarting pmlogger for host "bert" ..... done

  5. Verify that the pmlogger instance is running:

    pmlc
    pmlc> show loggers
    The following pmloggers are running on bert:
            primary (19144)
    pmlc> connect 19144
    pmlc> status
    pmlogger [19144] on host ernie is logging metrics from host bert
    log started      Thu Aug  8 10:10:10 1996 (times in local time)
    last log entry   Thu Aug  8 14:50:54 1996
    current time     Thu Aug  8 14:55:48 1996
    log volume       0
    log size         256

    To create archive logs on the local host for performance metrics collected from multiple remote hosts, repeat the steps in Procedure 6-1 for each remote host.

    Archive Log Administration

    Assume the local host has been set up to create archive logs of performance metrics collected from one or more hosts (which may be either the local host or a remote host).

    To activate the maintenance and housekeeping scripts for a collection of archive logs, execute the following tasks while logged into the local host as the superuser (root):

    1. Augment the crontab file for root. For example:

      crontab -l >/tmp/foo

    2. Edit /tmp/foo, adding lines similar to those from /var/pcp/config/pmlogger/crontab for pmlogger_daily and pmlogger_check; for example:

      # daily processing of archive logs
      10     0     *     *     *       /usr/pcp/bin/pmlogger_daily
      # every 30 minutes, check pmlogger instances are running
      25,55  *     *     *     *       /usr/pcp/bin/pmlogger_check

    3. Make these changes permanent with this command:

      crontab </tmp/foo

    Manipulating Archive Logs with pmlogextract

    The pmlogextract tool takes a number of PCP archive logs from a single host and performs the following tasks:

    • Merges the archives into a single log, while maintaining the correct time stamps for all values.

    • Extracts all metric values within a temporal window that could encompass several archive logs.

    • Extracts only a configurable subset of metrics from the archive logs.

    See the pmlogextract(1) man page for full information on this command. It replaced functionality of the pmlogmerge tool.

    Primary Logger

    On each system for which PMCD is active (each PCP collector system), there is an option to have a distinguished instance of the archive logger pmlogger (the “primary” logger) launched each time PMCD is started. This may be used to ensure the creation of minimalist archive logs required for ongoing system management and capacity planning in the event of failure of a system where a remote pmlogger may be running, or because the preferred archive logger deployment is to activate pmlogger on each PCP collector system.

    Run the following command as superuser on each PCP collector system where you want to activate the primary pmlogger:

    chkconfig pmlogger on

    The primary logger launches the next time PMCD is started. If you wish this to happen immediately, follow up with this command:

    /etc/init.d/pcp start

    When it is started in this fashion, the /etc/config/pmlogger.options file provides command line options for pmlogger. In the default setup, this in turn means that the initial logging state and configuration is specified in the file /var/pcp/config/pmlogger/config.default. Either one or both of these files may be modified to tailor pmlogger operation to the local requirements.

    Archive Logging Troubleshooting

    The following issues concern the creation and use of logs using pmlogger.

    Primary pmlogger Cannot Start

    Symptom: 

    The primary pmlogger cannot be started. A message like the following appears:

    pmlogger: there is already a primary pmlogger running

    Cause: 

    There is either a primary pmlogger already running, or the previous primary pmlogger was terminated unexpectedly before it could perform its cleanup operations.

    Resolution: 

    If there is already a primary pmlogger running and you wish to replace it with a new pmlogger, use the show command in pmlc to determine the process ID of the primary pmlogger. The process ID of the primary pmlogger appears in parentheses after the word “primary.” Send an SIGINT signal to the process to shut it down (use the kill command). If the process does not exist, proceed to the manual cleanup described in the paragraph below. If the process did exist, it should now be possible to start the new pmlogger.

    If pmlc's show command displays a process ID for a process that does not exist, a pmlogger process was terminated before it could clean up. If it was the primary pmlogger, the corresponding control files must be removed before one can start a new primary pmlogger. It is a good idea to clean up any spurious control files even if they are not for the primary pmlogger.

    The control files are kept in /var/tmp/pmlogger. A control file with the process ID of the pmlogger as its name is created when the pmlogger is started. In addition, the primary pmlogger creates a symbolic link named primary to its control file.

    For the primary pmlogger, remove both the symbolic link and the file (corresponding to its process ID) to which the link points. For other pmloggers, remove just the process ID file. Do not remove any other files in the directory. If the control file for an active pmlogger is removed, pmlc is not able to contact it.

    pmlogger Cannot Write Log

    Symptom: 

    The pmlogger utility does not start, and you see this message:

    _pmLogNewFile: “foo.index” already exists, not over-written

    Cause: 

    Archive logs are considered sufficiently precious that pmlogger does not empty or overwrite an existing set of archive log files. The log named foo actually consists of the physical file foo.index, foo.meta, and at least one file foo.N, where N is in the range 0, 1, 2, 3, and so on.

    A message similar to the one above is produced when a new pmlogger instance encounters one of these files already in existence.

    Resolution: 

    If you are sure, remove all of the parts of the archive log. For example, use the following command:

    rm -f foo.*

    Then rerun pmlogger.

    Cannot Find Log

    Symptom: 

    The pmdumplog utility, or any tool that can read an archive log, displays this message:

    Cannot open archive mylog: No such file or directory

    Cause: 

    An archive consists of at least three physical files. If the base name for the archive is mylog, then the archive actually consists of the physical files mylog.index, mylog.meta, and at least one file mylog.N, where N is in the range 0, 1, 2, 3, and so on.

    The above message is produced if one or more of the files is missing.

    Resolution: 

    Use this command to check which files the utility is trying to open:

    ls mylog.*

    Turn on the internal debug flag DBG_TRACE_LOG (-D 128) to see which files are being inspected by the _pmOpenLog routine as shown in the following example:

    pmdumplog -D 128 -l mylog

    Locate the missing files and move them all to the same directory, or remove all of the files that are part of the archive, and recreate the archive log.

    Identifying an Active pmlogger Process

    Symptom: 

    You have a PCP archive log that is demonstrably growing, but do not know the identify of the associated pmlogger process.

    Cause: 

    The PID is not obvious from the log, or the archive name may not be obvious from the output of the ps command.

    Resolution: 

    If the archive basename is foo, run the following commands:

    pmdumplog -l foo
    Log Label (Log Format Version 1)
    Performance metrics from host gonzo 
         commencing Wed Aug  7 00:10:09.214 1996 
         ending     Wed Aug  7 16:10:09.155 1996 
    pminfo -a foo -f pmcd.pmlogger 
    pmcd.pmlogger.host
         inst [10728 or "10728"] value "gonzo.melbourne.sgi.com"
    pmcd.pmlogger.port
         inst [10728 or "10728"] value 4331
    pmcd.pmlogger.archive
         inst [10728 or "10728"] value "/usr/var/adm/pcplog/gonzo/foo"

    All of the information describing the creator of the archive is revealed and, in particular, the instance identifier for the PMCD metrics (10728 in the example above) is the PID of the pmlogger instance, which may be used to control the process via pmlc.

    Illegal Label Record

    Symptom: 

    PCP tools report:

    Illegal label record at start of PCP archive log file.

    Cause: 

    Either you are attempting to read a Version 2 archive with a PCP 1.x tool, or the archive log has become corrupted.

    Resolution: 

    By default, pmlogger in PCP release 2.0 and later generates Version 2 archives that PCP 1.0 to 1.3 tools cannot interpret. If you must use older tools, pass the -V 1 option to pmlogger, forcing it to generate Version 1 archives.

    Empty Archive Log Files or pmlogger Exits Immediately

    Symptom: 

    Archive log files are zero size, requested metrics are not being logged, or pmlogger exits immediately with no error messages.

    Cause: 

    Either pmlogger encountered errors in the configuration file or has not flushed its output buffers yet or some (or all) metrics specified in the pmlogger configuration file have had their state changed to advisory off or mandatory off via pmlc. It is also possible that the logging interval specified in the pmlogger configuration file for some or all of the metrics is longer than the period of time you have been waiting since pmlogger started.

    Resolution: 

    If pmlogger exits immediately with no error messages, check the pmlogger.log file in the directory pmlogger was started in for any error messages. If pmlogger has not yet flushed its buffers, enter the following command:

    killall -SIGUSR1 pmlogger

    Otherwise, use the status command for pmlc to interrogate the internal pmlogger state of specific metrics.