This file documents NCO, a collection of utilities to manipulate and analyze netCDF files.

Copyright © 1995–2008 Charlie Zender

This is the first edition of the NCO User's Guide,
and is consistent with version 2 of texinfo.tex.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. The license is available online at http://www.gnu.org/copyleft/fdl.html

The original author of this software, Charlie Zender, wants to improve it with the help of your suggestions, improvements, bug-reports, and patches.
Charlie Zender <surname at uci dot edu> (yes, my surname is zender)
3200 Croul Hall
Department of Earth System Science
University of California, Irvine
Irvine, CA 92697-3100


Next: , Previous: (dir), Up: (dir)

NCO User's Guide

Note to readers of the NCO User's Guide in HTML format: The NCO User's Guide in PDF format (also on SourceForge) contains the complete NCO documentation.
This HTML documentation is equivalent except it refers you to the printed (i.e., DVI, PostScript, and PDF) documentation for description of complex mathematical expressions.

The netCDF Operators, or NCO, are a suite of programs known as operators. The operators facilitate manipulation and analysis of data stored in the self-describing netCDF format, available from (http://www.unidata.ucar.edu/packages/netcdf). Each NCO operator (e.g., ncks) takes netCDF input file(s), performs an operation (e.g., averaging, hyperslabbing, or renaming), and outputs a processed netCDF file. Although most users of netCDF data are involved in scientific research, these data formats, and thus NCO, are generic and are equally useful in fields from agriculture to zoology. The NCO User's Guide illustrates NCO use with examples from the field of climate modeling and analysis. The NCO homepage is http://nco.sf.net, and there is a mirror at http://dust.ess.uci.edu/nco.

This documentation is for NCO version 3.9.5. It was last updated 11 May 2008. Corrections, additions, and rewrites of this documentation are very welcome.

Enjoy,
Charlie Zender


Next: , Previous: Top, Up: Top

Foreword

NCO is the result of software needs that arose while I worked on projects funded by NCAR, NASA, and ARM. Thinking they might prove useful as tools or templates to others, it is my pleasure to provide them freely to the scientific community. Many users (most of whom I have never met) have encouraged the development of NCO. Thanks espcially to Jan Polcher, Keith Lindsay, Arlindo da Silva, John Sheldon, and William Weibel for stimulating suggestions and correspondence. Your encouragment motivated me to complete the NCO User's Guide. So if you like NCO, send me a note! I should mention that NCO is not connected to or officially endorsed by Unidata, ACD, ASP, CGD, or Nike.


Charlie Zender
May 1997
Boulder, Colorado


Major feature improvements entitle me to write another Foreword. In the last five years a lot of work has been done to refine NCO. NCO is now an open source project and appears to be much healthier for it. The list of illustrious institutions that do not endorse NCO continues to grow, and now includes UCI.

Charlie Zender
October 2000
Irvine, California


The most remarkable advances in NCO capabilities in the last few years are due to contributions from the Open Source community. Especially noteworthy are the contributions of Henry Butowsky and Rorik Peterson.

Charlie Zender
January 2003
Irvine, California


NCO has been generously supported from 2004–2008 by US National Science Foundation (NSF)grant IIS-0431203. This support allowed me to maintain and extend core NCO code, and others to advance NCO in new directions: Gayathri Venkitachalam helped implement MPI; Harry Mangalam improved regression testing and benchmarking; Daniel Wang developed the server-side capability, SWAMP; and Henry Butowsky, a long-time contributor, developed ncap2. This support also led NCO to debut in professional journals and meetings. The personal and professional contacts made during this evolution have been immensely rewarding.

Charlie Zender
March 2008
Grenoble, France


Next: , Previous: Foreword, Up: Top

Summary

This manual describes NCO, which stands for netCDF Operators. NCO is a suite of programs known as operators. Each operator is a standalone, command line program executed at the shell-level like, e.g., ls or mkdir. The operators take netCDF files (including HDF5 files constructed using the netCDF API) as input, perform an operation (e.g., averaging or hyperslabbing), and produce a netCDF file as output. The operators are primarily designed to aid manipulation and analysis of data. The examples in this documentation are typical applications of the operators for processing climate model output. This stems from their origin, though the operators are as general as netCDF itself.


Next: , Previous: Summary, Up: Top

1 Introduction


Next: , Previous: Introduction, Up: Introduction

1.1 Availability

The complete NCO source distribution is currently distributed as a compressed tarfile from http://sf.net/projects/nco and from http://dust.ess.uci.edu/nco/nco.tar.gz. The compressed tarfile must be uncompressed and untarred before building NCO. Uncompress the file with ‘gunzip nco.tar.gz’. Extract the source files from the resulting tarfile with ‘tar -xvf nco.tar’. GNU tar lets you perform both operations in one step with ‘tar -xvzf nco.tar.gz’.

The documentation for NCO is called the NCO User's Guide. The User's Guide is available in Postscript, HTML, DVI, TeXinfo, and Info formats. These formats are included in the source distribution in the files nco.ps, nco.html, nco.dvi, nco.texi, and nco.info*, respectively. All the documentation descends from a single source file, nco.texi 1. Hence the documentation in every format is very similar. However, some of the complex mathematical expressions needed to describe ncwa can only be displayed in DVI, Postscript, and PDF formats.

If you want to quickly see what the latest improvements in NCO are (without downloading the entire source distribution), visit the NCO homepage at http://nco.sf.net. The HTML version of the User's Guide is also available online through the World Wide Web at URL http://nco.sf.net/nco.html. To build and use NCO, you must have netCDF installed. The netCDF homepage is http://www.unidata.ucar.edu/packages/netcdf.

New NCO releases are announced on the netCDF list and on the nco-announce mailing list http://lists.sf.net/mailman/listinfo/nco-announce.


Next: , Previous: Availability, Up: Introduction

1.2 Operating systems compatible with NCO

NCO has been successfully ported and tested and is known to work on the following 32- and 64-bit platforms: IBM AIX 4.x, 5.x, FreeBSD 4.x, GNU/Linux 2.x, LinuxPPC, LinuxAlpha, LinuxARM, LinuxSparc64, SGI IRIX 5.x and 6.x, MacOS X 10.x, NEC Super-UX 10.x, DEC OSF, Sun SunOS 4.1.x, Solaris 2.x, Cray UNICOS 8.x–10.x, and MS Windows95 and all later versions. If you port the code to a new operating system, please send me a note and any patches you required.

The major prerequisite for installing NCO on a particular platform is the successful, prior installation of the netCDF library (and, as of 2003, the UDUnits library). Unidata has shown a commitment to maintaining netCDF and UDUnits on all popular UNIX platforms, and is moving towards full support for the Microsoft Windows operating system (OS). Given this, the only difficulty in implementing NCO on a particular platform is standardization of various C and Fortran interface and system calls. NCO code is tested for ANSI compliance by compiling with C compilers including those from GNU (‘gcc -std=c99 -pedantic -D_BSD_SOURCE -D_POSIX_SOURCE’ -Wall) 2, Comeau Computing (‘como --c99’), Cray (‘cc’), HP/Compaq/DEC (‘cc’), IBM (‘xlc -c -qlanglvl=extc99’), Intel (‘icc -std=c99’), NEC (‘cc’), PathScale (QLogic) (‘pathcc -std=c99’), PGI (‘pgcc -c9x’), SGI (‘cc -c99’), and Sun (‘cc’). NCO (all commands and the libnco library) and the C++ interface to netCDF (called libnco_c++) comply with the ISO C++ standards as implemented by Comeau Computing (‘como’), Cray (‘CC’), GNU (‘g++ -Wall’), HP/Compaq/DEC (‘cxx’), IBM (‘xlC’), Intel (‘icc’), NEC (‘c++’), PathScale (Qlogic) (‘pathCC’), PGI (‘pgCC’), SGI (‘CC -LANG:std’), and Sun (‘CC -LANG:std’). See nco/bld/Makefile and nco/src/nco_c++/Makefile.old for more details and exact settings.

Until recently (and not even yet), ANSI-compliant has meant compliance with the 1989 ISO C-standard, usually called C89 (with minor revisions made in 1994 and 1995). C89 lacks variable-size arrays, restricted pointers, some useful printf formats, and many mathematical special functions. These are valuable features of C99, the 1999 ISO C-standard. NCO is C99-compliant where possible and C89-compliant where necessary. Certain branches in the code are required to satisfy the native SGI and SunOS C compilers, which are strictly ANSI C89 compliant, and cannot benefit from C99 features. However, C99 features are fully supported by modern AIX, GNU, Intel, NEC, Solaris, and UNICOS compilers. NCO requires a C99-compliant compiler as of NCO version 2.9.8, released in August, 2004.

The most time-intensive portion of NCO execution is spent in arithmetic operations, e.g., multiplication, averaging, subtraction. These operations were performed in Fortran by default until August, 1999. This was a design decision based on the relative speed of Fortran-based object code vs. C-based object code in late 1994. C compiler vectorization capabilities have dramatically improved since 1994. We have accordingly replaced all Fortran subroutines with C functions. This greatly simplifies the task of building NCO on nominally unsupported platforms. As of August 1999, NCO built entirely in C by default. This allowed NCO to compile on any machine with an ANSI C compiler. In August 2004, the first C99 feature, the restrict type qualifier, entered NCO in version 2.9.8. C compilers can obtain better performance with C99 restricted pointers since they inform the compiler when it may make Fortran-like assumptions regarding pointer contents alteration. Subsequently, NCO requires a C99 compiler to build correctly 3.

In June 2005, NCO version 3.0.1 began to take advantage of C99 mathematical special functions. These include the standarized gamma function (called tgamma() for “true gamma”). NCO automagically takes advantage of some GNU Compiler Collection (GCC) extensions to ANSI C.

As of July 2000 and NCO version 1.2, NCO no longer performs arithmetic operations in Fortran. We decided to sacrifice executable speed for code maintainability. Since no objective statistics were ever performed to quantify the difference in speed between the Fortran and C code, the performance penalty incurred by this decision is unknown. Supporting Fortran involves maintaining two sets of routines for every arithmetic operation. The USE_FORTRAN_ARITHMETIC flag is still retained in the Makefile. The file containing the Fortran code, nco_fortran.F, has been deprecated but a volunteer (Dr. Frankenstein?) could resurrect it. If you would like to volunteer to maintain nco_fortran.F please contact me.


Previous: Compatability, Up: Compatability

1.2.1 Compiling NCO for Microsoft Windows OS

NCO has been successfully ported and tested on the Microsoft Windows (95/98/NT/2000/XP) operating systems. The switches necessary to accomplish this are included in the standard distribution of NCO. Using the freely available Cygwin (formerly gnu-win32) development environment 4, the compilation process is very similar to installing NCO on a UNIX system. Set the PVM_ARCH preprocessor token to WIN32. Note that defining WIN32 has the side effect of disabling Internet features of NCO (see below). NCO should now build like it does on UNIX.

The least portable section of the code is the use of standard UNIX and Internet protocols (e.g., ftp, rcp, scp, sftp, getuid, gethostname, and header files <arpa/nameser.h> and <resolv.h>). Fortunately, these UNIX-y calls are only invoked by the single NCO subroutine which is responsible for retrieving files stored on remote systems (see Remote storage). In order to support NCO on the Microsoft Windows platforms, this single feature was disabled (on Windows OS only). This was required by Cygwin 18.x—newer versions of Cygwin may support these protocols (let me know if this is the case). The NCO operators should behave identically on Windows and UNIX platforms in all other respects.


Next: , Previous: Compatability, Up: Introduction

1.3 Libraries

Like all executables, the NCO operators can be built using dynamic linking. This reduces the size of the executable and can result in significant performance enhancements on multiuser systems. Unfortunately, if your library search path (usually the LD_LIBRARY_PATH environment variable) is not set correctly, or if the system libraries have been moved, renamed, or deleted since NCO was installed, it is possible NCO operators will fail with a message that they cannot find a dynamically loaded (aka shared object or ‘.so’) library. This will produce a distinctive error message, such as ‘ld.so.1: /usr/local/bin/ncea: fatal: libsunmath.so.1: can't open file: errno=2’. If you received an error message like this, ask your system administrator to diagnose whether the library is truly missing 5, or whether you simply need to alter your library search path. As a final remedy, you may re-compile and install NCO with all operators statically linked.


Next: , Previous: Libraries, Up: Introduction

1.4 netCDF2/3/4 and HDF4/5 Support

netCDF version 2 was released in 1993. NCO (specifically ncks) began soon after this in 1994. netCDF 3.0 was released in 1996, and we were eager to reap the performance advantages of the newer netCDF implementation. One netCDF3 interface call (nc_inq_libvers) was added to NCO in January, 1998, to aid in maintainance and debugging. In March, 2001, the final conversion of NCO to netCDF3 was completed (coincidentally on the same day netCDF 3.5 was released). NCO versions 2.0 and higher are built with the -DNO_NETCDF_2 flag to ensure no netCDF2 interface calls are used. However, the ability to compile NCO with only netCDF2 calls is worth maintaining because HDF version 4 6 (available from HDF) supports only the netCDF2 library calls (see http://hdf.ncsa.uiuc.edu/UG41r3_html/SDS_SD.fm12.html#47784). Note that there are multiple versions of HDF. Currently HDF version 4.x supports netCDF2 and thus NCO version 1.2.x. If NCO version 1.2.x (or earlier) is built with only netCDF2 calls then all NCO operators should work with HDF4 files as well as netCDF files 7. The preprocessor token NETCDF2_ONLY exists in NCO version 1.2.x to eliminate all netCDF3 calls. Only versions of NCO numbered 1.2.x and earlier have this capability. The NCO 1.2.x branch will be maintained with bugfixes only (no new features) until HDF begins to fully support the netCDF3 interface (which is employed by NCO 2.x). If, at compilation time, NETCDF2_ONLY is defined, then NCO version 1.2.x will not use any netCDF3 calls and, if linked properly, the resulting NCO operators will work with HDF4 files. The Makefile supplied with NCO 1.2.x is written to simplify building in this HDF capability. When NCO is built with make HDF4=Y, the Makefile sets all required preprocessor flags and library links to build with the HDF4 libraries (which are assumed to reside under /usr/local/hdf4, edit the Makefile to suit your installation).

HDF version 5 became available in 1999, but did not support netCDF (or, for that matter, Fortran) as of December 1999. By early 2001, HDF5 did support Fortran90. However, support for netCDF4 in HDF5 is incomplete. Much of the HDF5-netCDF interface is complete, however, and it may be separately downloaded from the netCDF4 website. We are eager for HDF5 to complete netCDF support. This is scheduled to occur sometime in 2007, with the releases of HDF version 1.8 and netCDF version 4, which are collaborations between Unidata and NCSA. NCO version 3.0.3 added support for reading/writing netCDF4-formatted HDF5 files in October, 2005. See Selecting Output File Format for more details.

NCO version 3.9.0 added full support for all netCDF4 atomic data types in May, 2007. Support for netCDF4 features will be incremental, i.e., we will add one netCDF4 feature at a time. You must build NCO with netCDF4 to obtain this support.

The main netCDF4 features that NCO currently supports are the new atomic data types and Lempel-Ziv compression. The new atomic data types are NC_UBYTE, NC_USHORT, NC_UINT, NC_INT64, and NC_UINT64. Eight-byte integer support is especially useful improvement from netCDF3. All NCO operators support these types, e.g., ncks copies and prints them, ncra averages them, and ncap2 processes algebraic scripts with them. ncks prints compression information, if any, to screen.

Lempel-Ziv deflation is a lossless compression technique. See Deflation for more details.

netCDF4-enabled NCO handles netCDF3 files without change. In addition, it automagically handles netCDF4 (HDF5) files: If you feed NCO netCDF3 files, it produces netCDF3 output. If you feed NCO netCDF4 files, it produces netCDF4 output. Use the handy-dandy ‘-4’ switch to request netCDF4 output from netCDF3 input, i.e., to convert netCDF3 to netCDF4. See Selecting Output File Format for more details.

Use appropriate caution while netCDF4 is beta software. Problems with netCDF4 and HDF libraries are still being fixed. NCO support for netCDF4 atomic types is relatively untested. Binary NCO distributions (RPMs and debs) still use netCDF3.

For now you must build NCO from source to get netCDF4 support. Typically, one specifies the root of the netCDF4-beta installation directory. Do this with the NETCDF4_ROOT variable. Then use your preferred NCO build mechanism, e.g.,

     export NETCDF4_ROOT=/usr/local/netcdf4 # Set netCDF4 location
     cd ~/nco;./configure --enable-netcdf4  # Configure mechanism -or-
     cd ~/nco/bld;./make NETCDF4=Y allinone # Old Makefile mechanism

Our short term goal is to track the netCDF4-beta releases, keep the new netCDF4 atomic type support working, and iron out any problems. Our long term goal is to utilize more of the extensive new netCDF4 feature set. The next major netCDF4 feature we are likely to utilize is parallel I/O. We will enable this in the MPI netCDF operators.


Previous: netCDF2/3/4 and HDF4/5 Support, Up: Introduction

1.5 Help Requests and Bug Reports

We generally receive three categories of mail from users: help requests, bug reports, and feature requests. Notes saying the equivalent of "Hey, NCO continues to work great and it saves me more time everyday than it took to write this note" are a distant fourth.

There is a different protocol for each type of request. The preferred etiquette for all communications is via NCO Project Forums. Do not contact project members via personal e-mail unless your request comes with money or you have damaging information about our personal lives. Please use the Forums—they preserve a record of the questions and answers so that others can learn from our exchange. Also, since NCO is government-funded, this record helps us provide program officers with information they need to evaluate our project.

Before posting to the NCO forums described below, you might first register your name and email address with SourceForge.net or else all of your postings will be attributed to "nobody". Once registered you may choose to "monitor" any forum and to receive (or not) email when there are any postings including responses to your questions. We usually reply to the forum message, not to the original poster.

If you want us to include a new feature in NCO, check first to see if that feature is already on the TODO list. If it is, why not implement that feature yourself and send us the patch? If the feature is not yet on the list, then send a note to the NCO Discussion forum.

Read the manual before reporting a bug or posting a help request. Sending questions whose answers are not in the manual is the best way to motivate us to write more documentation. We would also like to accentuate the contrapositive of this statement. If you think you have found a real bug the most helpful thing you can do is simplify the problem to a manageable size and then report it. The first thing to do is to make sure you are running the latest publicly released version of NCO.

Once you have read the manual, if you are still unable to get NCO to perform a documented function, submit a help request. Follow the same procedure as described below for reporting bugs (after all, it might be a bug). That is, describe what you are trying to do, and include the complete commands (run with ‘-D 5’), error messages, and version of NCO (with ‘-r’). Post your help request to the NCO Help forum.

If you think you used the right command when NCO misbehaves, then you might have found a bug. Incorrect numerical answers are the highest priority. We usually fix those within one or two days. Core dumps and sementation violations receive lower priority. They are always fixed, eventually.

How do you simplify a problem that reveal a bug? Cut out extraneous variables, dimensions, and metadata from the offending files and re-run the command until it no longer breaks. Then back up one step and report the problem. Usually the file(s) will be very small, i.e., one variable with one or two small dimensions ought to suffice. Run the operator with ‘-r’ and then run the command with ‘-D 5’ to increase the verbosity of the debugging output. It is very important that your report contain the exact error messages and compile-time environment. Include a copy of your sample input file, or place one on a publically accessible location, of the file(s). Post the full bug report to the NCO Project buglist.

Build failures count as bugs. Our limited machine access means we cannot fix all build failures. The information we need to diagnose, and often fix, build failures are the three files output by GNU build tools, nco.config.log.${GNU_TRP}.foo, nco.configure.${GNU_TRP}.foo, and nco.make.${GNU_TRP}.foo. The file configure.eg shows how to produce these files. Here ${GNU_TRP} is the "GNU architecture triplet", the chip-vendor-OS string returned by config.guess. Please send us your improvements to the examples supplied in configure.eg. The regressions archive at http://dust.ess.uci.edu/nco/rgr contains the build output from our standard test systems. You may find you can solve the build problem yourself by examining the differences between these files and your own.


Next: , Previous: Introduction, Up: Top

2 Operator Strategies


Next: , Previous: Strategies, Up: Strategies

2.1 Philosophy

The main design goal is command line operators which perform useful, scriptable operations on netCDF files. Many scientists work with models and observations which produce too much data to analyze in tabular format. Thus, it is often natural to reduce and massage this raw or primary level data into summary, or second level data, e.g., temporal or spatial averages. These second level data may become the inputs to graphical and statistical packages, and are often more suitable for archival and dissemination to the scientific community. NCO performs a suite of operations useful in manipulating data from the primary to the second level state. Higher level interpretive languages (e.g., IDL, Yorick, Matlab, NCL, Perl, Python), and lower level compiled languages (e.g., C, Fortran) can always perform any task performed by NCO, but often with more overhead. NCO, on the other hand, is limited to a much smaller set of arithmetic and metadata operations than these full blown languages.

Another goal has been to implement enough command line switches so that frequently used sequences of these operators can be executed from a shell script or batch file. Finally, NCO was written to consume the absolute minimum amount of system memory required to perform a given job. The arithmetic operators are extremely efficient; their exact memory usage is detailed in Memory Requirements.


Next: , Previous: Philosophy, Up: Strategies

2.2 Climate Model Paradigm

NCO was developed at NCAR to aid analysis and manipulation of datasets produced by General Circulation Models (GCMs). Datasets produced by GCMs share many features with all gridded scientific datasets and so provide a useful paradigm for the explication of the NCO operator set. Examples in this manual use a GCM paradigm because latitude, longitude, time, temperature and other fields related to our natural environment are as easy to visualize for the layman as the expert.


Next: , Previous: Climate Model Paradigm, Up: Strategies

2.3 Temporary Output Files

NCO operators are designed to be reasonably fault tolerant, so that if there is a system failure or the user aborts the operation (e.g., with C-c), then no data are lost. The user-specified output-file is only created upon successful completion of the operation 8. This is accomplished by performing all operations in a temporary copy of output-file. The name of the temporary output file is constructed by appending .pid<process ID>.<operator name>.tmp to the user-specified output-file name. When the operator completes its task with no fatal errors, the temporary output file is moved to the user-specified output-file. Note the construction of a temporary output file uses more disk space than just overwriting existing files “in place” (because there may be two copies of the same file on disk until the NCO operation successfully concludes and the temporary output file overwrites the existing output-file). Also, note this feature increases the execution time of the operator by approximately the time it takes to copy the output-file. Finally, note this feature allows the output-file to be the same as the input-file without any danger of “overlap”.

Other safeguards exist to protect the user from inadvertently overwriting data. If the output-file specified for a command is a pre-existing file, then the operator will prompt the user whether to overwrite (erase) the existing output-file, attempt to append to it, or abort the operation. However, in processing large amounts of data, too many interactive questions slows productivity. Therefore NCO also implements two ways to override its own safety features, the ‘-O’ and ‘-A’ switches. Specifying ‘-O’ tells the operator to overwrite any existing output-file without prompting the user interactively. Specifying ‘-A’ tells the operator to attempt to append to any existing output-file without prompting the user interactively. These switches are useful in batch environments because they suppress interactive keyboard input.


Next: , Previous: Temporary Output Files, Up: Strategies

2.4 Appending Variables

Adding variables from one file to another is often desirable. This is referred to as appending, although some prefer the terminology merging 9 or pasting. Appending is often confused with what NCO calls concatenation. In NCO, concatenation refers to splicing a variable along the record dimension. Appending, on the other hand, refers to adding variables from one file to another 10. In this sense, ncks can append variables from one file to another file. This capability is invoked by naming two files on the command line, input-file and output-file. When output-file already exists, the user is prompted whether to overwrite, append/replace, or exit from the command. Selecting overwrite tells the operator to erase the existing output-file and replace it with the results of the operation. Selecting exit causes the operator to exit—the output-file will not be touched in this case. Selecting append/replace causes the operator to attempt to place the results of the operation in the existing output-file, See ncks netCDF Kitchen Sink.

The simplest way to create the union of two files is

     ncks -A fl_1.nc fl_2.nc

This puts the contents of fl_1.nc into fl_2.nc. The ‘-A’ is optional. On output, fl_2.nc is the union of the input files, regardless of whether they share dimensions and variables, or are completely disjoint. The append fails if the input files have differently named record dimensions (since netCDF supports only one), or have dimensions of the same name but different sizes.


Next: , Previous: Appending Variables, Up: Strategies

2.5 Simple Arithmetic and Interpolation

Users comfortable with NCO semantics may find it easier to perform some simple mathematical operations in NCO rather than higher level languages. ncbo (see ncbo netCDF Binary Operator) does file addition, subtraction, multiplication, division, and broadcasting. ncflint (see ncflint netCDF File Interpolator) does file addition, subtraction, multiplication and interpolation. Sequences of these commands can accomplish simple but powerful operations from the command line.


Next: , Previous: Simple Arithmetic and Interpolation, Up: Strategies

2.6 Averagers vs. Concatenators

The most frequently used operators of NCO are probably the averagers and concatenators. Because there are so many permutations of averaging (e.g., across files, within a file, over the record dimension, over other dimensions, with or without weights and masks) and of concatenating (across files, along the record dimension, along other dimensions), there are currently no fewer than five operators which tackle these two purposes: ncra, ncea, ncwa, ncrcat, and ncecat. These operators do share many capabilities 11, but each has its unique specialty. Two of these operators, ncrcat and ncecat, are for concatenating hyperslabs across files. The other two operators, ncra and ncea, are for averaging hyperslabs across files 12. First, let's describe the concatenators, then the averagers.


Next: , Previous: Averaging vs. Concatenating, Up: Averaging vs. Concatenating

2.6.1 Concatenators ncrcat and ncecat

Joining independent files together along a record dimension is called concatenation. ncrcat is designed for concatenating record variables, while ncecat is designed for concatenating fixed length variables. Consider five files, 85.nc, 86.nc, ... 89.nc each containing a year's worth of data. Say you wish to create from them a single file, 8589.nc containing all the data, i.e., spanning all five years. If the annual files make use of the same record variable, then ncrcat will do the job nicely with, e.g., ncrcat 8?.nc 8589.nc. The number of records in the input files is arbitrary and can vary from file to file. See ncrcat netCDF Record Concatenator, for a complete description of ncrcat.

However, suppose the annual files have no record variable, and thus their data are all fixed length. For example, the files may not be conceptually sequential, but rather members of the same group, or ensemble. Members of an ensemble may have no reason to contain a record dimension. ncecat will create a new record dimension (named record by default) with which to glue together the individual files into the single ensemble file. If ncecat is used on files which contain an existing record dimension, that record dimension is converted to a fixed-length dimension of the same name and a new record dimension (named record) is created. Consider five realizations, 85a.nc, 85b.nc, ... 85e.nc of 1985 predictions from the same climate model. Then ncecat 85?.nc 85_ens.nc glues the individual realizations together into the single file, 85_ens.nc. If an input variable was dimensioned [lat,lon], it will have dimensions [record,lat,lon] in the output file. A restriction of ncecat is that the hyperslabs of the processed variables must be the same from file to file. Normally this means all the input files are the same size, and contain data on different realizations of the same variables. See ncecat netCDF Ensemble Concatenator, for a complete description of ncecat.

ncpdq makes it possible to concatenate files along any dimension, not just the record dimension. First, use ncpdq to convert the dimension to be concatenated (i.e., extended with data from other files) into the record dimension. Second, use ncrcat to concatenate these files. Finally, if desirable, use ncpdq to revert to the original dimensionality. As a concrete example, say that files x_01.nc, x_02.nc, ... x_10.nc contain time-evolving datasets from spatially adjacent regions. The time and spatial coordinates are time and x, respectively. Initially the record dimension is time. Our goal is to create a single file that contains joins all the spatially adjacent regions into one single time-evolving dataset.

     for idx in 01 02 03 04 05 06 07 08 09 10; do # Bourne Shell
       ncpdq -a x,time x_${idx}.nc foo_${idx}.nc # Make x record dimension
     done
     ncrcat foo_??.nc out.nc       # Concatenate along x
     ncpdq -a time,x out.nc out.nc # Revert to time as record dimension

Note that ncrcat will not concatenate fixed-length variables, whereas ncecat concatenates both fixed-length and record variables along a new record variable. To conserve system memory, use ncrcat where possible.


Next: , Previous: Concatenation, Up: Averaging vs. Concatenating

2.6.2 Averagers ncea, ncra, and ncwa

The differences between the averagers ncra and ncea are analogous to the differences between the concatenators. ncra is designed for averaging record variables from at least one file, while ncea is designed for averaging fixed length variables from multiple files. ncra performs a simple arithmetic average over the record dimension of all the input files, with each record having an equal weight in the average. ncea performs a simple arithmetic average of all the input files, with each file having an equal weight in the average. Note that ncra cannot average fixed-length variables, but ncea can average both fixed-length and record variables. To conserve system memory, use ncra rather than ncea where possible (e.g., if each input-file is one record long). The file output from ncea will have the same dimensions (meaning dimension names as well as sizes) as the input hyperslabs (see ncea netCDF Ensemble Averager, for a complete description of ncea). The file output from ncra will have the same dimensions as the input hyperslabs except for the record dimension, which will have a size of 1 (see ncra netCDF Record Averager, for a complete description of ncra).


Previous: Averaging, Up: Averaging vs. Concatenating

2.6.3 Interpolator ncflint

ncflint can interpolate data between or two files. Since no other operators have this ability, the description of interpolation is given fully on the ncflint reference page (see ncflint netCDF File Interpolator). Note that this capability also allows ncflint to linearly rescale any data in a netCDF file, e.g., to convert between differing units.


Next: , Previous: Averaging vs. Concatenating, Up: Strategies

2.7 Large Numbers of Files

Occasionally one desires to digest (i.e., concatenate or average) hundreds or thousands of input files. Unfortunately, data archives (e.g., NASA EOSDIS) may not name netCDF files in a format understood by the ‘-n loop’ switch (see Specifying Input Files) that automagically generates arbitrary numbers of input filenames. The ‘-n loop’ switch has the virtue of being concise, and of minimizing the command line. This helps keeps output file small since the command line is stored as metadata in the history attribute (see History Attribute). However, the ‘-n loop’ switch is useless when there is no simple, arithmetic pattern to the input filenames (e.g., h00001.nc, h00002.nc, ... h90210.nc). Moreover, filename globbing does not work when the input files are too numerous or their names are too lengthy (when strung together as a single argument) to be passed by the calling shell to the NCO operator 13. When this occurs, the ANSI C-standard argc-argv method of passing arguments from the calling shell to a C-program (i.e., an NCO operator) breaks down. There are (at least) three alternative methods of specifying the input filenames to NCO in environment-limited situations.

The recommended method for sending very large numbers (hundreds or more, typically) of input filenames to the multi-file operators is to pass the filenames with the UNIX standard input feature, aka stdin:

     # Pipe large numbers of filenames to stdin
     /bin/ls | grep ${CASEID}_'......'.nc | ncecat -o foo.nc

This method avoids all constraints on command line size imposed by the operating system. A drawback to this method is that the history attribute (see History Attribute) does not record the name of any input files since the names were not passed on the command line. This makes determining the data provenance at a later date difficult. To remedy this situation, multi-file operators store the number of input files in the nco_input_file_number global attribute and the input file list itself in the nco_input_file_list global attribute (see File List Attributes). Although this does not preserve the exact command used to generate the file, it does retains all the information required to reconstruct the command and determine the data provenance.

A second option is to use the UNIX xargs command. This simple example selects as input to xargs all the filenames in the current directory that match a given pattern. For illustration, consider a user trying to average millions of files which each have a six character filename. If the shell buffer can not hold the results of the corresponding globbing operator, ??????.nc, then the filename globbing technique will fail. Instead we express the filename pattern as an extended regular expression, ......\.nc (see Subsetting Variables). We use grep to filter the directory listing for this pattern and to pipe the results to xargs which, in turn, passes the matching filenames to an NCO multi-file operator, e.g., ncecat.

     # Use xargs to transfer filenames on the command line
     /bin/ls | grep ${CASEID}_'......'.nc | xargs -x ncecat -o foo.nc

The single quotes protect the only sensitive parts of the extended regular expression (the grep argument), and allow shell interpolation (the ${CASEID} variable substitution) to proceed unhindered on the rest of the command. xargs uses the UNIX pipe feature to append the suitably filtered input file list to the end of the ncecat command options. The -o foo.nc switch ensures that the input files supplied by xargs are not confused with the output file name. xargs does, unfortunately, have its own limit (usually about 20,000 characters) on the size of command lines it can pass. Give xargs the ‘-x’ switch to ensure it dies if it reaches this internal limit. When this occurs, use either the stdin method above, or the symbolic link presented next.

Even when its internal limits have not been reached, the xargs technique may not be sophisticated enough to handle all situations. A full scripting language like Perl can handle any level of complexity of filtering input filenames, and any number of filenames. The technique of last resort is to write a script that creates symbolic links between the irregular input filenames and a set of regular, arithmetic filenames that the ‘-n loop’ switch understands. For example, the following Perl script a monotonically enumerated symbolic link to up to one million .nc files in a directory. If there are 999,999 netCDF files present, the links are named 000001.nc to 999999.nc:

     # Create enumerated symbolic links
     /bin/ls | grep \.nc | perl -e \
     '$idx=1;while(<STDIN>){chop;symlink $_,sprintf("%06d.nc",$idx++);}'
     ncecat -n 999999,6,1 000001.nc foo.nc
     # Remove symbolic links when finished
     /bin/rm ??????.nc

The ‘-n loop’ option tells the NCO operator to automatically generate the filnames of the symbolic links. This circumvents any OS and shell limits on command line size. The symbolic links are easily removed once NCO is finished. One drawback to this method is that the history attribute (see History Attribute) retains the filename list of the symbolic links, rather than the data files themselves. This makes it difficult to determine the data provenance at a later date.


Next: , Previous: Large Numbers of Files, Up: Strategies

2.8 Large Datasets

Large datasets are those files that are comparable in size to the amount of random access memory (RAM) in your computer. Many users of NCO work with files larger than 100 MB. Files this large not only push the current edge of storage technology, they present special problems for programs which attempt to access the entire file at once, such as ncea and ncecat. If you work with a 300 MB files on a machine with only 32 MB of memory then you will need large amounts of swap space (virtual memory on disk) and NCO will work slowly, or even fail. There is no easy solution for this. The best strategy is to work on a machine with sufficient amounts of memory and swap space. Since about 2004, many users have begun to produce or analyze files exceeding 2 GB in size. These users should familiarize themselves with NCO's Large File Support (LFS) capabilities (see Large File Support). The next section will increase your familiarity with NCO's memory requirements. With this knowledge you may re-design your data reduction approach to divide the problem into pieces solvable in memory-limited situations.

If your local machine has problems working with large files, try running NCO from a more powerful machine, such as a network server. Certain machine architectures, e.g., Cray UNICOS, have special commands which allow one to increase the amount of interactive memory. On Cray systems, try to increase the available memory with the ilimit command. If you get a memory-related core dump (e.g., ‘Error exit (core dumped)’) on a GNU/Linux system, try increasing the process-available memory with ulimit.

The speed of the NCO operators also depends on file size. When processing large files the operators may appear to hang, or do nothing, for large periods of time. In order to see what the operator is actually doing, it is useful to activate a more verbose output mode. This is accomplished by supplying a number greater than 0 to the ‘-D debug-level’ (or ‘--debug-level’, or ‘--dbg_lvl’) switch. When the debug-level is nonzero, the operators report their current status to the terminal through the stderr facility. Using ‘-D’ does not slow the operators down. Choose a debug-level between 1 and 3 for most situations, e.g., ncea -D 2 85.nc 86.nc 8586.nc. A full description of how to estimate the actual amount of memory the multi-file NCO operators consume is given in Memory Requirements.


Next: , Previous: Large Datasets, Up: Strategies

2.9 Memory Requirements

Many people use NCO on gargantuan files which dwarf the memory available (free RAM plus swap space) even on today's powerful machines. These users want NCO to consume the least memory possible so that their scripts do not have to tediously cut files into smaller pieces that fit into memory. We commend these greedy users for pushing NCO to its limits!

This section describes the memory NCO requires during operation. The required memory is based on the underlying algorithms. The description below is the memory usage per thread. Users with shared memory machines may use the threaded NCO operators (see OpenMP Threading). The peak and sustained memory usage will scale accordingly, i.e., by the number of threads. Memory consumption patterns of all operators are similar, with the exception of ncap2.


Next: , Previous: Memory Requirements, Up: Memory Requirements

2.9.1 Single and Multi-file Operators

The multi-file operators currently comprise the record operators, ncra and ncrcat, and the ensemble operators, ncea and ncecat. The record operators require much less memory than the ensemble operators. This is because the record operators operate on one single record (i.e., time-slice) at a time, wherease the ensemble operators retrieve the entire variable into memory. Let MS be the peak sustained memory demand of an operator, FT be the memory required to store the entire contents of all the variables to be processed in an input file, FR be the memory required to store the entire contents of a single record of each of the variables to be processed in an input file, VR be the memory required to store a single record of the largest record variable to be processed in an input file, VT be the memory required to store the largest variable to be processed in an input file, VI be the memory required to store the largest variable which is not processed, but is copied from the initial file to the output file. All operators require MI = VI during the initial copying of variables from the first input file to the output file. This is the initial (and transient) memory demand. The sustained memory demand is that memory required by the operators during the processing (i.e., averaging, concatenation) phase which lasts until all the input files have been processed. The operators have the following memory requirements: ncrcat requires MS <= VR. ncecat requires MS <= VT. ncra requires MS = 2FR + VR. ncea requires MS = 2FT + VT. ncbo requires MS <= 3VT (both input variables and the output variable). ncflint requires MS <= 3VT (both input variables and the output variable). ncpdq requires MS <= 2VT (one input variable and the output variable). ncwa requires MS <= 8VT (see below). Note that only variables that are processed, e.g., averaged, concatenated, or differenced, contribute to MS. Variables which do not appear in the output file (see Subsetting Variables) are never read and contribute nothing to the memory requirements.

ncwa consumes between two and seven times the memory of a variable in order to process it. Peak consumption occurs when storing simultaneously in memory one input variable, one tally array, one input weight, one conformed/working weight, one weight tally, one input mask, one conformed/working mask, and one output variable. When invoked, the weighting and masking features contribute up to three-sevenths and two-sevenths of these requirements apiece. If weights and masks are not specified (i.e., no ‘-w’ or ‘-a’ options) then ncwa requirements drop to MS <= 3VT (one input variable, one tally array, and the output variable).

The above memory requirements must be multiplied by the number of threads thr_nbr (see OpenMP Threading). If this causes problems then reduce (with ‘-t thr_nbr’) the number of threads.


Previous: Single and Multi-file Operators, Up: Memory Requirements

2.9.2 Memory for ncap2

ncap2 has unique memory requirements due its ability to process arbitrarily long scripts of any complexity. All scripts acceptable to ncap2 are ultimately processed as a sequence of binary or unary operations. ncap2 requires MS <= 2VT under most conditions. An exception to this is when left hand casting (see Left hand casting) is used to stretch the size of derived variables beyond the size of any input variables. Let VC be the memory required to store the largest variable defined by left hand casting. In this case, MS <= 2VC.

ncap2 scripts are complete dynamic and may be of arbitrary length. A script that contains many thousands of operations, may uncover a slow memory leak even though each single operation consumes little additional memory. Memory leaks are usually identifiable by their memory usage signature. Leaks cause peak memory usage to increase monotonically with time regardless of script complexity. Slow leaks are very difficult to find. Sometimes a malloc() (or new[]) failure is the only noticeable clue to their existance. If you have good reasons to believe that a memory allocation failure is ultimately due to an NCO memory leak (rather than inadequate RAM on your system), then we would be very interested in receiving a detailed bug report.


Previous: Memory Requirements, Up: Strategies

2.10 Performance Limitations

  1. No data buffering is performed during nc_get_var and nc_put_var operations. Hyperslabs too large too hold in core memory will suffer substantial performance penalties because of this.
  2. Since coordinate variables are assumed to be monotonic, the search for bracketing the user-specified limits should employ a quicker algorithm, like bisection, than the two-sided incremental search currently implemented.
  3. C_format, FORTRAN_format, signedness, scale_format and add_offset attributes are ignored by ncks when printing variables to screen.
  4. In the late 1990s it was discovered that some random access operations on large files on certain architectures (e.g., UNICOS) were much slower with NCO than with similar operations performed using languages that bypass the netCDF interface (e.g., Yorick). This may be a penalty of unnecessary byte-swapping in the netCDF interface. It is unclear whether such problems exist in present day (2007) netCDF/NCO environments.


Next: , Previous: Strategies, Up: Top

3 NCO Features

Many features have been implemented in more than one operator and are described here for brevity. The description of each feature is preceded by a box listing the operators for which the feature is implemented. Command line switches for a given feature are consistent across all operators wherever possible. If no “key switches” are listed for a feature, then that particular feature is automatic and cannot be controlled by the user.


Next: , Previous: Common features, Up: Common features

3.1 Internationalization

Availability: All operators
NCO support for internationalization of textual input and output (e.g., Warning messages) is nascent. We hope to produce foreign language string catalogues in 2004.


Next: , Previous: Internationalization, Up: Common features

3.2 Metadata Optimization

Availability: ncatted, ncks, ncrename
Short options: None
Long options: ‘--hdr_pad’, ‘--header_pad
NCO supports padding headers to improve the speed of future metadata operations. Use the ‘--hdr_pad’ and ‘--header_pad’ switches to request that hdr_pad bytes be inserted into the metadata section of the output file. Future metadata expansions will not incur the performance penalty of copying the entire output file unless the expansion exceeds the amount of header padding exceeded. This can be beneficial when it is known that some metadata will be added at a future date.

This optimization exploits the netCDF library nc__enddef() function, which behaves differently with different versions of netCDF. It will improve speed of future metadata expansion with CLASSIC and 64bit netCDF files, but not necessarily with NETCDF4 files, i.e., those created by the netCDF interface to the HDF5 library (see Selecting Output File Format).


Next: , Previous: Metadata Optimization, Up: Common features

3.3 OpenMP Threading

Availability: ncbo, ncea, ncecat, ncflint, ncpdq, ncra, ncrcat, ncwa
Short options: ‘-t
Long options: ‘--thr_nbr’, ‘--threads’, ‘--omp_num_threads
NCO supports shared memory parallelism (SMP) when compiled with an OpenMP-enabled compiler. Threads requests and allocations occur in two stages. First, users may request a specific number of threads thr_nbr with the ‘-t’ switch (or its long option equivalents, ‘--thr_nbr’, ‘--threads’, and ‘--omp_num_threads’). If not user-specified, OpenMP obtains thr_nbr from the OMP_NUM_THREADS environment variable, if present, or from the OS, if not.

NCO may modify thr_nbr according to its own internal settings before it requests any threads from the system. Certain operators contain hard-code limits to the number of threads they request. We base these limits on our experience and common sense, and to reduce potentially wasteful system usage by inexperienced users. For example, ncrcat is extremely I/O-intensive so we restrict thr_nbr <= 2 for ncrcat. This is based on the notion that the best performance that can be expected from an operator which does no arithmetic is to have one thread reading and one thread writing simultaneously. In the future (perhaps with netCDF4), we hope to demonstrate significant threading improvements with operators like ncrcat by performing multiple simultaneous writes.

Compute-intensive operators (ncwa and ncpdq) are expected to benefit the most from threading. The greatest increases in throughput due to threading will occur on large dataset where each thread performs millions or more floating point operations. Otherwise, the system overhead of setting up threads may outweigh the theoretical speed enhancements due to SMP parallelism. However, we have not yet demonstrated that the SMP parallelism scales well beyone four threads for these operators. Hence we restrict thr_nbr <= 4 for all operators. We encourage users to play with these limits (edit file nco_omp.c) and send us their feedback.

Once the initial thr_nbr has been modified for any operator-specific limits, NCO requests the system to allocate a team of thr_nbr threads for the body of the code. The operating system then decides how many threads to allocate based on this request. Users may keep track of this information by running the operator with dbg_lvl > 0.

By default, operators with thread attach one global attribute to any file they create or modify. The nco_openmp_thread_number global attribute contains the number of threads the operator used to process the input files. This information helps to verify that the answers with threaded and non-threaded operators are equal to within machine precision. This information is also useful for benchmarking.


Next: , Previous: OpenMP Threading, Up: Common features

3.4 Command Line Options

Availability: All operators
NCO achieves flexibility by using command line options. These options are implemented in all traditional UNIX commands as single letter switches, e.g., ‘ls -l’. For many years NCO used only single letter option names. In late 2002, we implemented GNU/POSIX extended or long option names for all options. This was done in a backward compatible way such that the full functionality of NCO is still available through the familiar single letter options. In the future, however, some features of NCO may require the use of long options, simply because we have nearly run out of single letter options. More importantly, mnemonics for single letter options are often non-intuitive so that long options provide a more natural way of expressing intent.

Extended options, also called long options, are implemented using the system-supplied getopt.h header file, if possible. This provides the getopt_long function to NCO 14.

The syntax of short options (single letter options) is -key value (dash-key-space-value). Here, key is the single letter option name, e.g., ‘-D 2’.

The syntax of long options (multi-letter options) is --long_name value (dash-dash-key-space-value), e.g., ‘--dbg_lvl 2’ or --long_name=value (dash-dash-key-equal-value), e.g., ‘--dbg_lvl=2’. Thus the following are all valid for the ‘-D’ (short version) or ‘--dbg_lvl’ (long version) command line option.

     ncks -D 3 in.nc        # Short option
     ncks --dbg_lvl=3 in.nc # Long option, preferred form
     ncks --dbg_lvl 3 in.nc # Long option, alternate form

The last example is preferred for two reasons. First, ‘--dbg_lvl’ is more specific and less ambiguous than ‘-D’. The long option form makes scripts more self documenting and less error prone. Often long options are named after the source code variable whose value they carry. Second, the equals sign = joins the key (i.e., long_name) to the value in an uninterruptible text block. Experience shows that users are less likely to mis-parse commands when restricted to this form.

GNU implements a superset of the POSIX standard which allows any unambiguous truncation of a valid option to be used.

     ncks -D 3 in.nc        # Short option
     ncks --dbg_lvl=3 in.nc # Long option, full form
     ncks --dbg=3 in.nc     # Long option, unambiguous truncation
     ncks --db=3 in.nc      # Long option, unambiguous truncation
     ncks --d=3 in.nc       # Long option, ambiguous truncation

The first four examples are equivalent and will work as expected. The final example will exit with an error since ncks cannot disambiguate whether ‘--d’ is intended as a truncation of ‘--dbg_lvl’, of ‘--dimension’, or of some other long option.

NCO provides many long options for common switches. For example, the debugging level may be set in all operators with any of the switches ‘-D’, ‘--debug-level’, or ‘--dbg_lvl’. This flexibility allows users to choose their favorite mnemonic. For some, it will be ‘--debug’ (an unambiguous truncation of ‘--debug-level’, and other will prefer ‘--dbg’. Interactive users usually prefer the minimal amount of typing, i.e., ‘-D’. We recommend that scripts which are re-usable employ some form of the long options for future maintainability.

This manual generally uses the short option syntax. This is for historical reasons and to conserve space. The remainder of this manual specifies the full long_name of each option. Users are expected to pick the unambiguous truncation of each option name that most suits their taste.


Next: , Previous: Command Line Options, Up: Common features

3.5 Specifying Input Files

Availability (-n): ncea, ncecat, ncra, ncrcat
Availability (-p): All operators
Short options: ‘-n’, ‘-p
Long options: ‘--nintap’, ‘--pth’, ‘--path
It is important that users be able to specify multiple input files without typing every filename in full, often a tedious task even by graduate student standards. There are four different ways of specifying input files to NCO: explicitly typing each, using UNIX shell wildcards, and using the NCO-n’ and ‘-p’ switches (or their long option equivalents, ‘--nintap’ or ‘--pth’ and ‘--path’, respectively). To illustrate these methods, consider the simple problem of using ncra to average five input files, 85.nc, 86.nc, ... 89.nc, and store the results in 8589.nc. Here are the four methods in order. They produce identical answers.

     ncra 85.nc 86.nc 87.nc 88.nc 89.nc 8589.nc
     ncra 8[56789].nc 8589.nc
     ncra -p input-path 85.nc 86.nc 87.nc 88.nc 89.nc 8589.nc
     ncra -n 5,2,1 85.nc 8589.nc

The first method (explicitly specifying all filenames) works by brute force. The second method relies on the operating system shell to glob (expand) the regular expression 8[56789].nc. The shell passes valid filenames which match the expansion to ncra. The third method uses the ‘-p input-path’ argument to specify the directory where all the input files reside. NCO prepends input-path (e.g., /data/usrname/model) to all input-files (but not to output-file). Thus, using ‘-p’, the path to any number of input files need only be specified once. Note input-path need not end with ‘/’; the ‘/’ is automatically generated if necessary.

The last method passes (with ‘-n’) syntax concisely describing the entire set of filenames 15. This option is only available with the multi-file operators: ncra, ncrcat, ncea, and ncecat. By definition, multi-file operators are able to process an arbitrary number of input-files. This option is very useful for abbreviating lists of filenames representable as alphanumeric_prefix+numeric_suffix+.+filetype where alphanumeric_prefix is a string of arbitrary length and composition, numeric_suffix is a fixed width field of digits, and filetype is a standard filetype indicator. For example, in the file ccm3_h0001.nc, we have alphanumeric_prefix = ccm3_h, numeric_suffix = 0001, and filetype = nc.

NCO is able to decode lists of such filenames encoded using the ‘-n’ option. The simpler (3-argument) ‘-n’ usage takes the form -n file_number,digit_number,numeric_increment where file_number is the number of files, digit_number is the fixed number of numeric digits comprising the numeric_suffix, and numeric_increment is the constant, integer-valued difference between the numeric_suffix of any two consecutive files. The value of alphanumeric_prefix is taken from the input file, which serves as a template for decoding the filenames. In the example above, the encoding -n 5,2,1 along with the input file name 85.nc tells NCO to construct five (5) filenames identical to the template 85.nc except that the final two (2) digits are a numeric suffix to be incremented by one (1) for each successive file. Currently filetype may be either be empty, nc, cdf, hdf, or hd5. If present, these filetype suffixes (and the preceding .) are ignored by NCO as it uses the ‘-n’ arguments to locate, evaluate, and compute the numeric_suffix component of filenames.

Recently the ‘-n’ option has been extended to allow convenient specification of filenames with “circular” characteristics. This means it is now possible for NCO to automatically generate filenames which increment regularly until a specified maximum value, and then wrap back to begin again at a specified minimum value. The corresponding ‘-n’ usage becomes more complex, taking one or two additional arguments for a total of four or five, respectively: -n file_number,digit_number,numeric_increment[,numeric_max[,numeric_min]] where numeric_max, if present, is the maximum integer-value of numeric_suffix and numeric_min, if present, is the minimum integer-value of numeric_suffix. Consider, for example, the problem of specifying non-consecutive input files where the filename suffixes end with the month index. In climate modeling it is common to create summertime and wintertime averages which contain the averages of the months June–July–August, and December–January–February, respectively:

     ncra -n 3,2,1 85_06.nc 85_0608.nc
     ncra -n 3,2,1,12 85_12.nc 85_1202.nc
     ncra -n 3,2,1,12,1 85_12.nc 85_1202.nc

The first example shows that three arguments to the ‘-n’ option suffice to specify consecutive months (06, 07, 08) which do not “wrap” back to a minimum value. The second example shows how to use the optional fourth and fifth elements of the ‘-n’ option to specify a wrap value to NCO. The fourth argument to ‘-n’, if present, specifies the maximum integer value of numeric_suffix. In this case the maximum value is 12, and will be formatted as 12 in the filename string. The fifth argument to ‘-n’, if present, specifies the minimum integer value of numeric_suffix. The default minimum filename suffix is 1, which is formatted as 01 in this case. Thus the second and third examples have the same effect, that is, they automatically generate, in order, the filenames 85_12.nc, 85_01.nc, and 85_02.nc as input to NCO.


Next: , Previous: Specifying Input Files, Up: Common features

3.6 Specifying Output Files

Availability: All operators
Short options: ‘-o
Long options: ‘--fl_out’, ‘--output
NCO commands produce no more than one output file, fl_out. Traditionally, users specify fl_out as the final argument to the operator, following all input file names. This is the positional argument method of specifying input and ouput file names. The positional argument method works well in most applications. NCO also supports specifying fl_out using the command line switch argument method, ‘-o fl_out’.

Specifying fl_out with a switch, rather than as a positional argument, allows fl_out to precede input files in the argument list. This is particularly useful with multi-file operators for three reasons. Multi-file operators may be invoked with hundreds (or more) filenames. Visual or automatic location of fl_out in such a list is difficult when the only syntactic distinction between input and output files is their position. Second, specification of a long list of input files may be difficult (see Large Numbers of Files). Making the input file list the final argument to an operator facilitates using xargs for this purpose. Some alternatives to xargs are very ugly and undesirable. Finally, many users are more comfortable specifying output files with ‘-o fl_out’ near the beginning of an argument list. Compilers and linkers are usually invoked this way.


Next: , Previous: Specifying Output Files, Up: Common features

3.7 Accessing Remote Files

Availability: All operators
Short options: ‘-p’, ‘-l
Long options: ‘--pth’, ‘--path’, ‘--lcl’, ‘--local
All NCO operators can retrieve files from remote sites as well as from the local file system. A remote site can be an anonymous FTP server, a machine on which the user has rcp, scp, or sftp privileges, or NCAR's Mass Storage System (MSS), or an OPeNDAP server. Examples of each are given below, following a brief description of the particular access protocol.

To access a file via an anonymous FTP server, supply the remote file's URL. FTP is an intrinsically insecure protocol because it transfers passwords in plain text format. Users should access sites using anonymous FTP when possible. Some FTP servers require a login/password combination for a valid user account. NCO allows these transactions so long as the required information is stored in the .netrc file. Usually this information is the remote machine name, login, and password, in plain text, separated by those very keywords, e.g.,

     machine dust.ess.uci.edu login zender password bushlied

Eschew using valuable passwords for FTP transactions, since .netrc passwords are potentially exposed to eavesdropping software 16.

SFTP, i.e., secure FTP, uses SSH-based security protocols that solve the security issues associated with plain FTP. NCO supports SFTP protocol access to files specified with a homebrew syntax of the form

     sftp://machine.domain.tld:/path/to/fi