Known Problems through 2012 (version 4.2.3)
Older Generic Run-time Problems:
- MM3 slowdown:
A longstanding “feature” of netCDF3 was identified in
March, 2012, and is now known by the tag MM3.
The MM3 issue can lead to unusually slow performance.
The problem is triggered by an aggregate pattern of file access so the
workaround must be implemented in the application software (e.g., NCO)
rather than in the netCDF library itself.
The name MM3 fits because the problem is normally encountered on
Multi-record Multi-variable netCDF3 files.
And we call our “solution” the MM3-workaround.
If you encounter unusually slow NCO performance while using NCO to
analyze MM3 files on a large blocksize filesystem,
chances are you are encountering an MM3-induced slowdown.
NCO release 4.1.0 implements the MM3-workaround for ncks.
It speeds-up common ncks sub-setting on NCAR's GLADE by 10-50x.
MM3-induced slowdowns are present in other NCO operators and we are
prioritizing our MM3-patches to those encountered most often.
Thanks to Gary Strand for reporting this problem, and to Russ Rew for
creating the workaround algorithm, which is also now in nccopy.
- NOFILL bug:
All netCDF versions prior to 4.1.3 may create corrupt netCDF3 files
when linked to any version of NCO except 4.0.8.
The solution is to install netCDF version 4.1.3 or later.
The corruption occurs silently (without warning or error messages).
The problem has been seen "in the wild" only on filesystems with
large block sizes (e.g., Lustre), although it may be more widespread.
It is caused by a netCDF bug that NCO triggers by invoking
NOFILL mode for faster writes. Hence it is called the NOFILL bug.
The bug is hard to trigger, it depends on a rare interaction of
filesystem block-size, hyperslab size, and order-of-variable writing.
The bug exists in all versions of netCDF through 4.1.2.
If you have a large block filesystem and cannot upgrade your netCDF
library, then use NCO version 4.0.8, which disables NOFILL mode (and
thus writes files more slowly).
NCO 4.0.8 and will workaround the NOFILL bug on all versions of netCDF
(i.e., 4.1.2 and earlier).
Hence NCO 4.0.8 will always correctly write netCDF3 files.
Other temporary workarounds include creating only netCDF4 files
(e.g., ncks -4 ...) instead of netCDF3 files.
The NOFILL patch included in NCO 4.0.8 was subsequently removed
in NCO 4.0.9, which assumes that netCDF 4.1.3 or later is installed.
- Degenerate hyperslabbing bug:
Versions ???—4.0.6 could return incorrect hyperslabs when
user-specified hyperslabs did not include at least one point.
In such cases, instead of returning no data, hyperslabs could return all data.
To determine whether your NCO is affected by this bug, run these commands:
ncks -O -v lat -d lat,20.,20.001 ~/nco/data/in.nc ~/foo.nc;ncks -H ~/foo.nc
If the returned hyperslab contains any data, then your NCO is buggy
(because that hyperslab should be empty).
This can lead to incorrect answers for hyperslabs that should be empty.
Analogous problems would occur with empty auxiliary coordinate bounding boxes.
Although most users do not specify empty hyperslabs, we urge all users
to upgrade to NCO 4.0.7+ just to be safe.
- Threading problems with MSA:
NCO version 3.9.5 has a nasty bug that causes threaded arithmetic
operators, e.g., nces to produce incorrect results under some
conditions.
The problem may occur whenever OpenMP is enabled and the operators
run on a multi-core CPU with more than one thread.
These incorrect answers, if generated, are relatively easy to notice.
The number of threads used to generate a file is, by default, recorded
in the global attribute nco_openmp_thread_number which may
be examined with ncks -M foo.nc | grep nco_openmp_thread_number.
The only action that will correct a file that you think (or know)
contains corrupted data because of this NCO bug is to re-process the
file with a non-buggy NCO version.
Version 3.9.5 is buggy and should be upgraded ASAP.
Be careful with data processed using this NCO version on multi-core CPUs.
The (one-line!) patch to fix this bug in 3.9.5 is
here.
- Index-based hyperslab problems:
NCO versions 2.7.3—2.8.3 have a nasty bug that causes
index-based hyperslabs, e.g., -d lat,1, to
behave like value-based hyperslabs, e.g., -d lat,1.0 under
some conditions.
Unfortunately, the incorrect answers generated may be hard to notice!
This problem was most often enountered by users trying to assemble
monthly averages using the stride feature of ncrcat.
One common symptom is that the time-offset of the output file is
incorrect.
Versions 2.7.3—2.8.3 are buggy and should be upgraded ASAP.
Re-do any data-processing that used index-based hyperslabbing with
these versions of NCO.
Older Operator-specific Run-time Problems:
- ncks bug with auxiliary coordinates:
Versions 4.2.x–4.3.1 of ncks did not correctly
support auxiliary coordinates (specified with -X).
Auxiliary coordinates continued to work with the other hyperslabbing
NCO operators. Auxiliary coordinates once again work in all
hyperslabbing operators, including on netCDF4 group files in operators
that support them.
Fixed in version 4.3.2.
- ncatted bug on implicit attribute names:
Versions 4.2.x–4.3.0 of ncatted could segfault when
processing attributes specified implicitly (i.e., by leaving the
attribute field blank in the -a specification.
Fixed in version 4.3.1.
- ncbo bug handling certain special variables:
Version 4.3.0 of ncbo inadvertently always turns off
certain exceptions
to variable list processing.
This may cause some grid-related variables (e.g., ntrm and nbdate)
and some non-grid variables (e.g., ORO and gw) to be
arithmetically processed (e.g., subtracted) even when that makes no
sense in most climate model datasets.
Fixed in version 4.3.1.
- ncks bug copying metadata:
Version 4.2.6 of ncks does not copy variable metadata by default.
Thus output files appear stripped of metadata.
One can work around this problem in 4.2.6 by specifying the -m option.
Otherwise an upgrade is recommended.
Fixed in version 4.3.0.
- ncks bug subsetting variables:
Version 4.2.4 of ncks sometimes dumps core
when subsetting variables with -v var.
Fixed in version 4.2.5.
- ncks bug with altering record dimensions:
Version 4.2.4 of ncks ignored both the
--mk_rec_dmn and the --fix_rec_dmn switches.
It exited successfully without altering the record variable.
Fixed in version 4.2.5.
- nces bug with non-record files:
Versions 4.2.1—4.2.3 of nces incorrectly referenced
the record variable on files which do not contain it.
This caused a segmentation violation and core dump.
- ncra bug when last file(s) is/are superfluous:
Versions 4.2.1—4.2.3 of ncra incorrectly skipped
writing the results of the final normalization when trailing files
were superfluous (not used).
In the most common case, all values are zeros in the output file.
Upgrade if you call ncra with trailing superfluous files.
- ncecat bug when files generated with -n:
Version 4.2.2 of ncecat could incorrectly skip the first
input file in the default mode (RECORD_AGGREGATE) when
the -n NINTAP switch is used to automate filename generation.
Upgrade if you use ncecat -n.
- ncra bug handling CF coordinates attributes
that contain the name of the record coordinate:
Versions 4.0.3—4.0.4 of ncra incorrectly treat the
record variable (usually time) as a fixed variable if it
is specified in the coordinates attribute of any variable in
a file processed with CCM/CCSM/CF metadata conventions.
This bug caused core dumps, and even weirder behavior like
creating imaginary time slices in the ouput.
Upgrade recommended if you work with NCAR CCSM/CESM model output.
One workaround that does not require NCO upgrades is to remove the
record coordinate name (usually time) from
the coordinates attribute of all variables in CF-compliant
files before processing the file with ncra.
- ncra bug averaging YYYYMMDD-format date
variables in CCSM/CF-compliant files:
Versions ???—4.0.5 of ncra contain a bug which
produces an incorrect average (usually zero) of the date
variable which many CCSM/CF-compliant files use to track model dates
in the human-readable YYYYMMDD-format.
Averaging YYYYMMDD-format integers is intrinsically difficult, since
such dates have calendar assumptions built-in.
NCO attempts this in CCSM/CF-compliant files by using the
nbdate (beginning date) and time (days
since nbdate) variables to find the average date,
converting that to YYYYMMDD, and writing that as the average value
of date.
- ncks bug hyperslabbinging fixed netCDF4 dimensions:
Versions 4.0.3—4.0.4 of ncks contain a bug which
triggers a core-dump when hyperslabbing (along a non-record
dimension) a netCDF4-format input file into a netCDF4-format output
file, e.g., ncks -d 0,1,lat in4.nc out4.nc.
Three workarounds that do not require NCO upgrades (or downgrades) are
to explicitly specify chunking with, e.g.,
ncks --cnk_plc=all -d 0,1,lat in4.nc out4.nc, or, to use
nces instead of ncks for hyperslabbing, e.g.,
nces -d 0,1,lat in4.nc out4.nc (nces does a no-op
when there is only one input file), or to write to a netCDF3 file,
ncks -3 -d 0,1,lat in4.nc out3.nc.
- Core dump with ncks:
Printing variables to screen with ncks can trigger a segfault
in NCO 3.9.9—4.0.3.
Users may upgrade, downgrade, or apply this one-line patch to 3.9.9 sources:
Remove this line
“*cnk_sz=(size_t)NULL;”
—near line 751 of nco/src/nco/nco_netcdf.c—
should fix the problem.
The problem in later NCO versions is due to a different bug and this
patch will not work.
- ncrename erroneous error exit:
Versions 4.0.1—4.0.3 of ncrename contain a bug where
commands like ncrename -a .old_nm,new_nm in.nc out.nc
would, if old_nm did not exist, write the correct file and
then exit with an error message although no error had occurred.
The files written were fine, and the error message can be safely
ignored. This was due to not clearing an extraneous return code.
- ncbo segmentation fault:
ncbo versions 4.0.0—4.0.2 incorrectly refreshed
internal metadata, leading to segmentation faults and core dumps with
some exacting compilers, notably xlC on AIX.
- ncra segmentation fault:
ncra versions 4.0.0—4.0.1 mishandled some CF-compliant
dates, leading to segmentation faults and core dumps.
- Arithmetic problems with ncap division, modulo, and exponentiation:
ncap versions < 3.0.1 incorrectly exponentiate
variables to variable powers (V^V).
We recommend that all ncap users upgrade.
ncap versions up to 2.9.1 incorrectly handle division,
modulo, and exponentiation operations of the form S/V,
S%V, and S^V where first operand (S) is
scalar (i.e., either typed directly in the ncap script or
converted from an attribute) and the second operand (V) is
a full variable (i.e., stored in a file or computed by ncap).
Instead of the requested quantity, ncap returned
V/S, V%S, and V^S.
In other words ncap treated some non-commutative operations
as commutative. This is now fixed.
The
V/V, V%V, V^V,
V/S, V%S, V^S,
S/S, S%S, and S^S operations were never
affected.
We recommend that all ncap users upgrade.
- Incorrect ncbo output for packed input:
ncbo versions ???—3.2.0 incorrectly write differences
of packed input. This only affects packed variables.
- Problems with ncflint and missing_values:
The algorithm ncflint used to perform interpolation in
versions up to 2.9.4 was not commutative.
It returned the weighted valid datum when the other datum was
missing_value, or it returned missing_value,
depending on the order the input files were specified.
As of version 2.9.5, ncflint always returns
missing_value when either input datum is
missing_value.
Possible future implementations are discussed
here.
- Problems with ncra and nces when missing_value = 0.0:
The algorithm ncra and nces used to perform
arithmetic in versions up to 2.9.2 breaks if missing_value
is 0.0.
Why, you ask?
Running average (or total, etc.) algorithms must initialize the answer
to 0.0.
This is done since the sum accumulates in place as ncra and
nces proceeds across records and files.
(Normalizing this accumulation by the total number of records is the
last step).
The old algorithm compared both the current running average and the
new record to the missing_value.
If either comparison matched, then nothing accumulated for that
record.
This zero-initialization led to a state where it was impossible
to ever recognize valid data.
As a result nothing accumulated and the answer was always zero.
The record and ensemble averages would also fail (in a non-obvious)
way whenever an intermediate sum equalled missing_value.
The chances of the latter event ever happening are exceedingly
remote.
The new algorithm compares only the new record to the
missing_value.
This fixes both problems and is faster, too.
- Packing problems with ncwa:
NCO versions ???—2.9.0 have a bug that causes ncwa
to fail (produce garbage answers) when processing packed
NC_FLOAT data. Version 2.9.1 fixes this problem.
This problem may have been noticed most by
OPeNDAP users since many
netCDF climate datasets served by
OPeNDAP are packed
NC_FLOATs.
Upgrade to 2.9.1 if you use ncwa on packed data.
- Packing problems with ncap:
NCO versions 2.8.4—2.8.6 have a bug that causes the ncap
intrinsic packing function pack() to fail.
Version 2.8.7 fixes this problem.
Older Platform-specific Run-time Problems:
- Float-valued intrinsic arithmetic functions in ncap on AIX:
ncap versions through 4.0.4 have a bug that causes all float-valued
intrinsic math functions to fail under AIX.
Float-valued math functions are the ISO C99 functions, e.g.,
cosf(), fabsf(), logf().
The user does not invoke these functions directly—
the user always specifies the generic function name, e.g.,
cos(), abs(), log().
NCO automatically calls the native single precision (i.e.,
float-valued) math functions when the generic function argument
is a native float (e.g., naked constants like 1.0f or
variables stored as NC_FLOAT).
Double precision arguments cause NCO to invoke the standard
(double-valued) form of the generic function, e.g., cos(),
fabs(), log().