This is nco.info, produced by makeinfo version 7.2 from nco.texi. INFO-DIR-SECTION netCDF START-INFO-DIR-ENTRY * NCO:: User Guide for the netCDF Operator suite END-INFO-DIR-ENTRY This file documents NCO, a collection of utilities to manipulate and analyze netCDF files. Copyright © 1995-2025 Charlie Zender This is the first edition of the ‘NCO User Guide’, and is consistent with version 2 of ‘texinfo.tex’. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.3 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. The license is available online at The original author of this software, Charlie Zender, wants to improve it with the help of your suggestions, improvements, bug-reports, and patches. Charlie Zender (yes, my surname is zender) 3200 Croul Hall Department of Earth System Science University of California, Irvine Irvine, CA 92697-3100 File: nco.info, Node: Climatology and Bounds Support, Next: UDUnits Support, Prev: Regridding, Up: Shared features 3.27 Climatology and Bounds Support =================================== Availability: ‘nces’, ‘ncra’, ‘ncrcat’ Short options: None Long options: ‘--cb=YR_SRT,YR_END,MTH_SRT,MTH_END,TPD’ ‘--clm_bnd=YR_SRT,YR_END,MTH_SRT,MTH_END,TPD’ ‘--clm_nfo=YR_SRT,YR_END,MTH_SRT,MTH_END,TPD’ ‘--climatology_information=YR_SRT,YR_END,MTH_SRT,MTH_END,TPD’ (NB: This section describes support for generating CF-compliant bounds variables and attributes, i.e., metadata. For instructions on constructing climatologies themselves, see the ‘ncclimo’ documentation). As of NCO version 4.9.4 (September, 2020) ‘ncra’ introduces the ‘--clm_bnd’ option, a powerful method to fully implement the CF ‘bounds’, ‘climatology’, and ‘cell_methods’ attributes defined by *note CF Conventions::. The new method updates the previous ‘--cb’ and ‘--c2b’ methods introduced in version 4.6.0 which only worked for monthly mean data. The newer ‘--cb’ method also works for climatological diurnally resolved input, and for datasets that contain more than more than one record. This option takes as argument a comma-separated list of five relevant input parameters: ‘--cb=YR_SRT,YR_END,MTH_SRT,MTH_END,TPD’, where YR_SRT is the climatology start-year, YR_END is the climatology end-year, MTH_SRT is the climatology start-month (in ‘[1..12]’ format), MTH_END is the climatology end-month (in ‘[1..12]’ format), and TPD is the number of timestpes per day (with the special exception that TPD=0 indicates monthly data, not diurnally-resolved data). For example, a seasonal summer climatology created from monthly mean input data spanning June, 2000 to August, 2020 should call ‘ncra’ with ‘--clm_bnd=2000,2020,6,8,0’, whereas a diurnally resolved climatology of the same period with 6-hourly input data resolution would use ‘--clm_bnd=2000,2020,6,8,4’. The ‘ncclimo’ command internally uses ‘--clm_bnd’ extensively. # Average monthly means into a climatological month ncra --cb=2014,2016,1,1,0 2014_01.nc 2015_01.nc 2016_01.nc clm_JAN.nc # Average seasonally contiguous climatological monthly means into NH winter ncra --cb=2013,2016,12,2,0 -w 31,31,28 DEC.nc JAN.nc FEB.nc DJF.nc # Average seasonally discontiguous climatological means into NH winter ncra --cb=2014,2016,1,12,0 -w 31,28,31 JAN.nc FEB.nc DEC.nc JFD.nc # Reduce four climatological seasons to make an annual climatology ncra --cb=2014,2016,1,12,0 -w 92,92,91,90 MAM.nc JJA.nc SON.nc DJF.nc ANN.nc # Reduce twelve monthly climatologies to make into an annual climatology ncra --cb=2014,2016,1,12,0 -w 31,28,31,30,31,30,31,31,30,31,30,31 clm_??.nc ANN.nc In the fourth and fifth examples, NCO uses the number of input files (3 and 4, respectively) to discriminate between seasonal and annual climatologies since the other arguments to ‘--cb’ are identical. When using this option, NCO expects each output file to contain ‘max(1,TPD)’ records. ‘nces’ and ‘ncra’ both accept the ‘--cb’ option. While ‘ncra’ almost always reduces the input dataset over the record dimension, ‘nces’ never does. This makes it easy to use ‘nces’ to combine and create climatologies of diurnally resolved input files. # Average diurnally resolved monthly means into a climatology nces --cb=2014,2016,1,1,8 2014_01.nc 2015_01.nc 2016_01.nc clm_JAN.nc # Average seasonally contiguous diurnally resolved means into a season nces --cb=2013,2016,12,2,8 -w 31,31,28 DEC.nc JAN.nc FEB.nc DJF.nc # Average seasonally discontiguous diurnally resolved means into a season nces --cb=2014,2016,1,12,8 -w 31,28,31 JAN.nc FEB.nc DEC.nc JFD.nc # Reduce four diurnally resolved seasons to make an annual climatology nces --cb=2014,2016,1,12,8 -w 92,92,91,90 MAM.nc JJA.nc SON.nc DJF.nc ANN.nc # Reduce twelve diurnally resolved months to make into an annual climatology nces --cb=2014,2016,1,12,8 -w 31,28,31,30,31,30,31,31,30,31,30,31 clm_??.nc ANN.nc Every input in the above set of examples must have eight records, and that number will appear in the output as well. File: nco.info, Node: UDUnits Support, Next: Rebasing Time Coordinate, Prev: Climatology and Bounds Support, Up: Shared features 3.28 UDUnits Support ==================== Availability: ‘ncbo’, ‘nces’, ‘ncecat’, ‘ncflint’, ‘ncks’, ‘ncpdq’, ‘ncra’, ‘ncrcat’, ‘ncwa’ Short options: ‘-d DIM,[MIN][,[MAX][,[STRIDE]]]’ Long options: ‘--dimension DIM,[MIN][,[MAX][,[STRIDE]]]’, ‘--dmn DIM,[MIN][,[MAX][,[STRIDE]]]’ There is more than one way to hyperskin a cat. The UDUnits (http://www.unidata.ucar.edu/software/udunits) package provides a library which, if present, NCO uses to translate user-specified physical dimensions into the physical dimensions of data stored in netCDF files. Unidata provides UDUnits under the same terms as netCDF, so sites should install both. Compiling NCO with UDUnits support is currently optional but may become required in a future version of NCO. Two examples suffice to demonstrate the power and convenience of UDUnits support. First, consider extraction of a variable containing non-record coordinates with physical dimensions stored in MKS units. In the following example, the user extracts all wavelengths in the visible portion of the spectrum in terms of the units very frequently used in visible spectroscopy, microns: % ncks --trd -C -H -v wvl -d wvl,"0.4 micron","0.7 micron" in.nc wvl[0]=5e-07 meter The hyperslab returns the correct values because the WVL variable is stored on disk with a length dimension that UDUnits recognizes in the ‘units’ attribute. The automagical algorithm that implements this functionality is worth describing since understanding it helps one avoid some potential pitfalls. First, the user includes the physical units of the hyperslab dimensions she supplies, separated by a simple space from the numerical values of the hyperslab limits. She encloses each coordinate specifications in quotes so that the shell does not break the _value-space-unit_ string into separate arguments before passing them to NCO. Double quotes (‘"foo"’) or single quotes (‘'foo'’) are equally valid for this purpose. Second, NCO recognizes that units translation is requested because each hyperslab argument contains text characters and non-initial spaces. Third, NCO determines whether the WVL is dimensioned with a coordinate variable that has a ‘units’ attribute. In this case, WVL itself is a coordinate variable. The value of its ‘units’ attribute is ‘meter’. Thus WVL passes this test so UDUnits conversion is attempted. If the coordinate associated with the variable does not contain a ‘units’ attribute, then NCO aborts. Fourth, NCO passes the specified and desired dimension strings (microns are specified by the user, meters are required by NCO) to the UDUnits library. Fifth, the UDUnits library that these dimension are commensurate and it returns the appropriate linear scaling factors to convert from microns to meters to NCO. If the units are incommensurate (i.e., not expressible in the same fundamental MKS units), or are not listed in the UDUnits database, then NCO aborts since it cannot determine the user's intent. Finally, NCO uses the scaling information to convert the user-specified hyperslab limits into the same physical dimensions as those of the corresponding cooridinate variable on disk. At this point, NCO can perform a coordinate hyperslab using the same algorithm as if the user had specified the hyperslab without requesting units conversion. The translation and dimensional interpretation of time coordinates shows a more powerful, and probably more common, UDUnits application. In this example, the user prints all data between 4 PM and 7 PM on December 8, 1999, from a variable whose time dimension is hours since the year 1900: % ncks -u -H -C -v time_udunits -d time_udunits,"1999-12-08 \ 16:00:0.0","1999-12-08 19:00:0.0" in.nc time_udunits[1]=876018 hours since 1900-01-01 00:00:0.0 Here, the user invokes the stride (*note Stride::) capability to obtain every other timeslice. This is possible because the UDUnits feature is additive, not exclusive--it works in conjunction with all other hyperslabbing (*note Hyperslabs::) options and in all operators which support hyperslabbing. The following example shows how one might average data in a time period spread across multiple input files ncra -d time,"1939-09-09 12:00:0.0","1945-05-08 00:00:0.0" \ in1.nc in2.nc in3.nc out.nc Note that there is no excess whitespace before or after the individual elements of the ‘-d’ argument. This is important since, as far as the shell knows, ‘-d’ takes only _one_ command-line argument. Parsing this argument into its component ‘DIM,[MIN][,[MAX][,[STRIDE]]]’ elements (*note Hyperslabs::) is the job of NCO. When unquoted whitespace is present between these elements, the shell passes NCO arugment fragments which will not parse as intended. NCO implemented support for the UDUnits2 library with version 3.9.2 (August, 2007). The UDUnits2 (http://www.unidata.ucar.edu/software/udunits/udunits-2/udunits2.html) package supports non-ASCII characters and logarithmic units. We are interested in user-feedback on these features. One aspect that deserves mention is that UDUnits, and thus NCO, supports run-time definition of the location of the relevant UDUnits databases. UDUnits2 (specifically, the function ‘ut_read_xml()’) uses the environment variable ‘UDUNITS2_XML_PATH’, if any, to find its all-important XML database, named ‘udunits2.xml’ by default. If ‘UDUNITS2_XML_PATH’ is undefined, then UDUnits2 looks in the fall-back default initial location that was hardcoded when the UDUnits2 library was built. This location varies depending upon your operating system and UDUnits2 ncompilation settings. If UDUnits2 is correctly linked yet cannot find the XML database in either of these locations, then NCO will report that the UDUnits2 library has failed to initialize. To fix this, export the full location (path+name) of the UDUnits2 XML database file ‘udunits2.xml’ to the shell: # Example UDUnits2 XML database locations: export UDUNITS2_XML_PATH='/opt/homebrew/share/udunits/udunits2.xml' # Homebrew export UDUNITS2_XML_PATH='/opt/local/share/udunits/udunits2.xml' # MacPorts export UDUNITS2_XML_PATH="${HOME}/anaconda/share/udunits/udunits2.xml" # Anaconda One can then invoke (without recompilation) NCO again, and UDUnits2 should work. This run-time flexibility can enable the full functionality of pre-built binaries on machines with libraries in different locations. The UDUnits (http://www.unidata.ucar.edu/software/udunits) package documentation describes the supported formats of time dimensions. Among the metadata conventions that adhere to these formats are the Climate and Forecast (CF) Conventions (http://cf-pcmdi.llnl.gov) and the Cooperative Ocean/Atmosphere Research Data Service (COARDS) Conventions (http://ferret.wrc.noaa.gov/noaa_coop/coop_cdf_profile.html). The following ‘-d arguments’ extract the same data using commonly encountered time dimension formats: -d time,'1918-11-11 00:00:0.0','1939-09-09 00:00:0.0' -d time,'1918-11-11 00:00:0.0','1939-09-09 00:00:0.0' -d time,'1918-11-11T00:00:0.0Z','1939-09-09T00:00:0.0Z' -d time,'1918-11-11','1939-09-09' -d time,'1918-11-11','1939-9-9' All of these formats include at least one dash ‘-’ in a non-leading character position (a dash in a leading character position is a negative sign). NCO assumes that a space, colon, or non-leading dash in a limit string indicates that a UDUnits units conversion is requested. Some date formats like YYYYMMDD that are valid in UDUnits are ambiguous to NCO because it cannot distinguish a purely numerical date (i.e., no dashes or text characters in it) from a coordinate or index value: -d time,1918-11-11 # Interpreted as the date November 11, 1918 -d time,19181111 # Interpreted as time-dimension index 19181111 -d time,19181111. # Interpreted as time-coordinate value 19181111.0 Hence, use the YYYY-MM-DD format rather than YYYYMMDD for dates. As of version 4.0.0 (January, 2010), NCO supports some calendar attributes specified by the CF conventions. *Supported types:* "365_day"/"noleap", "360_day", "gregorian", "standard" *Unsupported types:* "366_day"/"all_leap","proleptic_gregorian","julian","none" Unsupported types default to mixed Gregorian/Julian as defined by UDUnits. An Example: Consider the following netCDF variable variables: double lon_cal(lon_cal) ; lon_cal:long_name = "lon_cal" ; lon_cal:units = "days since 1964-2-28 0:0:0" ; lon_cal:calendar = "365_day" ; data: lon_cal = 1,2,3,4,5,6,7,8,9,10; ‘ncks -v lon_cal -d lon_cal,'1964-3-1 0:00:0.0','1964-3-4 00:00:0.0'’ results in ‘lon_cal=1,2,3,4’. netCDF variables should always be stored with MKS (i.e., God's) units, so that application programs may assume MKS dimensions apply to all input variables. The UDUnits feature is intended to alleviate NCO users' pain when handling MKS units. It connects users who think in human-friendly units (e.g., miles, millibars, days) to extract data which are always stored in God's units, MKS (e.g., meters, Pascals, seconds). The feature is not intended to encourage writers to store data in esoteric units (e.g., furlongs, pounds per square inch, fortnights). File: nco.info, Node: Rebasing Time Coordinate, Next: Multiple Record Dimensions, Prev: UDUnits Support, Up: Shared features 3.29 Rebasing Time Coordinate ============================= Availability: ‘ncra’, ‘ncrcat’ Short options: None Time rebasing is invoked when numerous files share a common record coordinate, and the record coordinate basetime (not the time increment, e.g., days or hours) changes among input files. The rebasing is performed automatically if and only if UDUnits is installed. Rebasing occurs when the record coordinate is a time-based variable, and times are recorded in units of a time-since-basetime, and the basetime changes from file to file. Since the output file can have only one unit (i.e., one basetime) for the record coordinate, NCO, in such cases, chooses the units of the first input file to be the units of the output file. It is necessary to "rebase" all the input record variables to this output time unit in order for the output file to have the correct values. For example suppose the time coordinate is in hours and each day in January is stored in its own daily file. Each daily file records the temperature variable ‘tpt(time)’ with an (unadjusted) ‘time’ coordinate value between 0-23 hours, and uses the ‘units’ attribute to advance the base time: file01.nc time:units="hours since 1990-1-1" file02.nc time:units="hours since 1990-1-2" ... file31.nc time:units="hours since 1990-1-31" // Mean noontime temperature in January ncra -v tpt -d time,"1990-1-1 12:00:00","1990-1-31 23:59:59",24 \ file??.nc noon.nc // Concatenate day2 noon through day3 noon records ncrcat -v tpt -d time,"1990-1-2 12:00:00","1990-1-3 11:59:59" \ file01.nc file02.nc file03.nc noon.nc // Results: time is "re-based" to the time units in "file01.nc" time=36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, \ 51, 52, 53, 54, 55, 56, 57, 58, 59 ; // If we repeat the above command but with only two input files... ncrcat -v tpt -d time,"1990-1-2 12:00:00","1990-1-3 11:59:59" \ file02.nc file03 noon.nc // ...then output time coordinate is based on time units in "file02.nc" time = 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, \ 26, 27, 28, 29, 30, 31, 32, 33, 34, 35 ; As of NCO version 4.2.1 (August, 2012), NCO automatically rebases not only the record coordinate (‘time’, here) but also any cell boundaries associated with the record coordinate (e.g., ‘time_bnds’) (*note CF Conventions::). As of NCO version 4.4.9 (May, 2015), NCO also rebases any climatology boundaries associated with the record coordinate (e.g., ‘climatology_bounds’) (*note CF Conventions::). As of NCO version 4.6.3 (December, 2016), NCO also rebases the time coordinate when the units differ between files. For example the first file may have ‘units="days since 2014-03-01"’ and the second file ‘units="hours since 2014-03-10 00:00"’. File: nco.info, Node: Multiple Record Dimensions, Next: Missing Values, Prev: Rebasing Time Coordinate, Up: Shared features 3.30 Multiple Record Dimensions =============================== Availability: ‘ncecat’, ‘ncpdq’ Short options: None Long options: ‘--mrd’ The netCDF3 file format allows only one record dimension, and that dimension must be the first dimension (i.e., the least rapidly varying dimension) of any variable in which it appears. This imposes certain rules on how operators must perform operations that alter the ordering of dimensions or the number of record variables. The netCDF4 file format has no such restrictions. Files and variables may have any number of record dimensions in any order. This additional flexibility of netCDF4 can only be realized by selectively abandoning the constraints that would make operations behave completely consistently between netCDF3 and netCDF4 files. NCO chooses, by default, to impose netCDF3-based constraints on netCDF4 files. This reduces the number of unanticipated consequences and keeps the operators functioning in a familiar way. Put another way, NCO limits production of additional record dimensions so processing netCDF4 files leads to the same results as processing netCDF3 files. Users can override this default with the ‘--mrd’ (or ‘--multiple_record_dimension’) switch, which enables netCDF4 variables to accumulate additional record dimensions. How can additional record dimensions be produced? Most commonly ‘ncecat’ (in record-aggregate mode) defines a new leading record dimension. In netCDF4 files this becomes an additional record dimension unless the original record dimension is changed to a fixed dimension (as must be done in netCDF3 files). Also when ‘ncpdq’ reorders dimensions it can preserve the "record" property of record variables. ‘ncpdq’ tries to define as a record dimension whichever dimension ends up first in a record variable, and, in netCDF4 files, this becomes an additional record dimension unless the original record dimension is changed to a fixed dimension (as must be done in netCDF3 files). It it easier if ‘ncpdq’ and ‘ncecat’ do not increase the number of record dimensions in a variable so that is the default. Use ‘--mrd’ to override this. File: nco.info, Node: Missing Values, Next: Chunking, Prev: Multiple Record Dimensions, Up: Shared features 3.31 Missing values =================== Availability: ‘ncap2’, ‘ncbo’, ‘ncclimo’, ‘nces’, ‘ncflint’, ‘ncpdq’, ‘ncra’, ‘ncremap’, ‘ncwa’ Short options: None The phrase “missing data” refers to data points that are missing, invalid, or for any reason not intended to be arithmetically processed in the same fashion as valid data. All NCO arithmetic operators attempt to handle missing data in an intelligent fashion. There are four steps in the NCO treatment of missing data: 1. Identifying variables that may contain missing data. NCO follows the convention that missing data should be stored with the _FILLVALUE specified in the variable's ‘_FillValue’ attributes. The _only_ way NCO recognizes that a variable _may_ contain missing data is if the variable has a ‘_FillValue’ attribute. In this case, any elements of the variable which are numerically equal to the _FILLVALUE are treated as missing data. NCO adopted the behavior that the default attribute name, if any, assumed to specify the value of data to ignore is ‘_FillValue’ with version 3.9.2 (August, 2007). Prior to that, the ‘missing_value’ attribute, if any, was assumed to specify the value of data to ignore. Supporting both of these attributes simultaneously is not practical. Hence the behavior NCO once applied to MISSING_VALUE it now applies to any _FILLVALUE. NCO now treats any MISSING_VALUE as normal data (1). It has been and remains most advisable to create both ‘_FillValue’ and ‘missing_value’ attributes with identical values in datasets. Many legacy datasets contain only ‘missing_value’ attributes. NCO can help migrating datasets between these conventions. One may use ‘ncrename’ (*note ncrename netCDF Renamer::) to rename all ‘missing_value’ attributes to ‘_FillValue’: ncrename -a .missing_value,_FillValue inout.nc Alternatively, one may use ‘ncatted’ (*note ncatted netCDF Attribute Editor::) to add a ‘_FillValue’ attribute to all variables ncatted -O -a _FillValue,,o,f,1.0e36 inout.nc 2. Converting the _FILLVALUE to the type of the variable, if neccessary. Consider a variable VAR of type VAR_TYPE with a ‘_FillValue’ attribute of type ATT_TYPE containing the value _FILLVALUE. As a guideline, the type of the ‘_FillValue’ attribute should be the same as the type of the variable it is attached to. If VAR_TYPE equals ATT_TYPE then NCO straightforwardly compares each value of VAR to _FILLVALUE to determine which elements of VAR are to be treated as missing data. If not, then NCO converts _FILLVALUE from ATT_TYPE to VAR_TYPE by using the implicit conversion rules of C, or, if ATT_TYPE is ‘NC_CHAR’ (2), by typecasting the results of the C function ‘strtod(_FILLVALUE)’. You may use the NCO operator ‘ncatted’ to change the ‘_FillValue’ attribute and all data whose data is _FILLVALUE to a new value (*note ncatted netCDF Attribute Editor::). 3. Identifying missing data during arithmetic operations. When an NCO arithmetic operator processes a variable VAR with a ‘_FillValue’ attribute, it compares each value of VAR to _FILLVALUE before performing an operation. Note the _FILLVALUE comparison imposes a performance penalty on the operator. Arithmetic processing of variables which contain the ‘_FillValue’ attribute always incurs this penalty, even when none of the data are missing. Conversely, arithmetic processing of variables which do not contain the ‘_FillValue’ attribute never incurs this penalty. In other words, do not attach a ‘_FillValue’ attribute to a variable which does not contain missing data. This exhortation can usually be obeyed for model generated data, but it may be harder to know in advance whether all observational data will be valid or not. 4. Treatment of any data identified as missing in arithmetic operators. NCO averagers (‘ncra’, ‘nces’, ‘ncwa’) do not count any element with the value _FILLVALUE towards the average. ‘ncbo’ and ‘ncflint’ define a _FILLVALUE result when either of the input values is a _FILLVALUE. Sometimes the _FILLVALUE may change from file to file in a multi-file operator, e.g., ‘ncra’. NCO is written to account for this (it always compares a variable to the _FILLVALUE assigned to that variable in the current file). Suffice it to say that, in all known cases, NCO does "the right thing". It is impossible to determine and store the correct result of a binary operation in a single variable. One such corner case occurs when both operands have differing _FILLVALUE attributes, i.e., attributes with different numerical values. Since the output (result) of the operation can only have one _FILLVALUE, some information may be lost. In this case, NCO always defines the output variable to have the same _FILLVALUE as the first input variable. Prior to performing the arithmetic operation, all values of the second operand equal to the second _FILLVALUE are replaced with the first _FILLVALUE. Then the arithmetic operation proceeds as normal, comparing each element of each operand to a single _FILLVALUE. Comparing each element to two distinct _FILLVALUE's would be much slower and would be no likelier to yield a more satisfactory answer. In practice, judicious choice of _FILLVALUE values prevents any important information from being lost. ---------- Footnotes ---------- (1) The old functionality, i.e., where the ignored values are indicated by ‘missing_value’ not ‘_FillValue’, may still be selected _at NCO build time_ by compiling NCO with the token definition ‘CPPFLAGS='-UNCO_USE_FILL_VALUE'’. (2) For example, the DOE ARM program often uses ATT_TYPE = ‘NC_CHAR’ and _FILLVALUE = ‘-99999.’. File: nco.info, Node: Chunking, Next: Quantization Algorithms, Prev: Missing Values, Up: Shared features 3.32 Chunking ============= Availability: ‘ncap2’, ‘ncbo’, ‘nces’, ‘ncecat’, ‘ncflint’, ‘ncks’, ‘ncpdq’, ‘ncra’, ‘ncrcat’, ‘ncwa’ Short options: none Long options: ‘--cnk_byt SZ_BYT’, ‘--chunk_byte SZ_BYT’ ‘--cnk_csh SZ_BYT’, ‘--chunk_cache SZ_BYT’ ‘--cnk_dmn DMN_NM,SZ_LMN’, ‘--chunk_dimension DMN_NM,SZ_LMN’ , ‘--cnk_map CNK_MAP’, ‘--chunk_map CNK_MAP’, ‘--cnk_min SZ_BYT’, ‘--chunk_min SZ_BYT’, ‘--cnk_plc CNK_PLC’, ‘--chunk_policy CNK_PLC’, ‘--cnk_scl SZ_LMN’, ‘--chunk_scalar SZ_LMN’ All netCDF4-enabled NCO operators that define variables support a plethora of chunksize options. Chunking can significantly accelerate or degrade read/write access to large datasets. Dataset chunking issues are described by THG and Unidata here (http://www.hdfgroup.org/HDF5/doc/H5.user/Chunking.html), here (http://www.unidata.ucar.edu/blogs/developer/en/entry/chunking_data_why_it_matters), and here (http://www.unidata.ucar.edu/blogs/developer/en/entry/chunking_data_choosing_shapes). NCO authors are working on generalized algorithms and applications of chunking strategies (stay tuned for more in 2018). As of NCO version 4.6.5 (March, 2017), NCO supports run-time alteration of the chunk cache size. By default, the cache size is set (by the ‘--with-chunk-cache-size’ option to ‘configure’) at netCDF compile time. The ‘--cnk_csh SZ’ option sets the cache size to SZ bytes for all variables. When the debugging level is set (with ‘-D DBG_LVL’) to three or higher, NCO prints the current value of the cache settings for informational purposes. Also ‘--chunk_cache’. Increasing cache size from the default can dramatically accelerate time to aggregate and rechunk multiple large input datasets, e.g., ncrcat -4 -L 1 --cnk_csh=1000000000 --cnk_plc=g3d --cnk_dmn=time,365 \ --cnk_dmn=lat,1800 --cnk_dmn=lon,3600 in*.nc4 out.nc In this example all 3D variables the input datasets (which may or may not be chunked already) are re-chunked to a size of 365 along the time dimension. Because the default chunk cache size of about 4 MB is too small to manipulate the large chunks, we reset the cache to 1 GB. The operation completes much faster, and subsequent reads along the time dimension will be much more rapid. The NCO chunking implementation is designed to be flexible. Users control four aspects of the chunking implementation. These are the “chunking policy”, “chunking map”, “chunksize”, and “minimum chunksize”. The chunking policy determines _which_ variables to chunk, and the chunking map determines how (with what exact sizes) to chunk those variables. These are high-level mechanisms that apply to an entire file and all variables and dimensions. The chunksize option allows per-dimension specification of sizes that will override the selected (or default) chunking map. The distinction between elements and bytes is subtle yet crucial to understand. Elements refers to values of an array, whereas bytes refers to the memory size required to hold the elements. These measures differ by a factor of four or eight for ‘NC_FLOAT’ or ‘NC_DOUBLE’, respectively. The option ‘--cnk_scl’ takes an argument SZ_LMN measured in elements. The options ‘--cnk_byt’, ‘--cnk_csh’, and ‘--cnk_min’ take arguments SZ_BYT measured in bytes. Use the ‘--cnk_min=SZ_BYT’ option to set the minimum size in bytes (not elements) of variables to chunk. This threshold is intended to restrict use of chunking to variables for which it is efficient. By default this minimum variable size for chunking is twice the system blocksize (when available) and is 8192 bytes otherwise. Users may set this to any value with the ‘--cnk_min=SZ_BYT’ switch. To guarantee that chunking is performed on all arrays, regardless of size, set the minimum size to one byte (not to zero bytes). The chunking implementation is similar to a hybrid of the ‘ncpdq’ packing policies (*note ncpdq netCDF Permute Dimensions Quickly::) and hyperslab specifications (*note Hyperslabs::). Each aspect is intended to have a sensible default, so that many users only need to set one switch to obtain sensible chunking. Power users can tune chunking with the three switches in tandem to obtain optimal performance. By default, NCO preserves the chunking characteristics of the input file in the output file (1). In other words, preserving chunking requires no switches or user intervention. Users specify the desired chunking policy with the ‘-P’ switch (or its long option equivalents, ‘--cnk_plc’ and ‘--chunk_policy’) and its CNK_PLC argument. As of August, 2014, six chunking policies are implemented: “Chunk All Variables” Definition: Chunk all variables possible. For obvious reasons, scalar variables cannot be chunked. Alternate invocation: ‘ncchunk’ CNK_PLC key values: ‘all’, ‘cnk_all’, ‘plc_all’ Mnemonic: All “Chunk Variables with at least Two Dimensions [_default_]” Definition: Chunk all variables possible with at least two dimensions Alternate invocation: none CNK_PLC key values: ‘g2d’, ‘cnk_g2d’, ‘plc_g2d’ Mnemonic: _G_reater than or equal to _2_ _D_imensions “Chunk Variables with at least Three Dimensions” Definition: Chunk all variables possible with at least three dimensions Alternate invocation: none CNK_PLC key values: ‘g3d’, ‘cnk_g3d’, ‘plc_g3d’ Mnemonic: _G_reater than or equal to _3_ _D_imensions “Chunk One-Dimensional Record Variables” Definition: Chunk all 1-D record variables Alternate invocation: none Any specified (with ‘--cnk_dmn’) record dimension chunksizes will be applied only to 1-D record variables (and to no other variables). Other dimensions may be chunked with their own ‘--cnk_dmn’ options that will apply to all variables. CNK_PLC key values: ‘r1d’, ‘cnk_r1d’, ‘plc_r1d’ Mnemonic: _R_ecord _1_-_D_ variables “Chunk Variables Containing Explicitly Chunked Dimensions” Definition: Chunk all variables possible that contain at least one dimension whose chunksize was explicitly set with the ‘--cnk_dmn’ option. Alternate invocation: none CNK_PLC key values: ‘xpl’, ‘cnk_xpl’, ‘plc_xpl’ Mnemonic: E_XPL_icitly specified dimensions “Chunk Variables that are already Chunked” Definition: Chunk only variables that are already chunked in the input file. When used in conjunction with ‘cnk_map=xst’ this option preserves and copies the chunking parameters from the input to the output file. Alternate invocation: none CNK_PLC key values: ‘xst’, ‘cnk_xst’, ‘plc_xst’ Mnemonic: E_X_i_ST_ing chunked variables “Chunk Variables with NCO recommendations” Definition: Chunk all variables according to NCO best practices. This is a virtual option that ensures the chunking policy is (in the subjective opinion of the authors) the best policy for typical usage. As of NCO version 4.4.8 (February, 2015), this virtual policy implements ‘map_rew’ for 3-D variables and ‘map_lfp’ for all other variables. Alternate invocation: none CNK_PLC key values: ‘nco’, ‘cnk_nco’, ‘plc_nco’ Mnemonic: _N_et_C_DF_O_perator “Unchunking” Definition: Unchunk all variables possible. The HDF5 storge layer requires that record variables (i.e., variables that contain at least one record dimension) must be chunked. Also variables that are compressed or use checksums must be chunked. Such variables cannot be unchunked. Alternate invocation: ‘ncunchunk’ CNK_PLC key values: ‘uck’, ‘cnk_uck’, ‘plc_uck’, ‘none’, ‘unchunk’ Mnemonic: _U_n_C_hun_K_ Equivalent key values are fully interchangeable. Multiple equivalent options are provided to satisfy disparate needs and tastes of NCO users working with scripts and from the command line. The chunking algorithms must know the chunksizes of each dimension of each variable to be chunked. The correspondence between the input variable shape and the chunksizes is called the “chunking map”. The user specifies the desired chunking map with the ‘-M’ switch (or its long option equivalents, ‘--cnk_map’ and ‘--chunk_map’) and its CNK_MAP argument. Nine chunking maps are currently implemented: “Chunksize Equals Dimension Size” Definition: Chunksize defaults to dimension size. Explicitly specify chunksizes for particular dimensions with ‘--cnk_dmn’ option. In most cases this chunksize will be applied in all variables that contain the specified dimension. Some chunking policies noted above allow (fxm), and others (fxm) prevent this chunksize from applying to all variables. CNK_MAP key values: ‘dmn’, ‘cnk_dmn’, ‘map_dmn’ Mnemonic: _D_i_M_e_N_sion “Chunksize Equals Dimension Size except Record Dimension” Definition: Chunksize equals dimension size except record dimension has size one. Explicitly specify chunksizes for particular dimensions with ‘--cnk_dmn’ option. CNK_MAP key values: ‘rd1’, ‘cnk_rd1’, ‘map_rd1’ Mnemonic: _R_ecord _D_imension size _1_ “Chunksize Equals Scalar Size Specified” Definition: Chunksize for all dimensions is set with the ‘--cnk_scl=SZ_LMN’ option. For this map SZ_LMN itself becomes the chunksize of each dimension. This is in contrast to the CNK_PRD map, where the Rth root of SZ_LMN) becomes the chunksize of each dimension. CNK_MAP key values: ‘scl’, ‘cnk_scl’, ‘map_scl’ Mnemonic: _SC_a_L_ar CNK_MAP key values: ‘xpl’, ‘cnk_xpl’, ‘map_xpl’ Mnemonic: E_XPL_icitly specified dimensions “Chunksize Product Matches Scalar Size Specified” Definition: The product of the chunksizes for each variable matches (approximately equals) the size specified with the ‘--cnk_scl=SZ_LMN’ option. A dimension of size one is said to be _degenerate_. For a variable of rank R (i.e., with R non-degenerate dimensions), the chunksize in each non-degenerate dimension is (approximately) the Rth root of SZ_LMN. This is in contrast to the CNK_SCL map, where SZ_LMN itself becomes the chunksize of each dimension. CNK_MAP key values: ‘prd’, ‘cnk_prd’, ‘map_prd’ Mnemonic: _PR_o_D_uct “Chunksize Lefter Product Matches Scalar Size Specified” Definition: The product of the chunksizes for each variable (approximately) equals the size specified with the ‘--cnk_byt=SZ_BYT’ (not ‘--cnk_dfl’) option. This is accomplished by using dimension sizes as chunksizes for the rightmost (most rapidly varying) dimensions, and then "flexing" the chunksize of the leftmost (least rapidly varying) dimensions such that the product of all chunksizes matches the specified size. All L-dimensions to the left of and including the first record dimension define the left-hand side. To be precise, if the total size (in bytes) of the variable is VAR_SZ, and if the specified (with ‘--cnk_byt’) product of the R "righter" dimensions (those that vary more rapidly than the first record dimension) is SZ_BYT, then chunksize (in bytes) of each of the L lefter dimensions is (approximately) the Lth root of VAR_SZ/SZ_BYT. This map was first proposed by Chris Barker. CNK_MAP key values: ‘lfp’, ‘cnk_lfp’, ‘map_lfp’ Mnemonic: _L_e_F_ter _P_roduct “Chunksize Equals Existing Chunksize” Definition: Chunksizes are copied from the input to the output file for every variable that is chunked in the input file. Variables not chunked in the input file will be chunked with default mappings. CNK_MAP key values: ‘xst’, ‘cnk_xst’, ‘map_xst’ Mnemonic: E_X_i_ST_ “Chunksize Balances 1D and (N-1)-D Access to N-D Variable [_default for netCDF4 input_]” Definition: Chunksizes are chosen so that 1-D and (N-1)-D hyperslabs of 3-D variables (e.g., point-timeseries or latitude/longitude surfaces of 3-D fields) both require approximately the same number of chunks. Hence their access time should be balanced. Russ Rew explains the motivation and derivation for this strategy here (http://www.unidata.ucar.edu/blogs/developer/en/entry/chunking_data_choosing_shapes). CNK_MAP key values: ‘rew’, ‘cnk_rew’, ‘map_rew’ Mnemonic: Russ _REW_ “Chunksizes use netCDF4 defaults” Definition: Chunksizes are determined by the underlying netCDF library. All variables selected by the current chunking policy have their chunksizes determined by netCDF library defaults. The default algorithm netCDF uses to determine chunksizes has changed through the years, and thus depends on the netCDF library version. This map can be used to reset (portions of) previously chunked files to default chunking values. CNK_MAP key values: ‘nc4’, ‘cnk_nc4’, ‘map_nc4’ Mnemonic: _N_et_C_DF_4_ “Chunksizes use NCO recommendations [_default for netCDF3 input_]” Definition: Chunksizes are determined by the currently recommended NCO map. This is a virtual option that ensures the chunking map is (in the subjective opinion of the authors) the best map for typical usage. As of NCO version 4.4.9 (May, 2015), this virtual map calls ‘map_lfp’. CNK_MAP key values: ‘nco’, ‘cnk_nco’, ‘map_nco’ Mnemonic: _N_et_C_DF_O_perator It is possible to combine the above chunking map algorithms with user-specified per-dimension (though not per-variable) chunksizes that override specific chunksizes determined by the maps above. The user specifies the per-dimension chunksizes with the (equivalent) long options ‘--cnk_dmn’ or ‘--chunk_dimension’). The option takes two comma-separated arguments, DMN_NM,SZ_LMN, which are the dimension name and its chunksize (in elements, not bytes), respectively. The ‘--cnk_dmn’ option may be used as many times as necessary. The default behavior of chunking depends on several factors. As mentioned above, when no chunking options are explicitly specified by the user, then NCO preserves the chunking characteristics of the input file in the output file. This is equivalent to specifying both CNK_PLC and CNK_MAP as "existing", i.e., ‘--cnk_plc=xst --cnk_map=xst’. If output netCDF4 files are chunked with the default behavior of the netCDF4 library. When any chunking parameter _except_ ‘cnk_plc’ or ‘cnk_map’ is specified (such as ‘cnk_dmn’ or ‘cnk_scl’), then the "existing" policy and map are retained and the output chunksizes are modified where necessary in accord with the user-specified parameter. When ‘cnk_map’ is specified and ‘cnk_plc’ is not, then NCO picks (what it thinks is) the optimal chunking policy. This has always been policy ‘map_g2d’. When ‘cnk_plc’ is specified and ‘cnk_map’ is not, then NCO picks (what it thinks is) the optimal chunking map. This has always been map ‘map_rd1’. To start afresh and return to netCDF4 chunking defaults, select ‘cnk_map=nc4’. # Simple chunking and unchunking ncks -O -4 --cnk_plc=all in.nc out.nc # Chunk in.nc ncks -O -4 --cnk_plc=unchunk in.nc out.nc # Unchunk in.nc # Chunk data then unchunk it, printing informative metadata ncks -O -4 -D 4 --cnk_plc=all ~/nco/data/in.nc ~/foo.nc ncks -O -4 -D 4 --cnk_plc=uck ~/foo.nc ~/foo.nc # Set total chunksize to 8192 B ncks -O -4 -D 4 --cnk_plc=all --cnk_byt=8192 ~/nco/data/in.nc ~/foo.nc # More complex chunking procedures, with informative metadata ncks -O -4 -D 4 --cnk_scl=8 ~/nco/data/in.nc ~/foo.nc ncks -O -4 -D 4 --cnk_scl=8 dstmch90_clm.nc ~/foo.nc ncks -O -4 -D 4 --cnk_dmn lat,64 --cnk_dmn lon,128 dstmch90_clm.nc \ ~/foo.nc ncks -O -4 -D 4 --cnk_plc=uck ~/foo.nc ~/foo.nc ncks -O -4 -D 4 --cnk_plc=g2d --cnk_map=rd1 --cnk_dmn lat,32 \ --cnk_dmn lon,128 dstmch90_clm_0112.nc ~/foo.nc # Chunking works with all operators... ncap2 -O -4 -D 4 --cnk_scl=8 -S ~/nco/data/ncap2_tst.nco \ ~/nco/data/in.nc ~/foo.nc ncbo -O -4 -D 4 --cnk_scl=8 -p ~/nco/data in.nc in.nc ~/foo.nc ncecat -O -4 -D 4 -n 12,2,1 --cnk_dmn lat,32 \ -p /data/zender/dstmch90 dstmch90_clm01.nc ~/foo.nc ncflint -O -4 -D 4 --cnk_scl=8 ~/nco/data/in.nc ~/foo.nc ncpdq -O -4 -D 4 -P all_new --cnk_scl=8 -L 5 ~/nco/data/in.nc ~/foo.nc ncrcat -O -4 -D 4 -n 12,2,1 --cnk_dmn lat,32 \ -p /data/zender/dstmch90 dstmch90_clm01.nc ~/foo.nc ncwa -O -4 -D 4 -a time --cnk_plc=g2d --cnk_map=rd1 --cnk_dmn lat,32 \ --cnk_dmn lon,128 dstmch90_clm_0112.nc ~/foo.nc Chunking policy ‘r1d’ changes the chunksize of 1-D record variables (and no other variables) to that specified (with ‘--cnk_dmn’) chunksize. Any specified record dimension chunksizes will be applied to 1-D record variables only. Other dimensions may be chunked with their own ‘--cnk_dmn’ options that will apply to all variables. For example, ncks --cnk_plc=r1d --cnk_dmn=time,1000. in.nc out.nc This sets ‘time’ chunks to 1000 only in 1-D record variables. Without the ‘r1d’ policy, ‘time’ chunks would change in all variables. It is appropriate to conclude by informing users about an aspect of chunking that may not be expected. Three types of variables are _always_ chunked: Record variables, Deflated (compressed) variables, and Checksummed variables. Hence all variables that contain a record dimension are also chunked (since data must be chunked in all dimensions, not just one). Unless otherwise specified by the user, the other (fixed, non-record) dimensions of record variables are assigned default chunk sizes. The HDF5 layer does all this automatically to optimize the on-disk variable/file storage geometry of record variables. Do not be surprised to learn that files created without any explicit instructions to activate chunking nevertheless contain chunked variables. ---------- Footnotes ---------- (1) This behavior became the default in November 2014 with NCO version 4.4.7. Prior versions would always use netCDF default chunking in the output file when no NCO chunking switches were activated, regardless of the chunking in the input file. File: nco.info, Node: Quantization Algorithms, Next: Compression, Prev: Chunking, Up: Shared features 3.33 Quantization Algorithms ============================ Availability: ‘ncbo’, ‘ncecat’, ‘nces’, ‘ncflint’, ‘ncks’, ‘ncpdq’, ‘ncra’, ‘ncrcat’, ‘ncwa’ Short options: None Long options: ‘--qnt_alg ALG_NM’ ‘--quantize_algorithm ALG_NM’ As of version 5.2.5 (May 2024), NCO supports a simple API to specify quantization algorithms. This method uses the ‘--qnt_alg=alg_nm’ option, where ‘alg_nm’ is a happy, friendly, abbreviation or full English string for the quantization algorithm name. ncks -7 -L 1 --qnt default=3 in.nc out.nc # Granular BitRound (NSD) ncks -7 -L 1 --qnt_alg=btg --qnt default=3 in.nc out.nc # BitGroom (NSD) ncks -7 -L 1 --qnt_alg=shv --qnt default=3 in.nc out.nc # BitShave (NSD) ncks -7 -L 1 --qnt_alg=set --qnt default=3 in.nc out.nc # BitSet (NSD) ncks -7 -L 1 --qnt_alg=dgr --qnt default=3 in.nc out.nc # DigitRound (NSD) ncks -7 -L 1 --qnt_alg=gbr --qnt default=3 in.nc out.nc # Granular BitRound (NSD) ncks -7 -L 1 --qnt_alg=bgr --qnt default=3 in.nc out.nc # BitGroomRound (NSD) ncks -7 -L 1 --qnt_alg=sh2 --qnt default=9 in.nc out.nc # HalfShave (NSB) ncks -7 -L 1 --qnt_alg=brt --qnt default=3 in.nc out.nc # BruteForce (NSD) ncks -7 -L 1 --qnt_alg=btr --qnt default=9 in.nc out.nc # BitRound (NSB) The algorithm strings shown above give only a hint as to the flexibility of synonyms recognized as algorithm names. For example, synonyms for ‘btr’ include (case-insensitive versions of) ‘bitround’, ‘bit round’, and ‘bit-round’. Try it and see! Behind the scenes, NCO translates the algorithm name to an enumerated bit-adjustment-algorithm BAA value. The BAA interface is undocumented and unsupported, however. This is to give the maintainers to change the unlying algorithm organization. ncks -7 -L 1 --ppc default=3 in.nc out.nc # Granular BitRound (NSD) ncks -7 -L 1 --baa=0 --ppc default=3 in.nc out.nc # BitGroom (NSD) ncks -7 -L 1 --baa=1 --ppc default=3 in.nc out.nc # BitShave (NSD) ncks -7 -L 1 --baa=2 --ppc default=3 in.nc out.nc # BitSet (NSD) ncks -7 -L 1 --baa=3 --ppc default=3 in.nc out.nc # DigitRound (NSD) ncks -7 -L 1 --baa=4 --ppc default=3 in.nc out.nc # Granular BitRound (NSD) ncks -7 -L 1 --baa=5 --ppc default=3 in.nc out.nc # BitGroomRound (NSD) ncks -7 -L 1 --baa=6 --ppc default=9 in.nc out.nc # HalfShave (NSB) ncks -7 -L 1 --baa=7 --ppc default=3 in.nc out.nc # BruteForce (NSD) ncks -7 -L 1 --baa=8 --ppc default=9 in.nc out.nc # BitRound (NSB) Although the ‘qnt_alg’ and BAA APIs are equivalent, the BAA values may change in the future so using the ‘qnt_alg’ interface is recommended. File: nco.info, Node: Compression, Next: Deflation, Prev: Quantization Algorithms, Up: Shared features 3.34 Compression ================ Availability: ‘ncbo’, ‘ncecat’, ‘nces’, ‘ncflint’, ‘ncks’, ‘ncpdq’, ‘ncra’, ‘ncrcat’, ‘ncwa’ Short options: None Long options: ‘--ppc VAR1[,VAR2[,...]]=PRC’, ‘--precision_preserving_compression VAR1[,VAR2[,...]]=PRC’, ‘--qnt VAR1[,VAR2[,...]]=PRC’ ‘--quantize VAR1[,VAR2[,...]]=PRC’ ‘--qnt_alg ALG_NM’ ‘--quantize_algorithm ALG_NM’ ‘--cmp cmp_sng’ ‘--cmp CODEC1[,PARAMS1[|CODEC2[,PARAMS2[|...]]]]’ ‘--codec CODEC1[,PARAMS1[|CODEC2[,PARAMS2[|...]]]]’ Compression is a rapidly developing area in geoscientific software, and NCO is no exception. Documentation of these features can quickly become out-of-date. A brief review of compression support from the early days until now: NCO first supported Linear Packing with ‘ncpdq’ in 2004. The advent of netCDF4 allowed NCO to support lossless compression with the DEFLATE algorithm beginning in 2007. Nearly a decade elapsed before the next features came in 2015 when, thanks to support from the DOE, we developed the lossy BitGroom quantization algorithm. NCO soon introduced a flexible per-variable API (‘--ppc’ and ‘--qnt’) to support it, and its relatives BitShave, and BitSet, in all arithmetic operators. This work helped spur interest and research on other Bit Adjustment Algorithms (BAAs) that perform quantization. In 2020 we reduced the quantization error of BitGroom by implementing IEEE-rounding (aka BitRound), and newer quantization algorithms including BitGroomRound, HalfShave, and DigitRound (1). These algorithms are all accessible via the ‘--baa’ option. In 2020 NSF awarded support for us to develop the Community Codec Respository to put a friendlier API on the HDF5 shared-library filter mechanism so that all netCDF users could shift to more modern and efficient codecs than DEFLATE. This strategy aligned with needs of operational forecasters supported by software engineers at the NOAA Environmental Modeling Center (EMC), who contributed to developing and testing the CCR (2). By the end of 2021 the CCR supported codecs for Bzip2, Zstandard, BitGroom, and Granular BitRound. The CCR performance helped persuade the netCDF team at Unidata of the importance and practicality of expanding compression options in the base netCDF C-library beyond the venerable DEFLATE algorithm. Together, we merged the quantization, Bzip2, and Zstandard codecs and API from the CCR into what became (in June, 2022) netCDF version 4.9.0 (3). NCO version 5.1.0 (released in July, 2022) unified these advances by fully supporting its own quantization methods, CCR codecs, generic (i.e., non-CCR) HDF5 codecs, and the quantization algorithms and modern lossless codecs in netCDF 4.9.0. Access to the quantization and compression options is available through three complementary, backwards-compatible, and overlapping APIs designed to accomodate the successive generations of compression features. The original generation of compression options remain accessible through the standard ‘ncpdq’ (for Linear Packing) and NCO-wide ‘-L’ option (or its synonyms ‘--deflate’ and ‘--dfl_lvl’) for DEFLATE. The second generation of compression options refers to the ‘--qnt’, ‘--ppc’, and ‘--baa’ (and related synonyms) options that control the type and level of quantization algorithm, and the variables to operate on. These options call quantization and rounding routines implemented within NCO itself, rather than in external libraries. The new ‘--cmp_sng’ (and synonyms) option provides an API for NCO to invoke all lossy and lossless codecs in external libraries, including the netCDF C-library, the CCR, and generic HDF5 codecs. The ‘--cmp_sng’ (and synonyms ‘--cmp’ and ‘--compression’) options take as argument a string CMP_SNG which contains a list of quantization and compression algorithms and their respective parameters. The CMP_SNG must adhere to a superset of the filter-list API introduced by the ‘nccopy’ command and reported in the netCDF ‘_Filter’ attribute. This API uses the UNIX pipe symbole ‘|’ to separate the codecs applied as HDF5 filters to a variable: % ncks --hdn -m in.nc | grep _Filter u:_Filter = "307,2|32015,3" ; U:_Filter = "32001,2,2,4,4185932,5,1,1" ; % ncdump -h -s in.nc | grep _Filter u:_Filter = "307,2|32015,3" ; U:_Filter = "32001,2,2,4,4185932,5,1,1" ; The above example shows variables compressed with two successive codecs. The variable ‘u’ was compressed with codecs with HDF5 filter IDs 307 and 32015, respectively. NCO translates these mysterious HDF5 numeric filter IDs into filter names in the CDL comments when invoked with a higher debugging level: % ncks -D 2 --hdn -m in.nc | grep _Filter u:_Filter = "307,2|32015,3" ; // char codec(s): Bzip2, Zstandard U:_Filter = "32001,2,2,4,4185932,5,1,1" ; // char codec(s): \ Blosc Shuffle, Blosc LZ4 You may provide any operator (besides ‘ncrename’ and ‘ncatted’) with a CMP_SNG comprised of an arbitrary number of HDF5 filters specified either by numeric ID or by name, including NCO-supported synonyms: % ncks --cmp="307,2|32015,3" in.nc out.nc # Filter numeric IDs % ncks --cmp="Bzip2,2|Zstandard,3" in.nc out.nc # Filter names % ncks --cmp="bzp,2|zst,3" in.nc out.nc # Filter abbreviations NCO also uses this API to invoke the netCDF quantization algorithms such as Granular BitGroom and BitRound. netCDF records the operation of quantization algorithms in a ‘_Quantize’ attribute. % ncks --cmp="gbr,2|zst,3" in.nc out.nc % ncks -D 2 --hdn -m out.nc | grep 'Filt|Quant' u:_QuantizeGranularBitRoundNumberOfSignificantDigits = 2 ; u:_Filter = "32015,3" ; // char Codec(s): Zstandard NCO calls the filters in the order specified. Thus, it is important to follow the above example and to specify compression pre-filters like quantization and Shuffle prior to any lossless codecs. However, the netCDF library imposes rules that can override the user-specified order for the Shuffle and Fletcher32 filters as described below (these are always helpful in real-world situations). The ‘--cmp’ option specifies a global compression configuration. This is fine for lossless codecs, since there is little evidence to motivate per-variable lossless compression levels for real-world data. By contrast, it is often desirable to configure quantization levels on a per-variable basis. This is because the real information content of geophysical variables can differ strongly due to a multitude of factors including the field meaning, spatial location, and dimensional units. NCO applies any quantization filter specified in CMP_SNG uniformly to all variables encountered (the ‘--qnt’ quantization option/API is still available for per-variable quantization parameters, as discussed below). The one exception is that NCO prohibits quantization of "coordinate-like variables". Variables that are tradiational (1-dimensional) coordinates, or that are mentioned in the values of CF ‘bounds’, ‘climatology’, ‘coordinates’, or ‘grid_mapping’ attributes, all count as coordinate-like variables. Such variables include quantities like gridcell areas and boundaries. NCO eschews quantizing such variables to avoid unforeseen or unanticipated degradation of numerical accuracy due to propagation of quantization errors in post-processing. Specifying the compression string with codec names should make lossy and lossless compression easier for users to understand and employ. There are, however, a few wrinkles and legacy conventions that might surprise users when first encountered. For instance, the codecs for DEFLATE, Shuffle, and Fletcher32 have always been built-in to netCDF. Users can instruct NCO to apply these filters with the new API, yet their application to a dataset will still be reported using the "old" per-filter attributes: % ncks --cmp="fletcher32|shuffle|deflate" in.nc out.nc % ncks --hdn -m out.nc ... u:_DeflateLevel = 1 ; u:_Shuffle = "true" ; u:_Fletcher32 = "true" ; ... As this example shows, it is not required to give filter parameter arguments to all filters. When the user omits filter parameters (e.g., compression level, NSD, NSB, or other filter configurator) for select filters that require such a parameter, NCO automatically inserts an appropriate default filter parameter. NCO assumes default parameters 1, 4, 1, and 3, for the lossless filters DEFLATE, Shuffle, Bzip2, and Zstandard, respectively. NCO assumes default parameters 3, 3, and 9, for the lossy quantization algorithms BitGroom, Granular BitGroom, and BitRound, respectively. Note that the netCDF filter for the Fletcher32 checksum algorithm does not accept a parameter argument, and NCO ignores any parameters provided to this filter. % ncks --cmp="fletcher32|shuffle|granularbr|deflate|zstandard" ... % ncks --cmp="f32|shf|gbr|dfl|zst" ... # Shorthand, default parameters % ncks --cmp="f32|shf,4|gbr,3|dfl,1|zst,3" ... # Explicit parameters The ‘cmd_sng’ option supports an arbitrary number of filters. An example that compressess then reads a file with most netCDF-supported algorithms shows the mixture of filter-specific attributes (‘_Shuffle’, ‘_DeflateLevel’, ‘_Fletcher32’) and generic filter attributes (‘_Filter’) that result: % ncks --cmp='f32|shf|gbr|dfl|bz2|zst' in.nc out.nc % ncks --hdn -m out.nc float u(time) ; u:long_name = "Zonal wind speed" ; u:units = "meter second-1" ; u:_QuantizeGranularBitRoundNumberOfSignificantDigits = 3 ; u:_Storage = "chunked" ; u:_ChunkSizes = 1024 ; u:_Filter = "307,1|32015,3" ; u:_DeflateLevel = 1 ; u:_Shuffle = "true" ; u:_Fletcher32 = "true" ; u:_Endianness = "little" ; Note that the ‘_Filter’ value is a valid CMP_SNG for use as an argument to the ‘--cmp_sng’ option. This enables users to easily duplicate the compression settings of one dataset in another dataset. One can also find the global CMP_SNG that NCO used to compress a dataset in the global ‘history’ attribute. In the absence of instructions to the contrary, NCO preserves the compression settings of datasets that it copies or subsets. It simply defaults to copying the per-variable compression settings from the input file. If the copy or subset command includes global compression instructions (i.e., the ‘--cmp’ or ‘-L’ options), those instructions will override the per-variable settings in the input file. The user can eliminate all compression filters by setting CMP_SNG to the special value ‘none’ (or to its synonyms ‘uncompress’, ‘decompress’, ‘defilter’, or ‘unset’). % ncks --cmp='f32|shf|gbr|dfl|bz2|zst' in.nc out.nc % ncks --cmp='none' out.nc none.nc % ncks --hdn -m none.nc float u(time) ; u:long_name = "Zonal wind speed" ; u:units = "meter second-1" ; u:_QuantizeGranularBitRoundNumberOfSignificantDigits = 3 ; u:_Storage = "chunked" ; u:_ChunkSizes = 1024 ; u:_Endianness = "little" ; The uncompressed copy has no filter attributes remaining because all filters have been removed. The ‘_Quantize’ attribute remains because the quantization was applied as an internal algorithm not as an HDF5 filter. In contrast to lossless compression filters, lossy quantization algorithms can never be "removed" much less undone because, by definition, they are lossy. Removing compression is "all or nothing" in that there is currently no way to remove only one or a few codecs and leave the rest in place. To do that, the user must instead recompress the dataset using a CMP_SNG that includes only the desired codecs. % ncks --cmp='shf|zst' ... # Compress % ncks --cmp='none' ... # Decompress (remove Shuffle and Zstd) % ncks --cmp=zst,4 ... # Recompress to new Zstd level, no Shuffle Shuffle is an important filter than can boost lossless compression ratios of geoscientific data by 10-20% (*note Figure 3.1: fgr:qnt_cr_dfl.). [image src="fgr/qnt_cr_dfl.png" text="Quantization and compression by @acronym{DEFLATE}"] Figure 3.1: Quantization and then compression by DEFLATE, including the contribution of Shuffle. Both the netCDF library and NCO continue to treat the Shuffle filter specially. If the Shuffle and DEFLATE algorithms are both invoked through the standard netCDF API (i.e., ‘nc_def_var_deflate()’), then the netCDF library ensures that the Shuffle filter is called before DEFLATE, indeed before any filter except ‘Fletcher32’ (which performs a checksum, not compression). This behavior is welcome as it avoids inadvertent mis-use of Shuffle. Prior to version 5.1.0, NCO always invoked Shuffle with DEFLATE, and did not expose the Shuffle filter to user control. NCO now exposes Shuffle to user control for all filters. To preserve backward compatibility, invoking the DEFLATE algorithm with the ‘dfl’ or ‘deflate’ names (or with the numeric HDF5 filter ID of ‘1’) still sets the Shuffle filter to true. To invoke DEFLATE without Shuffle, use the special filter names ‘dns’ or ‘DEFLATE No Shuffe’. Specifying the Shuffle filter along with any Blosc compressor causes NCO to invoke the Blosc version of Shuffle instead of the HDF5 version of Shuffle. The Blosc Shuffle should execute more rapidly because it takes advantage of AVX2 instructions, etc. In summary, users must explicitly request Shuffle for all non-DEFLATE codecs, otherwise Shuffle will not be employed prior to those codecs. % ncks --cmp='dfl' ... # Invoke Shuffle then DEFLATE % ncks --cmp='shf|dfl' ... # Same (default Shuffle stride 4-bytes) % ncks --cmp='dns' ... # Invoke DEFLATE (no Shuffle) % ncks --cmp='shf|dns' ... # Invoke Shuffle then DEFLATE % ncks --cmp='zst' ... # Invoke Zstandard (no Shuffle) % ncks --cmp='shf|zst' ... # Invoke Shuffle then Zstandard % ncks --cmp='zst|shf' ... # Same (netCDF enforces Shuffle-first rule) % ncks --cmp='shf' ... # Invoke only Shuffle % ncks --cmp='shf,8|zst' ... # Shuffle stride 8-bytes then Zstandard % ncks --cmp='shf,8|dfl' ... # Shuffle stride remains default 4-bytes % ncks --cmp='bls' ... # Invoke Blosc (LZ by default) without Shuffle % ncks --cmp='shf|bls' ... # Invoke Blosc (not HDF5) Shuffle, then Blosc LZ The last example above shows how to invoke Shuffle with non-default byte stride (4). The netCDF library, and thus NCO, always uses the default Shuffle stride (4 bytes) with the DEFLATE filter. Specifying a different stride only has an effect with non-DEFLATE filters. This ensures NCO's default behavior with DEFLATE does not change. As explained above, invoke DEFLATE with ‘--cmp=dns’ instead of ‘--cmp=dfl’ if you wish to suppress the otherwise automatic invocation of Shuffle. Note that NCO and netCDF implement their quantization algorithms internally, whereas the CCR implements them as external shared-library codecs (valid only with netCDF4 files). Since these quantization algorithms leave data in IEEE format no codec/filter is required to read (or write) them. Quantization therefore works with netCDF3 datasets, not just netCDF4. netCDF applications that attempt to read data compressed with shared-library filters must be linked to the same shared filters or the "decompression" step will fail. Datasets produced with netCDF or CCR-supported codecs (Bzip2, DEFLATE, Zstandard) will be readable by all users who upgrade to netCDF 4.9.0 or later or who install the CCR. There is no difference between a losslessly compressed dataset produced with a CCR-supplied vs. a netCDF-supplied filter. However, reading a dataset quantized by a CCR filter (e.g., BitGroom or GranularBG) requires access to the CCR filter, which forces users to take an extra step to install the CCR. This is an unfortunate artifact of implementing quantization as a codec (which the CCR must do) vs. an internal numerical function (which netCDF and NCO do). Where possible, people should encode datasets with netCDF-supported algorithms and codecs in preference to CCR or raw HDF5 codecs. Doing so will increase dataset portability. Limitations of Current Compression API -------------------------------------- The NCO filter API will evolve in a (hopefully) backward-compatible manner as experience and feedback are gained. All filter parameters are eventually passed to HDF5 as unsigned integers. By contrast, the current API treats all input arguments in CMP_SNG signed integers for ease-of-use. For example, it is easier to specify a Zstandard filter level of negative one as ‘zstd,-1’ than as ‘zstd,4294967295’. NCO currently has no easy way to specify the floating-point configuration parameters required by some CCR filters and many external filters. That said, most of the code to support the filter parameter syntax documented at and implemented in ‘ncgen’ and ‘nccopy’ is ready and we expect to support that syntax in NCO 5.0.9. That syntax is backward compatible with the integer-only input assumptions already embedded in NCO 5.1.0. In the meantime, we request your feedback, use-cases, and suggestions before adopting a final approach. Another limitation of NCO 5.1.0 was that the Blosc filters would fail without explanation. This is why the manual does not yet document much about Blosc filters. The Blosc issues were fixed upstream in netCDF version 4.9.1. netCDF 4.9.0 contained some other inadvertent mistakes that were fixed in 4.9.1. First, the quantization algorithms internal to netCDF work only on datasets of type NETCDF4 in netCDF 4.9.0. A recently discovered bug prevents them from working on NETCDF4_CLASSIC-format datasets (5). The fix to the NETCDF4_CLASSIC bug was officially released in netCDF 4.9.1. Note that this bug did not affect the same quantization algorithms as implemented in NCO (or in the CCR, for that matter). In other words, quantization to netCDF3 and NETCDF4_CLASSIC-format output files _always_ works when invoked through the ‘--qnt’ (not ‘--cmp’) option. This restriction will only affects netCDF prior to 4.9.1. Per-variable quantization settings must also be invoked through ‘--qnt’ (not ‘--cmp’) for all output formats, until and unless this feature is migrated to ‘--cmp’ (there are no plans to do so). Another sticky wicket expected that was fixed in netCDF 4.9.1 was the use of the Blosc codec. NCZarr uses Blosc internally, however the HDF5 Blosc codec in netCDF 4.9.0 was not robust. We plan to advertise the advantages of Blosc more fully in future versions of NCO. Feedback on any or all of these constraints is welcome. Best Practices for Real World Lossy Compression ----------------------------------------------- The workflow to compress large-scale, user-facing datasets requires consideration of multiple factors including storage space, speed, accuracy, information content, portability, and user-friendliness. We have found that this blend is best obtained by using per-variable quantization together with global lossless compression. NCO can configure per-variable quantization levels together with global lossless filters when the quantization algorithm is specified with the ‘--qnt’ option/API and the lossless filters are specified with the ‘--cmp_sng’ option/API. Granular BitRound (GranularBR) and BitRound are state-of-the-art quantization algorithms that are configured with the number of significant decimal digits (NSD) or number of significant bits (NSB), respectively. One can devise an optimal approach in about four steps: First, select a global lossless codec that produces a reasonable tradeoff between compression ratio (CR) and speed. The speed of the final workflow will depend mostly on the lossless codec employed and its compression settings. Try a few standard codecs with Shuffle to explore this tradeoff: ncks -7 --cmp='shf|zst' ... ncks -7 --cmp='shf|dfl' ... ncks -7 --cmp='shf|bz2' ... The ‘-7’ switch creates output in NETCDF4_CLASSIC format. This highly portable format supports codecs and is also mandated by many archives such as CMIP6. The only other viable format choice is ‘-4’ for NETCDF4. That format must be used if any variables make use of the extended netCDF4 atomic types. Our experience with ESM data shows that Bzip2 often yields the best CR (*note Figure 3.2: fgr:qnt_cr_bz2.). However Bzip2 is much slower than Zstandard which yields a comparable CR. [image src="fgr/qnt_cr_bz2.png" text="Quantization and compression by Bzip2"] Figure 3.2: Quantization and then compression by Bzip2, including the contribution of Shuffle. The second step is to choose the quantization algorithm and its default level. Quantization can significantly improve the CR without sacrificing any scientifically meaningful data. Lossless algorithms are unlikely to significantly alter the workflow throughput unless applied so agressively that they considerably reduce the entropy seen by the lossless codec. The goal in this step is to choose the strongest quantization that preserves all the meaningful precision of _most_ fields, and that dials-in the CR to the desired value. ncks -7 --qnt default=3 ... # GranularBR, NSD=3 ncks -7 --qnt default=3 ... # Same ncks -7 --qnt_alg=gbr --ppc default=3 ... # Same ncks -7 --qnt_alg=gbr --ppc dfl=3 ... # Same ncks -7 --qnt_alg=btr --ppc dfl=9 ... # BitRound, NSB=9 ncks -7 --baa=8 --ppc default=9 ... # Same As an argument to ‘--qnt’, the keyword ‘dfl’ is just a synonym for ‘default’ and has nothing to do with DEFLATE. The third step is to develop per-variable exceptions to the default quantization of the previous step. This can be a process of trial-and-error, or semi-automated through techniques such as determining an acceptable information content threshold for each variable (6). The per-variable arguments to ‘--qnt’ can take many forms: ncks --qnt p,w,z=5 --qnt q,RH=4 --qnt T,u,v=3 # Multiple options ncks --qnt p,w,z=5#q,RH=4#T,u,v=3 ... # Combined option ncks --qnt Q.?=5#FS.?,FL.?=4#RH=.3 ... # Regular expressions ncks --qnt_alg=btr --qnt p,w,z=15#q,RH=12#T,u,v=9 ... # BitRound (NSB) ncks --qnt_alg=btr --qnt CO2=15#AER.?=12#U,V=6 ... # Klower et al. No compression is necessary in this step, which presumably involves evaluating the adequacy of the quantized values at matching observations or at meeting other error metrics. The fourth and final step combines the lossless and lossy algorithms to produce the final workflow: ncks -7 --qnt dfl=3 --cmp='shf|zst' ... # A useful starting point? ncks -7 --qnt default=3#Q.?=5#FS.?,FL.?=4 --cmp='shf|zst' ... ncks -7 --qnt_alg=gbr --qnt default=3#Q.?=5#FS.?,FL.?=4 --cmp='shf|zst' ... ncks -7 --qnt_alg=btr --qnt default=9#Q.?=15#FS.?,FL.?=12 --cmp='shf|zst' ... The example above uses Zstandard (*note Figure 3.3: fgr:qnt_cr_zst.) because it is significant faster than other codecs with comparable CRs, e.g., Bzip2. [image src="fgr/qnt_cr_zst.png" text="Quantization and compression by Zstandard"] Figure 3.3: Quantization and then compression by Zstandard, including the contribution of Shuffle. Older Compression API --------------------- NCO implements or accesses four different compression algorithms, the standard lossless DEFLATE algorithm and three lossy compression algorithms. All four algorithms reduce the on-disk size of a dataset while sacrificing no (lossless) or a tolerable amount (lossy) of precision. First, NCO can access the lossless DEFLATE algorithm, a combination of Lempel-Ziv encoding and Huffman coding, algorithm on any netCDF4 dataset (*note Deflation::). Because it is lossless, this algorithm re-inflates deflated data to their full original precision. This algorithm is accessed via the HDF5 library layer (which itself calls the ‘zlib’ library also used by ‘gzip’), and is unavailable with netCDF3. * Menu: * Linear Packing:: * Precision-Preserving Compression:: ---------- Footnotes ---------- (1) R. Kouznetsov contributed the masking techinques used in BitRound, BitGroomRound, and HalfShave. Thanks Rostislav! (2) E. Hartnett of NOAA EMC is co-founder and co-maintainer of the CCR. Thanks Ed! (3) D. Heimbigner (Unidata) helped implement all these features into netCDF. Thanks Dennis! (4) Full disclosure: Documentation of the meaning of the Shuffle parameter is scarce. I think though am not certain that the Shuffle parameter refers to the number of contiguous byte-groups that the algorithm rearranges a chunk of data into. I call this the stride. Thus the default stride of 4 means that Shuffle rearranges a chunk of 4-byte integers into four consecutive sequences, the first comprises all the leading bytes, the second comprises all the second bytes, etc. A well-behaved stride should evenly divide the number of bytes in a data chunk. (5) Quantization may never be implemented in netCDF for any CLASSIC or other netCDF3 formats since there is no compression advantage to doing so. Use the NCO implementation to quantize to netCDF3 output formats. (6) See, e.g., the procedure described in "Compressing atmospheric data into its real information content" by M.~Klower et al., available at . File: nco.info, Node: Linear Packing, Next: Precision-Preserving Compression, Prev: Compression, Up: Compression 3.34.1 Linear Packing --------------------- The three lossy compression algorithms are Linear Packing (*note Packed data::), and two precision-preserving algorithms. Linear packing quantizes data of a higher precision type into a lower precision type (often ‘NC_SHORT’) that thus stores a fewer (though constant) number of bytes per value. Linearly packed data unpacks into a (much) smaller dynamic range than the floating-point data can represent. The type-conversion and reduced dynamic range of the data allows packing to eliminate bits typically used to store an exponent, thus improving its packing efficiency. Packed data also can also be deflated for additional space savings. A limitation of linear packing is that unpacking data stored as integers into the linear range defined by ‘scale_factor’ and ‘add_offset’ rapidly loses precision outside of a narrow range of floating-point values. Variables packed as ‘NC_SHORT’, for example, can represent only about 64000 discrete values in the range -32768*scale_factor+add_offset to 32767*scale_factor+add_offset. The precision of packed data equals the value of ‘scale_factor’, and ‘scale_factor’ is usually chosen to span the range of valid data, not to represent the intrinsic precision of the variable. In other words, the precision of packed data cannot be specified in advance because it depends on the range of values to quantize. File: nco.info, Node: Precision-Preserving Compression, Prev: Linear Packing, Up: Compression 3.34.2 Precision-Preserving Compression --------------------------------------- NCO implemented the final two lossy compression algorithms in version 4.4.8 (February, 2015). These are both “Precision-Preserving Compression” (PPC) algorithms and since standard terminology for precision is remarkably imprecise, so is our nomenclature. The operational definition of "significant digit" in our precision preserving algorithms is that the exact value, before rounding or quantization, is within one-half the value of the decimal place occupied by the “Least Significant Digit” (LSD) of the rounded value. For example, the value pi = 3.14 correctly represents the exact mathematical constant PI to three significant digits because the LSD of the rounded value (i.e., 4) is in the one-hundredths digit place, and the difference between the exact value and the rounded value is less than one-half of one one-hundredth, i.e., (3.14159265358979323844 - 3.14 = 0.00159 < 0.005). One PPC algorithm preserves the specified total “Number of Signifcant Digits” (NSD) of the value. For example there is only one significant digit in the weight of most "eight-hundred pound gorillas" that you will encounter, i.e., so NSD=1. This is the most straightforward measure of precision, and thus NSD is the default PPC algorithm. The other PPC algorithm preserves the number of “Decimal Significant Digits” (DSD), i.e., the number of significant digits following (positive, by convention) or preceding (negative) the decimal point. For example, ‘0.008’ and ‘800’ have, respectively, three and negative two digits digits following the decimal point, corresponding to DSD=3 and DSD=-2. The only justifiable NSD for a given value depends on intrinsic accuracy and error characteristics of the model or measurements, and not on the units with which the value is stored. The appropriate DSD for a given value depends on these intrinsic characteristics and, in addition, the units of storage. This is the fundamental difference between the NSD and DSD approaches. The eight-hundred pound gorilla always has NSD=1 regardless of whether the value is stored in pounds or in some other unit. DSD corresponding to this weight is DSD=-2 if the value is stored in pounds, DSD=4 if stored in megapounds. Users may wish to express the precision to be preserved as either NSD or DSD. Invoke PPC with the long option ‘--ppc var=prc’, or give the same arguments to the synonyms ‘--precision_preserving_compression’, ‘--qnt’, or ‘--quantize’. Here VAR is the variable to quantize, and PRC is its precision. The option ‘--qnt’ (and its long option equivalents such as ‘--ppc’ and ‘--quantize’) indicates the argument syntax will be KEY=VAL. As such, ‘--qnt’ and its synonyms are indicator options that accept arguments supplied one-by-one like ‘--qnt KEY1=VAL1 --qnt KEY2=VAL2’, or aggregated together in multi-argument format like ‘--qnt KEY1=VAL1#KEY2=VAL2’ (*note Multi-arguments::). The default algorithm assumes PRC specifies NSD precision, e.g., ‘T=2’ means NSD=2. Prepend PRC with a decimal point to specify DSD precision, e.g., ‘T=.2’ means DSD=2. NSD precision must be specified as a positive integer. DSD precision may be a positive or negative integer; and is specified as the negative base 10 logarithm of the desired precision, in accord with common usage. For example, specifying ‘T=.3’ or ‘T=.-2’ tells the DSD algorithm to store only enough bits to preserve the value of T rounded to the nearest thousandth or hundred, respectively. Setting VAR to ‘default’ has the special meaning of applying the associated NSD or DSD algorithm to all floating point variables except coordinate variables. Variables _not affected_ by ‘default’ include integer and non-numeric atomic types, coordinates, and variables mentioned in the ‘bounds’, ‘climatology’, or ‘coordinates’ attribute of any variable. NCO applies PPC to coordinate variables only if those variables are explicitly specified (i.e., not with the ‘default=PRC’ mechanism. NCO applies PPC to integer-type variables only if those variables are explicitly specified (i.e., not with the ‘default=PRC’, and only if the DSD algorithm is invoked with a negative PRC. To prevent PPC from applying to certain non-coordinate variables (e.g., ‘gridcell_area’ or ‘gaussian_weight’), explicitly specify a precision exceeding 7 (for ‘NC_FLOAT’) or 15 (for ‘NC_DOUBLE’) for those variables. Since these are the maximum representable precisions in decimal digits, NCO _turns-off_ PPC (i.e., does nothing) when more precision is requested. The time-penalty for compressing and uncompressing data varies according to the algorithm. The Number of Significant Digit (NSD) algorithm quantizes by bitmasking, and employs no floating-point math. The Decimal Significant Digit (DSD) algorithm quantizes by rounding, which does require floating-point math. Hence NSD is likely faster than DSD, though the difference has not been measured. NSD creates a bitmask to alter the “significand” of IEEE 754 floating-point data. The bitmask is one for all bits to be retained and zero or one for all bits to be ignored. The algorithm assumes that the number of binary digits (i.e., bits) necessary to represent a single base-10 digit is ln(10)/ln(2) = 3.32. The exact numbers of bits NBIT retained for single and double precision values are ceil(3.32*NSD)+1 and ceil(3.32*NSD)+2, respectively. Once these reach 23 and 53, respectively, bitmasking is completely ineffective. This occurs at NSD=6.3 and 15.4, respectively. The DSD algorithm, by contrast, uses rounding to remove undesired precision. The rounding (1) zeroes the greatest number of significand bits consistent with the desired precision. To demonstrate the change in IEEE representation caused by PPC rounding algorithms, consider again the case of PI, represented as an ‘NC_FLOAT’. The IEEE 754 single precision representations of the exact value (3.141592...), the value with only three significant digits treated as exact (3.140000...), and the value as stored (3.140625) after PPC-rounding with either the NSD (PRC=3) or DSD (PRC=2) algorithm are, respectively, S Exponent Fraction (Significand) Decimal Notes 0 100000001 0010010000111111011011 # 3.14159265 Exact 0 100000001 0010001111010111000011 # 3.14000000 0 100000001 0010010000000000000000 # 3.14062500 NSD = 3 0 100000001 0010010000000000000000 # 3.14062500 DSD = 2 The string of trailing zero-bits in the rounded values facilitates byte-stream compression. Note that the NSD and DSD algorithms do not always produce results that are bit-for-bit identical, although they do in this particular case. Reducing the preserved precision of NSD-rounding produces increasingly long strings of identical-bits amenable to compression: S Exponent Fraction (Significand) Decimal Notes 0 100000001 0010010000111111011011 # 3.14159265 Exact 0 100000001 0010010000111111011011 # 3.14159265 NSD = 8 0 100000001 0010010000111111011010 # 3.14159262 NSD = 7 0 100000001 0010010000111111011000 # 3.14159203 NSD = 6 0 100000001 0010010000111111000000 # 3.14158630 NSD = 5 0 100000001 0010010000111100000000 # 3.14154053 NSD = 4 0 100000001 0010010000000000000000 # 3.14062500 NSD = 3 0 100000001 0010010000000000000000 # 3.14062500 NSD = 2 0 100000001 0010000000000000000000 # 3.12500000 NSD = 1 The consumption of about 3 bits per digit of base-10 precision is evident, as is the coincidence of a quantized value that greatly exceeds the mandated precision for NSD = 2. Although the NSD algorithm generally masks some bits for all NSD <= 7 (for ‘NC_FLOAT’), compression algorithms like DEFLATE may need byte-size-or-greater (i.e., at least eight-bit) bit patterns before their algorithms can take advantage of of encoding such patterns for compression. Do not expect significantly enhanced compression from NSD > 5 (for ‘NC_FLOAT’) or NSD > 14 (for ‘NC_DOUBLE’). Clearly values stored as ‘NC_DOUBLE’ (i.e., eight-bytes) are susceptible to much greater compression than ‘NC_FLOAT’ for a given precision because their significands explicitly contain 53 bits rather than 23 bits. Maintaining non-biased statistical properties during lossy compression requires special attention. The DSD algorithm uses ‘rint()’, which rounds toward the nearest even integer. Thus DSD has no systematic bias. However, the NSD algorithm uses a bitmask technique susceptible to statistical bias. Zeroing all non-significant bits is guaranteed to produce numbers quantized to the specified tolerance, i.e., half of the decimal value of the position occupied by the LSD. However, always zeroing the non-significant bits results in quantized numbers that never exceed the exact number. This would produce a negative bias in statistical quantities (e.g., the average) subsequently derived from the quantized numbers. To avoid this bias, our NSD implementation rounds non-significant bits down (to zero) or up (to one) in an alternating fashion when processing array data. In general, the first element is rounded down, the second up, and so on. This results in a mean bias quite close to zero. The only exception is that the floating-point value of zero is never quantized upwards. For simplicity, NSD always rounds scalars downwards. Although NSD or DSD are different algorithms under the hood, they both replace the (unwanted) least siginificant bits of the IEEE significand with a string of consecutive zeroes. Byte-stream compression techniques, such as the ‘gzip’ DEFLATE algorithm compression available in HDF5, always compress zero-strings more efficiently than random digits. The net result is netCDF files that utilize compression can be significantly reduced in size. This feature only works when the data are compressed, either internally (by netCDF) or externally (by another user-supplied mechanism). It is most straightfoward to compress data internally using the built-in compression and decompression supported by netCDF4. For convenience, NCO automatically activates file-wide Lempel-Ziv deflation (*note Deflation::) level one (i.e., ‘-L 1’) when PPC is invoked on any variable in a netCDF4 output file. This makes PPC easier to use effectively, since the user need not explicitly specify deflation. Any explicitly specified deflation (including no deflation, ‘-L 0’) will override the PPC deflation default. If the output file is a netCDF3 format, NCO will emit a message suggesting internal netCDF4 or external netCDF3 compression. netCDF3 files compressed by an external utility such as ‘gzip’ accrue approximately the same benefits (shrinkage) as netCDF4, although with netCDF3 the user or provider must uncompress (e.g., ‘gunzip’) the file before accessing the data. There is no benefit to rounding numbers and storing them in netCDF3 files unless such custom compression/decompression is employed. Without that, one may as well maintain the undesired precision. The user accesses PPC through a single switch, ‘--ppc’, repeated as many times as necessary. To apply the NSD algorithm to variable U use, e.g., ncks -7 --qnt u=2 in.nc out.nc The output file will preserve only two significant digits of U. The options ‘-4’ or ‘-7’ ensure a netCDF4-format output (regardless of the input file format) to support internal compression. It is recommended though not required to write netCDF4 files after PPC. For clarity the ‘-4/-7’ switches are omitted in subsequent examples. NCO attaches attributes that indicate the algorithm used and degree of precision retained for each variable affected by PPC. The NSD and DSD algorithms store the attributes ‘QuantizeBitGroomNumberOfSignificantDigits’ (2) and ‘least_significant_digit’ (3), respectively. As of version 5.0.3 (October 2021), NCO supports a more complete set of Precision-Preserving Quantization (PPQ) filters than was previously documented here. The default algorithm has been changed from BitGroom with BitRound masks from R. Kouznetsov (2021), to what we call Granular BitRound (GBR). GBR combines features of BitGroom, BitRound, and DigitRound by Delaunay et al. (2019). GBR improves compression ratios by ~20% relative to BitGroom for NSD=3 on our benchmark 1 GB climate model output dataset. Since it quantizes a few more bits than BitGroom (and BitGroomRound) for a given NSD, GBR produces significantly larger quantization errors than those algorithms as well. These NSD algorithms write an algorithm-specific attribute, e.g., ‘QuantizeGranularBitRoundNumberOfSignificantDigits’ or ‘QuantizeDigitRoundNumberOfSignificantDigits’. Documentation on these algorithms is best found in the literature. While Bit-Groom/Shave/Set are described above, documentation on Bit-Adjustment-Algorithms (BAA) 3-8 will be improved in the future. As of version 5.0.5 (January 2022), NCO supports two quantization codecs (BitRound and HalfShave) that expect a user-specified number of explicit significant bits (NSB, or "keepbits") to retain (4). The NSB argument contrasts with the number of significant digits (NSD) parameter expected by the other codecs (like BitGroom, DigitRound, and GranularBR). The valid ranges of NSD are 1-7 for NC_FLOAT and 1-15 for NC_DOUBLE. The valid ranges of NSB are 1-23 for NC_FLOAT and 1-52 for NC_DOUBLE. The upper limits of NSB are the number of explicitly represented bits in the IEEE single- and double-precision formats (the implicit bit does not count). It is safe to attempt PPC on input that has already been rounded. Variables can be made rounder, not sharper, i.e., variables cannot be "un-rounded". Thus PPC attempted on an input variable with an existing PPC attribute proceeds only if the new rounding level exceeds the old, otherwise no new rounding occurs (i.e., a "no-op"), and the original PPC attribute is retained rather than replaced with the newer value of PRC. To request, say, five significant digits (NSD=5) for all fields, except, say, wind speeds which are only known to integer values (DSD=0) in the supplied units, requires ‘--ppc’ twice: ncks -4 --qnt default=5 --qnt u,v=.0 in.nc out.nc To preserve five digits in all variables except coordinate variables and U and V, first specify a default value with the ‘default’ keyword (or synonym keywords ‘dfl’, ‘global’, or ‘glb’), and separately specify the exceptions with per-variable key-value pairs: ncks --qnt default=5 --qnt u,v=20 in.nc out.nc The ‘--qnt’ option may be specified any number of times to support varying precision types and levels, and each option may aggregate all the variables with the same precision ncks --qnt p,w,z=5 --qnt q,RH=4 --qnt T,u,v=3 in.nc out.nc ncks --qnt p,w,z=5#q,RH=4#T,u,v=3 in.nc out.nc # Multi-argument format Any VAR argument may be a regular expression. This simplifies generating lists of related variables: ncks --qnt Q.?=5 --qnt FS.?,FL.?=4 --qnt RH=.3 in.nc out.nc ncks --qnt Q.?=5#FS.?,FL.?=4#RH=.3 in.nc out.nc # Multi-argument format Although PPC-rounding instantly reduces data precision, on-disk storage reduction only occurs once the data are compressed. How can one be sure the lossy data are sufficiently precise? PPC preserves all significant digits of every value. The DSD algorithm uses floating-point math to round each value optimally so that it has the maximum number of zeroed bits that preserve the specified precision. The NSD algorithm uses a theoretical approach (3.2 bits per base-10 digit), tuned and tested to ensure the _worst_ case quantization error is less than half the value of the minimum increment in the least significant digit. _Note for Info users_: The definition of error metrics relies heavily on mathematical expressions which cannot be easily represented in Info. _See the printed manual (./nco.pdf) for much more detailed and complete documentation of this subject._ All three metrics are expressed in terms of the fraction of the ten's place occupied by the LSD. If the LSD is the hundreds digit or the thousandths digit, then the metrics are fractions of 100, or of 1/100, respectively. PPC algorithms should produce maximum absolute errors no greater than 0.5 in these units. If the LSD is the hundreds digit, then quantized versions of true values will be within fifty of the true value. It is much easier to satisfy this tolerance for a true value of 100 (only 50% accuracy required) than for 999 (5% accuracy required). Thus the minimum accuracy guaranteed for NSD=1 ranges from 5-50%. For this reason, the best and worst cast performance usually occurs for true values whose LSD value is close to one and nine, respectively. Of course most users prefer PRC > 1 because accuracies increase exponentially with PRC. Continuing the previous example to PRC=2, quantized versions of true values from 1000-9999 will also be within 50 of the true value, i.e., have accuracies from 0.5-5%. In other words, only two significant digits are necessary to guarantee better than 5% accuracy in quantization. We recommend that dataset producers and users consider quantizing datasets with NSD=3. This guarantees accuracy of 0.05-0.5% for individual values. Statistics computed from ensembles of quantized values will, assuming the mean error EMEAN is small, have much better accuracy than 0.5%. This accuracy is the most that can be justified for many applications. To demonstrate these principles we conduct error analyses on an artificial, reproducible dataset, and on an actual dataset of observational analysis values. (5) The table summarizes quantization accuracy based on the three metrics. ‘NSD’ Number of Significant Digits. ‘Emabs’ Maximum absolute error. ‘Emebs’ Mean absolute error. ‘Emean’ Mean error. Artificial Data: N=1000000 values in [1.0,2.0) in steps of 1.0e-6 Single-Precision Double-Precision Single-Precision NSD Emabs Emebs Emean Emabs Emebs Emean DSD Emabs Emebs Emean 1 0.31 0.11 4.1e-4 0.31 0.11 4.0e-4 1 0.30 0.11 -8.1e-4 2 0.39 0.14 6.8e-5 0.39 0.14 5.5e-5 2 0.39 0.14 -1.3e-4 3 0.49 0.17 1.0e-6 0.49 0.17 -5.5e-7 3 0.49 0.17 -2.0e-5 4 0.30 0.11 3.2e-7 0.30 0.11 -6.1e-6 4 0.30 0.11 5.1e-8 5 0.37 0.13 3.1e-7 0.38 0.13 -5.6e-6 5 0.38 0.13 2.6e-6 6 0.36 0.12 -4.4e-7 0.48 0.17 -4.1e-7 6 0.48 0.17 7.2e-6 7 0.00 0.00 0.0 0.30 0.10 1.5e-7 7 0.00 0.00 0.0 Observational Analysis: N=13934592 values MERRA Temperature 20130601 Single-Precision NSD Emabs Emebs Emean 1 0.31 0.11 2.4e-3 2 0.39 0.14 3.8e-4 3 0.49 0.17 -9.6e-5 4 0.30 0.11 2.3e-3 5 0.37 0.13 2.2e-3 6 0.36 0.13 1.7e-2 7 0.00 0.00 0.0 All results show that PPC quantization performs as expected. Absolute maximum errors EMABS < 0.5 for all PRC. For 1 <= PRC <= 6, quantization results in comparable maximum absolute and mean absolute errors EMABS and EMEBS, respectively. Mean errors EMEAN are orders of magnitude smaller because quantization produces over- and under-estimated values in balance. When PRC=7, quantization of single-precision values is ineffective, because all available bits are used to represent the maximum precision of seven digits. The maximum and mean absolute errors EMABS and EMEBS are nearly identical across algorithms, precisions, and dataset types. This is consistent with both the artificial data and empirical data being random, and thus exercising equally strengths and weaknesses of the algorithms over the course of millions of input values. We generated artificial arrays with many different starting values and interval spacing and all gave qualitatively similar results. The results presented are the worst obtained. The artificial data has much smaller mean error EMEAN than the observational analysis. The reason why is unclear. It may be because the temperature field is concentrated in particular ranges of values (and associated quantization errors) prevalent on Earth, e.g., 200 < T < 320. It is worth noting that the mean error EMEAN < 0.01 for 1 <= PRC < 6, and that EMEAN is typically at least two or more orders of magnitude less than EMABS. Thus quantized values with precisions as low as PRC=1 still yield highly significant statistics by contemporary scientific standards. Testing shows that PPC quantization enhances compression of typical climate datasets. The degree of enhancement depends, of course, on the required precision. Model results are often computed as ‘NC_DOUBLE’ then archived as ‘NC_FLOAT’ to save space. This table summarizes the performance of lossless and lossy compression on two typical, or at least random, netCDF data files. The files were taken from representative model-simulated and satellite-retrieved datasets. Only floating-point data were compressed. No attempt was made to compress integer-type variables as they occupy an insignificant fraction of every dataset. The columns are ‘Type’ File-type: ‘N3’ for netCDF ‘CLASSIC’, ‘N4’ for ‘NETCDF4’, ‘N7’ for ‘NETCDF4_CLASSIC’ (which comprises netCDF3 data types and structures with netCDF4 storage features like compression), ‘H4’ for HDF4, and ‘H5’ for HDF5. ‘N4/7’ means results apply to both ‘N4’ and ‘N7’ filetypes. ‘LLC’ Type of lossless compression employed, if any. Bare numbers refer to the strength of the DEFLATE algorithm employed internally by netCDF4/HDF5, while numbers prefixed with ‘B’ refer to the block size employed by the Burrows-Wheeler algorithm in ‘bzip2’. ‘PPC’ Number of significant digits retained by the precision-preserving compression NSD algorithm. ‘Pck’ ‘Y’ if the default ‘ncpdq’ packing algorithm (convert floating-point types to ‘NC_SHORT’) was employed. ‘Size’ Resulting filesize in MB. ‘%’ Compression ratio, i.e., resulting filesize relative to original size, in percent. In some cases the original files is already losslessly compressed. The compression ratios reported are relative to the size of the original file as distributed, not as optimally losslessly compressed. A dash (‘-’) indicates the associated compression feature was not employed. # dstmch90_clm.nc Type LLC PPC Pck Size % Flags and Notes N3 - - - 34.7 100.0 Original is not compressed N3 B1 - - 28.9 83.2 bzip2 -1 N3 B9 - - 29.3 84.4 bzip2 -9 N7 - - - 35.0 101.0 N7 1 - - 28.2 81.3 -L 1 N7 9 - - 28.0 80.8 -L 9 N7 - - Y 17.6 50.9 ncpdq -L 0 N7 1 - Y 7.9 22.8 ncpdq -L 1 N7 1 7 - 28.2 81.3 --ppc default=7 N7 1 6 - 27.9 80.6 --ppc default=6 N7 1 5 - 25.9 74.6 --ppc default=5 N7 1 4 - 22.3 64.3 --ppc default=4 N7 1 3 - 18.9 54.6 --ppc default=3 N7 1 2 - 14.5 43.2 --ppc default=2 N7 1 1 - 10.0 29.0 --ppc default=1 # b1850c5cn_doe_polar_merged_0_cesm1_2_0_HD+MAM4+tun2b.hp.e003.cam.h0.0001-01.nc Type LLC PPC Pck Size % Flags and Notes N3 - - - 119.8 100.0 Original is not compressed N3 B1 - - 84.2 70.3 bzip2 -1 N3 B9 - - 84.8 70.9 bzip2 -9 N7 - - - 120.5 100.7 N7 1 - - 82.6 69.0 -L 1 N7 9 - - 82.1 68.6 -L 9 N7 - - Y 60.7 50.7 ncpdq -L 0 N7 1 - Y 26.0 21.8 ncpdq -L 1 N7 1 7 - 82.6 69.0 --ppc default=7 N7 1 6 - 81.9 68.4 --ppc default=6 N7 1 5 - 77.2 64.5 --ppc default=5 N7 1 4 - 69.0 57.6 --ppc default=4 N7 1 3 - 59.3 49.5 --ppc default=3 N7 1 2 - 49.5 41.3 --ppc default=2 N7 1 1 - 38.2 31.9 --ppc default=1 # MERRA300.prod.assim.inst3_3d_asm_Cp.20130601.hdf Type LLC PPC Pck Size % Flags and Notes H4 5 - - 244.3 100.0 Original is compressed H4 B1 - - 244.7 100.1 bzip2 -1 N4 5 - - 214.5 87.8 N7 5 - - 210.6 86.2 N4 B1 - - 215.4 88.2 bzip2 -1 N4 B9 - - 214.8 87.9 bzip2 -9 N3 - - - 617.1 252.6 N4/7 - - - 694.0 284.0 -L 0 N4/7 1 - - 223.2 91.3 -L 1 N4/7 9 - - 207.3 84.9 -L 9 N4/7 - - Y 347.1 142.1 ncpdq -L 0 N4/7 1 - Y 133.6 54.7 ncpdq -L 1 N4/7 1 7 - 223.1 91.3 --ppc default=7 N4/7 1 6 - 225.1 92.1 --ppc default=6 N4/7 1 5 - 221.4 90.6 --ppc default=5 N4/7 1 4 - 201.4 82.4 --ppc default=4 N4/7 1 3 - 185.3 75.9 --ppc default=3 N4/7 1 2 - 150.0 61.4 --ppc default=2 N4/7 1 1 - 100.8 41.3 --ppc default=1 # OMI-Aura_L2-OMIAuraSO2_2012m1222-o44888_v01-00-2014m0107t114720.h5 Type LLC PPC Pck Size % Flags and Notes H5 5 - - 29.5 100.0 Original is compressed H5 B1 - - 29.3 99.6 bzip2 -1 N4 5 - - 29.5 100.0 N4 B1 - - 29.3 99.6 bzip2 -1 N4 B9 - - 29.3 99.4 bzip2 -9 N4 - - - 50.7 172.3 -L 0 N4 1 - - 29.8 101.3 -L 1 N4 9 - - 29.4 99.8 -L 9 N4 - - Y 27.7 94.0 ncpdq -L 0 N4 1 - Y 12.9 43.9 ncpdq -L 1 N4 1 7 - 29.7 100.7 --ppc default=7 N4 1 6 - 29.7 100.8 --ppc default=6 N4 1 5 - 27.3 92.8 --ppc default=5 N4 1 4 - 23.8 80.7 --ppc default=4 N4 1 3 - 20.3 69.0 --ppc default=3 N4 1 2 - 15.1 51.2 --ppc default=2 N4 1 1 - 9.9 33.6 --ppc default=1 A selective, per-variable approach to PPC yields the best balance of precision and compression yet requires the dataset producer to understand the intrinsic precision of each variable. Such a specification for a GCM dataset might look like this (using names for the NCAR CAM model): # Be conservative on non-explicit quantities, so default=5 # Some quantities deserve four significant digits # Many quantities, such as aerosol optical depths and burdens, are # highly uncertain and only useful to three significant digits. ncks -7 -O \ --qnt default=5 \ --qnt AN.?,AQ.?=4 \ --qnt AER.?,AOD.?,ARE.?,AW.?,BURDEN.?=3 \ ncar_cam.nc ~/foo.nc ---------- Footnotes ---------- (1) Rounding is performed by the internal math library ‘rint()’ family of functions that were standardized in C99. The exact alorithm employed is VAL := rint(SCALE*VAL)/SCALE where SCALE is the nearest power of 2 that exceeds 10**PRC, and the inverse of SCALE is used when PRC < 0. For QNT = 3 or QNT = -2, for example, we have SCALE = 1024 and SCALE = 1/128. (2) Prior to NCO version 5.0.3 (October, 2021), NCO stored the NSD attribute ‘number_of_significant_digits’. However, this was deemed too ambiguous, given the increasing number of supported quantization methods. The new attribute names better disambiguate which algorithm was used to quantize the data. They also harmonize better with the metadata produced by the upcoming netCDF library quantization features. (3) A suggestion by Rich Signell and the ‘nc3tonc4’ tool by Jeff Whitaker inspired NCO to implement PPC. Note that NCO implements a different DSD algorithm than ‘nc3tonc4’, and produces slightly different (not bit-for-bit) though self-consistent and equivalent results. ‘nc3tonc4’ records the precision of its DSD algorithm in the attribute ‘least_significant_digit’ and NCO does the same for consistency. The Unidata blog here (http://www.unidata.ucar.edu/blogs/developer/en/entry/compression_by_bit_shaving) also shows how to compress IEEE floating-point data by zeroing insignificant bits. The author, John Caron, writes that the technique has been called "bit-shaving". We call the algorithm of always rounding-up "bit-setting". And we named the algorithm produced by alternately rounding up and down (with a few other bells and whistles) "bit-grooming". Imagine orthogonally raking an already-groomed Japanese rock garden. The criss-crossing tracks increase the pattern's entropy, and this entropy produces self-compensating instead of accumulating errors during statistical operations. (4) The terminology of significant bits (not to mention digits) can be confusing. The IEEE standard devotes 24 and 53 bits, respectively, to the mantissas that determine the precision of single and double precision floating-point numbers. However, the first (i.e., the most significant) of those bits is _implicit_, and is not explicitly stored. Its value is one unless all of the exponent bits are zero. The implicit bit is _significant_ thought it is not explicitly stored and so cannot be quantized. Therefore single and double precision floats have only 23 and 52 explicitly stored bits, respectively, that can be "kept" and therefore quantized. Each explicit bit kept is as significant as the implicit bit. Thus the number of "keepbits" is one less than the number of significant bits, i.e., the bits that contribute to the precision of an IEEE value. The BitRound quantization algorithm in NCO and in netCDF accept as an input parameter the number of keepbits, i.e., the number of explicit significant bits NESB to retain (i.e., not mask to zero). Unfortunately the acronym NSB has been used instead of the more accurate acronym NESB, and at this point it is difficult to change. Therefore the NSB acronym and parameter as used by NCO and netCDF should be interpreted as "number of stored bits" (i.e., keepbits) not the "number of significant bits". (5) The artificial dataset employed is one million evenly spaced values from 1.0-2.0. The analysis data are N=13934592 values of the temperature field from the NASA MERRA analysis of 20130601. File: nco.info, Node: Deflation, Next: MD5 digests, Prev: Compression, Up: Shared features 3.35 Deflation ============== Availability: ‘ncap2’, ‘ncbo’, ‘ncclimo’, ‘nces’, ‘ncecat’, ‘ncflint’, ‘ncks’, ‘ncpdq’, ‘ncra’, ‘ncrcat’, ‘ncremap’, ‘ncwa’ Short options: ‘-L’ Long options: ‘--dfl_lvl’, ‘--deflate’ All NCO operators that define variables support the netCDF4 feature of storing variables compressed with the lossless DEFLATE compression algorithm. DEFLATE combines the Lempel-Ziv encoding with Huffman coding. The specific version used by netCDF4/HDF5 is that implemented in the ‘zlib’ library used by ‘gzip’. Activate deflation with the ‘-L DFL_LVL’ short option (or with the same argument to the ‘--dfl_lvl’ or ‘--deflate’ long options). Specify the deflation level DFL_LVL on a scale from no deflation (DFL_LVL = 0) to maximum deflation (DFL_LVL = 9). Under the hood, this selects the compression blocksize. Minimal deflation (DFL_LVL = 1) achieves considerable storage compression with little time penalty. Higher deflation levels require more time for compression. File sizes resulting from minimal (DFL_LVL = 1) and maximal (DFL_LVL = 9) deflation levels typically differ by less than 10% in size. To compress an entire file using deflation, use ncks -4 -L 0 in.nc out.nc # No deflation (fast, no time penalty) ncks -4 -L 1 in.nc out.nc # Minimal deflation (little time penalty) ncks -4 -L 9 in.nc out.nc # Maximal deflation (much slower) Unscientific testing shows that deflation compresses typical climate datasets by 30-60%. Packing, a lossy compression technique available for all netCDF files (see *note Packed data::), can easily compress files by 50%. Packed data may be deflated to squeeze datasets by about 80%: ncks -4 -L 1 in.nc out.nc # Minimal deflation (~30-60% compression) ncks -4 -L 9 in.nc out.nc # Maximal deflation (~31-63% compression) ncpdq in.nc out.nc # Standard packing (~50% compression) ncpdq -4 -L 9 in.nc out.nc # Deflated packing (~80% compression) ‘ncks’ prints deflation parameters, if any, to screen (*note ncks netCDF Kitchen Sink::). File: nco.info, Node: MD5 digests, Next: Buffer sizes, Prev: Deflation, Up: Shared features 3.36 MD5 digests ================ Availability: ‘ncecat’, ‘ncks’, ‘ncrcat’ Short options: Long options: ‘--md5_dgs’, ‘--md5_digest’, ‘--md5_wrt_att’, ‘--md5_write_attribute’ As of NCO version 4.1.0 (April, 2012), NCO supports data integrity verification using the MD5 digest algorithm. This support is currently implemented in ‘ncks’ and in the multi-file concatenators ‘ncecat’ and ‘ncrcat’. Activate it with the ‘--md5_dgs’ or ‘--md5_digest’ long options. As of NCO version 4.3.3 (July, 2013), NCO will write the MD5 digest of each variable as an ‘NC_CHAR’ attribute named ‘MD5’. This support is currently implemented in ‘ncks’ and in the multi-file concatenators ‘ncecat’ and ‘ncrcat’. Activate it with the ‘--md5_wrt_att’ or ‘--md5_write_attribute’ long options. The behavior and verbosity of the MD5 digest is operator-dependent. MD5 digests may be activated in both ‘ncks’ invocation types, the one-filename argument mode for printing sub-setted and hyperslabbed data, and the two-filename argument mode for copying that data to a new file. Both modes will incur minor overhead from performing the hash algorithm for each variable read, and each variable written will have an additional attribute named ‘MD5’. When activating MD5 digests with ‘ncks’ it is assumed that the user wishes to print the digest of every variable when the debugging level exceeds one. ‘ncks’ displays an MD5 digest as a 32-character hexadecimal string in which each two characters represent one byte of the 16-byte digest: > ncks --trd -D 2 -C --md5 -v md5_a,md5_abc ~/nco/data/in.nc ... ncks: INFO MD5(md5_a) = 0cc175b9c0f1b6a831c399e269772661 md5_a = 'a' ncks: INFO MD5(md5_abc) = 900150983cd24fb0d6963f7d28e17f72 lev[0]=100 md5_abc[0--2]='abc' > ncks --trd -D 2 -C -d lev,0 --md5 -v md5_a,md5_abc ~/nco/data/in.nc ... ncks: INFO MD5(md5_a) = 0cc175b9c0f1b6a831c399e269772661 md5_a = 'a' ncks: INFO MD5(md5_abc) = 0cc175b9c0f1b6a831c399e269772661 lev[0]=100 md5_abc[0--0]='a' In fact these examples demonstrate the validity of the hash algorithm since the MD5 hashes of the strings "a" and "abc" are widely known. The second example shows that the hyperslab of variable ‘md5_abc’ (= "abc") consisting of only its first letter (= "a") has the same hash as the variable ‘md5_a’ ("a"). This illustrates that MD5 digests act only on variable data, not on metadata. When activating MD5 digests with ‘ncecat’ or ‘ncrcat’ it is assumed that the user wishes to verify that every variable written to disk has the same MD5 digest as when it is subsequently read from disk. This incurs the major additional overhead of reading in each variable after it is written and performing the hash algorithm again on that to compare to the original hash. Moreover, it is assumed that such operations are generally done in "production mode" where the user is not interested in actually examining the digests herself. The digests proceed silently unless the debugging level exceeds three: > ncecat -O -D 4 --md5 -p ~/nco/data in.nc in.nc ~/foo.nc | grep MD5 ... ncecat: INFO MD5(wnd_spd) = bec190dd944f2ce2794a7a4abf224b28 ncecat: INFO MD5 digests of RAM and disk contents for wnd_spd agree > ncrcat -O -D 4 --md5 -p ~/nco/data in.nc in.nc ~/foo.nc | grep MD5 ... ncrcat: INFO MD5(wnd_spd) = 74699bb0a72b7f16456badb2c995f1a1 ncrcat: INFO MD5 digests of RAM and disk contents for wnd_spd agree Regardless of the debugging level, an error is returned when the digests of the variable read from the source file and from the output file disagree. These rules may further evolve as NCO pays more attention to data integrity. We welcome feedback and suggestions from users. File: nco.info, Node: Buffer sizes, Next: RAM disks, Prev: MD5 digests, Up: Shared features 3.37 Buffer sizes ================= Availability: All operators Short options: Long options: ‘--bfr_sz_hnt’, ‘--buffer_size_hint’ As of NCO version 4.2.0 (May, 2012), NCO allows the user to request specific buffer sizes to allocate for reading and writing files. This buffer size determines how many system calls the netCDF layer must invoke to read and write files. By default, netCDF uses the preferred I/O block size returned as the ‘st_blksize’ member of the ‘stat’ structure returned by the ‘stat()’ system call (1). Otherwise, netCDF uses twice the system pagesize. Larger sizes can increase access speed by reducing the number of system calls netCDF makes to read/write data from/to disk. Because netCDF cannot guarantee the buffer size request will be met, the actual buffer size granted by the system is printed as an INFO statement. # Request 2 MB file buffer instead of default 8 kB buffer > ncks -O -D 3 --bfr_sz=2097152 ~/nco/data/in.nc ~/foo.nc ... ncks: INFO nc__open() will request file buffer size = 2097152 bytes ncks: INFO nc__open() opened file with buffer size = 2097152 bytes ... ---------- Footnotes ---------- (1) On modern Linux systems the block size defaults to 8192 B. The GLADE filesystem at NCAR has a block size of 512 kB. File: nco.info, Node: RAM disks, Next: Unbuffered I/O, Prev: Buffer sizes, Up: Shared features 3.38 RAM disks ============== Availability: All operators (works with netCDF3 files only) Short options: Long options: ‘--ram_all’, ‘--create_ram’, ‘--open_ram’, ‘--diskless_all’ As of NCO version 4.2.1 (August, 2012), NCO supports the use of diskless files, aka RAM disks, for access and creation of netCDF3 files (these options have no effect on netCDF4 files). Two independent switches, ‘--open_ram’ and ‘--create_ram’, control this feature. Before describing the specifics of these switches, we describe why many NCO operations will not benefit from them. Essentially, reading/writing from/to RAM rather than disk only hastens the task when reads/writes to disk are avoided. Most NCO operations are simple enough that they require a single read-from/write-to disk for every block of input/output. Diskless access does not change this, but it does add an extra read-from/write-to RAM. However this extra RAM write/read does avoid contention for limited system resources like disk-head access. Operators which may benefit from RAM disks include ‘ncwa’, which may need to read weighting variables multiple times, the multi-file operators ‘ncra’, ‘ncrcat’, and ‘ncecat’, which may try to write output at least once per input file, and ‘ncap2’ scripts which may be arbitrarily long and convoluted. The ‘--open_ram’ switch causes input files to copied to RAM when opened. All further metadata and data access occurs in RAM and thus avoids access time delays caused by disk-head movement. Usually input data is read at most once so it is unlikely that requesting input files be stored in RAM will save much time. The likeliest exceptions are files that are accessed numerous times, such as those repeatedly analyzed by ‘ncap2’. Invoking ‘--open_ram’, ‘--ram_all’, or ‘--diskless_all’ uses much more system memory. To copy the input file to RAM increases the sustained memory use by exactly the on-disk filesize of the input file, i.e., MS += FT. For large input files this can be a huge memory burden that starves the rest of the NCO analysis of sufficient RAM. To be safe, use ‘--open_ram’, ‘--ram_all’, or ‘--diskless_all’ only on files that are much (say at least a factor of four) smaller than your available system RAM. See *note Memory Requirements:: for further details. The ‘--create_ram’ switch causes output files to be created in RAM, rather than on disk. These files are copied to disk only when closed, i.e., when the operator completes. Creating files in RAM may save time, especially with ‘ncap2’ computations that are iterative, e.g., loops, and for multi-file operators that write output every record (timestep) or file. RAM files provide many of the same benefits as RAM variables in such cases (*note RAM variables::). Two switches, ‘--ram_all’ and ‘--diskless_all’, are convenient shortcuts for specifying both ‘--create_ram’ and ‘--open_ram’. Thus ncks in.nc out.nc # Default: Open in.nc on disk, write out.nc to disk ncks --open_ram in.nc out.nc # Open in.nc in RAM, write out.nc to disk ncks --create_ram in.nc out.nc # Create out.nc in RAM, write to disk # Open in.nc in RAM, create out.nc in RAM, then write out.nc to disk ncks --open_ram --create_ram in.nc out.nc ncks --ram_all in.nc out.nc # Same as above ncks --diskless_all in.nc out.nc # Same as above It is straightforward to demonstrate the efficacy of RAM disks. For NASA we constructed a test that employs ‘ncecat’ an arbitrary number (set to one hundred thousand) of files that are all symbolically linked to the same file. Everything is on the local filesystem (not DAP). # Create symbolic links for benchmark cd ${DATA}/nco # Do all work here for idx in {1..99999}; do idx_fmt=`printf "%05d" ${idx}` /bin/ln -s ${DATA}/nco/LPRM-AMSR_E_L3_D_SOILM3_V002-20120512T111931Z_20020619.nc \ ${DATA}/nco/${idx_fmt}.nc done # Benchmark time to ncecat one hundred thousand files time ncecat --create_ram -O -u time -v ts -d Latitude,40.0 \ -d Longitude,-105.0 -p ${DATA}/nco -n 99999,5,1 00001.nc ~/foo.nc Run normally on a laptop in 201303, this completes in 21 seconds. The ‘--create_ram’ reduces the elapsed time to 9 seconds. Some of this speed may be due to using symlinks and caching. However, the efficacy of ‘--create_ram’ is clear. Placing the output file in RAM avoids thousands of disk writes. It is not unreasonable to for NCO to process a million files like this in a few minutes. However, there is no substitute for benchmarking with real files. A completely independent way to reduce time spent writing files is to refrain from writing temporary output files. This is accomplished with the ‘--no_tmp_fl’ switch (*note Temporary Output Files::). File: nco.info, Node: Unbuffered I/O, Next: Packed data, Prev: RAM disks, Up: Shared features 3.39 Unbuffered I/O =================== Availability: All operators (works on netCDF3 files only) Short options: Long options: ‘--share_all’, ‘--create_share’, ‘--open_share’, ‘--unbuffered_io’, ‘--uio’ As of NCO version 4.9.4 (July, 2020), NCO supports unbuffered I/O with netCDF3 files when requested with the ‘--unbuffered_io’ flag, or its synonyms ‘--uio’ or ‘--share_all’. (Note that these options work only with netCDF3 files and have no affect on netCDF4 files). These flags turn-off the default I/O buffering mode for both newly created and existing datasets. For finer-grained control, use the ‘--create_share’ switch to request unbuffered I/O only for newly created datasets, and the ‘--open_share’ switch to request unbuffered I/O only for existing datasets. Typically these options only significantly reduce throughput time when large record variables are written or read. Normal I/O buffering copies the data to be read/written into an intermediate buffer in order to avoid numerous small reads/writes. Unbuffered I/O avoids this intermediate step and can therefore execute (sometimes much) faster when read/write lengths are large. File: nco.info, Node: Packed data, Next: Operation Types, Prev: Unbuffered I/O, Up: Shared features 3.40 Packed data ================ Availability: ‘ncap2’, ‘ncbo’, ‘nces’, ‘ncflint’, ‘ncpdq’, ‘ncra’, ‘ncwa’ Short options: None Long options: ‘--hdf_upk’, ‘--hdf_unpack’ The phrase “packed data” refers to data which are stored in the standard netCDF3 lossy linear packing format. See *note ncks netCDF Kitchen Sink:: for a description of deflation, a lossless compression technique available with netCDF4 only. Packed data may be deflated to save additional space. Standard Packing Algorithm -------------------------- “Packing” The standard netCDF linear packing algorithm (described here (http://www.unidata.ucar.edu/software/netcdf/docs/netcdf/Attribute-Conventions.html)) produces packed data with the same dynamic range as the original but which requires no more than half the space to store. NCO will always use this algorithm for packing. Like all packing algorithms, linear packing is _lossy_. Just how lossy depends on the values themselves, especially their range. The packed variable is stored (usually) as type ‘NC_SHORT’ with the two attributes required to unpack the variable, ‘scale_factor’ and ‘add_offset’, stored at the original (unpacked) precision of the variable (1). Let MIN and MAX be the minimum and maximum values of X. SCALE_FACTOR = (MAX-MIN)/NDRV ADD_OFFSET = 0.5*(MIN+MAX) PCK = (UPK-ADD_OFFSET)/SCALE_FACTOR = (UPK-0.5*(MIN+MAX))*NDRV/(MAX-MIN) where NDRV is the number of discrete representable values for given type of packed variable. The theoretical maximum value for NDRV is two raised to the number of bits used to store the packed variable. Thus if the variable is packed into type ‘NC_SHORT’, a two-byte datatype, then there are at most 2^{16} = 65536 distinct values representable. In practice, the number of discretely representible values is taken to be two less than the theoretical maximum. This leaves space for a missing value and solves potential problems with rounding that may occur during the unpacking of the variable. Thus for ‘NC_SHORT’, ndrv = 65536 - 2 = 65534. Less often, the variable may be packed into type ‘NC_CHAR’, where ndrv = 2^{8} - 2 = 256 - 2 = 254, or type ‘NC_INT’ where where ndrv = 2^{32} - 2 = 4294967295 - 2 = 4294967293. One useful feature of the (lossy) netCDF packing algorithm is that lossless packing algorithms perform well on top of it. Standard (Default) Unpacking Algorithm -------------------------------------- “Unpacking” The unpacking algorithm depends on the presence of two attributes, ‘scale_factor’ and ‘add_offset’. If ‘scale_factor’ is present for a variable, the data are multiplied by the value SCALE_FACTOR after the data are read. If ‘add_offset’ is present for a variable, then the ADD_OFFSET value is added to the data after the data are read. If both ‘scale_factor’ and ‘add_offset’ attributes are present, the data are first scaled by SCALE_FACTOR before the offset ADD_OFFSET is added. UPK = SCALE_FACTOR*PCK + ADD_OFFSET = (MAX-MIN)*PCK/NDRV + 0.5*(MIN+MAX) NCO will use this algorithm for unpacking unless told otherwise as described below. When ‘scale_factor’ and ‘add_offset’ are used for packing, the associated variable (containing the packed data) is typically of type ‘byte’ or ‘short’, whereas the unpacked values are intended to be of type ‘int’, ‘float’, or ‘double’. An attribute's ‘scale_factor’ and ‘add_offset’ if any, should be of the type intended for the unpacked data, i.e., ‘int’, ‘float’ or ‘double’. NCO also expects ‘_FillValue’ to be of the type intended for the unpacked data(2). Non-Standard Packing and Unpacking Algorithms --------------------------------------------- Many (most?) files originally written in HDF4 format use poorly documented packing/unpacking algorithms that are incompatible and easily confused with the netCDF packing algorithm described above. The unpacking component of the "conventional" HDF algorithm (described here (http://www.hdfgroup.org/HDF5/doc/UG/UG_frame10Datasets.html) and in Section 3.10.6 of the HDF4 Users Guide here (http://www.hdfgroup.org/release4/doc/UsrGuide_html/UG_PDF.pdf), and in the FAQ for MODIS MOD08 data here (http://modis-atmos.gsfc.nasa.gov/MOD08_D3/faq.html)) is UPK = SCALE_FACTOR*(PCK - ADD_OFFSET) The unpacking component of the HDF algorithm employed for MODIS MOD13 data is UPK = (PCK - ADD_OFFSET)/SCALE_FACTOR The unpacking component of the HDF algorithm employed for MODIS MOD04 data is the same as the netCDF algorithm. Confusingly, the (incompatible) netCDF and HDF algorithms both store their parameters in attributes with the same names (‘scale_factor’ and ‘add_offset’). Data packed with one algorithm should never be unpacked with the other; doing so will result in incorrect answers. Unfortunately, few users are aware that their datasets may be packed, and fewer know the details of the packing algorithm employed. This is what we in the "bizness" call an “interoperability” issue because it hampers data analysis performed on heterogeneous systems. As described below, NCO automatically unpacks data before performing arithmetic. This automatic unpacking occurs silently since there is usually no reason to bother users with these details. There is as yet no generic way for NCO to know which packing convention was used, so NCO _assumes_ the netCDF convention was used. NCO uses the same convention for unpacking unless explicitly told otherwise with the ‘--hdf_upk’ (also ‘--hdf_unpack’) switch. Until and unless a method of automatically detecting the packing method is devised, it must remain the user's responsibility to tell NCO when to use the HDF convention instead of the netCDF convention to unpack. If your data originally came from an HDF file (e.g., NASA EOS) then it was likely packed with the HDF convention and must be unpacked with the same convention. Our recommendation is to only request HDF unpacking when you are certain. Most packed datasets encountered by NCO will have used the netCDF convention. Those that were not will hopefully produce noticeably weird values when unpacked by the wrong algorithm. Before or after panicking, treat this as a clue to re-try your commands with the ‘--hdf_upk’ switch. See *note ncpdq netCDF Permute Dimensions Quickly:: for an easy technique to unpack data packed with the HDF convention, and then re-pack it with the netCDF convention. Handling of Packed Data by Other Operators ------------------------------------------ All NCO arithmetic operators understand packed data. The operators automatically unpack any packed variable in the input file which will be arithmetically processed. For example, ‘ncra’ unpacks all record variables, and ‘ncwa’ unpacks all variable which contain a dimension to be averaged. These variables are stored unpacked in the output file. On the other hand, arithmetic operators do not unpack non-processed variables. For example, ‘ncra’ leaves all non-record variables packed, and ‘ncwa’ leaves packed all variables lacking an averaged dimension. These variables (called fixed variables) are passed unaltered from the input to the output file. Hence fixed variables which are packed in input files remain packed in output files. Completely packing and unpacking files is easily accomplished with ‘ncpdq’ (*note ncpdq netCDF Permute Dimensions Quickly::). Pack and unpack individual variables with ‘ncpdq’ and the ‘ncap2’ ‘pack()’ and ‘unpack()’ functions (*note Methods and functions::). ---------- Footnotes ---------- (1) Although not a part of the standard, NCO enforces the policy that the ‘_FillValue’ attribute, if any, of a packed variable is also stored at the original precision. (2) The NetCDF guide and the CF conventions say that ‘_FillValue’ should have the type of the packed data. NCO convention for ‘_FillValue’ pre-dates the one of Unidata and CF. A patch to fix this would be welcome. File: nco.info, Node: Operation Types, Next: Type Conversion, Prev: Packed data, Up: Shared features 3.41 Operation Types ==================== Availability: ‘ncap2’, ‘ncra’, ‘nces’, ‘ncwa’ Short options: ‘-y’ Long options: ‘--operation’, ‘--op_typ’ The ‘-y OP_TYP’ switch allows specification of many different types of operations Set OP_TYP to the abbreviated key for the corresponding operation: ‘avg’ Mean value ‘sqravg’ Square of the mean ‘avgsqr’ Mean of sum of squares ‘max’ Maximum value ‘min’ Minimum value ‘mabs’ Maximum absolute value ‘mebs’ Mean absolute value ‘mibs’ Minimum absolute value ‘rms’ Root-mean-square (normalized by N) ‘rmssdn’ Root-mean square (normalized by N-1) ‘sqrt’ Square root of the mean ‘tabs’ Sum of absolute values ‘ttl’ Sum of values NCO assumes coordinate variables represent grid axes, e.g., longitude. The only rank-reduction which makes sense for coordinate variables is averaging. Hence NCO implements the operation type requested with ‘-y’ on all non-coordinate variables, not on coordinate variables. When an operation requires a coordinate variable to be reduced in rank, i.e., from one dimension to a scalar or from one dimension to a degenerate (single value) array, then NCO _always averages_ the coordinate variable regardless of the arithmetic operation type performed on the non-coordinate variables. The mathematical definition of each arithmetic operation is given below. *Note ncwa netCDF Weighted Averager::, for additional information on masks and normalization. If an operation type is not specified with ‘-y’ then the operator performs an arithmetic average by default. Averaging is described first so the terminology for the other operations is familiar. _Note for Info users_: The definition of mathematical operations involving rank reduction (e.g., averaging) relies heavily on mathematical expressions which cannot be easily represented in Info. _See the printed manual (./nco.pdf) for much more detailed and complete documentation of this subject._ The definitions of some of these operations are not universally useful. Mostly they were chosen to facilitate standard statistical computations within the NCO framework. We are open to redefining and or adding to the above. If you are interested in having other statistical quantities defined in NCO please contact the NCO project (*note Help Requests and Bug Reports::). EXAMPLES Suppose you wish to examine the variable ‘prs_sfc(time,lat,lon)’ which contains a time series of the surface pressure as a function of latitude and longitude. Find the minimum value of ‘prs_sfc’ over all dimensions: ncwa -y min -v prs_sfc in.nc foo.nc Find the maximum value of ‘prs_sfc’ at each time interval for each latitude: ncwa -y max -v prs_sfc -a lon in.nc foo.nc Find the root-mean-square value of the time-series of ‘prs_sfc’ at every gridpoint: ncra -y rms -v prs_sfc in.nc foo.nc ncwa -y rms -v prs_sfc -a time in.nc foo.nc The previous two commands give the same answer but ‘ncra’ is preferred because it has a smaller memory footprint. A dimension of size one is said to be “degenerate”. By default, ‘ncra’ leaves the (degenerate) ‘time’ dimension in the output file (which is usually useful) whereas ‘ncwa’ removes the ‘time’ dimension (unless ‘-b’ is given). These operations work as expected in multi-file operators. Suppose that ‘prs_sfc’ is stored in multiple timesteps per file across multiple files, say ‘jan.nc’, ‘feb.nc’, ‘march.nc’. We can now find the three month maximum surface pressure at every point. nces -y max -v prs_sfc jan.nc feb.nc march.nc out.nc It is possible to use a combination of these operations to compute the variance and standard deviation of a field stored in a single file or across multiple files. The procedure to compute the temporal standard deviation of the surface pressure at all points in a single file ‘in.nc’ involves three steps. ncwa -O -v prs_sfc -a time in.nc out.nc ncbo -O -v prs_sfc in.nc out.nc out.nc ncra -O -y rmssdn out.nc out.nc First construct the temporal mean of ‘prs_sfc’ in the file ‘out.nc’. Next overwrite ‘out.nc’ with the anomaly (deviation from the mean). Finally overwrite ‘out.nc’ with the root-mean-square of itself. Note the use of ‘-y rmssdn’ (rather than ‘-y rms’) in the final step. This ensures the standard deviation is correctly normalized by one fewer than the number of time samples. The procedure to compute the variance is identical except for the use of ‘-y avgsqr’ instead of ‘-y rmssdn’ in the final step. ‘ncap2’ can also compute statistics like standard deviations. Brute-force implementation of formulae is one option, e.g., ncap2 -s 'prs_sfc_sdn=sqrt((prs_sfc-prs_sfc.avg($time)^2). \ total($time)/($time.size-1))' in.nc out.nc The operation may, of course, be broken into multiple steps in order to archive intermediate quantities, such as the time-anomalies ncap2 -s 'prs_sfc_anm=prs_sfc-prs_sfc.avg($time)' \ -s 'prs_sfc_sdn=sqrt((prs_sfc_anm^2).total($time)/($time.size-1))' \ in.nc out.nc ‘ncap2’ supports intrinsic standard deviation functions (*note Operation Types::) which simplify the above expression to ncap2 -s 'prs_sfc_sdn=(prs_sfc-prs_sfc.avg($time)).rmssdn($time)' in.nc out.nc These instrinsic functions compute the answer quickly and concisely. The procedure to compute the spatial standard deviation of a field in a single file ‘in.nc’ involves three steps. ncwa -O -v prs_sfc,gw -a lat,lon -w gw in.nc out.nc ncbo -O -v prs_sfc,gw in.nc out.nc out.nc ncwa -O -y rmssdn -v prs_sfc -a lat,lon -w gw out.nc out.nc First the spatially weighted (by ‘-w gw’) mean values are written to the output file, as are the mean weights. The initial output file is then overwritten with the gridpoint deviations from the spatial mean. It is important that the output file after the second line contain the original, non-averaged weights. This will be the case if the weights are named so that NCO treats them like a coordinate (*note CF Conventions::). One such name is ‘gw’, and any variable whose name begins with ‘msk_’ (for "mask") or ‘wgt_’ (for "weight") will likewise be treated as a coordinate, and will be copied (not differenced) straight from ‘in.nc’ to ‘out.nc’ in the second step. When using weights to compute standard deviations one must remember to include the weights in the initial output files so that they may be used again in the final step. Finally the root-mean-square of the appropriately weighted spatial deviations is taken. No elegant ‘ncap2’ solution exists to compute weighted standard deviations. Those brave of heart may try to formulate one. A general formula should allow weights to have fewer than and variables to have more than the minimal spatial dimensions (latitude and longitude). The procedure to compute the standard deviation of a time-series across multiple files involves one extra step since all the input must first be collected into one file. ncrcat -O -v tpt in.nc in.nc foo1.nc ncwa -O -a time foo1.nc foo2.nc ncbo -O -v tpt foo1.nc foo2.nc foo3.nc ncra -O -y rmssdn foo3.nc out.nc The first step assembles all the data into a single file. Though this may consume a lot of temporary disk space, it is more or less required by the ‘ncbo’ operation in the third step. File: nco.info, Node: Type Conversion, Next: Batch Mode, Prev: Operation Types, Up: Shared features 3.42 Type Conversion ==================== Availability (automatic type conversion): ‘ncap2’, ‘ncbo’, ‘nces’, ‘ncflint’, ‘ncra’, ‘ncwa’ Short options: None (it's _automatic_) Availability (manual type conversion): ‘nces’, ‘ncra’, ‘ncwa’ Short options: None Long options: ‘--dbl’, ‘--flt’, ‘--rth_dbl’, ‘--rth_flt’ Type conversion refers to the casting or coercion of one fundamental or atomic data type to another, e.g., converting ‘NC_SHORT’ (two bytes) to ‘NC_DOUBLE’ (eight bytes). Type conversion always “promotes” or “demotes” the range and/or precision of the values a variable can hold. Type conversion is automatic when the language carries out this promotion according to an internal set of rules without explicit user intervention. In contrast, manual type conversion refers to explicit user commands to change the type of a variable or attribute. Most type conversion happens automatically, yet there are situations in which manual type conversion is advantageous. * Menu: * Automatic type conversion:: * Promoting Single-precision to Double:: * Manual type conversion:: File: nco.info, Node: Automatic type conversion, Next: Promoting Single-precision to Double, Prev: Type Conversion, Up: Type Conversion 3.42.1 Automatic type conversion -------------------------------- There are at least two reasons to avoid type conversions. First, type conversions are expensive since they require creating (temporary) buffers and casting each element of a variable from its storage type to some other type and then, often, converting it back. Second, a dataset's creator perhaps had a good reason for storing data as, say, ‘NC_FLOAT’ rather than ‘NC_DOUBLE’. In a scientific framework there is no reason to store data with more precision than the observations merit. Normally this is single-precision, which guarantees 6-9 digits of precision. Reasons to engage in type conversion include avoiding rounding errors and out-of-range limitations of less-precise types. This is the case with most integers. Thus NCO defaults to automatically promote integer types to floating-point when performing lengthy arithmetic, yet NCO defaults to not promoting single to double-precision floats. Before discussing the more subtle floating-point issues, we first examine integer promotion. We will show how following parsimonious conversion rules dogmatically can cause problems, and what NCO does about that. That said, there are situations in which implicit conversion of single- to double-precision is also warranted. Understanding the narrowness of these situations takes time, and we hope the reader appreciates the following detailed discussion. Consider the average of the two ‘NC_SHORT’s ‘17000s’ and ‘17000s’. A straightforward average without promotion results in garbage since the intermediate value which holds their sum is also of type ‘NC_SHORT’ and thus overflows on (i.e., cannot represent) values greater than 32,767 (1). There are valid reasons for expecting this operation to succeed and the NCO philosophy is to make operators do what you want, not what is purest. Thus, unlike C and Fortran, but like many other higher level interpreted languages, NCO arithmetic operators will perform automatic type conversion on integers when all the following conditions are met (2): 1. The requested operation is arithmetic. This is why type conversion is limited to the operators ‘ncap2’, ‘ncbo’, ‘nces’, ‘ncflint’, ‘ncra’, and ‘ncwa’. 2. The arithmetic operation could benefit from type conversion. Operations that could benefit include averaging, summation, or any "hard" arithmetic that could overflow or underflow. Larger representable sums help avoid overflow, and more precision helps to avoid underflow. Type conversion does not benefit searching for minima and maxima (‘-y min’, or ‘-y max’). 3. The variable on disk is of type ‘NC_BYTE’, ‘NC_CHAR’, ‘NC_SHORT’, or ‘NC_INT’. Type ‘NC_DOUBLE’ is not promoted because there is no type of higher precision. Conversion of type ‘NC_FLOAT’ is discussed in detail below. When it occurs, it follows the same procedure (promotion then arithmetic then demotion) as conversion of integer types. When these criteria are all met, the operator promotes the variable in question to type ‘NC_DOUBLE’, performs all the arithmetic operations, casts the ‘NC_DOUBLE’ type back to the original type, and finally writes the result to disk. The result written to disk may not be what you expect, because of incommensurate ranges represented by different types, and because of (lack of) rounding. First, continuing the above example, the average (e.g., ‘-y avg’) of ‘17000s’ and ‘17000s’ is written to disk as ‘17000s’. The type conversion feature of NCO makes this possible since the arithmetic and intermediate values are stored as ‘NC_DOUBLE’s, i.e., ‘34000.0d’ and only the final result must be represented as an ‘NC_SHORT’. Without the type conversion feature of NCO, the average would have been garbage (albeit predictable garbage near ‘-15768s’). Similarly, the total (e.g., ‘-y ttl’) of ‘17000s’ and ‘17000s’ written to disk is garbage (actually ‘-31536s’) since the final result (the true total) of 34000 is outside the range of type ‘NC_SHORT’. After arithmetic is computed in double-precision for promoted variables, the intermediate double-precision values must be demoted to the variables' original storage type (e.g., from ‘NC_DOUBLE’ to ‘NC_SHORT’). NCO has handled this demotion in three ways in its history. Prior to October, 2011 (version 4.0.8), NCO employed the C library truncate function, ‘trunc()’ (3). Truncation rounds X to the nearest integer not larger in absolute value. For example, truncation rounds ‘1.0d’, ‘1.5d’, and ‘1.8d’ to the same value, ‘1s’. Clearly, truncation does not round floating-point numbers to the nearest integer! Yet truncation is how the C language performs implicit conversion of real numbers to integers. NCO stopped using truncation for demotion when an alert user (Neil Davis) informed us that this caused a small bias in the packing algorithm employed by ‘ncpdq’. This led to NCO adopting rounding functions for demotion. Rounding functions eliminated the small bias in the packing algorithm. From February, 2012 through March, 2013 (versions 4.0.9-4.2.6), NCO employed the C library family of rounding functions, ‘lround()’. These functions round X to the nearest integer, halfway cases away from zero. The problem with ‘lround()’ is that it always rounds real values ending in ‘.5’ away from zero. This rounds, for example, ‘1.5d’ and ‘2.5d’ to ‘2s’ and ‘3s’, respectively. Since April, 2013 (version 4.3.0), NCO has employed the other C library family of rounding functions, ‘lrint()’. This algorithm rounds X to the nearest integer, using the current rounding direction. Halfway cases are rounded to the nearest even integer. This rounds, for example, both ‘1.5d’ and ‘2.5d’ to the same value, ‘2s’, as recommended by the IEEE. This rounding is symmetric: up half the time, down half the time. This is the current and hopefully final demotion algorithm employed by NCO. Hence because of automatic conversion, NCO will compute the average of ‘2s’ and ‘3s’ in double-precision arithmetic as (‘2.0d’ + ‘3.0d’)/‘2.0d’) = ‘2.5d’. It then demotes this intermediate result back to ‘NC_SHORT’ and stores it on disk as ‘trunc(2.5d)’ = ‘2s’ (versions up to 4.0.8), ‘lround(2.5d)’ = ‘3s’ (versions 4.0.9-4.2.6), and ‘lrint(2.5d)’ = ‘2s’ (versions 4.3.0 and later). ---------- Footnotes ---------- (1) 32767 = 2^15-1 (2) Operators began performing automatic type conversions before arithmetic in NCO version 1.2, August, 2000. Previous versions never performed unnecessary type conversion for arithmetic. (3) The actual type conversions with trunction were handled by intrinsic type conversion, so the ‘trunc()’ function was never explicitly called, although the results would be the same if it were. File: nco.info, Node: Promoting Single-precision to Double, Next: Manual type conversion, Prev: Automatic type conversion, Up: Type Conversion 3.42.2 Promoting Single-precision to Double ------------------------------------------- Promotion of real numbers from single- to double-precision is fundamental to scientific computing. When it should occur depends on the precision of the inputs and the number of operations. Single-precision (four-byte) numbers contain about seven significant figures, while double-precision contain about sixteen. More, err, precisely, the IEEE single-precision representation gives from 6 to 9 significant decimal digits precision (1). And the IEEE double-precision representation gives from 15 to 17 significant decimal digits precision (2). Hence double-precision numbers represent about nine digits more precision than single-precision numbers. Given these properties, there are at least two possible arithmetic conventions for the treatment of real numbers: 1. Conservative, aka Fortran Convention Automatic type conversion during arithmetic in the Fortran language is, by default, performed only when necessary. All operands in an operation are converted to the most precise type involved the operation before the arithmetic operation. Expressions which involve only single-precision numbers are computed entirely in single-precision. Expressions involving mixed precision types are computed in the type of higher precision. NCO by default employs the Fortan Convention for promotion. 2. Aggressive, aka C Convention The C language is by default much more aggressive (and thus wasteful) than Fortran, and will always implicitly convert single- to double-precision numbers, even when there is no good reason. All real-number standard C library functions are double-precision, and C programmers must take extra steps to only utilize single precision arithmetic. The high-level interpreted data analysis languages IDL, Matlab, and NCL all adopt the C Convention. NCO does not automatically promote ‘NC_FLOAT’ because, in our judgement, the performance penalty of always doing so would outweigh the potential benefits. The now-classic text "Numerical Recipes in C" discusses this point under the section "Implicit Conversion of Float to Double" (3). That said, such promotion is warranted in some circumstances. For example, rounding errors can accumulate to worrisome levels during arithmetic performed on large arrays of single-precision floats. This use-case occurs often in geoscientific studies of climate where thousands-to-millions of gridpoints may contribute to a single average. If the inputs are all single-precision, then so should be the output. However the intermediate results where running sums are accumulated may suffer from too much rounding or from underflow unless computed in double-precision. The order of operations matters to floating-point math even when the analytic expressions are equal. Cautious users feel disquieted when results from equally valid analyses differ in the final bits instead of agreeing bit-for-bit. For example, averaging arrays in multiple stages produces different answers than averaging them in one step. This is easily seen in the computation of ensemble averages by two different methods. The NCO test file ‘in.nc’ contains single- and double-precision representations of the same temperature timeseries as ‘tpt_flt’ and ‘tpt_dbl’. Pretend each datapoint in this timeseries represents a monthly-mean temperature. We will mimic the derivation of a fifteen-year ensemble-mean January temperature by concatenating the input file five times, and then averaging the datapoints representing January two different ways. In Method 1 we derive the 15-year ensemble January average in two steps, as the average of three five-year averages. This method is naturally used when each input file contains multiple years and multiple input files are needed (4). In Method 2 we obtain 15-year ensemble January average in a single step, by averaging all 15 Januaries at one time: # tpt_flt and tpt_dbl are identical except for precision ncks -C -v tpt_flt,tpt_dbl ~/nco/data/in.nc # tpt_dbl = 273.1, 273.2, 273.3, 273.4, 273.5, 273.6, 273.7, 273.8, 273.9, 274 # tpt_flt = 273.1, 273.2, 273.3, 273.4, 273.5, 273.6, 273.7, 273.8, 273.9, 274 # Create file with five "ten-month years" (i.e., 50 timesteps) of temperature data ncrcat -O -v tpt_flt,tpt_dbl -p ~/nco/data in.nc in.nc in.nc in.nc in.nc ~/foo.nc # Average 1st five "Januaries" (elements 1, 11, 21, 31, 41) ncra --flt -O -F -d time,1,,10 ~/foo.nc ~/foo_avg1.nc # Average 2nd five "Januaries" (elements 2, 12, 22, 32, 42) ncra --flt -O -F -d time,2,,10 ~/foo.nc ~/foo_avg2.nc # Average 3rd five "Januaries" (elements 3, 13, 23, 33, 43) ncra --flt -O -F -d time,3,,10 ~/foo.nc ~/foo_avg3.nc # Method 1: Obtain ensemble January average by averaging the averages ncra --flt -O ~/foo_avg1.nc ~/foo_avg2.nc ~/foo_avg3.nc ~/foo_avg_mth1.nc # Method 2: Obtain ensemble January average by averaging the raw data # Employ ncra's "subcycle" feature (http://nco.sf.net/nco.html#ssc) ncra --flt -O -F -d time,1,,10,3 ~/foo.nc ~/foo_avg_mth2.nc # Difference the two methods ncbo -O ~/foo_avg_mth1.nc ~/foo_avg_mth2.nc ~/foo_avg_dff.nc ncks ~/foo_avg_dff.nc # tpt_dbl = 5.6843418860808e-14 ; # tpt_flt = -3.051758e-05 ; Although the two methods are arithmetically equivalent, they produce slightly different answers due to the different order of operations. Moreover, it appears at first glance that the single-precision answers suffer from greater error than the double-precision answers. In fact both precisions suffer from non-zero rounding errors. The answers differ negligibly to machine precision, which is about seven significant figures for single precision floats (‘tpt_flt’), and sixteen significant figures for double precision (‘tpt_dbl’). The input precision determines the answer precision. IEEE arithmetic guarantees that two methods will produce bit-for-bit identical answers only if they compute the same operations in the same order. Bit-for-bit identical answers may also occur by happenstance when rounding errors exactly compensate one another. This is demonstrated by repeating the example above with the ‘--dbl’ (or ‘--rth_dbl’ for clarity) option which forces conversion of single-precision numbers to double-precision prior to arithmetic. Now ‘ncra’ will treat the first value of ‘tpt_flt’, ‘273.1000f’, as ‘273.1000000000000d’. Arithmetic on ‘tpt_flt’ then proceeds in double-precision until the final answer, which is converted back to single-precision for final storage. # Average 1st five "Januaries" (elements 1, 11, 21, 31, 41) ncra --dbl -O -F -d time,1,,10 ~/foo.nc ~/foo_avg1.nc # Average 2nd five "Januaries" (elements 2, 12, 22, 32, 42) ncra --dbl -O -F -d time,2,,10 ~/foo.nc ~/foo_avg2.nc # Average 3rd five "Januaries" (elements 3, 13, 23, 33, 43) ncra --dbl -O -F -d time,3,,10 ~/foo.nc ~/foo_avg3.nc # Method 1: Obtain ensemble January average by averaging the averages ncra --dbl -O ~/foo_avg1.nc ~/foo_avg2.nc ~/foo_avg3.nc ~/foo_avg_mth1.nc # Method 2: Obtain ensemble January average by averaging the raw data # Employ ncra's "subcycle" feature (http://nco.sf.net/nco.html#ssc) ncra --dbl -O -F -d time,1,,10,3 ~/foo.nc ~/foo_avg_mth2.nc # Difference the two methods ncbo -O ~/foo_avg_mth1.nc ~/foo_avg_mth2.nc ~/foo_avg_dff.nc # Show differences ncks ~/foo_avg_dff.nc # tpt_dbl = 5.6843418860808e-14 ; # tpt_flt = 0 ; The ‘--dbl’ switch has no effect on the results computed from double-precision inputs. But now the two methods produce bit-for-bit identical results from the single-precision inputs! This is due to the happenstance of rounding along with the effects of the ‘--dbl’ switch. The ‘--flt’ and ‘--rth_flt’ switches are provided for symmetry. They enforce the traditional NCO and Fortran convention of keeping single-precision arithmetic in single-precision unless a double-precision number is explicitly involved. We have shown that forced promotion of single- to double-precision prior to arithmetic has advantages and disadvantages. The primary disadvantages are speed and size. Double-precision arithmetic is 10-60% slower than, and requires twice the memory of single-precision arithmetic. The primary advantage is that rounding errors in double-precision are much less likely to accumulate to values near the precision of the underlying geophysical variable. For example, if we know temperature to five significant digits, then a rounding error of 1-bit could affect the least precise digit of temperature after 1,000-10,000 consecutive one-sided rounding errors under the worst possible scenario. Many geophysical grids have tens-of-thousands to millions of points that must be summed prior to normalization to compute an average. It is possible for single-precision rouding errors to accumulate and degrade the precision in such situtations. Double-precision arithmetic mititgates this problem, so ‘--dbl’ would be warranted. This can be seen with another example, averaging a global surface temperature field with ‘ncwa’. The input contains a single-precision global temperature field (stored in ‘TREFHT’) produced by the CAM3 general circulation model (GCM) run and stored at 1.9 by 2.5 degrees resolution. This requires 94 latitudes and 144 longitudes, or 13,824 total surface gridpoints, a typical GCM resolution in 2008-2013. These input characteristics are provided only to show the context to the interested reader, equivalent results would be found in statistics of any dataset of comparable size. Models often represent Earth on a spherical grid where global averages must be created by weighting each gridcell by its latitude-dependent weight (e.g., a Gaussian weight stored in ‘gw’), or by the surface area of each contributing gridpoint (stored in ‘area’). Like many geophysical models and most GCMs, CAM3 runs completely in double-precision yet stores its archival output in single-precision to save space. In practice such models usually save multi-dimensional prognostic and diagnostic fields (like ‘TREFHT(lat,lon)’) as single-precision, while saving all one-dimensional coordinates and weights (here ‘lat’, ‘lon’, and ‘gw(lon)’) as double-precision. The gridcell area ‘area(lat,lon)’ is an extensive grid property that should be, but often is not, stored as double-precision. To obtain pure double-precision arithmetic _and_ storage of the globla mean temperature, we first create and store double-precision versions of the single-precision fields: ncap2 -O -s 'TREFHT_dbl=double(TREFHT);area_dbl=double(area)' in.nc in.nc The single- and double-precision temperatures may each be averaged globally using four permutations for the precision of the weight and of the intermediate arithmetic representation: 1. Single-precision weight (‘area’), single-precision arithmetic 2. Double-precision weight (‘gw’), single-precision arithmetic 3. Single-precision weight (‘area’), double-precision arithmetic 4. Double-precision weight (‘gw’), double-precision arithmetic # NB: Values below are printed with C-format %5.6f using # ncks -H -C -s '%5.6f' -v TREFHT,TREFHT_dbl out.nc # Single-precision weight (area), single-precision arithmetic ncwa --flt -O -a lat,lon -w area in.nc out.nc # TREFHT = 289.246735 # TREFHT_dbl = 289.239964 # Double-precision weight (gw), single-precision arithmetic ncwa --flt -O -a lat,lon -w gw in.nc out.nc # TREFHT = 289.226135 # TREFHT_dbl = 289.239964 # Single-precision weight (area), double-precision arithmetic ncwa --dbl -O -a lat,lon -w area in.nc out.nc # TREFHT = 289.239960 # TREFHT_dbl = 289.239964 # Double-precision weight (gw), double-precision arithmetic ncwa --dbl -O -a lat,lon -w gw in.nc out.nc # TREFHT = 289.239960 # TREFHT_dbl = 289.239964 First note that the ‘TREFHT_dbl’ average never changes because ‘TREFHT_dbl(lat,lon)’ is double-precision in the input file. As described above, NCO automatically converts all operands involving to the highest precision involved in the operation. So specifying ‘--dbl’ is redundant for double-precision inputs. Second, the single-precision arithmetic averages of the single-precision input ‘TREFHT’ differ by 289.246735 - 289.226135 = 0.0206 from eachother, and, more importantly, by as much as 289.239964 - 289.226135 = 0.013829 from the correct (double-precision) answer. These averages differ in the fifth digit, i.e., they agree only to four significant figures! Given that climate scientists are concerned about global temperature variations of a tenth of a degree or less, this difference is large. Global mean temperature changes significant to climate scientists are comparable in size to the numerical artifacts produced by the averaging procedure. Why are the single-precision numerical artifacts so large? Each global average is the result of multiplying almost 15,000 elements each by its weight, summing those, and then dividing by the summed weights. Thus about 50,000 single-precision floating-point operations caused the loss of two to three significant digits of precision. The net error of a series of independent rounding errors is a random walk phenomena (5). Successive rounding errors displace the answer further from the truth. An ensemble of such averages will, on average, have no net bias. In other words, the expectation value of a series of IEEE rounding errors is zero. And the error of any given sequence of rounding errors obeys, for large series, a Gaussian distribution centered on zero. Single-precision numbers use three of their four eight-bit bytes to represent the mantissa so the smallest representable single-precision mantissa is \epsilon \equiv 2^{-23} = 1.19209 \times 10^{-7}. This \epsilon is the smallest X such that 1.0 + x \ne 1.0. This is the rounding error for non-exact precision-numbers. Applying random walk theory to rounding, it can be shown that the expected rounding error after N inexact operations is \sqrt{2n/\pi} for large N. The expected (i.e., mean absolute) rounding error in our example with 13,824 additions is about \sqrt{2 \times 13824 / \pi} = 91.96. Hence, addition alone of about fifteen thousand single-precision floats is expected to consume about two significant digits of precision. This neglects the error due to the inner product (weights times values) and normalization (division by tally) aspects of a weighted average. The ratio of two numbers each containing a numerical bias can magnify the size of the bias. In summary, a global mean number computed from about 15,000 gridpoints each with weights can be expected to lose up to three significant digits. Since single-precision starts with about seven significant digits, we should not expect to retain more than four significant digits after computing weighted averages in single-precision. The above example with ‘TREFHT’ shows the expected four digits of agreement. The NCO results have been independently validated to the extent possible in three other languages: C, Matlab, and NCL. C and NCO are the only languages that permit single-precision numbers to be treated with single precision arithmetic: # Double-precision weight (gw), single-precision arithmetic (C) ncwa_3528514.exe # TREFHT = 289.240112 # Double-precision weight (gw), double-precision arithmetic (C) # TREFHT = 289.239964 # Single-precision weight (area), double-precision arithmetic (Matlab) # TREFHT = 289.239964 # Double-precision weight (gw), double-precision arithmetic (Matlab) # TREFHT = 289.239964 # Single-precision weight (area), double-precision arithmetic (NCL) ncl < ncwa_3528514.ncl # TREFHT = 289.239960 # TREFHT_dbl = 289.239964 # Double-precision weight (gw), double-precision arithmetic (NCL) # TREFHT = 289.239960 # TREFHT_dbl = 289.239964 All languages tested (C, Matlab, NCL, and NCO) agree to machine precision with double-precision arithmetic. Users are fortunate to have a variety of high quality software that liberates them from the drudgery of coding their own. Many packages are free (as in beer)! As shown above NCO permits one to shift to their float-promotion preferences as desired. No other language allows this with a simple switch. To summarize, until version 4.3.6 (September, 2013), the default arithmetic convention of NCO adhered to Fortran behavior, and automatically promoted single-precision to double-precision in all mixed-precision expressions, and left-alone pure single-precision expressions. This is faster and more memory efficient than other conventions. However, pure single-precision arithmetic can lose too much precision when used to condense (e.g., average) large arrays. Statistics involving about n = 10,000 single-precision inputs will lose about 2-3 digits if not promoted to double-precision prior to arithmetic. The loss scales with the squareroot of N. For larger N, users should promote floats with the ‘--dbl’ option if they want to preserve more than four significant digits in their results. The ‘--dbl’ and ‘--flt’ switches are only available with the NCO arithmetic operators that could potentially perform more than a few single-precision floating-point operations per result. These are ‘nces’, ‘ncra’, and ‘ncwa’. Each is capable of thousands to millions or more operations per result. By contrast, the arithmetic operators ‘ncbo’ and ‘ncflint’ perform at most one floating-point operation per result. Providing the ‘--dbl’ option for such trivial operations makes little sense, so the option is not currently made available. We are interested in users' opinions on these matters. The default behavior was changed from ‘--flt’ to ‘--dbl’ with the release of NCO version 4.3.6 (October 2013). We will change the default back to ‘--flt’ if users prefer. Or we could set a threshold (e.g., n \ge 10000) after which single- to double-precision promotion is automatically invoked. Or we could make the default promotion convention settable via an environment variable (GSL does this a lot). Please let us know what you think of the selected defaults and options. ---------- Footnotes ---------- (1) According to Wikipedia's summary of IEEE standard 754, "If a decimal string with at most 6 significant digits is converted to IEEE 754 single-precision and then converted back to the same number of significant decimal, then the final string should match the original; and if an IEEE 754 single-precision is converted to a decimal string with at leastn 9 significant decimal and then converted back to single, then the final number must match the original". (2) According to Wikipedia's summary of IEEE standard 754, "If a decimal string with at most 15 significant digits is converted to IEEE 754 double-precision representation and then converted back to a string with the same number of significant digits, then the final string should match the original; and if an IEEE 754 double precision is converted to a decimal string with at least 17 significant digits and then converted back to double, then the final number must match the original". (3) See page 21 in Section 1.2 of the First edition for this gem: One does not need much experience in scientific computing to recognize that the implicit conversion rules are, in fact, sheer madness! In effect, they make it impossible to write efficient numerical programs. (4) For example, the CMIP5 archive tends to distribute monthly average timeseries in 50-year chunks. (5) Thanks to Michael J. Prather for explaining this to me. File: nco.info, Node: Manual type conversion, Prev: Promoting Single-precision to Double, Up: Type Conversion 3.42.3 Manual type conversion ----------------------------- ‘ncap2’ provides intrinsic functions for performing manual type conversions. This, for example, converts variable ‘tpt’ to external type ‘NC_SHORT’ (a C-type ‘short’), and variable ‘prs’ to external type ‘NC_DOUBLE’ (a C-type ‘double’). ncap2 -s 'tpt=short(tpt);prs=double(prs)' in.nc out.nc With ncap2 there also is the ‘convert()’ method that takes an integer argument. For example the above statements become: ncap2 -s 'tpt=tpt.convert(NC_SHORT);prs=prs.convert(NC_DOUBLE)' in.nc out.nc Can also use ‘convert()’ in combination with ‘type()’ so to make variable ‘ilev_new’ the same type as ‘ilev’ just do: ncap2 -s 'ilev_new=ilev_new.convert(ilev.type())' in.nc out.nc *Note ncap2 netCDF Arithmetic Processor::, for more details. File: nco.info, Node: Batch Mode, Next: Global Attribute Addition, Prev: Type Conversion, Up: Shared features 3.43 Batch Mode =============== Availability: All operators Short options: ‘-O’, ‘-A’ Long options: ‘--ovr’, ‘--overwrite’, ‘--apn’, ‘--append’ If the OUTPUT-FILE specified for a command is a pre-existing file, then the operator will prompt the user whether to overwrite (erase) the existing OUTPUT-FILE, attempt to append to it, or abort the operation. However, interactive questions reduce productivity when processing large amounts of data. Therefore NCO also implements two ways to override its own safety features, the ‘-O’ and ‘-A’ switches. Specifying ‘-O’ tells the operator to overwrite any existing OUTPUT-FILE without prompting the user interactively. Specifying ‘-A’ tells the operator to attempt to append to any existing OUTPUT-FILE without prompting the user interactively. These switches are useful in batch environments because they suppress interactive keyboard input. NB: As of 20120515, ‘ncap2’ is unable to append to files that already contain the appended dimensions. File: nco.info, Node: Global Attribute Addition, Next: Global Attribute Deletion, Prev: Batch Mode, Up: Shared features 3.44 Global Attribute Addition ============================== Availability: All operators Short options: None Long options: ‘--gaa’ and ‘--glb_att_add’ ‘--gaa ATT_NM=ATT_VAL’ (multiple invocations allowed) All operators can add user-specified global attributes to output files. As of NCO version 4.5.2 (July, 2015), NCO supports multiple uses of the ‘--gaa’ (or equivalent ‘--glb_att_add’) switch. The option ‘--gaa’ (and its long option equivalent ‘--glb_att_add’) indicates the argument syntax will be KEY=VAL. As such, ‘--gaa’ and its synonyms are indicator options that accept arguments supplied one-by-one like ‘--gaa KEY1=VAL1 --gaa KEY2=VAL2’, or aggregated together in multi-argument format like ‘--gaa KEY1=VAL1#KEY2=VAL2’ (*note Multi-arguments::). The switch takes mandatory arguments ‘--gaa ATT_NM=ATT_VAL’ where ATT_NM is the desired name of the global attribute to add, and ATT_VAL is its value. Currently only text attributes are supported (recorded as type ‘NC_CHAR’), and regular expressions are not allowed (unlike *note ncatted netCDF Attribute Editor::). Attributes are added in "Append" mode, meaning that values are appended to pre-existing values, if any. Multiple invocations can simplify the annotation of output file at creation (or modification) time: ncra --gaa machine=${HOSTNAME} --gaa created_by=${USER} in*.nc out.nc As of NCO version 4.6.2 (October, 2016), one may instead combine the separate invocations into a single list of invocations separated by colons: ncra --gaa machine=${HOSTNAME}:created_by=${USER} in*.nc out.nc The list may contain any number of key-value pairs. Special care must be taken should a key or value contain a delimiter (i.e., a colon) otherwise ‘NCO’ will interpret the colon as a delimiter and will attempt to create a new attribute. To protect a colon from being interpreted as an argument delimiter, precede it with a backslash. The global attribution addition feature helps to avoid the performance penalty incurred by using ‘ncatted’ separately to annotate large files. Should users emit a loud hue and cry, we will consider ading the functionality of ncatted to the front-end of all operators, i.e., accepting valid ‘ncatted’ arguments to modify attributes of any type and to apply regular expressions. File: nco.info, Node: Global Attribute Deletion, Next: History Attribute, Prev: Global Attribute Addition, Up: Shared features 3.45 Global Attribute Deletion ============================== Availability: All operators Short options: None Long options: ‘--gad’, ‘--glb_att_del’ All operators can delete user-specified global attributes from output files. As of NCO version 5.3.4 (June, 2025), NCO supports the option ‘--gad=ATT1,ATT2,...,ATTN’ (or long-option equivalent ‘--glb_att_del’). This option is deletes (rather than adds, like its sister option ‘--gaa’) global attributes from the output file. Global attribute deletion requires only the name of the global attribute without any further information, so the argument to the option is simply a comma-separated list of all attributes to delete. No information about the attributes' types, sizes, or values need be given. In other words, the option works on attributes of all types, sizes, and values. ncks --gad=history_of_appended_files,nco_openmp_thread_number,\ input_file,map_file,remap_version,remap_hostname,\ remap_command,remap_script,NCO in.nc out.nc The global attribution deletion feature helps to avoid the performance penalty incurred by using ‘ncatted’ separately to annotate large files. File: nco.info, Node: History Attribute, Next: File List Attributes, Prev: Global Attribute Deletion, Up: Shared features 3.46 History Attribute ====================== Availability: All operators Short options: ‘-h’ Long options: ‘--hst’, ‘--history’ All operators automatically append a ‘history’ global attribute to any file they create or modify. The ‘history’ attribute consists of a timestamp and the full string of the invocation command to the operator, e.g., ‘Mon May 26 20:10:24 1997: ncks in.nc out.nc’. The full contents of an existing ‘history’ attribute are copied from the first INPUT-FILE to the OUTPUT-FILE. The timestamps appear in reverse chronological order, with the most recent timestamp appearing first in the ‘history’ attribute. Since NCO adheres to the ‘history’ convention, the entire data processing path of a given netCDF file may often be deduced from examination of its ‘history’ attribute. As of May, 2002, NCO is case-insensitive to the spelling of the ‘history’ attribute name. Thus attributes named ‘History’ or ‘HISTORY’ (which are non-standard and not recommended) will be treated as valid history attributes. When more than one global attribute fits the case-insensitive search for "history", the first one found is used. To avoid information overkill, all operators have an optional switch (‘-h’, ‘--hst’, or ‘--history’) to override automatically appending the ‘history’ attribute (*note ncatted netCDF Attribute Editor::). Note that the ‘-h’ switch also turns off writing the ‘nco_input_file_list’-attribute for multi-file operators (*note File List Attributes::). As of NCO version 5.3.4 (June, 2025), NCO inserts line-breaks in ‘NC_STRING’-valued text when it encounters a C-format carriage return ‘\n’. Text stored as ‘NC_STRING’ and ‘NC_CHAR’ now prints with the same appearance. This makes it easier to read ‘history’ global attributes that are stored as ‘NC_STRING’ ncap2 -O -4 -h -s 'global@history="Previous history"s' ~/foo.nc ncap2 -O -s 'one=1' ~/foo.nc ~/foo.nc ncks -M ~/foo.nc # NC_STRING history is now more legible As of NCO version 4.5.0 (June, 2015), NCO supports its own convention to retain the ‘history’-attribute contents of all files that were appended to a file (1). This convention stores those contents in the ‘history_of_appended_files’ attribute, which complements the ‘history’-attribute to provide a more complete provenance. These attributes may appear something like this in output: // global attributes: :history = "Thu Jun 4 14:19:04 2015: ncks -A /home/zender/foo3.nc /home/zender/tmp.nc\n", "Thu Jun 4 14:19:04 2015: ncks -A /home/zender/foo2.nc /home/zender/tmp.nc\n", "Thu Jun 4 14:19:04 2015: ncatted -O -a att1,global,o,c,global metadata only in foo1 /home/zender/foo1.nc\n", "original history from the ur-file serving as the basis for subsequent appends." ; :history_of_appended_files = "Thu Jun 4 14:19:04 2015: Appended file \ /home/zender/foo3.nc had following \"history\" attribute:\n", "Thu Jun 4 14:19:04 2015: ncatted -O -a att2,global,o,c,global metadata only in foo3 /home/zender/foo3.nc\n", "history from foo3 from which data was appended to foo1 after data from foo2 was appended\n", "Thu Jun 4 14:19:04 2015: Appended file /home/zender/foo2.nc had following \"history\" attribute:\n", "Thu Jun 4 14:19:04 2015: ncatted -O -a att2,global,o,c,global metadata only in foo2 /home/zender/foo2.nc\n", "history of some totally different file foo2 from which data was appended to foo1 before foo3 was appended\n", :att1 = "global metadata only in foo1" ; Note that the ‘history_of_appended_files’-attribute is only created, and will only exist, in a file that is, or descends from a file that was, appended to. The optional switch ‘-h’ (or ‘--hst’ or ‘--history’) also overrides automatically appending the ‘history_of_appended_files’ attribute. ---------- Footnotes ---------- (1) Note that before version 4.5.0, NCO could, in append (‘-A’) mode only, inadvertently overwrite the global metadata (including ‘history’) of the output file with that of the input file. This is opposite the behavior most would want. File: nco.info, Node: File List Attributes, Next: CF Conventions, Prev: History Attribute, Up: Shared features 3.47 File List Attributes ========================= Availability: All binary executables Short options: ‘-H’ Long options: ‘--fl_lst_in’, ‘--file_list’ Many methods of specifying large numbers of input file names pass these names via pipes, encodings, or argument transfer programs (*note Large Numbers of Files::). When these methods are used, the input file list is not explicitly passed on the command line. This results in a loss of information since the ‘history’ attribute no longer contains the complete input information from which the file was created. NCO solves this dilemma by archiving an attribute that contains the input file list. When the input file list to an operator is specified via ‘stdin’, the operator, by default, attaches two global attributes to any file(s) they create or modify. The ‘nco_input_file_number’ global attribute contains the number of input files, and ‘nco_input_file_list’ contains the file names, specified as standard input to the multi-file operator. This information helps to verify that all input files the user thinks were piped through ‘stdin’ actually arrived. Without the ‘nco_input_file_list’ attribute, the information is lost forever and the "chain of evidence" would be broken. The ‘-H’ switch overrides (turns off) the default behavior of writing the input file list global attributes when input is from ‘stdin’. The ‘-h’ switch does this too, and turns off the ‘history’ attribute as well (*note History Attribute::). Hence both switches allows space-conscious users to avoid storing what may amount to many thousands of filenames in a metadata attribute. File: nco.info, Node: CF Conventions, Next: ARM Conventions, Prev: File List Attributes, Up: Shared features 3.48 CF Conventions =================== Availability: ‘ncbo’, ‘nces’, ‘ncecat’, ‘ncflint’, ‘ncpdq’, ‘ncra’, ‘ncwa’ Short options: None NCO recognizes some Climate and Forecast (CF) metadata conventions, and applies special rules to such data. NCO was contemporaneous with COARDS and still contains some rules to handle older model datasets that pre-date CF, such as NCAR CCM and early CCSM datasets. Such datasets may not contain an explicit ‘Conventions’ attribute (e.g., ‘CF-1.0’). Nevertheless, we refer to all such metadata collectively as CF metadata. Skip this section if you never work with CF metadata. The latest CF netCDF conventions are described here (http://cfconventions.org/1.10.html). Most CF netCDF conventions are transparent to NCO. There are no known pitfalls associated with using any NCO operator on files adhering to these conventions. NCO applies some rules that are not in CF, or anywhere else, because experience shows that they simplify data analysis, and stay true to the NCO mantra to do what users want. Here is a general sense of NCO's CF-support: • Understand and implement NUG recommendations such as the history attribute, packing conventions, and attention to units. • Special handling of variables designated as coordinates, bounds, or ancillary variables, so that users subsetting a certain variable automatically obtain all related variables. • Special handling and prevention of meaningless operations (e.g., the root-mean-square of latitude) so that coordinates and bounds preserve meaningful information even as normal (non-coordinate) fields are statistically transformed. • Understand units and certain calendars so that hyperslabs may be specified in physical units, and so that user needs not manually decode per-file time specifications. • Understand auxiliary coordinates so that irregular hyperslabs may be specified on complex geometric grids. • Check for CF-compliance on netCDF3 and netCDF4 and HDF files. • Convert netCDF4 and HDF files to netCDF3 for strict CF-compliance. Finally, a main use of NCO is to "produce CF", i.e., to improve CF-compliance by annotating metadata, renaming objects (attributes, variables, and dimensions), permuting and inverting dimensions, recomputing values, and data compression. Currently, NCO determines whether a datafile is a CF output datafile simply by checking (case-insensitively) whether the value of the global attribute ‘Conventions’ (if any) equals ‘CF-1.0’ or ‘NCAR-CSM’ Should ‘Conventions’ equal either of these in the (first) INPUT-FILE, NCO will apply special rules to certain variables because of their usual meaning in CF files. NCO will not average the following variables often found in CF files: ‘ntrm’, ‘ntrn’, ‘ntrk’, ‘ndbase’, ‘nsbase’, ‘nbdate’, ‘nbsec’, ‘mdt’, ‘mhisf’. These variables contain scalar metadata such as the resolution of the host geophysical model and it makes no sense to change their values. Furthermore, the “size and rank-preserving arithmetic operators” try not to operate on certain grid properties. These operators are ‘ncap2’, ‘ncbo’, ‘nces’, ‘ncflint’, and ‘ncpdq’ (when used for packing, not for permutation). These operators do not operate, by default, on (i.e., add, subtract, pack, etc.) the following variables: ‘ORO’, ‘area’, ‘datesec’, ‘date’, ‘gw’, ‘hyai’, ‘hyam’, ‘hybi’. ‘hybm’, ‘lat_bnds’, ‘lon_bnds’, ‘msk_*’, and ‘wgt_*’. These variables represent Gaussian weights, land/sea masks, time fields, hybrid sigma-pressure coefficients, and latitude/longitude boundaries. We call these fields non-coordinate “grid properties”. Coordinate grid properties are easy to identify because they are coordinate variables such as ‘latitude’ and ‘longitude’. Users usually want _all_ grid properties to remain unaltered in the output file. To be treated as a grid property, the variable name must _exactly_ match a name in the above list, or be a coordinate variable. Handling of ‘msk_*’ and ‘wgt_*’ is exceptional in that _any_ variable whose name starts with ‘msk_’ or ‘wgt_’ is considered to be a "mask" or a "weight" and is thus preserved (not operated on when arithmetic can be avoided). As of NCO version 4.7.7 (September, 2018), NCO began to explicitly identify files adhering to the MPAS convention. These files have a global attribute ‘Conventions’ attribute that contains the string or ‘MPAS’. Size and rank-preserving arithmetic operators will not operate on these MPAS non-coordinate grid properties: ‘angleEdge’, ‘areaCell’, ‘areaTriangle’, ‘cellMask’, ‘cellsOnCell’, ‘cellsOnEdge’, ‘cellsOnVertex’, ‘dcEdge’, ‘dvEdge’, ‘edgesOnCell’, ‘edgesOnEdge’, ‘edgesOnVertex’, ‘indexToCellID’, ‘indexToEdgeID’, ‘indexToVertexID’, ‘kiteAreasOnVertex’, ‘latCell’, ‘latEdge’, ‘latVertex’, ‘lonCell’, ‘lonEdge’, ‘lonVertex’, ‘maxLevelCell’, ‘meshDensity’, ‘nEdgesOnCell’, ‘nEdgesOnEdge’, ‘vertexMask’, ‘verticesOnCell’, ‘verticesOnEdge’, ‘weightsOnEdge’, ‘xCell’, ‘xEdge’, ‘xVertex’, ‘yCell’, ‘yEdge’, ‘yVertex’, ‘zCell’, ‘zEdge’, and ‘zVertex’. As of NCO version 4.5.0 (June, 2015), NCO began to support behavior required for the DOE E3SM/ACME program, and we refer to these rules collectively as the E3SM/ACME convention. The first E3SM/ACME rule implemented is that the contents of INPUT-FILE variables named ‘date_written’ and ‘time_written’, if any, will be updated to the current system-supplied (with ‘gmtime()’) GMT-time as the variables are copied to the OUTPUT-FILE. You must spoof NCO if you would like any grid properties or other special CF fields processed normally. For example rename the variables first with ‘ncrename’, or alter the ‘Conventions’ attribute. As of NCO version 4.0.8 (April, 2011), NCO supports the CF ‘bounds’ convention for cell boundaries described here (http://cfconventions.org/cf-conventions/cf-conventions.html#cell-boundaries). This convention allows coordinate variables (including multidimensional coordinates) to describe the boundaries of their cells. This is done by naming the variable which contains the bounds in in the ‘bounds’ attribute. Note that coordinates of rank N have bounds of rank N+1. NCO-generated subsets of CF-compliant files with ‘bounds’ attributes will include the coordinates specified by the ‘bounds’ attribute, if any. Hence the subsets will themselves be CF-compliant. Bounds are subject to the user-specified override switches (including ‘-c’ and ‘-C’) described in *note Subsetting Coordinate Variables::. The CAM/EAM family of atmospheric models does not output a ‘bounds’ variable or attribute corresponding to the ‘lev’ coordinate. This prevents NCO from activating its CF bounds machinery when ‘lev’ is extracted. As of version 4.7.7 (September, 2018), NCO works around this by outputting the ‘ilev’ coordinate (and ‘hyai’, ‘hybi’) whenever the ‘lev’ coordinate is also output. As of NCO version 4.4.9 (May, 2015), NCO supports the CF ‘climatology’ convention for climatological statistics described here (http://cfconventions.org/Data/cf-conventions/cf-conventions-1.7/build/cf-conventions.html#climatological-statistics). This convention allows coordinate variables (including multidimensional coordinates) to describe the (possibly nested) periods and statistical methods of their associated statistics. This is done by naming the variable which contains the periods and methods in the ‘climatology’ attribute. Note that coordinates of rank N have climatology bounds of rank N+1. NCO-generated subsets of CF-compliant files with ‘climatology’ attributes will include the variables specified by the ‘climatology’ attribute, if any. Hence the subsets will themselves be CF-compliant. Climatology variables are subject to the user-specified override switches (including ‘-c’ and ‘-C’) described in *note Subsetting Coordinate Variables::. As of NCO version 4.4.5 (July, 2014), NCO supports the CF ‘ancillary_variables’ convention for described here (http://cfconventions.org/cf-conventions/cf-conventions.html#ancillary-data). This convention allows ancillary variables to be associated with one or more primary variables. NCO attaches any such variables to the extraction list along with the primary variable and its usual (one-dimensional) coordinates, if any. Ancillary variables are subject to the user-specified override switches (including ‘-c’ and ‘-C’) described in *note Subsetting Coordinate Variables::. As of NCO version 4.6.4 (January, 2017), NCO supports the CF ‘cell_measures’ convention described here (http://cfconventions.org/cf-conventions/cf-conventions.html#cell-measures). This convention allows variables to indicate which other variable or variables contains area or volume information about a gridcell. These measures variables are pointed to by the ‘cell_measures’ attribute. The CDL specification of a measures variable for area looks like orog:cell_measures = "area: areacella" where ‘areacella’ is the name of the measures variable. Unless the default behavior is overridden, NCO attaches any measures variables to the extraction list along with the primary variable and other associated variables. By definition, measures variables are a subset of the rank of the variable they measure. The most common case is that the measures variable for area is the same size as 2D fields (like surface air temperature) and much smaller than 3D fields (like full air temperature). In such cases the measures variable might occupy 50% of the space of a dataset consisting of only one 2D field. Extraction of measures variables is subject to the user-specified override switches (including ‘-c’ and ‘-C’) described in *note Subsetting Coordinate Variables::. To conserve space without sacrificing too much metadata, NCO makes it possible to override the extraction of measures variables independent of extracting other associated variables. Override the default with ‘--no_cell_measures’ or ‘--no_cll_msr’. These options are available in all operators that perform subsetting (i.e., all operators except ‘ncatted’ and ‘ncrename’). As of NCO version 4.6.4 (January, 2017), NCO supports the CF ‘formula_terms’ convention described here (http://cfconventions.org/cf-conventions/cf-conventions.html#formula-terms). This convention encodes formulas used to construct (usually vertical) coordinate grids. The CDL specification of a vertical coordinate formula for looks like lev:standard_name = "atmosphere_hybrid_sigma_pressure_coordinate" lev:formula_terms = "a: hyam b: hybm p0: P0 ps: PS" where ‘standard_name’ contains the standardized name of the formula variable and ‘formula_terms’ contains a list of the variables used, called formula variables. Above the formula variables are ‘hyam’, ‘hybm’, ‘P0’, and ‘PS’. Unless the default behavior is overridden, NCO attaches any formula variables to the extraction list along with the primary variable and other associated variables. By definition, formula variables are a subset of the rank of the variable they define. One common case is that the formula variables for constructing a 3D height grid involves a 2D variable (like surface pressure, or elevation). In such cases the formula variables typically constitute only a small fraction of a dataset consisting of one 3D field. Extraction of formula variables is subject to the user-specified override switches (including ‘-c’ and ‘-C’) described in *note Subsetting Coordinate Variables::. To conserve space without sacrificing too much metadata, NCO makes it possible to override the extraction of formula variables independent of extracting other associated variables. Override the default with ‘--no_formula_terms’ or ‘--no_frm_trm’. These options are available in all operators that perform subsetting (i.e., all operators except ‘ncatted’ and ‘ncrename’). As of NCO version 4.6.0 (May, 2016), NCO supports the CF ‘grid_mapping’ convention for described here (http://cfconventions.org/cf-conventions/cf-conventions.html#grid-mappings-and-projections). This convention allows descriptions of map-projections to be associated with variables. NCO attaches any such map-projection variables to the extraction list along with the primary variable and its usual (one-dimensional) coordinates, if any. Map-projection variables are subject to the user-specified override switches (including ‘-c’ and ‘-C’) described in *note Subsetting Coordinate Variables::. As of NCO version 3.9.6 (January, 2009), NCO supports the CF ‘coordinates’ convention described here (http://cfconventions.org/cf-conventions/cf-conventions.html#coordinate-system). This convention allows variables to specify additional coordinates (including mult-idimensional coordinates) in a space-separated string attribute named ‘coordinates’. NCO attaches any such coordinates to the extraction list along with the variable and its usual (one-dimensional) coordinates, if any. These auxiliary coordinates are subject to the user-specified override switches (including ‘-c’ and ‘-C’) described in *note Subsetting Coordinate Variables::. Elimination of reduced dimensions from the ‘coordinates’ attribute helps ensure that rank-reduced variables become completely independent from their former dimensions. As of NCO version 4.4.9 (May, 2015), NCO may modify the ‘coordinates’ attribute to assist this. In particular, ‘ncwa’ eliminates from the ‘coordinates’ attribute any dimension that it collapses, e.g., by averaging. The former presence of this dimension will usually be indicated by the CF ‘cell_methods’ convention described here (http://cfconventions.org/cf-conventions/cf-conventions.html#cell-methods). Hence the CF ‘cell_methods’ and ‘coordinates’ conventions can be said to work in tandem to characterize the state and history of a variable's analysis. As of NCO version 4.4.2 (February, 2014), NCO supports some of the CF ‘cell_methods’ convention (http://cfconventions.org/cf-conventions/cf-conventions.html#cell-methods) to describe the analysis procedures that have been applied to data. The convention creates (or appends to an existing) ‘cell_methods’ attribute a space-separated list of couplets of the form DMN: OP where DMN is a comma-separated list of dimensions previously contained in the variable that have been reduced by the arithmetic operation OP. For example, the ‘cell_methods’ value ‘time: mean’ says that the variable in question was averaged over the ‘time’ dimension. In such cases ‘time’ will either be a scalar variable or a degenerate dimension or coordinate. This simply means that it has been averaged-over. The value ‘time, lon: mean lat: max’ says that the variable in question is the maximum zonal mean of the time averaged original variable. Which is to say that the variable was first averaged over time and longitude, and then the residual latitudinal array was reduced by choosing the maximum value. Since the ‘cell methods’ convention may alter metadata in an undesirable (or possibly incorrect) fashion, we provide switches to ensure it is always or never used. Use long-options ‘--cll_mth’ or ‘--cell_methods’ to invoke the algorithm (true by default), and options ‘--no_cll_mth’ or ‘--no_cell_methods’ to turn it off. These options are only available in the operators ‘ncwa’ and ‘ncra’. File: nco.info, Node: ARM Conventions, Next: Operator Version, Prev: CF Conventions, Up: Shared features 3.49 ARM Conventions ==================== Availability: ‘ncrcat’ Short options: None ‘ncrcat’ has been programmed to correctly handle data files which utilize the Atmospheric Radiation Measurement (ARM) Program convention (http://www.arm.gov/data/time.stm) for time and time offsets. If you do not work with ARM data then you may skip this section. ARM data files store time information in two variables, a scalar, ‘base_time’, and a record variable, ‘time_offset’. Subtle but serious problems can arise when these type of files are blindly concatenated without CF or ARM support. ‘NCO’ implements rebasing (*note Rebasing Time Coordinate::) as necessary on both CF and ARM files. Rebasing chains together consecutive INPUT-FILES and produces an OUTPUT-FILE which contains the correct time information. For ARM files this is expecially complex because the time coordinates are often stored as type ‘NC_CHAR’. Currently, ‘ncrcat’ determines whether a datafile is an ARM datafile simply by testing for the existence of the variables ‘base_time’, ‘time_offset’, and the dimension ‘time’. If these are found in the INPUT-FILE then ‘ncrcat’ will automatically perform two non-standard, but hopefully useful, procedures. First, ‘ncrcat’ will ensure that values of ‘time_offset’ appearing in the OUTPUT-FILE are relative to the ‘base_time’ appearing in the first INPUT-FILE (and presumably, though not necessarily, also appearing in the OUTPUT-FILE). Second, if a coordinate variable named ‘time’ is not found in the INPUT-FILES, then ‘ncrcat’ automatically creates the ‘time’ coordinate in the OUTPUT-FILE. The values of ‘time’ are defined by the ARM conventions TIME = BASE_TIME + TIME_OFFSET. Thus, if OUTPUT-FILE contains the ‘time_offset’ variable, it will also contain the ‘time’ coordinate. A short message is added to the ‘history’ global attribute whenever these ARM-specific procedures are executed. File: nco.info, Node: Operator Version, Prev: ARM Conventions, Up: Shared features 3.50 Operator Version ===================== Availability: All operators Short options: ‘-r’ Long options: ‘--revision’, ‘--version’, or ‘--vrs’ All operators can be told to print their version information, library version, copyright notice, and compile-time configuration with the ‘-r’ switch, or its long-option equivalent ‘revision’. The ‘--version’ or ‘--vrs’ switches print the operator version information only. The internal version number varies between operators, and indicates the most recent change to a particular operator's source code. This is useful in making sure you are working with the most recent operators. The version of NCO you are using might be, e.g., ‘3.9.5’. Using ‘-r’ on, say, ‘ncks’, produces something like ‘NCO netCDF Operators version "3.9.5" last modified 2008/05/11 built May 12 2008 on neige by zender Copyright (C) 1995--2008 Charlie Zender ncks version 20090918’. This tells you that ‘ncks’ contains all patches up to version ‘3.9.5’, which dates from May 11, 2008. File: nco.info, Node: Reference Manual, Next: Contributing, Prev: Shared features, Up: Top 4 Reference Manual ****************** This chapter presents reference pages for each of the operators individually. The operators are presented in alphabetical order. All valid command line switches are included in the syntax statement. Recall that descriptions of many of these command line switches are provided only in *note Shared features::, to avoid redundancy. Only options specific to, or most useful with, a particular operator are described in any detail in the sections below. * Menu: * ncap2 netCDF Arithmetic Processor:: * ncatted netCDF Attribute Editor:: * ncbo netCDF Binary Operator:: * ncchecker netCDF Compliance Checker:: * ncclimo netCDF Climatology Generator:: * ncecat netCDF Ensemble Concatenator:: * nces netCDF Ensemble Statistics:: * ncflint netCDF File Interpolator:: * ncks netCDF Kitchen Sink:: * ncpdq netCDF Permute Dimensions Quickly:: * ncra netCDF Record Averager:: * ncrcat netCDF Record Concatenator:: * ncremap netCDF Remapper:: * ncrename netCDF Renamer:: * ncwa netCDF Weighted Averager:: File: nco.info, Node: ncap2 netCDF Arithmetic Processor, Next: ncatted netCDF Attribute Editor, Prev: Reference Manual, Up: Reference Manual 4.1 ‘ncap2’ netCDF Arithmetic Processor ======================================= ‘ncap2’ understands a relatively full-featured language of operations, including loops, conditionals, arrays, and math functions. ‘ncap2’ is the most rapidly changing NCO operator and its documentation is incomplete. The distribution file ‘data/ncap2_tst.nco’ contains an up-to-date overview of its syntax and capabilities. The ‘data/*.nco’ distribution files (especially ‘bin_cnt.nco’, ‘psd_wrf.nco’, and ‘rgr.nco’) contain in-depth examples of ‘ncap2’ solutions to complex problems. SYNTAX ncap2 [-3] [-4] [-5] [-6] [-7] [-A] [-C] [-c] [-D DBG] [-F] [-f] [--gaa ...] [--gad ATT[,...]] [-H] [-h] [--hdf] [--hdr_pad NBR] [--hpss] [-L DFL_LVL] [-l PATH] [--no_tmp_fl] [-O] [-o OUTPUT-FILE] [-p PATH] [-R] [-r] [--ram_all] [-s ALGEBRA] [-S FL.NCO] [-t THR_NBR] [-v] [INPUT-FILE] [OUTPUT-FILE] DESCRIPTION ‘ncap2’ arithmetically processes netCDF files. ‘ncap2’ is the successor to ‘ncap’ which was put into maintenance mode in November, 2006, and completely removed from NCO in March, 2018. This documentation refers to ‘ncap2’ implements its own domain-specific language to produc a powerful superset ‘ncap’-functionality. ‘ncap2’ may be renamed ‘ncap’ one day! The processing instructions are contained either in the NCO script file ‘fl.nco’ or in a sequence of command line arguments. The options ‘-s’ (or long options ‘--spt’ or ‘--script’) are used for in-line scripts and ‘-S’ (or long options ‘--fl_spt’, ‘--nco_script’, or ‘--script-file’) are used to provide the filename where (usually multiple) scripting commands are pre-stored. ‘ncap2’ was written to perform arbitrary algebraic transformations of data and archive the results as easily as possible. *Note Missing Values::, for treatment of missing values. The results of the algebraic manipulations are called “derived fields”. Unlike the other operators, ‘ncap2’ does not accept a list of variables to be operated on as an argument to the ‘-v’ option (*note Subsetting Files::). In ‘ncap2’, ‘-v’ is a switch that takes no arguments and indicates that ‘ncap2’ should output _only_ user-defined variables (and coordinates associated with variables used in deriving them). ‘ncap2’ neither accepts nor understands the -X switch. We recommend making this distinction clear by using ‘--usr_dfn_var’ (or its synonym, ‘--output_user_defined_variables’, both introduced in NCO version 5.1.9 in October, 2023) instead of ‘-v’, which may be deprecated. NB: As of 20120515, ‘ncap2’ is unable to append to files that already contain the appended dimensions. Providing a name for OUTPUT-FILE is optional if INPUT-FILE is a netCDF3 format, in which case ‘ncap2’ attempts to write modifications directly to INPUT-FILE (similar to the behavior of ‘ncrename’ and ‘ncatted’). Format-constraints prevent this type of appending from working on a netCDF4 format INPUT-FILE. In any case, reading and writing the same file can be risky and lead to unexpected consequences (since the file is being both read and written), so in normal usage we recommend providing OUTPUT-FILE (which can be the same as INPUT-FILE since the changes are first written to an intermediate file). As of NCO version 4.8.0 (released May, 2019), ‘ncap2’ does not require that INPUT-FILE be specified when OUTPUT-FILE has no dependency on it. Prior to this, ‘ncap2’ required users to specify a dummy INPUT-FILE even if it was not used to construct OUTPUT-FILE. Input files are always read by ‘ncap2’, and dummy input files are read though not used for anything nor modified. Now ncap2 -s 'quark=1' ~/foo.nc # Create new foo.nc ncap2 -s 'print(quark)' ~/foo.nc # Print existing foo.nc ncap2 -O -s 'quark=1' ~/foo.nc # Overwrite old with new foo.nc ncap2 -s 'quark=1' ~/foo.nc ~/foo.nc # Add to old foo.nc * Menu: * Syntax of ncap2 statements:: * Expressions:: * Dimensions:: * Left hand casting:: * Arrays and hyperslabs:: * Attributes:: * Value List:: * Number literals:: * if statement:: * Print & String methods:: * Missing values ncap2:: * Methods and functions:: * RAM variables:: * Where statement:: * Loops:: * Include files:: * Sort methods:: * UDUnits script:: * Vpointer:: * Irregular grids:: * Bilinear interpolation:: * GSL special functions:: * GSL interpolation:: * GSL least-squares fitting:: * GSL statistics:: * GSL random number generation:: * Examples ncap2:: * Intrinsic mathematical methods:: * Operator precedence and associativity :: * ID Quoting:: * make_bounds() function:: * solar_zenith_angle function:: Defining new variables in terms of existing variables is a powerful feature of ‘ncap2’. Derived fields inherit the metadata (i.e., attributes) of their ancestors, if any, in the script or input file. When the derived field is completely new (no identically-named ancestors exist), then it inherits the metadata (if any) of the left-most variable on the right hand side of the defining expression. This metadata inheritance is called “attribute propagation”. Attribute propagation is intended to facilitate well-documented data analysis, and we welcome suggestions to improve this feature. The only exception to this rule of attribute propagation is in cases of left hand casting (*note Left hand casting::). The user must manually define the proper metadata for variables defined using left hand casting. File: nco.info, Node: Syntax of ncap2 statements, Next: Expressions, Prev: ncap2 netCDF Arithmetic Processor, Up: ncap2 netCDF Arithmetic Processor 4.1.1 Syntax of ‘ncap2’ statements ---------------------------------- Mastering ‘ncap2’ is relatively simple. Each valid statement STATEMENT consists of standard forward algebraic expression. The ‘fl.nco’, if present, is simply a list of such statements, whitespace, and comments. The syntax of statements is most like the computer language C. The following characteristics of C are preserved: Array syntax Arrays elements are placed within ‘[]’ characters; Array indexing Arrays are 0-based; Array storage Last dimension is most rapidly varying; Assignment statements A semi-colon ‘;’ indicates the end of an assignment statement. Comments Multi-line comments are enclosed within ‘/* */’ characters. Single line comments are preceded by ‘//’ characters. Nesting Files may be nested in scripts using ‘#include SCRIPT’. The ‘#include’ command is not followed by a semi-colon because it is a pre-processor directive, not an assignment statement. The filename ‘script’ is interpreted relative to the run directory. Attribute syntax The at-sign ‘@’ is used to delineate an attribute name from a variable name. File: nco.info, Node: Expressions, Next: Dimensions, Prev: Syntax of ncap2 statements, Up: ncap2 netCDF Arithmetic Processor 4.1.2 Expressions ----------------- Expressions are the fundamental building block of ‘ncap2’. Expressions are composed of variables, numbers, literals, and attributes. The following C operators are "overloaded" and work with scalars and multi-dimensional arrays: Arithmetic Operators: * / % + - ^ Binary Operators: > >= < <= == != == || && >> << Unary Operators: + - ++ -- ! Conditional Operator: exp1 ? exp2 : exp3 Assign Operators: = += -= /= *= In the following section a “variable” also refers to a number literal which is read in as a scalar variable: *Arithmetic and Binary Operators * Consider _var1 'op' var2_ *Precision* • When both operands are variables, the result has the precision of the higher precision operand. • When one operand is a variable and the other an attribute, the result has the precision of the variable. • When both operands are attributes, the result has the precision of the more precise attribute. • The exponentiation operator "^" is an exception to the above rules. When both operands have type less than ‘NC_FLOAT’, the result is ‘NC_FLOAT’. When either type is ‘NC_DOUBLE’, the result is also ‘NC_DOUBLE’. *Rank* • The Rank of the result is generally equal to Rank of the operand that has the greatest number of dimensions. • If the dimensions in var2 are a subset of the dimensions in var1 then its possible to make var2 conform to var1 through broadcasting and or dimension reordering. • Broadcasting a variable means creating data in non-existing dimensions by copying data in existing dimensions. • More specifically: If the numbers of dimensions in var1 is greater than or equal to the number of dimensions in var2 then an attempt is made to make var2 conform to var1 ,else var1 is made to conform to var2. If conformance is not possible then an error message will be emitted and script execution will cease. Even though the logical operators return True(1) or False(0) they are treated in the same way as the arithmetic operators with regard to precision and rank. Examples: dimensions: time=10, lat=2, lon=4 Suppose we have the two variables: double P(time,lat,lon); float PZ0(lon,lat); // PZ0=1,2,3,4,5,6,7,8; Consider now the expression: PZ=P-PZ0 PZ0 is made to conform to P and the result is PZ0 = 1,3,5,7,2,4,6,8, 1,3,5,7,2,4,6,8, 1,3,5,7,2,4,6,8, 1,3,5,7,2,4,6,8, 1,3,5,7,2,4,6,8, 1,3,5,7,2,4,6,8, 1,3,5,7,2,4,6,8, 1,3,5,7,2,4,6,8, 1,3,5,7,2,4,6,8, 1,3,5,7,2,4,6,8, Once the expression is evaluated then PZ will be of type double; Consider now start=four-att_var@double_att; // start =-69 and is of type intger; four_pow=four^3.0f // four_pow=64 and is of type float three_nw=three_dmn_var_sht*1.0f; // type is now float start@n1=att_var@short_att*att_var@int_att; // start@n1=5329 and is type int *Binary Operators* Unlike C the binary operators return an array of values. There is no such thing as short circuiting with the AND/OR operators. Missing values are carried into the result in the same way they are with the arithmetic operators. When an expression is evaluated in an if() the missing values are treated as true. The binary operators are, in order of precedence: ! Logical Not ---------------------------- << Less Than Selection >> Greater Than Selection ---------------------------- > Greater than >= Greater than or equal to < Less than <= Less than or equal to ---------------------------- == Equal to != Not equal to ---------------------------- && Logical AND ---------------------------- || Logical OR ---------------------------- To see all operators: *note Operator precedence and associativity:: Examples: tm1=time>2 && time <7; // tm1=0, 0, 1, 1, 1, 1, 0, 0, 0, 0 double tm2=time==3 || time>=6; // tm2=0, 0, 1, 0, 0, 1, 1, 1, 1, 1 double tm3=int(!tm1); // tm3=1, 1, 0, 0, 0, 0, 1, 1, 1, 1 int tm4=tm1 && tm2; // tm4=0, 0, 1, 0, 0, 1, 0, 0, 0, 0 double tm5=!tm4; // tm5=1, 1, 0, 1, 1, 0, 1, 1, 1, 1 double *Regular Assign Operator* _var1 '=' exp1_ If var1 does not already exist in Input or Output then var1 is written to Output with the values, type and dimensions from expr1. If var1 is in Input only it is copied to Output first. Once the var is in Ouptut then the only reqirement on expr1 is that the number of elements must match the number already on disk. The type of expr1 is converted as necessary to the disk type. If you wish to change the type or shape of a variable in Input then you must cast the variable. See *note Left hand casting:: time[time]=time.int(); three_dmn_var_dbl[time,lon,lat]=666L; * Other Assign Operators +=,-=,*=./= * _var1 'ass_op' exp1 _ if exp1 is a variable and it doesn't conform to var1 then an attempt is made to make it conform to var1. If exp1 is an attribute it must have unity size or else have the same number of elements as var1. If expr1 has a different type to var1 the it is converted to the var1 type. z1=four+=one*=10 // z1=14 four=14 one=10; time-=2 // time= -1,0,1,2,3,4,5,6,7,8 *Increment/Decrement Operators * These work in a similar fashion to their regular C counterparts. If say the variable ‘four’ is input only then the statement ‘++four’ effectively means read ‘four’ from input increment each element by one, then write the new values to Output; Example: n2=++four; n2=5, four=5 n3=one--+20; n3=21 one=0; n4=--time; n4=time=0.,1.,2.,3.,4.,5.,6.,7.,8.,9.; *Conditional Operator ?:* _exp1 ? exp2 : exp3 _ The conditional operator (or ternary Operator) is a succinct way of writing an if/then/else. If exp1 evaluates to true then exp2 is returned else exp3 is returned. Example: weight_avg=weight.avg(); weight_avg@units= (weight_avg == 1 ? "kilo" : "kilos"); PS_nw=PS-(PS.min() > 100000 ? 100000 : 0); *Clipping Operators* << Less-than Clipping For arrays, the less-than selection operator selects all values in the left operand that are less than the corresponding value in the right operand. If the value of the left side is greater than or equal to the corresponding value of the right side, then the right side value is placed in the result >> Greater-than Clipping For arrays, the greater-than selection operator selects all values in the left operand that are greater than the corresponding value in the right operand. If the value of the left side is less than or equal to the corresponding value of the right side, then the right side value is placed in the result. Example: RDM2=RDM >> 100.0 // 100,100,100,100,126,126,100,100,100,100 double RDM2=RDM << 90s // 1, 9, 36, 84, 90, 90, 84, 36, 9, 1 int File: nco.info, Node: Dimensions, Next: Left hand casting, Prev: Expressions, Up: ncap2 netCDF Arithmetic Processor 4.1.3 Dimensions ---------------- Dimensions are defined in Output using the ‘defdim()’ function. defdim("cnt",10); # Dimension size is fixed by default defdim("cnt",10,NC_UNLIMITED); # Dimension is unlimited (record dimension) defdim("cnt",10,0); # Dimension is unlimited (record dimension) defdim("cnt",10,1); # Dimension size is fixed defdim("cnt",10,737); # All non-zero values indicate dimension size is fixed This dimension name must then be prefixed with a dollar-sign ‘$’ when referred to in method arguments or left-hand-casting, e.g., new_var[$cnt]=time; temperature[$time,$lat,$lon]=35.5; temp_avg=temperature.avg($time); The ‘size’ method allows dimension sizes to be used in arithmetic expressions: time_avg=time.total()/$time.size; Increase the size of a new variable by one and set new member to zero: defdim("cnt_new",$cnt.size+1); new_var[$cnt_new]=0.0; new_var(0:($cnt_new.size-2))=old_var; To define an unlimited dimension, simply set the size to zero defdim("time2",0) *Dimension Abbreviations * It is possible to use dimension abbreviations as method arguments: ‘$0’ is the first dimension of a variable ‘$1’ is the second dimension of a variable ‘$n’ is the n+1 dimension of a variable float four_dmn_rec_var(time,lat,lev,lon); double three_dmn_var_dbl(time,lat,lon); four_nw=four_dmn_rev_var.reverse($time,$lon) four_nw=four_dmn_rec_var.reverse($0,$3); four_avg=four_dmn_rec_var.avg($lat,$lev); four_avg=four_dmn_rec_var.avg($1,$2); three_mw=three_dmn_var_dbl.permute($time,$lon,$lat); three_mw=three_dmn_var_dbl.permute($0,$2,$1); *ID Quoting * If the dimension name contains non-regular characters use ID quoting: See *note ID Quoting:: defdim("a--list.A",10); A1['$a--list.A']=30.0; *GOTCHA * It is not possible to manually define in Output any dimensions that exist in Input. When a variable from Input appears in an expression or statement its dimensions in Input are automagically copied to Output (if they are not already present) File: nco.info, Node: Left hand casting, Next: Arrays and hyperslabs, Prev: Dimensions, Up: ncap2 netCDF Arithmetic Processor 4.1.4 Left hand casting ----------------------- The following examples demonstrate the utility of the “left hand casting” ability of ‘ncap2’. Consider first this simple, artificial, example. If LAT and LON are one dimensional coordinates of dimensions LAT and LON, respectively, then addition of these two one-dimensional arrays is intrinsically ill-defined because whether LAT_LON should be dimensioned LAT by LON or LON by LAT is ambiguous (assuming that addition is to remain a “commutative” procedure, i.e., one that does not depend on the order of its arguments). Differing dimensions are said to be “orthogonal” to one another, and sets of dimensions which are mutually exclusive are orthogonal as a set and any arithmetic operation between variables in orthogonal dimensional spaces is ambiguous without further information. The ambiguity may be resolved by enumerating the desired dimension ordering of the output expression inside square brackets on the left hand side (LHS) of the equals sign. This is called “left hand casting” because the user resolves the dimensional ordering of the RHS of the expression by specifying the desired ordering on the LHS. ncap2 -s 'lat_lon[lat,lon]=lat+lon' in.nc out.nc ncap2 -s 'lon_lat[lon,lat]=lat+lon' in.nc out.nc The explicit list of dimensions on the LHS, ‘[lat,lon]’ resolves the otherwise ambiguous ordering of dimensions in LAT_LON. In effect, the LHS “casts” its rank properties onto the RHS. Without LHS casting, the dimensional ordering of LAT_LON would be undefined and, hopefully, ‘ncap2’ would print an error message. Consider now a slightly more complex example. In geophysical models, a coordinate system based on a blend of terrain-following and density-following surfaces is called a “hybrid sigma-pressure coordinate system”. In this coordinate system, four variables must be manipulated to obtain the pressure of the vertical coordinate: PO is the domain-mean surface pressure offset (a scalar), PS is the local (time-varying) surface pressure (usually two horizontal spatial dimensions, i.e. latitude by longitude), HYAM is the weight given to surfaces of constant density (one spatial dimension, pressure, which is orthogonal to the horizontal dimensions), and HYBM is the weight given to surfaces of constant elevation (also one spatial dimension). This command constructs a four-dimensional pressure ‘prs_mdp’ from the four input variables of mixed rank and orthogonality: ncap2 -s 'prs_mdp[time,lat,lon,lev]=P0*hyam+PS*hybm' in.nc out.nc Manipulating the four fields which define the pressure in a hybrid sigma-pressure coordinate system is easy with left hand casting. Finally, we show how to use interface quantities to define midpoint quantities. In particular, we will define interface pressures using the standard CESM output hybrid sigma-pressure coordinate parameters, and then difference those interface pressures to obtain the pressure difference between the interfaces. The pressure difference is necessary obtain gridcell mass path and density (which are midpoint quantities). Definitions are as in the above example, with new variables HYAI and HYBI defined at grid cell vertical interfaces (rather than midpoints like HYAM and HYBM). The approach naturally fits into two lines: cat > ~/pdel.nco << 'EOF' *prs_ntf[time,lat,lon,ilev]=P0*hyai+PS*hybi; // Requires NCO 4.5.4 and later: prs_dlt[time,lat,lon,lev]=prs_ntf(:,:,:,1:$ilev.size-1)-prs_ntf(:,:,:,0:$ilev.size-2); // Derived variable that require pressure thickness: // Divide by gravity to obtain total mass path in layer aka mpl [kg m-2] mpl=prs_dlt/grv_sfc; // Multiply by mass mixing ratio to obtain mass path of constituent mpl_CO2=mpl*mmr_CO2; EOF ncap2 -O -v -S ~/pdel.nco ~/nco/data/in.nc ~/foo.nc ncks -O -C -v prs_dlt ~/foo.nc The first line defines the four-dimensional interface pressures ‘prs_ntf’ as a RAM variable because those are not desired in the output file. The second differences each pressure level from the pressure above it to obtain the pressure difference. This line employs both left-hand casting and array hyperslabbing. However, this syntax only works with NCO version 4.5.4 (November, 2015) and later because earlier versions require that LHS and RHS dimension names (not just sizes) match. From the pressure differences, one can obtain the mass path in each layer as shown. Another reason to cast a variable is to modify the shape or type of a variable already in Input gds_var[gds_crd]=gds_var.double(); three_dmn_var_crd[lat,lon,lev]=10.0d; four[]=four.int(); File: nco.info, Node: Arrays and hyperslabs, Next: Attributes, Prev: Left hand casting, Up: ncap2 netCDF Arithmetic Processor 4.1.5 Arrays and hyperslabs --------------------------- Generating a regularly spaced n-dimensional array with ‘ncap2’ is simple with the ‘array()’ function. The function comes in three (overloaded) forms (A) var_out=array(val_srt,val_inc,$dmn_nm); // One-dimensional output (B) var_out=array(val_srt,val_inc,var_tpl); // Multi-dimensional output (C) var_out=array(val_srt,val_inc,/$dmn1,$dmn2...,$dmnN/); // Multi-dimensional output “val_srt” Starting value of the array. The TYPE of the array will be the TYPE of this starting value. “val_inc” Spacing (or increment) between elements. “var_tpl” Variable from which the array can derive its shape 1D or nD *One-Dimensional Arrays * Use form (A) or (B) above for 1D arrays: # var_out will be NC_DOUBLE: var_out=array(10.0,2,$time) // 10.5,12.5,14.5,16.5,18.5,20.5,22.5,24.5,26.5,28.5 // var_out will be NC_UINT, and "shape" will duplicate "ilev" var_out=array(0ul,2,ilev) // 0,2,4,6 // var_out will be NC_FLOAT var_out=array(99.0f,2.5,$lon) // 99,101.5,104,106.5 // Create an array of zeros var_out=array(0,0,$time) // 0,0,0,0,0,0,0,0,0,0 // Create array of ones var_out=array(1.0,0.0,$lon) // 1.0,1.0,1.0,1.0 *n-Dimensional Arrays * Use form (B) or (C) for creating n-D arrays. NB: In (C) the final argument is a list of dimensions // These are equivalent var_out=array(1.0,2.0,three_dmn_var); var_out=array(1.0,2.0,/$lat,$lev,$lon/); // var_out is NC_BYTE var_out=array(20b, -4, /$lat,$lon/); // 20,16,12,8,4,0,-4,-8 srt=3.14159f; inc=srt/2.0f; var_out(srt,inc,var_2D_rrg); // 3.14159, 4.712385, 6.28318, 7.853975, 9.42477, 10.99557, 12.56636, 14.13716 ; Hyperslabs in ‘ncap2’ are more limited than hyperslabs with the other NCO operators. ‘ncap2’ does not understand the shell command-line syntax used to specify multi-slabs, wrapped co-ordinates, negative stride or coordinate value limits. However with a bit of syntactic magic they are all are possible. ‘ncap2’ accepts (in fact, it requires) N-hyperslab arguments for a variable of rank N: var1(arg1,arg2 ... argN); where each hyperslab argument is of the form start:end:stride and the arguments for different dimensions are separated by commas. If START is omitted, it defaults to zero. If END is omitted, it defaults to dimension size minus one. If STRIDE is omitted, it defaults to one. If a single value is present then it is assumed that that dimension collapses to a single value (i.e., a cross-section). The number of hyperslab arguments MUST equal the variable's rank. *Hyperslabs on the Right Hand Side of an assign * A simple 1D example: ($time.size=10) od[$time]={20,22,24,26,28,30,32,34,36,38}; od(7); // 34 od(7:); // 34,36,38 od(:7); // 20,22,24,26,28,30,32,34 od(::4); // 20,28,36 od(1:6:2) // 22,26,30 od(:) // 20,22,24,26,28,30,32,34,36,38 A more complex three dimensional example: ($lat.size=2,$lon.size=4) th[$time,$lat,$lon]= {1, 2, 3, 4, 5, 6, 7, 8, 9,10,11,12,13,14,15,16, 17,18,19,20,21,22,23,24, -99,-99,-99,-99,-99,-99,-99,-99, 33,34,35,36,37,38,39,40, 41,42,43,44,45,46,47,48, 49,50,51,52,53,54,55,56, -99,58,59,60,61,62,63,64, 65,66,67,68,69,70,71,72, -99,74,75,76,77,78,79,-99 }; th(1,1,3); // 16 th(2,0,:); // 17, 18, 19, 20 th(:,1,3); // 8, 16, 24, -99, 40, 48, 56, 64, 72, -99 th(::5,:,0:3:2); // 1, 3, 5, 7, 41, 43, 45, 47 If hyperslab arguments collapse to a single value (a cross-section has been specified), then that dimension is removed from the returned variable. If all the values collapse then a scalar variable is returned. So, for example, the following is valid: th_nw=th(0,:,:)+th(9,:,:); // th_nw has dimensions $lon,$lat // NB: the time dimension has become degenerate The following is invalid: th_nw=th(0,:,0:1)+th(9,:,0:1); because the ‘$lon’ dimension now only has two elements. The above can be calculated by using a LHS cast with ‘$lon_nw’ as replacement dim for ‘$lon’: defdim("lon_nw",2); th_nw[$lat,$lon_nw]=th(0,:,0:1)+th(9,:,0:1); *Hyperslabs on the Left Hand Side of an assign * When hyperslabing on the LHS, the expression on the RHS must evaluate to a scalar or a variable/attribute with the same number of elements as the LHS hyperslab. Set all elements of the last record to zero: th(9,:,:)=0.0; Set first element of each lon element to 1.0: th(:,:,0)=1.0; One may hyperslab on both sides of an assign. For example, this sets the last record to the first record: th(9,:,:)=th(0,:,:); Say TH0 represents pressure at height=0 and TH1 represents pressure at height=1. Then it is possible to insert these hyperslabs into the records prs[$time,$height,$lat,$lon]=0.0; prs(:,0,:,:)=th0; prs(:,1,:,:)=th1; *Reverse method* Use the ‘reverse()’ method to reverse a dimension's elements in a variable with at least one dimension. This is equivalent to a negative stride, e.g., th_rv=th(1,:,:).reverse($lon); // {12,11,10,9 }, {16,15,14,13} od_rv=od.reverse($time); // {38,36,34,32,30,28,26,24,22,20} *Permute method*p Use the ‘permute()’ method to swap the dimensions of a variable. The number and names of dimension arguments must match the dimensions in the variable. If the first dimension in the variable is of record type then this must remain the first dimension. If you want to change the record dimension then consider using ‘ncpdq’. Consider the variable: float three_dmn_var(lat,lev,lon); three_dmn_var_prm=three_dmn_var.permute($lon,$lat,$lev); // The permuted values are three_dmn_var_prm= 0,4,8, 12,16,20, 1,5,9, 13,17,21, 2,6,10, 14,18,22, 3,7,11, 15,19,23; File: nco.info, Node: Attributes, Next: Value List, Prev: Arrays and hyperslabs, Up: ncap2 netCDF Arithmetic Processor 4.1.6 Attributes ---------------- Refer to attributes with _var_nm@att_nm_. The following are all valid statements: global@text="Test Attributes"; /* Assign a global variable attribute */ a1[$time]=time*20; a1@long_name="Kelvin"; a1@min=a1.min(); a1@max=a1.max(); a1@min++; --a1@max; a1(0)=a1@min; a1($time.size-1)=a1@max; NetCDF allows all attribute types to have a size between one and ‘NC_MAX_ATTRS’. Here is the metadata for variable A1: double a1(time) ; a1:long_name = "Kelvin" ; a1:max = 199. ; a1:min = 21. ; a1:trip1 = 1, 2, 3 ; a1:triplet = 21., 110., 199. ; These basic methods can be used with attributes: ‘size()’, ‘type()’, and ‘exists()’. For example, to save an attribute text string in a variable: defdim("sng_len",a1@long_name.size()); sng_arr[$sng_len]=a1@long_name; // sng_arr now contains "Kelvin" Attributes defined in a script are stored in memory and are written to the output file after script completion. To stop the attribute being written use the ‘ram_delete()’ method or use a bogus variable name. *Attribute Propagation and Inheritance* • Attribute propagation occurs in a regular assign statement. The variable being defined on the LHS gets copies of the attributes from the leftermost variable on the RHS. • Attribute Inheritance: The LHS variable "inherits" attributes from an Input variable with the same name • It is possible to have a regular assign statement for which both propagation and inheritance occur. // prs_mdp inherits attributes from P0: prs_mdp[time,lat,lon,lev]=P0*hyam+hybm*PS; // th_min inherits attributes from three_dmn_var_dbl: th_min=1.0 + 2*three_dmn_var_dbl.min($time); *Attribute Concatenation * The push() function concatenates attributes, or appends an "expression" to a pre-existing attribute, and comes in two forms: (A) att_new=push(att_exp,expr) (B) att_size=push(&att_nm,expr) In form (A) The first argument should be an attribute identifier or an expression that evaluates to an attribute. The second argument can evalute to an attribute or a variable. The second argument is then converted to the type of ATT_EXP; and appended to ATT_EXP, and the resulting attribute is returned. In form (B) the first argument is a call-by-reference attribute identifier (which may not yet exist). The second argument is then evaluated (and type-converted as needed) and appended to the call-by-reference atttribute. The final size of the attribute is then returned. temp@range=-10.0; push(&temp@range,12.0); // temp@range=-10.0,12.0 numbers@squares=push(1,4); numbers@squares=push(numbers@squares,9); push(&number@squares,16.0); push(&number@squares,25ull); // numbers@squares=1,4,9,16,25 Now some text examples. Remember, an atttribute identifier that begins with @ is global. For example, '@institution' is short for 'global@institution'. global@greetings=push("hello"," world !!"); global@greek={"alpha"s,"beta"s,"gamma"s}; // Append an NC_STRING push(&@greek,"delta"s); // Pushing an NC_CHAR to an NC_STRING attribute is allowed, it is converted to an NC_CHAR @e="epsilon"; push(&@greek,@e); push(&@greek,"zeta"); // Pushing a single NC_STRING to an NC_CHAR is not allowed @h="hello"; push(&@h," again"s); // BAD PUSH If the attribute name contains non-regular characters use ID quoting: 'b..m1@c--lost'=23; See *note ID Quoting::. File: nco.info, Node: Value List, Next: Number literals, Prev: Attributes, Up: ncap2 netCDF Arithmetic Processor 4.1.7 Value List ---------------- A _value list_ is a special type of attribute. It can only be used on the RHS of the assign family of statements. That is _=, +=, -=, *=, /=_ A value list CANNOT be involved in any logical, binary, or arithmetical operations (except those above). A value list CANNOT be used as a function argument. A value list CANNOT have nested value lists. The type of a value list is the type of the member with the highest type. a1@trip={1,2,3}; a1@trip+={3,2,1}; // 4,4,4 a1@triplet={a1@min,(a1@min+a1@max)/2,a1@max}; lon[lon]={0.0,90.0,180.0,270.0}; lon*={1.0,1.1,1.2,1.3} dlon[lon]={1b,2s,3ull,4.0f}; // final type NC_FLOAT a1@ind={1,2,3}+{4,4,4}; // BAD a1@s=sin({1.0,16.0}); // BAD One can also use a value_list to create an attribute of type NC_STRING. Remember, a literal string of type NC_STRING has a postfix 's'. A value list of NC_CHAR has no semantic meaning and is plain wrong. array[lon]={1.0,2.,4.0,7.0}; array@numbers={"one"s, "two"s, "four"s, "seven"s}; // GOOD ar[lat]={0,20} ar@numbers={"zero","twenty"}; // BAD File: nco.info, Node: Number literals, Next: if statement, Prev: Value List, Up: ncap2 netCDF Arithmetic Processor 4.1.8 Number literals --------------------- The table below lists the postfix character(s) to add to a number literal (aka, a naked constant) for explicit type specification. The same type-specification rules are used for variables and attributes. A floating-point number without a postfix defaults to ‘NC_DOUBLE’, while an integer without a postfix defaults to type ‘NC_INT’: var[$rlev]=0.1; // Variable will be type NC_DOUBLE var[$lon_grd]=2.0; // Variable will be type NC_DOUBLE var[$gds_crd]=2e3; // Variable will be type NC_DOUBLE var[$gds_crd]=2.0f; // Variable will be type NC_FLOAT (note "f") var[$gds_crd]=2e3f; // Variable will be type NC_FLOAT (note "f") var[$gds_crd]=2; // Variable will be type NC_INT var[$gds_crd]=-3; // Variable will be type NC_INT var[$gds_crd]=2s; // Variable will be type NC_SHORT var[$gds_crd]=-3s; // Variable will be type NC_SHORT var@att=41.; // Attribute will be type NC_DOUBLE var@att=41.f; // Attribute will be type NC_FLOAT var@att=41; // Attribute will be type NC_INT var@att=-21s; // Attribute will be type NC_SHORT var@units="kelvin"; // Attribute will be type NC_CHAR There is no postfix for characters, use a quoted string instead for ‘NC_CHAR’. ‘ncap2’ interprets a standard double-quoted string as a value of type ‘NC_CHAR’. In this case, any receiving variable must be dimensioned as an array of ‘NC_CHAR’ long enough to hold the value. To use the newer netCDF4 types NCO must be compiled/linked to the netCDF4 library and the output file must be of type ‘NETCDF4’: var[$time]=1UL; // Variable will be type @code{NC_UINT} var[$lon]=4b; // Variable will be type @code{NC_BYTE} var[$lat]=5ull; // Variable will be type @code{NC_UINT64} var[$lat]=5ll; // Variable will be type @code{NC_INT64} var@att=6.0d; // Attribute will be type @code{NC_DOUBLE} var@att=-666L; // Attribute will be type @code{NC_INT} var@att="kelvin"s; // Attribute will be type @code{NC_STRING} (note the "s") Use a post-quote ‘s’ for ‘NC_STRING’. Place the letter ‘s’ immediately following the double-quoted string to indicate that the value is of type ‘NC_STRING’. In this case, the receiving variable need not have any memory allocated to hold the string because netCDF4 handles that memory allocation. Suppose one creates a file containing an ensemble of model results, and wishes to label the record coordinate with the name of each model. The ‘NC_STRING’ type is well-suited to this because it facilitates storing arrays of strings of arbitrary length. This is sophisticated, though easy with ‘ncap2’: % ncecat -O -u model cesm.nc ecmwf.nc giss.nc out.nc % ncap2 -4 -O -s 'model[$model]={"cesm"s,"ecmwf"s,"giss"s}' out.nc out.nc The key here to place an ‘s’ character after each double-quoted string value to indicate an ‘NC_STRING’ type. The ‘-4’ ensures the output filetype is netCDF4 in case the input filetype is not. *netCDF3/4 Types* b|B ‘NC_BYTE’, a signed 1-byte integer none ‘NC_CHAR’, an ISO/ASCII character s|S ‘NC_SHORT’, a signed 2-byte integer l|L ‘NC_INT’, a signed 4-byte integer f|F ‘NC_FLOAT’, a single-precision (4-byte) floating-point number d|D ‘NC_DOUBLE’, a double-precision (8-byte) floating-point number *netCDF4 Types* ub|UB ‘NC_UBYTE’, an unsigned 1-byte integer us|US ‘NC_USHORT’, an unsigned 2-byte integer u|U|ul|UL ‘NC_UINT’, an unsigned 4-byte integer ll|LL ‘NC_INT64’, a signed 8-byte integer ull|ULL ‘NC_UINT64’, an unsigned 8-byte integer s ‘NC_STRING’, a string of arbitrary length File: nco.info, Node: if statement, Next: Print & String methods, Prev: Number literals, Up: ncap2 netCDF Arithmetic Processor 4.1.9 if statement ------------------ The syntax of the if statement is similar to its C counterpart. The _Conditional Operator (ternary operator)_ has also been implemented. if(exp1) stmt1; else if(exp2) stmt2; else stmt3; # Can use code blocks as well: if(exp1){ stmt1; stmt1a; stmt1b; }else if(exp2) stmt2; else{ stmt3; stmt3a; stmt3b; } For a variable or attribute expression to be logically true all its non-missing value elements must be logically true, i.e., non-zero. The expression can be of any type. Unlike C there is no short-circuiting of an expression with the OR (‘||’) and AND (‘&&’) operators. The whole expression is evaluated regardless if one of the AND/OR operands are True/False. # Simple example if(time > 0) print("All values of time are greater than zero\n"); else if(time < 0) print("All values of time are less than zero\n"); else { time_max=time.max(); time_min=time.min(); print("min value of time=");print(time_min,"%f"); print("max value of time=");print(time_max,"%f"); } # Example from ddra.nco if(fl_typ == fl_typ_gcm){ var_nbr_apx=32; lmn_nbr=1.0*var_nbr_apx*varsz_gcm_4D; /* [nbr] Variable size */ if(nco_op_typ==nco_op_typ_avg){ lmn_nbr_avg=1.0*var_nbr_apx*varsz_gcm_4D; // Block size lmn_nbr_wgt=dmnsz_gcm_lat; /* [nbr] Weight size */ } // !nco_op_typ_avg }else if(fl_typ == fl_typ_stl){ var_nbr_apx=8; lmn_nbr=1.0*var_nbr_apx*varsz_stl_2D; /* [nbr] Variable size */ if(nco_op_typ==nco_op_typ_avg){ lmn_nbr_avg=1.0*var_nbr_apx*varsz_stl_2D; // Block size lmn_nbr_wgt=dmnsz_stl_lat; /* [nbr] Weight size */ } // !nco_op_typ_avg } // !fl_typ *Conditional Operator * // netCDF4 needed for this example th_nw=(three_dmn_var_sht >= 0 ? three_dmn_var_sht.uint() : \ three_dmn_var_sht.int()); File: nco.info, Node: Print & String methods, Next: Missing values ncap2, Prev: if statement, Up: ncap2 netCDF Arithmetic Processor 4.1.10 Print & String methods ----------------------------- The print statement comes in a variety of forms: (A) print(variable_name, format string?); (A1) print(expression/string, format string?); (B) sprint(expression/string, format string?); (B1) sprint4(expression/string, format string?); *print() * If the variable exists in I/O then it is printed in a similar fashion to ‘ncks -H’. print(lon); lon[0]=0 lon[1]=90 lon[2]=180 lon[3]=270 print(byt_2D) lat[0]=-90 lon[0]=0 byt_2D[0]=0 lat[0]=-90 lon[1]=90 byt_2D[1]=1 lat[0]=-90 lon[2]=180 byt_2D[2]=2 lat[0]=-90 lon[3]=270 byt_2D[3]=3 lat[1]=90 lon[0]=0 byt_2D[4]=4 lat[1]=90 lon[1]=90 byt_2D[5]=5 lat[1]=90 lon[2]=180 byt_2D[6]=6 lat[1]=90 lon[3]=270 byt_2D[7]=7 If the first argument is NOT a variable the form (A1) is invoked. print(mss_val_fst@_FillValue); mss_val_fst@_FillValue, size = 1 NC_FLOAT, value = -999 print("This function \t is monotonic\n"); This function is monotonic print(att_var@float_att) att_var@float_att, size = 7 NC_FLOAT, value = 73, 72, 71, 70.01, 69.001, 68.01, 67.01 print(lon*10.0) lon, size = 4 NC_DOUBLE, value = 0, 900, 1800, 2700 If the format string is specified then the results from (A) and (A1) forms are the same print(lon_2D_rrg,"%3.2f,"); 0.00,0.00,180.00,0.00,180.00,0.00,180.00,0.00, print(lon*10.0,"%g,") 0,900,1800,2700, print(att_var@float_att,"%g,") 73,72,71,70.01,69.001,68.01,67.01, *sprint() and sprint4() * These functions work in an identical fashion to (A1) except that ‘sprint()’ outputs a regular netCDF3 ‘NC_CHAR’ attribute and ‘sprint4()’ outputs a netCDF4 ‘NC_STRING’ attribute time@units=sprint(yyyy,"days since %d-1-1") bnd@num=sprint4(bnd_idx,"Band number=%d") time@arr=sprint4(time,"%.2f,") // "1.00,2.00,3.00,4.00,5.00,6.00,7.00,8.00,9.00,10.00," You can also use ‘sprint4()’ to convert a ‘NC_CHAR’ string to a ‘NC_STRING’ string, and ‘sprint()’ to convert a ‘NC_STRING’ to a ‘NC_CHAR’: lat_1D_rct@long_name = "Latitude for 2D rectangular grid stored as 1D arrays"; // // Convert to NC_STRING lat_1D_rct@long_name = sprint4(lat_1D_rct@long_name) *Hyperslab a netCDF string * It is possible to index into an ‘NC_CHAR’ string, similar to a C-String. Unlike a C-String, however, an ‘NC_CHAR’ string has no null-character to mark its termination. On the other hand, one CANNOT index into an ‘NC_STRING’ string. One must must convert to an ‘NC_CHAR’ first. global@greeting="hello world!!!" @h=@greeting(0:4); // "hello" @w=@greeting(6:11); // "world" // can use negative inidices @x=@greeting(-3:-1); // "!!!" // can use stride @n=@greeting(::2); // "hlowrd!" // concatenation global@new_greeting=push(@h, " users !!!"); // "hello users!!!" @institution="hotel california"s; @h=@institution(0:4); // BAD // convert NC_STRING to NC_CHAR @is=sprint(@institution); @h=@is(0:4); // "hotel" // convert NC_CHAR to NC_STRING @h=sprint4(@h); *get_vars_in() & get_vars_out()* att_lst=get_vars_in(att_regexp?) att_lst=get_vars_out(att_regexp?) These functions are used to create a list of vars in Input or Output. The optional arg 'att_regexp'. Can be an ‘NC_CHAR’ att or a ‘NC_STRING’ att. If ‘NC_CHAR’ then only a single reg-exp can be specified. If ‘NC_STRING’ then multiple reg-exp can be specified. The output is allways an ‘NC_STRING’ att. The matching works in an identical fashion to the -v switch in ncks. if there is no arg then all vars are returned. @slist=get_vars_in("^time"); // "time", "time_bnds", "time_lon", "time_udunits" // Use NC_STRINGS @regExp={".*_bnd"s,".*_grd"s} @slist=get_vars_in(@regExp); // "lat_bnd", "lat_grd", "lev_bnd", "lon_grd", "time_bnds", "cnv_CF_grd" File: nco.info, Node: Missing values ncap2, Next: Methods and functions, Prev: Print & String methods, Up: ncap2 netCDF Arithmetic Processor 4.1.11 Missing values ncap2 --------------------------- Missing values operate slightly differently in ‘ncap2’ Consider the expression where op is any of the following operators (excluding '=') Arithmetic operators ( * / % + - ^ ) Binary Operators ( > >= < <= == != == || && >> << ) Assign Operators ( += -= /= *= ) var1 'op' var2 If var1 has a missing value then this is the value used in the operation, otherwise the missing value for var2 is used. If during the element-by-element operation an element from either operand is equal to the missing value then the missing value is carried through. In this way missing values 'percolate' or propagate through an expression. Missing values associated with Output variables are stored in memory and are written to disk after the script finishes. During script execution its possible (and legal) for the missing value of a variable to take on several different values. # Consider the variable: int rec_var_int_mss_val_int(time); =-999,2,3,4,5,6,7,8,-999,-999; rec_var_int_mss_val_int:_FillValue = -999; n2=rec_var_int_mss_val_int + rec_var_int_mss_val_int.reverse($time); n2=-999,-999,11,11,11,11,11,11,999,-999; The following methods query or manipulate missing value (aka ‘_FillValue’ information associated with a variable. The methods that "manipulate" only succeed on variables in Output. ‘set_miss(expr)’ The numeric argument EXPR becomes the new missing value, overwriting the old missing value, if any. The argument given is converted if necessary to the variable's type. NB: This only changes the missing value attribute. Missing values in the original variable remain unchanged, and thus are no long considered missing values. They are effectively "orphaned". Thus ‘set_miss()’ is normally used only when creating new variables. The intrinsic function ‘change_miss()’ (see below) is typically used to edit values of existing variables. ‘change_miss(expr)’ Sets or changes (any pre-existing) missing value attribute and missing data values to EXPR. NB: This is an expensive function since all values must be examined. Use this function when changing missing values for pre-existing variables. ‘get_miss()’ Returns the missing value of a variable. If the variable exists in Input and Output then the missing value of the variable in Output is returned. If the variable has no missing value then an error is returned. ‘delete_miss()’ Delete the missing value associated with a variable. ‘number_miss()’ Count the number of missing values a variable contains. ‘has_miss()’ Returns 1 (True) if the variable has a missing value associated with it. else returns 0 (False) ‘missing()’ This function creates a True/False mask array of where the missing value is set. It is syntatically equivalent to ‘(var_in == var_in.get_miss())’, except that requires deleting the missing value before-hand. th=three_dmn_var_dbl; th.change_miss(-1e10d); /* Set values less than 0 or greater than 50 to missing value */ where(th < 0.0 || th > 50.0) th=th.get_miss(); # Another example: new[$time,$lat,$lon]=1.0; new.set_miss(-997.0); // Extract all elements evenly divisible by 3 where (three_dmn_var_dbl%3 == 0) new=three_dmn_var_dbl; elsewhere new=new.get_miss(); // Print missing value and variable summary mss_val_nbr=three_dmn_var_dbl.number_miss(); print(three_dmn_var_dbl@_FillValue); print("Number of missing values in three_dmn_var_dbl: "); print(mss_val_nbr,"%d"); print(three_dmn_var_dbl); // Find total number of missing values along dims $lat and $lon mss_ttl=three_dmn_var_dbl.missing().ttl($lat,$lon); print(mss_ttl); // 0, 0, 0, 8, 0, 0, 0, 1, 0, 2 ; ‘simple_fill_miss(var)’ This function takes a variable and attempts to fill missing values using an average of up to the 4 nearest neighbour grid points. The method used is iterative (up to 1000 cycles). For very large areas of missing values results can be unpredictable. The given variable must be at least 2D; and the algorithm assumes that the last two dims are lat/lon or y/x ‘weighted_fill_miss(var)’ Weighted_fill_miss is more sophisticated. Up to 8 nearest neighbours are used to calculate a weighted average. The weighting used is the inverse square of distance. Again the method is iterative (up to 1000 cycles). The area filled is defined by the final two dims of the variable. In addition this function assumes the existance of coordinate vars the same name as the last two dims. if it doesn't find these dims it will gently exit with warning. File: nco.info, Node: Methods and functions, Next: RAM variables, Prev: Missing values ncap2, Up: ncap2 netCDF Arithmetic Processor 4.1.12 Methods and functions ---------------------------- The convention within this document is that methods can be used as functions. However, functions are not and cannot be used as methods. Methods can be daisy-chained d and their syntax is cleaner than functions. Method names are reserved words and CANNOT be used as variable names. The command ‘ncap2 -f’ shows the complete list of methods available on your build. n2=sin(theta) n2=theta.sin() n2=sin(theta)^2 + cos(theta)^2 n2=theta.sin().pow(2) + theta.cos()^2 This statement chains together methods to convert three_dmn_var_sht to type double, average it, then convert this back to type short: three_avg=three_dmn_var_sht.double().avg().short(); *Aggregate Methods * These methods mirror the averaging types available in ‘ncwa’. The arguments to the methods are the dimensions to average over. Specifying no dimensions is equivalent to specifying all dimensions i.e., averaging over all dimensions. A masking variable and a weighting variable can be manually created and applied as needed. ‘avg()’ Mean value ‘sqravg()’ Square of the mean ‘avgsqr()’ Mean of sum of squares ‘max()’ Maximum value ‘min()’ Minimum value ‘mabs()’ Maximum absolute value ‘mebs()’ Mean absolute value ‘mibs()’ Minimum absolute value ‘rms()’ Root-mean-square (normalize by N) ‘rmssdn()’ Root-mean square (normalize by N-1) ‘tabs() or ttlabs()’ Sum of absolute values ‘ttl() or total() or sum()’ Sum of values // Average a variable over time four_time_avg=four_dmn_rec_var($time); *Packing Methods * For more information see *note Packed data:: and *note ncpdq netCDF Permute Dimensions Quickly:: ‘pack() & pack_short()’ The default packing algorithm is applied and variable is packed to ‘NC_SHORT’ ‘pack_byte()’ Variable is packed to ‘NC_BYTE’ ‘pack_short()’ Variable is packed to ‘NC_SHORT’ ‘pack_int()’ Variable is packed to ‘NC_INT’ ‘unpack()’ The standard unpacking algorithm is applied. NCO automatically unpacks packed data before arithmetically modifying it. After modification NCO stores the unpacked data. To store it as packed data again, repack it with, e.g., the ‘pack()’ function. To ensure that ‘temperature’ is packed in the output file, regardless of whether it is packed in the input file, one uses, e.g., ncap2 -s 'temperature=pack(temperature-273.15)' in.nc out.nc All the above pack functions also take the additional two arguments ‘scale_factor, add_offset’. Both arguments must be included: ncap2 -v -O -s 'rec_pck=pack(three_dmn_rec_var,-0.001,40.0);' in.nc foo.nc *Basic Methods * These methods work with variables and attributes. They have no arguments. ‘size()’ Total number of elements ‘ndims()’ Number of dimensions in variable ‘type()’ Returns the netcdf type (see previous section) ‘exists()’ Return 1 (true) if var or att is present in I/O else return 0 (false) ‘getdims()’ Returns an ‘NC_STRING’ attribute of all the dim names of a variable *Utility Methods * These functions are used to manipulate missing values and RAM variables. *note Missing values ncap2:: ‘set_miss(expr)’ Takes one argument, the missing value. Sets or overwrites the existing missing value. The argument given is converted if necessary to the variable type. (NB: pre-existing missing values, if any, are not converted). ‘change_miss(expr)’ Changes the missing value elements of the variable to the new missing value (NB: an expensive function). ‘get_miss()’ Returns the missing value of a variable in Input or Output ‘delete_miss()’ Deletes the missing value associated with a variable. ‘has_miss()’ Returns 1 (True) if the variable has a missing else returns 0 (False) ‘number_miss’ Returns the number of missing values a variable contains ‘ram_write()’ Writes a RAM variable to disk i.e., converts it to a regular disk type variable ‘ram_delete()’ Deletes a RAM variable or an attribute *PDQ Methods * See *note ncpdq netCDF Permute Dimensions Quickly:: ‘reverse(dim args)’ Reverse the dimension ordering of elements in a variable. ‘permute(dim args)’ Re-shape variables by re-ordering the dimensions. All the dimensions of the variable must be specified in the arguments. A limitation of this permute (unlike ‘ncpdq’) is that the record dimension cannot be re-assigned. // Swap dimensions about and reorder along lon lat_2D_rrg_new=lat_2D_rrg.permute($lon,$lat).reverse($lon); lat_2D_rrg_new=0,90,-30,30,-30,30,-90,0 *Type Conversion Methods and Functions * These methods allow ‘ncap2’ to convert variables and attributes to the different netCDF types. For more details on automatic and manual type conversion see (*note Type Conversion::). netCDF4 types are only available if you have compiled/links NCO with the netCDF4 library and the Output file is HDF5. ‘*netCDF3/4 Types*’ ‘byte()’ convert to ‘NC_BYTE’, a signed 1-byte integer ‘char()’ convert to ‘NC_CHAR’, an ISO/ASCII character ‘short()’ convert to ‘NC_SHORT’, a signed 2-byte integer ‘int()’ convert to ‘NC_INT’, a signed 4-byte integer ‘float()’ convert to ‘NC_FLOAT’, a single-precision (4-byte) floating-point number ‘double()’ convert to ‘NC_DOUBLE’, a double-precision (8-byte) floating-point number ‘*netCDF4 Types*’ ‘ubyte()’ convert to ‘NC_UBYTE’, an unsigned 1-byte integer ‘ushort()’ convert to ‘NC_USHORT’, an unsigned 2-byte integer ‘uint()’ convert to ‘NC_UINT’, an unsigned 4-byte integer ‘int64()’ convert to ‘NC_INT64’, a signed 8-byte integer ‘uint64()’ convert to ‘NC_UINT64’, an unsigned 8-byte integer You can also use the ‘convert()’ method to do type conversion. This takes an integer agument. For convenience, ‘ncap2’ defines the netCDF pre-processor tokens as RAM variables. For example you may wish to convert a non-floating point variable to the same type as another variable. lon_type=lon.type(); if(time.type() != NC_DOUBLE && time.type() != NC_FLOAT) time=time.convert(lon_type); *Intrinsic Mathematical Methods * The list of mathematical methods is system dependant. For the full list *note Intrinsic mathematical methods:: All the mathematical methods take a single argument except ‘atan2()’ and ‘pow()’ which take two. If the operand type is less than _float_ then the result will be of type _float_. Arguments of type _double_ yield results of type _double_. Like the other methods, you are free to use the mathematical methods as functions. n1=pow(2,3.0f) // n1 type float n2=atan2(2,3.0) // n2 type double n3=1/(three_dmn_var_dbl.cos().pow(2))-tan(three_dmn_var_dbl)^2; // n3 type double File: nco.info, Node: RAM variables, Next: Where statement, Prev: Methods and functions, Up: ncap2 netCDF Arithmetic Processor 4.1.13 RAM variables -------------------- Unlike regular variables, RAM variables are never written to disk. Hence using RAM variables in place of regular variables (especially within loops) significantly increases execution speed. Variables that are frequently accessed within ‘for’ or ‘where’ clauses provide the greatest opportunities for optimization. To declare and define a RAM variable simply prefix the variable name with an asterisk (‘*’) when the variable is declared/initialized. To delete RAM variables (and recover their memory) use the ‘ram_delete()’ method. To write a RAM variable to disk (like a regular variable) use ‘ram_write()’. *temp[$time,$lat,$lon]=10.0; // Cast *temp_avg=temp.avg($time); // Regular assign temp_avg.ram_write(); // Write Variable to output temp.ram_delete(); // Delete RAM variable // Create and increment a RAM variable from "one" in Input *one++; // Create RAM variables from the variables three and four in Input. // Multiply three by 10 and add it to four. *four+=*three*=10; // three=30, four=34 File: nco.info, Node: Where statement, Next: Loops, Prev: RAM variables, Up: ncap2 netCDF Arithmetic Processor 4.1.14 Where statement ---------------------- The ‘where()’ statement combines the definition and application of a mask and can lead to succinct code. The syntax of a ‘where()’ statement is: // Single assign ('elsewhere' is optional) where(mask) var1=expr1; elsewhere var1=expr2; // Multiple assigns where(mask){ var1=expr1; var2=expr2; ... }elsewhere{ var1=expr3 var2=expr4 var3=expr5; ... } • The only expression allowed in the predicate of a where is assign, i.e., 'var=expr'. This assign differs from a regular ‘ncap2’ assign. The LHS var must already exist in Input or Output. The RHS expression must evaluate to a scalar or a variable/attribute of the same size as the LHS variable. • Consider when both the LHS and RHS are variables: For every element where mask condition is True, the corresponding LHS variable element is re-assigned to its partner element on the RHS. In the elsewhere part the mask is logically inverted and the assign process proceeds as before. • If the mask dimensions are a subset of the LHS variable's dimensions, then it is made to conform; if it cannot be made to conform then script execution halts. • Missing values in the mask evaluate to False in the where code/block statement and to True in the elsewhere block/statement. • LHS variable elements set to missing value are treated just like any other elements and can be re-assigned as the mask dictates • LHS variable cannot include subscripts. If they do script execution will terminate. See below example for work-araound. Consider the variables ‘float lon_2D_rct(lat,lon);’ and ‘float var_msk(lat,lon);’. Suppose we wish to multiply by two the elements for which ‘var_msk’ equals 1: where(var_msk == 1) lon_2D_rct=2*lon_2D_rct; Suppose that we have the variable ‘int RDM(time)’ and that we want to set its values less than 8 or greater than 80 to 0: where(RDM < 8 || RDM > 80) RDM=0; To use ‘where’ on a variable hyperslab, define and use a temporary variable, e.g., *var_tmp=var2(:,0,:,:); where (var1 < 0.5) var_tmp=1234; var2(;,0,:,;)=var_tmp; ram_delete(var_tmp); Consider irregularly gridded data, described using rank 2 coordinates: ‘double lat(south_north,east_west)’, ‘double lon(south_north,east_west)’, ‘double temperature(south_north,east_west)’. This type of structure is often found in regional weather/climate model (such as WRF) output, and in satellite swath data. For this reason we call it "Swath-like Data", or SLD. To find the average temperature in a region bounded by [LAT_MIN,LAT_MAX] and [LON_MIN,LON_MAX]: temperature_msk[$south_north,$east_west]=0.0; where((lat >= lat_min && lat <= lat_max) && (lon >= lon_min && lon <= lon_max)) temperature_msk=temperature; elsewhere temperature_msk=temperature@_FillValue; temp_avg=temperature_msk.avg(); temp_max=temperature.max(); For North American Regional Reanalysis (NARR) data (example dataset (http://dust.ess.uci.edu/diwg/narr_uwnd.199605.nc)) the procedure looks like this ncap2 -O -v -S ~/narr.nco ${DATA}/hdf/narr_uwnd.199605.nc ~/foo.nc where ‘narr.nco’ is an ‘ncap2’ script like this: /* North American Regional Reanalysis (NARR) Statistics NARR stores grids with 2-D latitude and longitude, aka Swath-like Data (SLD) Here we work with three variables: lat(y,x), lon(y,x), and uwnd(time,level,y,x); To study sub-regions of SLD, we use masking techniques: 1. Define mask as zero times variable to be masked Then mask automatically inherits variable attributes And average below will inherit mask attributes 2. Optionally, create mask as RAM variable (as below with asterisk *) NCO does not write RAM variable to output Masks are often unwanted, and can be big, so this speeds execution 3. Example could be extended to preserve mean lat and lon of sub-region Follow uwnd example to do this: lat_sk=0.0*lat ... lat_avg=lat.avg($y,$x) */ *uwnd_msk=0.0*uwnd; where((lat >= 35.6 && lat <= 37.0) && (lon >= -100.5 && lon <= -99.0)) uwnd_msk=uwnd; elsewhere uwnd_msk=uwnd@_FillValue; // Average only over horizontal dimensions x and y (preserve level and time) uwnd_avg=uwnd_msk.avg($y,$x); Stripped of comments and formatting, this example is a three-statement script executed by a one-line command. NCO needs only this meagre input to unpack and copy the input data and attributes, compute the statistics, and then define and write the output file. Unless the comments pointed out that wind variable (‘uwnd’) was four-dimensional and the latitude/longitude grid variables were both two-dimensional, there would be no way to tell. This shows how NCO hides from the user the complexity of analyzing multi-dimensional SLD. We plan to extend such SLD features to more operators soon. File: nco.info, Node: Loops, Next: Include files, Prev: Where statement, Up: ncap2 netCDF Arithmetic Processor 4.1.15 Loops ------------ ‘ncap2’ supplies ‘for()’ loops and ‘while()’ loops. They are completely unoptimized so use them only with RAM variables unless you want thrash your disk to death. To break out of a loop use the ‘break’ command. To iterate to the next cycle use the ‘continue’ command. // Set elements in variable double temp(time,lat) // If element < 0 set to 0, if element > 100 set to 100 *sz_idx=$time.size; *sz_jdx=$lat.size; for(*idx=0;idx 100) temp(idx,jdx)=100.0; else if(temp(idx,jdx) < 0) temp(idx,jdx)=0.0; // Are values of co-ordinate variable double lat(lat) monotonic? *sz=$lat.size; for(*idx=1;idx ~/ncap2_foo.nco << 'EOF' // Purpose: Save irregular 1-D regions based on variable values // Included in NCO User Guide at http://nco.sf.net/nco.html#sort /* NB: Single quotes around EOF above turn off shell parameter expansion in "here documents". This in turn prevents the need for protecting dollarsign characters in NCO scripts with backslashes when the script is cut-and-pasted (aka "moused") from an editor or e-mail into a shell console window */ /* Copy coordinates and variable(s) of interest into RAM variable(s) Benefits: 1. ncap2 defines writes all variables on LHS of expression to disk Only exception is RAM variables, which are stored in RAM only Repeated operations on regular variables takes more time, because changes are written to disk copy after every change. RAM variables are only changed in RAM so script works faster RAM variables can be written to disk at end with ram_write() 2. Script permutes variables of interest during processing Safer to work with copies that have different names This discourages accidental, mistaken use of permuted versions 3. Makes this script a more generic template: var_in instead of specific variable names everywhere */ *var_in=one_dmn_rec_var; *crd_in=time; *dmn_in_sz=$time.size; // [nbr] Size of input arrays /* Create all other "intermediate" variables as RAM variables to prevent them from cluttering the output file. Mask flag and sort map are same size as variable of interest */ *msk_flg=var_in; *srt_map=var_in; /* In this example we mask for all values evenly divisible by 3 This is the key, problem-specific portion of the template Replace this where() condition by that for your problem Mask variable is Boolean: 1=Meets condition, 0=Fails condition */ where(var_in % 3 == 0) msk_flg=1; elsewhere msk_flg=0; // print("msk_flg = ");print(msk_flg); // For debugging... /* The sort() routine is overloaded, and takes one or two arguments The second argument (optional) is the "sort map" (srt_map below) Pass the sort map by reference, i.e., prefix with an ampersand If the sort map does not yet exist, then it will be created and returned as an integer type the same shape as the input variable. The output of sort(), on the LHS, is a sorted version of the input msk_flg is not needed in its original order after sort() Hence we use msk_flg as both input to and output from sort() Doing this prevents the need to define a new, unneeded variable */ msk_flg=sort(msk_flg,&srt_map); // Count number of valid points in mask by summing the one's *msk_nbr=msk_flg.total(); // Define output dimension equal in size to number of valid points defdim("crd_out",msk_nbr); /* Now sort the variable of interest using the sort map and remap() The output, on the LHS, is the input re-arranged so that all points meeting the mask condition are contiguous at the end of the array Use same srt_map to hyperslab multiple variables of the same shape Remember to apply srt_map to the coordinate variables */ crd_in=remap(crd_in,srt_map); var_in=remap(var_in,srt_map); /* Hyperslab last msk_nbr values of variable(s) of interest */ crd_out[crd_out]=crd_in((dmn_in_sz-msk_nbr):(dmn_in_sz-1)); var_out[crd_out]=var_in((dmn_in_sz-msk_nbr):(dmn_in_sz-1)); /* NB: Even though we created all variables possible as RAM variables, the original coordinate of interest, time, is written to the ouput. I'm not exactly sure why. For now, delete it from the output with: ncks -O -x -v time ~/foo.nc ~/foo.nc */ EOF ncap2 -O -v -S ~/ncap2_foo.nco ~/nco/data/in.nc ~/foo.nc ncks -O -x -v time ~/foo.nc ~/foo.nc ncks ~/foo.nc Here is an extended example of how to use ‘ncap2’ features to sort multi-dimensional arrays based on the coordinate values along a single dimension. cat > ~/ncap2_foo.nco << 'EOF' /* Purpose: Sort multi-dimensional array based on coordinate values This example sorts the variable three_dmn_rec_var(time,lat,lon) based on the values of the time coordinate. */ // Included in NCO User Guide at http://nco.sf.net/nco.html#sort // Randomize the time coordinate time=10.0*gsl_rng_uniform(time); //print("original randomized time = \n");print(time); /* The sort() routine is overloaded, and takes one or two arguments The first argument is a one dimensional array The second argument (optional) is the "sort map" (srt_map below) Pass the sort map by reference, i.e., prefix with an ampersand If the sort map does not yet exist, then it will be created and returned as an integer type the same shape as the input variable. The output of sort(), on the LHS, is a sorted version of the input */ time=sort(time,&srt_map); //print("sorted time (ascending order) and associated sort map =\n");print(time);print(srt_map); /* sort() always sorts in ascending order The associated sort map therefore re-arranges the original, randomized time array into ascending order. There are two methods to obtain the descending order the user wants 1) We could solve the problem in ascending order (the default) and then apply the reverse() method to re-arrange the results. 2) We could change the sort map to return things in descending order of time and solve the problem directly in descending order. */ // Following shows how to do method one: /* Expand the sort map to srt_map_3d, the size of the data array 1. Use data array to provide right shape for the expanded sort map 2. Coerce data array into an integer so srt_map_3d is an integer 3. Multiply data array by zero so 3-d map elements are all zero 4. Add the 1-d sort map to the 3-d sort map (NCO automatically resizes) 5. Add the spatial (lat,lon) offsets to each time index 6. de-sort using the srt_map_3d 7. Use reverse to obtain descending in time order Loops could accomplish the same thing (exercise left for reader) However, loops are slow for large datasets */ /* Following index manipulation requires understanding correspondence between 1-d (unrolled, memory order of storage) and access into that memory as a multidimensional (3-d, in this case) rectangular array. Key idea to understand is how dimensionality affects offsets */ // Copy 1-d sort map into 3-d sort map srt_map_3d=(0*int(three_dmn_rec_var))+srt_map; // Multiply base offset by factorial of lesser dimensions srt_map_3d*=$lat.size*$lon.size; lon_idx=array(0,1,$lon); lat_idx=array(0,1,$lat)*$lon.size; lat_lon_idx[$lat,$lon]=lat_idx+lon_idx; srt_map_3d+=lat_lon_idx; print("sort map 3d =\n");print(srt_map_3d); // Use remap() to re-map the data three_dmn_rec_var=remap(three_dmn_rec_var,srt_map_3d); // Finally, reverse data so time coordinate is descending time=time.reverse($time); //print("sorted time (descending order) =\n");print(time); three_dmn_rec_var=three_dmn_rec_var.reverse($time); // Method two: Key difference is srt_map=$time.size-srt_map-1; EOF ncap2 -O -v -S ~/ncap2_foo.nco ~/nco/data/in.nc ~/foo.nc File: nco.info, Node: UDUnits script, Next: Vpointer, Prev: Sort methods, Up: ncap2 netCDF Arithmetic Processor 4.1.18 UDUnits script --------------------- As of NCO version 4.6.3 (December, 2016), ncap2 includes support for UDUnits conversions. The function is called ‘udunits’. Its syntax is varOut=udunits(varIn,"UnitsOutString") The ‘udunits()’ function looks for the attribute of ‘varIn@units’ and fails if it is not found. A quirk of this function that due to attribute propagation ‘varOut@units’ will be overwritten by ‘varIn@units’. It is best to re-initialize this attribute AFTER the call. In addition if ‘varIn@units’ is of the form ‘"time_interval since basetime"’ then the calendar attribute ‘varIn@calendar’ will read it. If it does not exist then the calendar used defaults to mixed Gregorian/Julian as defined by UDUnits. If ‘varIn’ is not a floating-point type then it is promoted to ‘NC_DOUBLE’ for the system call in the UDUnits library, and then demoted back to its original type after. T[lon]={0.0,100.0,150.0,200.0}; T@units="Celsius"; // Overwrite variable T=udunits(T,"kelvin"); print(T); // 273.15, 373.15, 423.15, 473.15 ; T@units="kelvin"; // Rebase coordinate days to hours timeOld=time; print(timeOld); // 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 ; timeOld@units="days since 2012-01-30"; @units="hours since 2012-02-01 01:00"; timeNew=udunits(timeOld, @units); timeNew@units=@units; print(timeNew); // -25, -1, 23, 47, 71, 95, 119, 143, 167, 191 ; tOld=time; // NB: Calendar=365_day has NO Leap year tOld@calendar="365_day"; tOld@units="minutes since 2012-02-28 23:58:00.00"; @units="seconds since 2012-03-01 00:00"; tNew=udunits(tOld, @units); tNew@units=@units; print(tNew); // -60, 0, 60, 120, 180, 240, 300, 360, 420, 480 strftime() The ‘var_str=strtime(var_time,fmt_sng)’ method takes a time-based variable and a format string and returns an ‘NC_STRING’ variable (of the same shape as var_time) of time-stamps in the form specified by 'fmt_sng'. In order to run this command output type must be netCDF4. ncap2 -4 -v -O -s 'time_str=strftime(time,"%Y-%m-%d");' in.nc foo.nc time_str="1964-03-13", "1964-03-14", "1964-03-15", "1964-03-16", "1964-03-17", "1964-03-18", "1964-03-19", "1964-03-20", "1964-03-21", "1964-03-22" ; Under the hood there are a few steps invoved: First the method reads ‘var_time@units’ and ‘var_time@calendar’ (if present) then converts ‘var_time’ to ‘seconds since 1970-01-01’. It then converts these possibly UTC seconds to the standard struture ‘struct *tm’. Finally ‘strftime()’ is called with ‘fmt_sng’ and the ‘*tm’ struct. The C-standard ‘strftime()’ is used as defined in ‘time.h’. If the method is called without FMT_SNG then the following default is used: ‘"%Y-%m-%d %H:%M:%S"’. The method ‘regular’ takes a single var argument and uses the above default string. ncap2 -4 -v -O -s 'time_str=regular(time);' in.nc foo.nc time_str = "1964-03-13 21:09:00", "1964-03-14 21:09:00", "1964-03-15 21:09:00", "1964-03-16 21:09:00", "1964-03-17 21:09:00", "1964-03-18 21:09:00", "1964-03-19 21:09:00", "1964-03-20 21:09:00", "1964-03-21 21:09:00", "1964-03-22 21:09:00" ; Another working example ncap2 -v -O -s 'ts=strftime(frametime(0),"%Y-%m-%d/envlog_netcdf_L1_ua-mac_%Y-%m-%d.nc");' in.nc out.nc ts="2017-08-11/envlog_netcdf_L1_ua-mac_2017-08-11.nc" File: nco.info, Node: Vpointer, Next: Irregular grids, Prev: UDUnits script, Up: ncap2 netCDF Arithmetic Processor 4.1.19 Vpointer --------------- A variable-pointer or _vpointer_ is a pointer to a variable or attribute. It is most useful when one needs to apply a set of operations on a list of variables. For example, after regular processing one may wish to set the ‘_FillValue’ of all ‘NC_FLOAT’ variables to a particular value, or to create min/max attributes for all 3D variables of type ‘NC_DOUBLE’. A vpointer does not 'point' to a memory location in the C/C++ sense. Rather, a vpointer is a text attribute that contains the name of a variable. To use the pointer simply prefix the pointer with ‘*’. Then, most places where you use ‘VAR_ID’ you can use ‘*vpointer_nm’. There are a variety of ways to maintain a list of strings in ‘ncap2’. The easiest method is to use an ‘NC_STRING’ attribute. The example below illustrates a vpointer of type ‘NC_CHAR’. Remember, attributes that start with ‘@’ are 'global', e.g., ‘@vpx’ is short for ‘global@vpx’. idx=9; idy=20; t2=time; // The vpointer is a global attribute named "idx" global@vpx="idx"; // Increment idx by one *global@vpx++; print(idx); // Multiply by 5 *@vpx*=5; // idx now 50 print(idx); // Add 200 (long method) *@vpx=*@vpx+200; // idx now 250 print(idx); @vpy="idy"; // Add idx, idy to get idz idz=*@vpx+*@vpy; // idz == 270 print(idz); // We can also reference variables in the input file // Can use an existing attribute pointer since attributes are not written // to the netCDF file until after the script has finished. @vpx="three_dmn_var"; // Convert this variable to type NC_DOUBLE and write it to ouptut *@vpx=*@vpx.double(); The following script writes to the output files all variables that are of type ‘NC_DOUBLE’ and that have at least two dimensions. It then changes their ‘_FillValue’ to ‘1.0E-9’. The function ‘get_vars_in()’ creates an ‘NC_STRING’ attribute that contains all of the variable names in the input file. Note that a vpointer must be a plain attribute, NOT an a attribute expression. In the script below using ‘*all(idx)’ would be a fundamental mistake. In the below example the vpointer ‘var_nm’ is of type ‘NC_STRING’. @all=get_vars_in(); *sz=@all.size(); *idx=0; for(idx=0;idx= 2){ *@var_nm=*@var_nm; *@var_nm.change_miss(1e-9d); } } The following script writes to the output file all 3D/4D variables of type ‘NC_FLOAT’. Then for each variable it calculates a ‘range’ attribute that contains the maximum and minimum values, and a ‘total’ attribute that is the sum of all the elements. In this example vpointers are used to 'point' to attributes. @all=get_vars_in(); *sz=@all.size(); for(*idx=0;idx= 3){ *@var_nm=*@var_nm.float(); /* The push function also takes a call-by-ref attribute. If it does not already exist then it will be created. The call below pushes an NC_STRING to an attribute so the end result is a list of NC_STRINGs. */ push(&@prc,@var_nm); } } *sz=@prc.size(); for(*idx=0;idx 1 often use missing values to indicated empty points. For example, so-called "staggered grids" will use fewer east_west points near the poles and more near the equator. netCDF only accepts rectangular arrays so space must be allocated for the maximum number of east_west points at all latitudes. Then the application writes missing values into the unused points near the poles. We demonstrate the ‘ncap2’ analysis technique for irregular regions by constructing a mask for an R=2 grid. We wish to find, say, the mean temperature within [LAT_MIN,LAT_MAX] and [LON_MIN,LON_MAX]: ncap2 -s 'mask_var= (lat >= lat_min && lat <= lat_max) && \ (lon >= lon_min && lon <= lon_max);' in.nc out.nc Arbitrarily shaped regions can be defined by more complex conditional statements. Once defined, masks can be applied to specific variables, and to entire files: ncap2 -s 'temperature_avg=(temperature*mask_var).avg()' in.nc out.nc ncwa -a lat,lon -m mask_var -w area in.nc out.nc Crafting such commands on the command line is possible though unwieldy. In such cases, a script is often cleaner and allows you to document the procedure: cat > ncap2.in << 'EOF' mask_var = (lat >= lat_min && lat <= lat_max) && (lon >= lon_min && > lon <= lon_max); if(mask_var.total() > 0){ // Check that mask contains some valid values temperature_avg=(temperature*mask_var).avg(); // Average temperature temperature_max=(temperature*mask_var).max(); // Maximum temperature } EOF ncap2 -S ncap2.in in.nc out.nc Grids like those produced by the WRF model are complex because one must use global metadata to determine the grid staggering and offsets to translate ‘XLAT’ and ‘XLONG’ into real latitudes, longitudes, and missing points. The WRF grid documentation should describe this. For WRF files creating regional masks looks, in general, like mask_var = (XLAT >= lat_min && XLAT <= lat_max) && (XLONG >= lon_min && XLONG <= lon_max); A few notes: Irregular regions are the union of arrays of lat/lon min/max's. The mask procedure is identical for all R. File: nco.info, Node: Bilinear interpolation, Next: GSL special functions, Prev: Irregular grids, Up: ncap2 netCDF Arithmetic Processor 4.1.21 Bilinear interpolation ----------------------------- As of version 4.0.0 NCO has internal routines to perform bilinear interpolation on gridded data sets. In mathematics, bilinear interpolation is an extension of linear interpolation for interpolating functions of two variables on a regular grid. The idea is to perform linear interpolation first in one direction, and then again in the other direction. Suppose we have an irregular grid of data ‘temperature[lat,lon]’, with co-ordinate vars ‘lat[lat], lon[lon]’. We wish to find the temperature at an arbitary point [X,Y] within the grid. If we can locate lat_min,lat_max and lon_min,lon_max such that ‘lat_min <= X <= lat_max’ and ‘lon_min <= Y <= lon_max’ then we can interpolate in two dimensions the temperature at [X,Y]. The general form of the ‘ncap2’ interpolation function is var_out=bilinear_interp(grid_in,grid_out,grid_out_x,grid_out_y,grid_in_x,grid_in_y) where ‘grid_in’ Input function data. Usually a two dimensional variable. It must be of size ‘grid_in_x.size()*grid_in_y.size()’ ‘grid_out’ This variable is the shape of ‘var_out’. Usually a two dimensional variable. It must be of size ‘grid_out_x.size()*grid_out_y.size()’ ‘grid_out_x’ X output values ‘grid_out_y’ Y output values ‘grid_in_x’ X input values values. Must be monotonic (increasing or decreasing). ‘grid_in_y’ Y input values values. Must be monotonic (increasing or decreasing). Prior to calculations all arguments are converted to type ‘NC_DOUBLE’. After calculations ‘var_out’ is converted to the input type of ‘grid_in’. Suppose the first part of an ‘ncap2’ script is defdim("X",4); defdim("Y",5); // Temperature T_in[$X,$Y]= {100, 200, 300, 400, 500, 101, 202, 303, 404, 505, 102, 204, 306, 408, 510, 103, 206, 309, 412, 515.0 }; // Coordinate variables x_in[$X]={0.0,1.0,2.0,3.01}; y_in[$Y]={1.0,2.0,3.0,4.0,5}; Now we interpolate with the following variables: defdim("Xn",3); defdim("Yn",4); T_out[$Xn,$Yn]=0.0; x_out[$Xn]={0.0,0.02,3.01}; y_out[$Yn]={1.1,2.0,3,4}; var_out=bilinear_interp(T_in,T_out,x_out,y_out,x_in,y_in); print(var_out); // 110, 200, 300, 400, // 110.022, 200.04, 300.06, 400.08, // 113.3, 206, 309, 412 ; It is possible to interpolate a single point: var_out=bilinear_interp(T_in,0.0,3.0,4.99,x_in,y_in); print(var_out); // 513.920594059406 *Wrapping and Extrapolation* The function ‘bilinear_interp_wrap()’ takes the same arguments as ‘bilinear_interp()’ but performs wrapping (Y) and extrapolation (X) for points off the edge of the grid. If the given range of longitude is say (25-335) and we have a point at 20 degrees, then the endpoints of the range are used for the interpolation. This is what wrapping means. For wrapping to occur Y must be longitude and must be in the range (0,360) or (-180,180). There are no restrictions on the longitude (X) values, though typically these are in the range (-90,90). This ‘ncap2’ script illustrates both wrapping and extrapolation of end points: defdim("lat_in",6); defdim("lon_in",5); // Coordinate input vars lat_in[$lat_in]={-80,-40,0,30,60.0,85.0}; lon_in[$lon_in]={30, 110, 190, 270, 350.0}; T_in[$lat_in,$lon_in]= {10,40,50,30,15, 12,43,52,31,16, 14,46,54,32,17, 16,49,56,33,18, 18,52,58,34,19, 20,55,60,35,20.0 }; defdim("lat_out",4); defdim("lon_out",3); // Coordinate variables lat_out[$lat_out]={-90,0,70,88.0}; lon_out[$lon_out]={0,190,355.0}; T_out[$lat_out,$lon_out]=0.0; T_out=bilinear_interp_wrap(T_in,T_out,lat_out,lon_out,lat_in,lon_in); print(T_out); // 13.4375, 49.5, 14.09375, // 16.25, 54, 16.625, // 19.25, 58.8, 19.325, // 20.15, 60.24, 20.135 ; File: nco.info, Node: GSL special functions, Next: GSL interpolation, Prev: Bilinear interpolation, Up: ncap2 netCDF Arithmetic Processor 4.1.22 GSL special functions ---------------------------- As of version 3.9.6 (released January, 2009), NCO can link to the GNU Scientific Library (GSL). ‘ncap2’ can access most GSL special functions including Airy, Bessel, error, gamma, beta, hypergeometric, and Legendre functions and elliptical integrals. GSL must be version 1.4 or later. To list the GSL functions available with your NCO build, use ‘ncap2 -f | grep ^gsl’. The function names used by ncap2 mirror their GSL names. The NCO wrappers for GSL functions automatically call the error-handling version of the GSL function when available (1). This allows NCO to return a missing value when the GSL library encounters a domain error or a floating-point exception. The slow-down due to calling the error-handling version of the GSL numerical functions was found to be negligible (please let us know if you find otherwise). Consider the gamma function. The GSL function prototype is ‘int gsl_sf_gamma_e(const double x, gsl_sf_result * result)’ The ‘ncap2’ script would be: lon_in[lon]={-1,0.1,0,2,0.3}; lon_out=gsl_sf_gamma(lon_in); lon_out= _, 9.5135, 4.5908, 2.9915 The first value is set to ‘_FillValue’ since the gamma function is undefined for negative integers. If the input variable has a missing value then this value is used. Otherwise, the default double fill value is used (defined in the netCDF header ‘netcdf.h’ as ‘NC_FILL_DOUBLE = 9.969e+36’). Consider a call to a Bessel function with GSL prototype ‘int gsl_sf_bessel_Jn_e(int n, double x, gsl_sf_result * result)’ An ‘ncap2’ script would be lon_out=gsl_sf_bessel_Jn(2,lon_in); lon_out=0.11490, 0.0012, 0.00498, 0.011165 This computes the Bessel function of order N=2 for every value in ‘lon_in’. The Bessel order argument, an integer, can also be a non-scalar variable, i.e., an array. n_in[lon]={0,1,2,3}; lon_out=gsl_sf_bessel_Jn(n_in,0.5); lon_out= 0.93846, 0.24226, 0.03060, 0.00256 Arguments to GSL wrapper functions in ‘ncap2’ must conform to one another, i.e., they must share the same sub-set of dimensions. For example: ‘three_out=gsl_sf_bessel_Jn(n_in,three_dmn_var_dbl)’ is valid because the variable ‘three_dmn_var_dbl’ has a LON dimension, so ‘n_in’ in can be broadcast to conform to ‘three_dmn_var_dbl’. However ‘time_out=gsl_sf_bessel_Jn(n_in,time)’ is invalid. Consider the elliptical integral with prototype ‘int gsl_sf_ellint_RD_e(double x, double y, double z, gsl_mode_t mode, gsl_sf_result * result)’ three_out=gsl_sf_ellint_RD(0.5,time,three_dmn_var_dbl); The three arguments are all conformable so the above ‘ncap2’ call is valid. The mode argument in the function prototype controls the convergence of the algorithm. It also appears in the Airy Function prototypes. It can be set by defining the environment variable ‘GSL_PREC_MODE’. If unset it defaults to the value ‘GSL_PREC_DOUBLE’. See the GSL manual for more details. export GSL_PREC_MODE=0 // GSL_PREC_DOUBLE export GSL_PREC_MODE=1 // GSL_PREC_SINGLE export GSL_PREC_MODE=2 // GSL_PREC_APPROX The ‘ncap2’ wrappers to the array functions are slightly different. Consider the following GSL prototype ‘int gsl_sf_bessel_Jn_array(int nmin, int nmax, double x, double *result_array)’ b1=lon.double(); x=0.5; status=gsl_sf_bessel_Jn_array(1,4,x,&b1); print(status); b1=0.24226,0.0306,0.00256,0.00016; This calculates the Bessel function of X=0.5 for N=1 to 4. The first three arguments are scalar values. If a non-scalar variable is supplied as an argument then only the first value is used. The final argument is the variable where the results are stored (NB: the ‘&’ indicates this is a call by reference). This final argument must be of type ‘double’ and must be of least size NMAX-NMIN+1. If either of these conditions is not met then then the function returns an error message. The function/wrapper returns a status flag. Zero indicates success. Consider another array function ‘int gsl_sf_legendre_Pl_array(int lmax, double x, double *result_array);’ a1=time.double(); x=0.3; status=gsl_sf_legendre_Pl_array(a1.size()-1, x,&a1); print(status); This call calculates P_L(0.3) for L=0..9. Note that |X|<=1, otherwise there will be a domain error. See the GSL documentation for more details. The GSL functions implemented in NCO are listed in the table below. This table is correct for GSL version 1.10. To see what functions are available on your build run the command ‘ncap2 -f |grep ^gsl’ . To see this table along with the GSL C-function prototypes look at the spreadsheet *doc/nco_gsl.ods*. *GSL NAME* *I* *NCAP FUNCTION CALL* gsl_sf_airy_Ai_e Y gsl_sf_airy_Ai(dbl_expr) gsl_sf_airy_Bi_e Y gsl_sf_airy_Bi(dbl_expr) gsl_sf_airy_Ai_scaled_e Y gsl_sf_airy_Ai_scaled(dbl_expr) gsl_sf_airy_Bi_scaled_e Y gsl_sf_airy_Bi_scaled(dbl_expr) gsl_sf_airy_Ai_deriv_e Y gsl_sf_airy_Ai_deriv(dbl_expr) gsl_sf_airy_Bi_deriv_e Y gsl_sf_airy_Bi_deriv(dbl_expr) gsl_sf_airy_Ai_deriv_scaled_eY gsl_sf_airy_Ai_deriv_scaled(dbl_expr) gsl_sf_airy_Bi_deriv_scaled_eY gsl_sf_airy_Bi_deriv_scaled(dbl_expr) gsl_sf_airy_zero_Ai_e Y gsl_sf_airy_zero_Ai(uint_expr) gsl_sf_airy_zero_Bi_e Y gsl_sf_airy_zero_Bi(uint_expr) gsl_sf_airy_zero_Ai_deriv_eY gsl_sf_airy_zero_Ai_deriv(uint_expr) gsl_sf_airy_zero_Bi_deriv_eY gsl_sf_airy_zero_Bi_deriv(uint_expr) gsl_sf_bessel_J0_e Y gsl_sf_bessel_J0(dbl_expr) gsl_sf_bessel_J1_e Y gsl_sf_bessel_J1(dbl_expr) gsl_sf_bessel_Jn_e Y gsl_sf_bessel_Jn(int_expr,dbl_expr) gsl_sf_bessel_Jn_array Y status=gsl_sf_bessel_Jn_array(int,int,double,&var_out) gsl_sf_bessel_Y0_e Y gsl_sf_bessel_Y0(dbl_expr) gsl_sf_bessel_Y1_e Y gsl_sf_bessel_Y1(dbl_expr) gsl_sf_bessel_Yn_e Y gsl_sf_bessel_Yn(int_expr,dbl_expr) gsl_sf_bessel_Yn_array Y gsl_sf_bessel_Yn_array gsl_sf_bessel_I0_e Y gsl_sf_bessel_I0(dbl_expr) gsl_sf_bessel_I1_e Y gsl_sf_bessel_I1(dbl_expr) gsl_sf_bessel_In_e Y gsl_sf_bessel_In(int_expr,dbl_expr) gsl_sf_bessel_In_array Y status=gsl_sf_bessel_In_array(int,int,double,&var_out) gsl_sf_bessel_I0_scaled_e Y gsl_sf_bessel_I0_scaled(dbl_expr) gsl_sf_bessel_I1_scaled_e Y gsl_sf_bessel_I1_scaled(dbl_expr) gsl_sf_bessel_In_scaled_e Y gsl_sf_bessel_In_scaled(int_expr,dbl_expr) gsl_sf_bessel_In_scaled_arrayY staus=gsl_sf_bessel_In_scaled_array(int,int,double,&var_out) gsl_sf_bessel_K0_e Y gsl_sf_bessel_K0(dbl_expr) gsl_sf_bessel_K1_e Y gsl_sf_bessel_K1(dbl_expr) gsl_sf_bessel_Kn_e Y gsl_sf_bessel_Kn(int_expr,dbl_expr) gsl_sf_bessel_Kn_array Y status=gsl_sf_bessel_Kn_array(int,int,double,&var_out) gsl_sf_bessel_K0_scaled_e Y gsl_sf_bessel_K0_scaled(dbl_expr) gsl_sf_bessel_K1_scaled_e Y gsl_sf_bessel_K1_scaled(dbl_expr) gsl_sf_bessel_Kn_scaled_e Y gsl_sf_bessel_Kn_scaled(int_expr,dbl_expr) gsl_sf_bessel_Kn_scaled_arrayY status=gsl_sf_bessel_Kn_scaled_array(int,int,double,&var_out) gsl_sf_bessel_j0_e Y gsl_sf_bessel_J0(dbl_expr) gsl_sf_bessel_j1_e Y gsl_sf_bessel_J1(dbl_expr) gsl_sf_bessel_j2_e Y gsl_sf_bessel_j2(dbl_expr) gsl_sf_bessel_jl_e Y gsl_sf_bessel_jl(int_expr,dbl_expr) gsl_sf_bessel_jl_array Y status=gsl_sf_bessel_jl_array(int,double,&var_out) gsl_sf_bessel_jl_steed_arrayY gsl_sf_bessel_jl_steed_array gsl_sf_bessel_y0_e Y gsl_sf_bessel_Y0(dbl_expr) gsl_sf_bessel_y1_e Y gsl_sf_bessel_Y1(dbl_expr) gsl_sf_bessel_y2_e Y gsl_sf_bessel_y2(dbl_expr) gsl_sf_bessel_yl_e Y gsl_sf_bessel_yl(int_expr,dbl_expr) gsl_sf_bessel_yl_array Y status=gsl_sf_bessel_yl_array(int,double,&var_out) gsl_sf_bessel_i0_scaled_e Y gsl_sf_bessel_I0_scaled(dbl_expr) gsl_sf_bessel_i1_scaled_e Y gsl_sf_bessel_I1_scaled(dbl_expr) gsl_sf_bessel_i2_scaled_e Y gsl_sf_bessel_i2_scaled(dbl_expr) gsl_sf_bessel_il_scaled_e Y gsl_sf_bessel_il_scaled(int_expr,dbl_expr) gsl_sf_bessel_il_scaled_arrayY status=gsl_sf_bessel_il_scaled_array(int,double,&var_out) gsl_sf_bessel_k0_scaled_e Y gsl_sf_bessel_K0_scaled(dbl_expr) gsl_sf_bessel_k1_scaled_e Y gsl_sf_bessel_K1_scaled(dbl_expr) gsl_sf_bessel_k2_scaled_e Y gsl_sf_bessel_k2_scaled(dbl_expr) gsl_sf_bessel_kl_scaled_e Y gsl_sf_bessel_kl_scaled(int_expr,dbl_expr) gsl_sf_bessel_kl_scaled_arrayY status=gsl_sf_bessel_kl_scaled_array(int,double,&var_out) gsl_sf_bessel_Jnu_e Y gsl_sf_bessel_Jnu(dbl_expr,dbl_expr) gsl_sf_bessel_Ynu_e Y gsl_sf_bessel_Ynu(dbl_expr,dbl_expr) gsl_sf_bessel_sequence_Jnu_eN gsl_sf_bessel_sequence_Jnu gsl_sf_bessel_Inu_scaled_eY gsl_sf_bessel_Inu_scaled(dbl_expr,dbl_expr) gsl_sf_bessel_Inu_e Y gsl_sf_bessel_Inu(dbl_expr,dbl_expr) gsl_sf_bessel_Knu_scaled_eY gsl_sf_bessel_Knu_scaled(dbl_expr,dbl_expr) gsl_sf_bessel_Knu_e Y gsl_sf_bessel_Knu(dbl_expr,dbl_expr) gsl_sf_bessel_lnKnu_e Y gsl_sf_bessel_lnKnu(dbl_expr,dbl_expr) gsl_sf_bessel_zero_J0_e Y gsl_sf_bessel_zero_J0(uint_expr) gsl_sf_bessel_zero_J1_e Y gsl_sf_bessel_zero_J1(uint_expr) gsl_sf_bessel_zero_Jnu_e N gsl_sf_bessel_zero_Jnu gsl_sf_clausen_e Y gsl_sf_clausen(dbl_expr) gsl_sf_hydrogenicR_1_e N gsl_sf_hydrogenicR_1 gsl_sf_hydrogenicR_e N gsl_sf_hydrogenicR gsl_sf_coulomb_wave_FG_e N gsl_sf_coulomb_wave_FG gsl_sf_coulomb_wave_F_arrayN gsl_sf_coulomb_wave_F_array gsl_sf_coulomb_wave_FG_arrayN gsl_sf_coulomb_wave_FG_array gsl_sf_coulomb_wave_FGp_arrayN gsl_sf_coulomb_wave_FGp_array gsl_sf_coulomb_wave_sphF_arrayNgsl_sf_coulomb_wave_sphF_array gsl_sf_coulomb_CL_e N gsl_sf_coulomb_CL gsl_sf_coulomb_CL_array N gsl_sf_coulomb_CL_array gsl_sf_coupling_3j_e N gsl_sf_coupling_3j gsl_sf_coupling_6j_e N gsl_sf_coupling_6j gsl_sf_coupling_RacahW_e N gsl_sf_coupling_RacahW gsl_sf_coupling_9j_e N gsl_sf_coupling_9j gsl_sf_coupling_6j_INCORRECT_eNgsl_sf_coupling_6j_INCORRECT gsl_sf_dawson_e Y gsl_sf_dawson(dbl_expr) gsl_sf_debye_1_e Y gsl_sf_debye_1(dbl_expr) gsl_sf_debye_2_e Y gsl_sf_debye_2(dbl_expr) gsl_sf_debye_3_e Y gsl_sf_debye_3(dbl_expr) gsl_sf_debye_4_e Y gsl_sf_debye_4(dbl_expr) gsl_sf_debye_5_e Y gsl_sf_debye_5(dbl_expr) gsl_sf_debye_6_e Y gsl_sf_debye_6(dbl_expr) gsl_sf_dilog_e N gsl_sf_dilog gsl_sf_complex_dilog_xy_e N gsl_sf_complex_dilog_xy_e gsl_sf_complex_dilog_e N gsl_sf_complex_dilog gsl_sf_complex_spence_xy_eN gsl_sf_complex_spence_xy_e gsl_sf_multiply_e N gsl_sf_multiply gsl_sf_multiply_err_e N gsl_sf_multiply_err gsl_sf_ellint_Kcomp_e Y gsl_sf_ellint_Kcomp(dbl_expr) gsl_sf_ellint_Ecomp_e Y gsl_sf_ellint_Ecomp(dbl_expr) gsl_sf_ellint_Pcomp_e Y gsl_sf_ellint_Pcomp(dbl_expr,dbl_expr) gsl_sf_ellint_Dcomp_e Y gsl_sf_ellint_Dcomp(dbl_expr) gsl_sf_ellint_F_e Y gsl_sf_ellint_F(dbl_expr,dbl_expr) gsl_sf_ellint_E_e Y gsl_sf_ellint_E(dbl_expr,dbl_expr) gsl_sf_ellint_P_e Y gsl_sf_ellint_P(dbl_expr,dbl_expr,dbl_expr) gsl_sf_ellint_D_e Y gsl_sf_ellint_D(dbl_expr,dbl_expr,dbl_expr) gsl_sf_ellint_RC_e Y gsl_sf_ellint_RC(dbl_expr,dbl_expr) gsl_sf_ellint_RD_e Y gsl_sf_ellint_RD(dbl_expr,dbl_expr,dbl_expr) gsl_sf_ellint_RF_e Y gsl_sf_ellint_RF(dbl_expr,dbl_expr,dbl_expr) gsl_sf_ellint_RJ_e Y gsl_sf_ellint_RJ(dbl_expr,dbl_expr,dbl_expr,dbl_expr) gsl_sf_elljac_e N gsl_sf_elljac gsl_sf_erfc_e Y gsl_sf_erfc(dbl_expr) gsl_sf_log_erfc_e Y gsl_sf_log_erfc(dbl_expr) gsl_sf_erf_e Y gsl_sf_erf(dbl_expr) gsl_sf_erf_Z_e Y gsl_sf_erf_Z(dbl_expr) gsl_sf_erf_Q_e Y gsl_sf_erf_Q(dbl_expr) gsl_sf_hazard_e Y gsl_sf_hazard(dbl_expr) gsl_sf_exp_e Y gsl_sf_exp(dbl_expr) gsl_sf_exp_e10_e N gsl_sf_exp_e10 gsl_sf_exp_mult_e Y gsl_sf_exp_mult(dbl_expr,dbl_expr) gsl_sf_exp_mult_e10_e N gsl_sf_exp_mult_e10 gsl_sf_expm1_e Y gsl_sf_expm1(dbl_expr) gsl_sf_exprel_e Y gsl_sf_exprel(dbl_expr) gsl_sf_exprel_2_e Y gsl_sf_exprel_2(dbl_expr) gsl_sf_exprel_n_e Y gsl_sf_exprel_n(int_expr,dbl_expr) gsl_sf_exp_err_e Y gsl_sf_exp_err(dbl_expr,dbl_expr) gsl_sf_exp_err_e10_e N gsl_sf_exp_err_e10 gsl_sf_exp_mult_err_e N gsl_sf_exp_mult_err gsl_sf_exp_mult_err_e10_e N gsl_sf_exp_mult_err_e10 gsl_sf_expint_E1_e Y gsl_sf_expint_E1(dbl_expr) gsl_sf_expint_E2_e Y gsl_sf_expint_E2(dbl_expr) gsl_sf_expint_En_e Y gsl_sf_expint_En(int_expr,dbl_expr) gsl_sf_expint_E1_scaled_e Y gsl_sf_expint_E1_scaled(dbl_expr) gsl_sf_expint_E2_scaled_e Y gsl_sf_expint_E2_scaled(dbl_expr) gsl_sf_expint_En_scaled_e Y gsl_sf_expint_En_scaled(int_expr,dbl_expr) gsl_sf_expint_Ei_e Y gsl_sf_expint_Ei(dbl_expr) gsl_sf_expint_Ei_scaled_e Y gsl_sf_expint_Ei_scaled(dbl_expr) gsl_sf_Shi_e Y gsl_sf_Shi(dbl_expr) gsl_sf_Chi_e Y gsl_sf_Chi(dbl_expr) gsl_sf_expint_3_e Y gsl_sf_expint_3(dbl_expr) gsl_sf_Si_e Y gsl_sf_Si(dbl_expr) gsl_sf_Ci_e Y gsl_sf_Ci(dbl_expr) gsl_sf_atanint_e Y gsl_sf_atanint(dbl_expr) gsl_sf_fermi_dirac_m1_e Y gsl_sf_fermi_dirac_m1(dbl_expr) gsl_sf_fermi_dirac_0_e Y gsl_sf_fermi_dirac_0(dbl_expr) gsl_sf_fermi_dirac_1_e Y gsl_sf_fermi_dirac_1(dbl_expr) gsl_sf_fermi_dirac_2_e Y gsl_sf_fermi_dirac_2(dbl_expr) gsl_sf_fermi_dirac_int_e Y gsl_sf_fermi_dirac_int(int_expr,dbl_expr) gsl_sf_fermi_dirac_mhalf_eY gsl_sf_fermi_dirac_mhalf(dbl_expr) gsl_sf_fermi_dirac_half_e Y gsl_sf_fermi_dirac_half(dbl_expr) gsl_sf_fermi_dirac_3half_eY gsl_sf_fermi_dirac_3half(dbl_expr) gsl_sf_fermi_dirac_inc_0_eY gsl_sf_fermi_dirac_inc_0(dbl_expr,dbl_expr) gsl_sf_lngamma_e Y gsl_sf_lngamma(dbl_expr) gsl_sf_lngamma_sgn_e N gsl_sf_lngamma_sgn gsl_sf_gamma_e Y gsl_sf_gamma(dbl_expr) gsl_sf_gammastar_e Y gsl_sf_gammastar(dbl_expr) gsl_sf_gammainv_e Y gsl_sf_gammainv(dbl_expr) gsl_sf_lngamma_complex_e N gsl_sf_lngamma_complex gsl_sf_taylorcoeff_e Y gsl_sf_taylorcoeff(int_expr,dbl_expr) gsl_sf_fact_e Y gsl_sf_fact(uint_expr) gsl_sf_doublefact_e Y gsl_sf_doublefact(uint_expr) gsl_sf_lnfact_e Y gsl_sf_lnfact(uint_expr) gsl_sf_lndoublefact_e Y gsl_sf_lndoublefact(uint_expr) gsl_sf_lnchoose_e N gsl_sf_lnchoose gsl_sf_choose_e N gsl_sf_choose gsl_sf_lnpoch_e Y gsl_sf_lnpoch(dbl_expr,dbl_expr) gsl_sf_lnpoch_sgn_e N gsl_sf_lnpoch_sgn gsl_sf_poch_e Y gsl_sf_poch(dbl_expr,dbl_expr) gsl_sf_pochrel_e Y gsl_sf_pochrel(dbl_expr,dbl_expr) gsl_sf_gamma_inc_Q_e Y gsl_sf_gamma_inc_Q(dbl_expr,dbl_expr) gsl_sf_gamma_inc_P_e Y gsl_sf_gamma_inc_P(dbl_expr,dbl_expr) gsl_sf_gamma_inc_e Y gsl_sf_gamma_inc(dbl_expr,dbl_expr) gsl_sf_lnbeta_e Y gsl_sf_lnbeta(dbl_expr,dbl_expr) gsl_sf_lnbeta_sgn_e N gsl_sf_lnbeta_sgn gsl_sf_beta_e Y gsl_sf_beta(dbl_expr,dbl_expr) gsl_sf_beta_inc_e N gsl_sf_beta_inc gsl_sf_gegenpoly_1_e Y gsl_sf_gegenpoly_1(dbl_expr,dbl_expr) gsl_sf_gegenpoly_2_e Y gsl_sf_gegenpoly_2(dbl_expr,dbl_expr) gsl_sf_gegenpoly_3_e Y gsl_sf_gegenpoly_3(dbl_expr,dbl_expr) gsl_sf_gegenpoly_n_e N gsl_sf_gegenpoly_n gsl_sf_gegenpoly_array Y gsl_sf_gegenpoly_array gsl_sf_hyperg_0F1_e Y gsl_sf_hyperg_0F1(dbl_expr,dbl_expr) gsl_sf_hyperg_1F1_int_e Y gsl_sf_hyperg_1F1_int(int_expr,int_expr,dbl_expr) gsl_sf_hyperg_1F1_e Y gsl_sf_hyperg_1F1(dbl_expr,dbl_expr,dbl_expr) gsl_sf_hyperg_U_int_e Y gsl_sf_hyperg_U_int(int_expr,int_expr,dbl_expr) gsl_sf_hyperg_U_int_e10_e N gsl_sf_hyperg_U_int_e10 gsl_sf_hyperg_U_e Y gsl_sf_hyperg_U(dbl_expr,dbl_expr,dbl_expr) gsl_sf_hyperg_U_e10_e N gsl_sf_hyperg_U_e10 gsl_sf_hyperg_2F1_e Y gsl_sf_hyperg_2F1(dbl_expr,dbl_expr,dbl_expr,dbl_expr) gsl_sf_hyperg_2F1_conj_e Y gsl_sf_hyperg_2F1_conj(dbl_expr,dbl_expr,dbl_expr,dbl_expr) gsl_sf_hyperg_2F1_renorm_eY gsl_sf_hyperg_2F1_renorm(dbl_expr,dbl_expr,dbl_expr,dbl_expr) gsl_sf_hyperg_2F1_conj_renorm_eYgsl_sf_hyperg_2F1_conj_renorm(dbl_expr,dbl_expr,dbl_expr,dbl_expr) gsl_sf_hyperg_2F0_e Y gsl_sf_hyperg_2F0(dbl_expr,dbl_expr,dbl_expr) gsl_sf_laguerre_1_e Y gsl_sf_laguerre_1(dbl_expr,dbl_expr) gsl_sf_laguerre_2_e Y gsl_sf_laguerre_2(dbl_expr,dbl_expr) gsl_sf_laguerre_3_e Y gsl_sf_laguerre_3(dbl_expr,dbl_expr) gsl_sf_laguerre_n_e Y gsl_sf_laguerre_n(int_expr,dbl_expr,dbl_expr) gsl_sf_lambert_W0_e Y gsl_sf_lambert_W0(dbl_expr) gsl_sf_lambert_Wm1_e Y gsl_sf_lambert_Wm1(dbl_expr) gsl_sf_legendre_Pl_e Y gsl_sf_legendre_Pl(int_expr,dbl_expr) gsl_sf_legendre_Pl_array Y status=gsl_sf_legendre_Pl_array(int,double,&var_out) gsl_sf_legendre_Pl_deriv_arrayNgsl_sf_legendre_Pl_deriv_array gsl_sf_legendre_P1_e Y gsl_sf_legendre_P1(dbl_expr) gsl_sf_legendre_P2_e Y gsl_sf_legendre_P2(dbl_expr) gsl_sf_legendre_P3_e Y gsl_sf_legendre_P3(dbl_expr) gsl_sf_legendre_Q0_e Y gsl_sf_legendre_Q0(dbl_expr) gsl_sf_legendre_Q1_e Y gsl_sf_legendre_Q1(dbl_expr) gsl_sf_legendre_Ql_e Y gsl_sf_legendre_Ql(int_expr,dbl_expr) gsl_sf_legendre_Plm_e Y gsl_sf_legendre_Plm(int_expr,int_expr,dbl_expr) gsl_sf_legendre_Plm_array Y status=gsl_sf_legendre_Plm_array(int,int,double,&var_out) gsl_sf_legendre_Plm_deriv_arrayNgsl_sf_legendre_Plm_deriv_array gsl_sf_legendre_sphPlm_e Y gsl_sf_legendre_sphPlm(int_expr,int_expr,dbl_expr) gsl_sf_legendre_sphPlm_arrayY status=gsl_sf_legendre_sphPlm_array(int,int,double,&var_out) gsl_sf_legendre_sphPlm_deriv_arrayNgsl_sf_legendre_sphPlm_deriv_array gsl_sf_legendre_array_sizeN gsl_sf_legendre_array_size gsl_sf_conicalP_half_e Y gsl_sf_conicalP_half(dbl_expr,dbl_expr) gsl_sf_conicalP_mhalf_e Y gsl_sf_conicalP_mhalf(dbl_expr,dbl_expr) gsl_sf_conicalP_0_e Y gsl_sf_conicalP_0(dbl_expr,dbl_expr) gsl_sf_conicalP_1_e Y gsl_sf_conicalP_1(dbl_expr,dbl_expr) gsl_sf_conicalP_sph_reg_e Y gsl_sf_conicalP_sph_reg(int_expr,dbl_expr,dbl_expr) gsl_sf_conicalP_cyl_reg_e Y gsl_sf_conicalP_cyl_reg(int_expr,dbl_expr,dbl_expr) gsl_sf_legendre_H3d_0_e Y gsl_sf_legendre_H3d_0(dbl_expr,dbl_expr) gsl_sf_legendre_H3d_1_e Y gsl_sf_legendre_H3d_1(dbl_expr,dbl_expr) gsl_sf_legendre_H3d_e Y gsl_sf_legendre_H3d(int_expr,dbl_expr,dbl_expr) gsl_sf_legendre_H3d_array N gsl_sf_legendre_H3d_array gsl_sf_legendre_array_sizeN gsl_sf_legendre_array_size gsl_sf_log_e Y gsl_sf_log(dbl_expr) gsl_sf_log_abs_e Y gsl_sf_log_abs(dbl_expr) gsl_sf_complex_log_e N gsl_sf_complex_log gsl_sf_log_1plusx_e Y gsl_sf_log_1plusx(dbl_expr) gsl_sf_log_1plusx_mx_e Y gsl_sf_log_1plusx_mx(dbl_expr) gsl_sf_mathieu_a_array N gsl_sf_mathieu_a_array gsl_sf_mathieu_b_array N gsl_sf_mathieu_b_array gsl_sf_mathieu_a N gsl_sf_mathieu_a gsl_sf_mathieu_b N gsl_sf_mathieu_b gsl_sf_mathieu_a_coeff N gsl_sf_mathieu_a_coeff gsl_sf_mathieu_b_coeff N gsl_sf_mathieu_b_coeff gsl_sf_mathieu_ce N gsl_sf_mathieu_ce gsl_sf_mathieu_se N gsl_sf_mathieu_se gsl_sf_mathieu_ce_array N gsl_sf_mathieu_ce_array gsl_sf_mathieu_se_array N gsl_sf_mathieu_se_array gsl_sf_mathieu_Mc N gsl_sf_mathieu_Mc gsl_sf_mathieu_Ms N gsl_sf_mathieu_Ms gsl_sf_mathieu_Mc_array N gsl_sf_mathieu_Mc_array gsl_sf_mathieu_Ms_array N gsl_sf_mathieu_Ms_array gsl_sf_pow_int_e N gsl_sf_pow_int gsl_sf_psi_int_e Y gsl_sf_psi_int(int_expr) gsl_sf_psi_e Y gsl_sf_psi(dbl_expr) gsl_sf_psi_1piy_e Y gsl_sf_psi_1piy(dbl_expr) gsl_sf_complex_psi_e N gsl_sf_complex_psi gsl_sf_psi_1_int_e Y gsl_sf_psi_1_int(int_expr) gsl_sf_psi_1_e Y gsl_sf_psi_1(dbl_expr) gsl_sf_psi_n_e Y gsl_sf_psi_n(int_expr,dbl_expr) gsl_sf_synchrotron_1_e Y gsl_sf_synchrotron_1(dbl_expr) gsl_sf_synchrotron_2_e Y gsl_sf_synchrotron_2(dbl_expr) gsl_sf_transport_2_e Y gsl_sf_transport_2(dbl_expr) gsl_sf_transport_3_e Y gsl_sf_transport_3(dbl_expr) gsl_sf_transport_4_e Y gsl_sf_transport_4(dbl_expr) gsl_sf_transport_5_e Y gsl_sf_transport_5(dbl_expr) gsl_sf_sin_e N gsl_sf_sin gsl_sf_cos_e N gsl_sf_cos gsl_sf_hypot_e N gsl_sf_hypot gsl_sf_complex_sin_e N gsl_sf_complex_sin gsl_sf_complex_cos_e N gsl_sf_complex_cos gsl_sf_complex_logsin_e N gsl_sf_complex_logsin gsl_sf_sinc_e N gsl_sf_sinc gsl_sf_lnsinh_e N gsl_sf_lnsinh gsl_sf_lncosh_e N gsl_sf_lncosh gsl_sf_polar_to_rect N gsl_sf_polar_to_rect gsl_sf_rect_to_polar N gsl_sf_rect_to_polar gsl_sf_sin_err_e N gsl_sf_sin_err gsl_sf_cos_err_e N gsl_sf_cos_err gsl_sf_angle_restrict_symm_eN gsl_sf_angle_restrict_symm gsl_sf_angle_restrict_pos_eN gsl_sf_angle_restrict_pos gsl_sf_angle_restrict_symm_err_eNgsl_sf_angle_restrict_symm_err gsl_sf_angle_restrict_pos_err_eNgsl_sf_angle_restrict_pos_err gsl_sf_zeta_int_e Y gsl_sf_zeta_int(int_expr) gsl_sf_zeta_e Y gsl_sf_zeta(dbl_expr) gsl_sf_zetam1_e Y gsl_sf_zetam1(dbl_expr) gsl_sf_zetam1_int_e Y gsl_sf_zetam1_int(int_expr) gsl_sf_hzeta_e Y gsl_sf_hzeta(dbl_expr,dbl_expr) gsl_sf_eta_int_e Y gsl_sf_eta_int(int_expr) gsl_sf_eta_e Y gsl_sf_eta(dbl_expr) ---------- Footnotes ---------- (1) These are the GSL standard function names postfixed with ‘_e’. NCO calls these functions automatically, without the NCO command having to specifically indicate the ‘_e’ function suffix.