Hector both allows (internally to each component, and when reading inputs) and requires (passing data between components, and writing outputs) values to have associated units. The internal unit system is simple, not providing unit conversion for example, but also lightweight and easy. Our use of units is prompted primarily by safety–to avoid the “Mars Climate Orbiter” problem of mismatched unit–as well as having easy-to-interpret outputs.

The internal C++ type is called unitval, and its code is located in headers/data/unitval.hpp.

Creating a variable with associated unit

Two ways to create a unitval:

unitval x(1, U_PGC);  # x now holds 1 petagram of carbon
unitval y;
y.set(2, U_CO2_PPMV);

Doing math

The unitval class supports basic mathematical operations.

# assume that y and z are already defined unitvals
unitval x = y + z;  # y and z must have identical units or an error is thrown
unitval x = y * 2.0;  # multiplying by a constant always works

Extracting a unitval value

Internally within components, we often simply want to work with numbers, extracting them from unitval objects. This is straightforward, but first we have to ‘prove’ that we know the correct units:

double x = y.value(U_PGC); # we assert that y's units are petagrams C

When is unitval required?

Using unitvals within a component is optional, but it’s required when passing data between components. This falls into three categories:

  • Querying the core for data. Read more about Hector’s internal data passing.
  • Responding to a query for data.
  • Output. Read more about Hector’s output visitors.

When is unitval optional?

In input files.

TODO.

if( varName == D_PREINDUSTRIAL_N2O ) {`
            `H_ASSERT( data.date == Core::undefinedIndex() , "date not allowed" );`
            `N0 = unitval::parse_unitval( data.value_str, data.units_str, U_PPBV_N2O );`

Missing values

Briefly, when you need to return a missing value, use the MISSING_FLOAT macro defined in inst/include/unitval.hpp. This macro will return NA_REAL (from Rcpp) if Hector is compiled as an R package (technically, if the USE_RCPP macro is defined) and NAN (from std) otherwise.

Why this complexity? R (and, by extension, Rcpp) distinguish between two kinds of “placeholder” numeric values: A missing value (NA_real_ in R; often abbreviated to NA) represents an unobserved value; for instance, if you take annual observations from 1990 to 2000 but failed to make a measurement in 1996. On the other hand, not a number (NaN) represents the result of an invalid mathematical operation, such as taking the square root of a negative number. Both of these values propagate through any calculations in which they are involved, both in R and C++, and both can be detected and removed by R’s is.na() function and removed by na.omit, na.rm = TRUE, and friends. (The R function is.nan() can used to check specifically for NaN values – is.nan(NaN) => TRUE, but is.nan(NA) => FALSE). However, because they mean fundamentally different things (and suggest different errors), we try to preserve this distinction if we can. (For example, if some code you run returns NA then you would suspect that required values were not set somewhere. On the other hand, if the code returns NaN, you know that all values were set but resulted in a mathematically invalid calculation. Your approach to debugging these problems would likely be different.)