Estimate missing observations
If some observations are missing, OpenDA has less information to calculate the innovation at a given location. However, by using the other available observations, we can estimate the innovation, allowing this location to still play a role in the data assimilation.
This page describes how missing observations can be calculated using
observations that are available. Before being able to calculate these,
\(H*K\) should already be calculated and stored in KalmanGainStorage.
How this is done is explained below.
The vector \(d\) is called the residual vector or the innovation vector, and it contains the difference between measurements and predictions.
The innovation of missing stations can be computed using the following formula:
The matrices \(M_1\) and \(M_2\) are constructed from the matrix \(H*K\). We will explain how this is done using an example with 9 stations where the observations at locations 2, 4 and 5 are missing. The matrix \(H*K\) can be colored as follows:
hk_11 |
hk_12 |
hk_13 |
hk_14 |
hk_15 |
hk_16 |
hk_17 |
hk_18 |
hk_19 |
hk_21 |
hk_22 |
hk_23 |
hk_24 |
hk_25 |
hk_26 |
hk_27 |
hk_28 |
hk_29 |
hk_31 |
hk_32 |
hk_33 |
hk_34 |
hk_35 |
hk_36 |
hk_37 |
hk_38 |
hk_39 |
hk_41 |
hk_42 |
hk_43 |
hk_44 |
hk_45 |
hk_46 |
hk_47 |
hk_48 |
hk_49 |
hk_51 |
hk_52 |
hk_53 |
hk_54 |
hk_55 |
hk_56 |
hk_57 |
hk_58 |
hk_59 |
hk_61 |
hk_62 |
hk_63 |
hk_64 |
hk_65 |
hk_66 |
hk_67 |
hk_68 |
hk_69 |
hk_71 |
hk_72 |
hk_73 |
hk_74 |
hk_75 |
hk_76 |
hk_77 |
hk_78 |
hk_79 |
hk_81 |
hk_82 |
hk_83 |
hk_84 |
hk_85 |
hk_86 |
hk_87 |
hk_88 |
hk_89 |
hk_91 |
hk_92 |
hk_93 |
hk_94 |
hk_95 |
hk_96 |
hk_97 |
hk_98 |
hk_99 |
Then \(M_1\) is constructed from the blue elements in \(H*K\): the rows related to the missing stations and the columns related to the available stations:
hk_21 |
hk_23 |
hk_26 |
hk_27 |
hk_28 |
hk_29 |
hk_41 |
hk_43 |
hk_46 |
hk_47 |
hk_48 |
hk_49 |
hk_51 |
hk_53 |
hk_56 |
hk_57 |
hk_58 |
hk_59 |
The number of rows in \(M_1\) relates to the number of missing stations, and the number of columns is equal to the number of available stations.
\(M_2\) is a square matrix with dimensions equal to the number of missing stations. It is constructed from the orange elements in \(H*K\):
hk_22 |
hk_24 |
hk_25 |
hk_42 |
hk_44 |
hk_45 |
hk_52 |
hk_54 |
hk_55 |
These matrices \(M_1\) and \(M_2\) are created and filled via
SteadyStateFilter.HKCalculator.createMatrices(), and \(d_{available}\)
is filled in SteadyStateFilter.HKCalculator.setDAvailableValue() during the
loop in SteadyStateFilter.analysis(), where the innovation values of
non-missing observations are processed. With a filled \(d_{available}\)
the innovation for missing observations can be calculated and applied. This
happens in
SteadyStateFilter.HKCalculator.compensateForMissingObservationsWithHK().
Calculating and writing \(H*K\)
In order to use \(H*K\) in the steady-state filter, the Kalman gain matrix
(saved in KalmanGainStorage) should be available in NetCDF CF format.
We distinguish two different cases:
Standard case: \(H*K\) is calculated in
EnKF.computeHK()and the calculation is based onEnKF.computeGainMatrix()where \(H*K\) was already calculated. In the logging and the code \(H*K\) is also known as \(K_{pred}\).Hamill localization: When Hamill localization is used, \(H*K\) needs to be element-wise multiplied with the weight factor between observations called
rho.rhowill be retrieved via a call toIModelInstance.getRhoForLocalization(). The weight factorrhois currently only calculated for the black-box model inBBStochModelInstance.java. For all other models, a default implementation (IModelInstance.getDefaultRhoForLocalization()) is available such that the interface contracts will not be broken. This default implementation will return a rho matrix consisting of only the value 1, so it does not affect \(H*K\).For all other localizations, \(H*K\) is not computed and the user is informed by a message.
\(H*K\) will be added to KalmanGainStorage in NetCDF CF
format in the method KalmanGainStorage.writeKalmanGainToNetcdfCF().