Estimate missing observations

If some observations are missing, OpenDA has less information to calculate the innovation at a given location. However, by using the other available observations, we can estimate the innovation, allowing this location to still play a role in the data assimilation.

This page describes how missing observations can be calculated using observations that are available. Before being able to calculate these, \(H*K\) should already be calculated and stored in KalmanGainStorage. How this is done is explained below.

The vector \(d\) is called the residual vector or the innovation vector, and it contains the difference between measurements and predictions.

The innovation of missing stations can be computed using the following formula:

\[d_{missing} = (I - M_2)^{-1} * M_1 * d_{available}.\]

The matrices \(M_1\) and \(M_2\) are constructed from the matrix \(H*K\). We will explain how this is done using an example with 9 stations where the observations at locations 2, 4 and 5 are missing. The matrix \(H*K\) can be colored as follows:

hk_11

hk_12

hk_13

hk_14

hk_15

hk_16

hk_17

hk_18

hk_19

hk_21

hk_22

hk_23

hk_24

hk_25

hk_26

hk_27

hk_28

hk_29

hk_31

hk_32

hk_33

hk_34

hk_35

hk_36

hk_37

hk_38

hk_39

hk_41

hk_42

hk_43

hk_44

hk_45

hk_46

hk_47

hk_48

hk_49

hk_51

hk_52

hk_53

hk_54

hk_55

hk_56

hk_57

hk_58

hk_59

hk_61

hk_62

hk_63

hk_64

hk_65

hk_66

hk_67

hk_68

hk_69

hk_71

hk_72

hk_73

hk_74

hk_75

hk_76

hk_77

hk_78

hk_79

hk_81

hk_82

hk_83

hk_84

hk_85

hk_86

hk_87

hk_88

hk_89

hk_91

hk_92

hk_93

hk_94

hk_95

hk_96

hk_97

hk_98

hk_99

Then \(M_1\) is constructed from the blue elements in \(H*K\): the rows related to the missing stations and the columns related to the available stations:

hk_21

hk_23

hk_26

hk_27

hk_28

hk_29

hk_41

hk_43

hk_46

hk_47

hk_48

hk_49

hk_51

hk_53

hk_56

hk_57

hk_58

hk_59

The number of rows in \(M_1\) relates to the number of missing stations, and the number of columns is equal to the number of available stations.

\(M_2\) is a square matrix with dimensions equal to the number of missing stations. It is constructed from the orange elements in \(H*K\):

hk_22

hk_24

hk_25

hk_42

hk_44

hk_45

hk_52

hk_54

hk_55

These matrices \(M_1\) and \(M_2\) are created and filled via SteadyStateFilter.HKCalculator.createMatrices(), and \(d_{available}\) is filled in SteadyStateFilter.HKCalculator.setDAvailableValue() during the loop in SteadyStateFilter.analysis(), where the innovation values of non-missing observations are processed. With a filled \(d_{available}\) the innovation for missing observations can be calculated and applied. This happens in SteadyStateFilter.HKCalculator.compensateForMissingObservationsWithHK().

Calculating and writing \(H*K\)

In order to use \(H*K\) in the steady-state filter, the Kalman gain matrix (saved in KalmanGainStorage) should be available in NetCDF CF format. We distinguish two different cases:

  • Standard case: \(H*K\) is calculated in EnKF.computeHK() and the calculation is based on EnKF.computeGainMatrix() where \(H*K\) was already calculated. In the logging and the code \(H*K\) is also known as \(K_{pred}\).

  • Hamill localization: When Hamill localization is used, \(H*K\) needs to be element-wise multiplied with the weight factor between observations called rho. rho will be retrieved via a call to IModelInstance.getRhoForLocalization(). The weight factor rho is currently only calculated for the black-box model in BBStochModelInstance.java. For all other models, a default implementation (IModelInstance.getDefaultRhoForLocalization()) is available such that the interface contracts will not be broken. This default implementation will return a rho matrix consisting of only the value 1, so it does not affect \(H*K\).

  • For all other localizations, \(H*K\) is not computed and the user is informed by a message.

\(H*K\) will be added to KalmanGainStorage in NetCDF CF format in the method KalmanGainStorage.writeKalmanGainToNetcdfCF().