Estimate missing observations

If some observations are missing, OpenDA has less information to calculate the innovation at a given location. However, by using the other available observations, we can estimate the innovation, allowing this location to still play a role in the data assimilation.

This page describes how missing observations can be calculated using observations that are available. Before being able to calculate these, \(H*K\) should already be calculated and stored in KalmanGainStorage. How this is done is explained below.

The vector \(d\) is called the residual vector or the innovation vector, and it contains the difference between measurements and predictions.

The innovation of missing stations can be computed using the following formula:

\[d_{missing} = (I - M_2)^{-1} * M_1 * d_{available}.\]

The matrices \(M_1\) and \(M_2\) are constructed from the matrix \(H*K\). We will explain how this is done using an example with 9 stations where the observations at locations 2, 4 and 5 are missing. The matrix \(H*K\) can be colored as follows:

hk_11	hk_12	hk_13	hk_14	hk_15	hk_16	hk_17	hk_18	hk_19
hk_21	hk_22	hk_23	hk_24	hk_25	hk_26	hk_27	hk_28	hk_29
hk_31	hk_32	hk_33	hk_34	hk_35	hk_36	hk_37	hk_38	hk_39
hk_41	hk_42	hk_43	hk_44	hk_45	hk_46	hk_47	hk_48	hk_49
hk_51	hk_52	hk_53	hk_54	hk_55	hk_56	hk_57	hk_58	hk_59
hk_61	hk_62	hk_63	hk_64	hk_65	hk_66	hk_67	hk_68	hk_69
hk_71	hk_72	hk_73	hk_74	hk_75	hk_76	hk_77	hk_78	hk_79
hk_81	hk_82	hk_83	hk_84	hk_85	hk_86	hk_87	hk_88	hk_89
hk_91	hk_92	hk_93	hk_94	hk_95	hk_96	hk_97	hk_98	hk_99

Then \(M_1\) is constructed from the blue elements in \(H*K\): the rows related to the missing stations and the columns related to the available stations:

hk_21	hk_23	hk_26	hk_27	hk_28	hk_29
hk_41	hk_43	hk_46	hk_47	hk_48	hk_49
hk_51	hk_53	hk_56	hk_57	hk_58	hk_59

The number of rows in \(M_1\) relates to the number of missing stations, and the number of columns is equal to the number of available stations.

\(M_2\) is a square matrix with dimensions equal to the number of missing stations. It is constructed from the orange elements in \(H*K\):

hk_22	hk_24	hk_25
hk_42	hk_44	hk_45
hk_52	hk_54	hk_55

These matrices \(M_1\) and \(M_2\) are created and filled via SteadyStateFilter.HKCalculator.createMatrices(), and \(d_{available}\) is filled in SteadyStateFilter.HKCalculator.setDAvailableValue() during the loop in SteadyStateFilter.analysis(), where the innovation values of non-missing observations are processed. With a filled \(d_{available}\) the innovation for missing observations can be calculated and applied. This happens in SteadyStateFilter.HKCalculator.compensateForMissingObservationsWithHK().

Calculating and writing \(H*K\)

In order to use \(H*K\) in the steady-state filter, the Kalman gain matrix (saved in KalmanGainStorage) should be available in NetCDF CF format. We distinguish two different cases:

Standard case: \(H*K\) is calculated in EnKF.computeHK() and the calculation is based on EnKF.computeGainMatrix() where \(H*K\) was already calculated. In the logging and the code \(H*K\) is also known as \(K_{pred}\).
Hamill localization: When Hamill localization is used, \(H*K\) needs to be element-wise multiplied with the weight factor between observations called rho. rho will be retrieved via a call to IModelInstance.getRhoForLocalization(). The weight factor rho is currently only calculated for the black-box model in BBStochModelInstance.java. For all other models, a default implementation (IModelInstance.getDefaultRhoForLocalization()) is available such that the interface contracts will not be broken. This default implementation will return a rho matrix consisting of only the value 1, so it does not affect \(H*K\).
For all other localizations, \(H*K\) is not computed and the user is informed by a message.

\(H*K\) will be added to KalmanGainStorage in NetCDF CF format in the method KalmanGainStorage.writeKalmanGainToNetcdfCF().