Bias and Error Graphed Against Forecast Hour

This page shows the results of the 525 forecast runs described near the end of the Results and Discussion page but graphed with time into the forecast as the horizontal axis. The main reason for doing this is to see how much worse the model gets as it predicts farther into the future. It is also a way of seeing how much error was created by the data used to initialize the model and how well that error is removed by WRF.


This graph shows the wind speed as forecast by WRF initialized with the RUC data. The standard lines representing different combinations of months are shown like usual but, additionally, a black line with the averaged actual speed as measured by the sodar over the entire 8 months has been added.

There are some obvious trends in this graph, but these will be pointed out with the next graph down which shows bias with respect to forecast hour, which is closely related.

With this graph and the other graphs on this page, sodar data and a RUC forecast is available for the starting point of the forecast but the WRF data does not start until 10 minutes later. This is true for all graphs on this web site but is not obvious because other pages do not show forecast hour. The RUC or NAM data used to initialize WRF is less dense than the sodar and WRF data; also, it is instantaneous. The RUC data is instantaneous data every hour on the hour. The NAM data is instantaneous data every 3 hours on the hour. The sodar continuously produces data that is averaged over a 10 minute time period. By default, WRF also outputs instantaneous data at whatever interval is chosen. However, custom changes have been made to WRF so that it can output data averaged over any chosen time period. These source code changes are described on the page titled WRF Source Code Changes. Since the sodar, which is used to verify the WRF data, outputs 10 minute averages, the WRF output is set to produce ten minute averages. And since it takes 10 minutes of forecast time to create the first WRF output, the WRF output starts at the 10 minute forecast time rather than zero.


This graph shows bias graphed against the time into the forecast. The WRF bias starts out fairly high and goes down significantly over the first two forecast hours. The RUC bias starts out fairly high and remains there. Since that RUC data is used throughout the WRF domains to initialize it, that explains why the WRF bias starts out with the same bias. It takes time for the bias to be removed from the forecast. But the fact that the WRF bias moves toward zero even though the RUC bias stays level above 2 m/s indicates that the outer WRF domains are sufficiently large to counteract the bias in the RUC data which is at 13 km horizontal resolution as it is transitioned to the 1 km resolution in the WRF innermost domain.

As mentioned elsewhere, it takes approximately one hour after the beginning time in a RUC forecasts before that forecast is available for download. Every third RUC forecast is 12 hours in length rather than 9 hours, so they take a little longer to become available. And there is an additional delay or 20 to 30 minutes for the RUC forecasts that start at 0 and 12 GMT because those forecasts are delayed so that the radiosonde data can be incorporated into their initialization. Added to the hour delay for availability is the time needed to download the data, preprocess it for use into WRF, and the time to actually run WRF. So by the time the two hours of forecast time that is needed to remove the RUC bias has been run, most of it is already in the past and thus unusable.

Just after 2 hours of forecast time, the bias becomes slightly negative and then makes its way up to zero. This may suggest that WRF overshoots as it naturally corrects the RUC bias and gradually corrects itself. I could imagine it overshooting as some kind of numerical momentum in the model, but it does not make sense to me that it would take so long to get back to zero. So I do not have an explanation for that with which I am comfortable.


This graph shows wind speed errors with forecast hour as the horizontal axis. Since some of the forecast error is accounted for by bias, this graph partially shows the same pattern as the above graph of WRF and RUC bias. Unfortunately, another point worth making is that the lines are not very smooth. This is true even though quite a large number of data points were averaged to create each point on the graph. Two graphs down shows how many data points were used. This implies that some forecasts were quite bad at some time, unless some only fairly bad forecasts had their worst times about the same way into the forecast. Another possible, more likely explanation is that fairly fast transitions in wind speed were simply mistimed. These transitions can be either slow to fast or fast to slow and are sometimes called ramping events. Even so, if a forecast is used for power generation predictions, a timing error of a wind speed transition should not be taken lightly.


This graph indicates the error relative to the forecast hour if the bias shown two graphs up is removed from the error shown one graph up.

Since the bias in the WRF model is near zero for most of the forecast time after 2 hours, very little improvement is achieved by removing bias relative to forecast hour. As with graphs on other pages, removing the bias, even though it is substantial, had less effect on reducing the error as I would have expected. And, also as with other pages, the RUC error is reduced more than the WRF error. Apparently, more of the RUC error is of a bias nature, which would be called "systemic error" as opposed to "random error."

Other pages show that there is bias with respect to other variables such as measured wind speed, forecast wind speed, and time of day, but since these are independent of the forecast hour, the bias can average out to near zero when graphed against forecast hour but then become "random error." In a way, most error is just bias with respect to some variable that is misrepresented or missing in the model.


This graph shows the counts of data points that are averaged together for each point in the above graphs. As explained elsewhere (Results and Discussion), the data is based on 525 WRF runs but the times when the sodar data did not have a high enough confidence level, were ignored. And, of course, each two month colored grouping in the graphs has about one quarter of the data points as the combined 8 month period which is shown in black.