Results and Discussion

This page is a starting point for multiple pages that show graphs and provide discussion about some recent wind forecasting research done at Boise State University. The graphs as shown on the pages are fairly small with difficult to see details, but you can click on them or open them in a different tab to see a larger version.

Unless some indication is given to the contrary, the data in the graphs is for WRF runs that are based on RUC data for initialization and have 9 hours of forecast and have a horizontal resolution of one kilometer. Some graphs have WRF runs at three kilometer or 200 meter horizontal resolution. The RUC data has a bias that WRF can remove over time, so many of the graphs have data from the first two forecast hours removed. Since it takes time for the RUC data to become available and it takes time to run WRF, the first two forecasts hours are actually in the past by the time they are complete, so they are not usable anyway.  Even if they were usable, it might still be prudent to ignore them since they still contain so much bias coming from the RUC data. Bias and its related error being removed as the WRF forecast hour progresses, can be seen in graphs on this page Bias and Error Graphed Against Forecast Hour or on this page Attributing Error to RUC and NAM.

When I include RUC or NAM data in my graphs, please note that I took shortcuts in doing so. Since evaluating RUC or NAM was not my primary goal, I do not mind if my portrayal of them is a little skewed. When preparing a WRF run, RUC or NAM data is used to create initial conditions and ongoing lateral boundary condition files gridded the same as the WRF run. The wind speed values I use come from these files rather than from the original data sets. From these files, I interpolate to the location and height of the sodar readings, so I am using interpolations of interpolations. There is also some problems with the interpolation to the correct height after the original data has been re-gridded to WRF grids and terrain heights. Like I said, evaluating the incoming data was not the primary focus, so I use shortcuts that involve less programming and less data integration techniques to get a reasonable approximation of that data.

Many people use Root Mean Squared Error (RMSE) as a way to analyze error. It helps capture some of the second order moment of the error, like variance and standard deviation do. It is a little like combining error and standard deviation together. The squaring process magnifies the largest errors, making them more important than smaller errors. I use RMSE graphs too, but do not include them on this website. They generally do not provide additional information for the rather casual discussion format I use here.

The rest of this page is about an identified bias with respect to the measured wind speed. It is a good place to start since it has slightly more explanation for the kinds of graphs and techniques that are used throughout the Results and Discussion sections. It also shows how things that seem simple end up having complexities that diminish their usefulness. Side comments that have links to other pages can be followed immediately if you want to understand them better as you go or you can stick to this topic and go to the other pages later.

Graphs and discussion of data graphed with wind speed forecast by WRF as the horizontal axis can be found on the Bias and Error Graphed Against Forecast Wind Speed page. Simularly, also pages graphing data by forecast hour, Bias and Error Graphed Against Forecast Hour, and by time of day, Bias and Error Graphed Against Time of Day. A page called Attributing Error to RUC and NAM explores how much of the error in forecasts can be attributed to/blamed on the data used to initialize WRF. I have a page that shows some of the same kinds of data when NAM is used to initialize the WRF runs. It is called WRF with NAM initialization.

Bias and Error Graphed Against Measured Wind Speed

The discussion mostly applies to the wind speed graphs on the left side of the page. The graphs on the right side show similar data but related to power output from wind turbines. This is done by putting the wind speed into the power curve for a Suzlon S88 turbine and normalizing it. Normalizing is simply dividing by the maximum power capacity of the turbine. This results in output, bias, and error being expressed as a portion of the capacity. This method was suggested by a paper by Project Anemos: For awhile, I had difficulty getting this file from the internet, so I made a copy in my downloadable files which you can reference here: ANEMOS_D2.3_EvaluationProtocol.pdf.


This graph shows how both the RUC data and the WRF results have a bias with respect to the actual wind speed, as measured by our sodar. What seems most remarkable is that the relationship is fairly linear and so it erroneously seems like a great candidate as something to correct as a way to improve results. However, the graph has a linear relationship between measured wind speed and bias of the forecast speed. I would have to already know either the bias in each individual forecast speed or the corresponding measured speed. In a forecast, the measured speed has not occurred. If I already know the bias/error for each forecast speed, which I don't, I would not need this near linear relationship. And it does not work to use this graph to assume and calculate a linear relationship between forecast speed and bias. I tried several ways to do that but the results actually become worse. That assumption seems reasonable but depends on the bias with respect to the forecast speed increasing like it does with the measured speed. If you go to the page Bias and Error Graphed Against Forecast Wind Speed, you will see that the bias with respect to the forecast speed actually decreases as speed increases.

Different colors are used to represent different times of the year. Some seasons tend to have more bias than others. The black data represents all 8 months combined.  Both the RUC and WRF data have the same color schemes but use triangles that point in different directions, as indicated in the legend. This scheme is used in most of the graphs in the Results and Discussion pages.

Obviously, the RUC data has more bias than the WRF results. Of course, these results may be dependent on location or something I did, but I speculate that they imply this: the RUC and WRF MYJ BPL schemes have inherent problems that keep them from capturing reality, but they have been tweaked so they are wrong the least amount of the time. That sounds insulting, but I actually consider it a complement considering the complexities of modeling weather.

The bias in the power output prediction on the right has two characteristics to notice. First, its shape resembles a power curve from a turbine, as might be expected. Second, it is even more noticeable that RUC predictions have a more consistent bias across different seasons.

Please note that bias has the opposite sign than what I would intuitively expect. Error is defined as the measured value from the sodar minus the forecast value, so if the forecast value is too slow, the error is positive, not negative. I use average error as the bias, so a negative bias indicates that the forecasts speeds are too fast.



This graph shows the mean absolute error (MAE) of the data, also with respect to the measured wind speed. RUC actually has smaller error for the slower speeds but has larger error for faster speeds. Part of this is due to the fact that RUC has a smaller absolute bias for lower speeds, as can be seen on the above graph. Since power production theoretically increases at a cube of the increase of wind speed, the RUC errors at higher speeds are magnified, as seen on the right.

See the graph after the next one down to understand why the different times of year have such wildly varying errors for the higher wind speeds.


So we have an easily defined bias that seems like it might account for a significant part of the absolute error. But this graph shows how much error remains after the bias is removed from both the RUC and WRF data. In a way, I am cheating in that I calculate the bias from the same data from which I remove it. This is done on a point by point basis. For each measured wind speed, the average error, which is bias, is calculated. Then the absolute error is calculated after the wind speed is adjusted for that point's bias. The proper way to handle this would be to calculate the bias based on one data set and apply it to a different test data set. That would be a true test of the adjustment made to the model output. The method used here tells us the maximum possible improvement that could be made with bias adjustment based on wind speed, which is also useful, but not representative of an operational implementation.

It is interesting to note that the RUC data actually has less error after its bias is removed. In general, that implies that RUC is wrong in a more consistent way; the higher resolution WRF runs have a component to their error that is not related to the wind speed so they have less error that is identifiable as bias relative to measured wind speed.


This graph indicates how many data points go into each of the averages at each speed. Just as an aside, the distribution of the points looks rather like a Rayleigh distribution curve. There are two things about the distribution that are worth pointing out.

First, much of the distribution centers around the winds speeds that have very little bias with respect to the measured wind speed. So there will be much less error reduction when the bias is removed from graphs not having measured wind speed as the horizontal axis. This will be more noticeable on other pages which have data graphed against time of day, forecast hour, or forecast speed.

Second, there are very few data points at the higher wind speeds. This contributes to the wild swings in the previous three graphs at high speeds. There were not enough points to average the results to a smoother line.

OK, so where did all these data points come from? 525 runs from Jan 1 to Aug 31, 2009 were created. These were evenly spaced with 11 hours between the start of each 9 hour run. This created runs that covered all 8 months. Four subsets were also created using the month that the forecast started. The systematic 11 hour spacing staggered the runs so different hours of the day were covered approximately equally. There would have been 531 runs, but several of the chosen times did not have data available to initialize WRF. And since each run included 7 usable forecast hours and each hour has 6 ten minute averages, over 22,000 data points were possible. However, the sodar does not have valid readings for each of those possible times, so the data actually has just over 20,000 total data points. And obviously, each 2 month subset has about a quarter of the number of the consolidated 8 month set.

Rather than give the same graph on the right for the distribution of data used for power output predictions, I have a graph of the predicted power. Because of errors in the prediction, the forecasted power output lines are not straight or smooth. However, there is also a line showing the predicted power if the sodar readings were used. The sodar is used to verify the predicted wind speeds, so right or wrong, we are assuming it is accurate. Therefore, predicted power based on the sodar follows the Suzlon S88 power curve very well, although there are small variations because of graphing/mathematical details. For instance, the power curve indicates a cut in speed of 4 m/s but the graph indicates a little power output at that speed. That is because the data for the graph point at 4 m/s includes the data from 3.5 m/s to 4.5 m/s, so the data points above 4 m/s but below 4.5 m/s create a little predicted power output. The opposite thing happens at 14 m/s where full power output should be shown.

The cut out speed is 25 m/s, but there are no data points at that speed. There may have been wind at that speed during the January through August time period, but they may not have made it on to the graphs for two reasons: first, I only use sodar data for the times that match forecast runs, which covers only 7 out of every 11 hour time period. Second, only sodar readings that have a confidence value of 90 or above are included. If the sodar reading is below 90 in confidence, it and the corresponding WRF prediction are ignored. And the sodar is more likely to have low confidence values for higher speeds.