Understanding Anomalies in Renewable Energy Data

As a data scientist, I crave data and can not wait to put my hands on data modelling. However, the first thing that needs to be done is to clean the data. In this article, I would like to briefly introduce the data we face the most when modelling renewable energy in the real world.

Wind power forecasting models need to be built on good-quality observations. In the real world, the operation of wind parks is complicated including curtailment, re-dispatching, wind turbine malfunction and wind park maintenance, which will naturally show up in the real production data.

Before we get into modelling, our data scientists first massage the data to make sure it is ready.

Anomalies

In general, data can be categorised as good or bad. In our case, bad data can be referred to as anomalies. Considering the sources of anomalies, we break them down into 5 scenarios:

Outliers Anomalies

Outliers refer to production data which is far beyond or below compared to what we expect. This can be caused by noises in the process of data collection or other reasons. For example, icing events for wind parks or snow coverage on solar panels lead to unexpected low production.

Flatliners Anomalies

A flatliner figuratively refers to the data which remains the same without any minor fluctuation in a short period, regardless of changes in wind speed. This can be caused by filling (zero) data to avoid empty values, depending on the data collection.

Commissioning Anomalies

When the park is up for running, not all turbines are put into operation simultaneously. It can take a few months to get the whole park running. The production data from this period is called commissioning anomalies and can not be used for modelling.

Curtailment Anomalies

Curtailment happens quite frequently in wind park operations when some or all turbines within wind farms need to be shut down to mitigate issues associated with turbine loading, export to the grid, or certain planning conditions. In this case, the real production data will maintain a certain level for a short period.

Availability Anomalies

The availability is about the availability capacity of a park. Sometimes a park does not run at full capacity. For example, if the value is 90%, then 90% capacity is available for the park. Availability changes are scheduled in advance due to operational requirements such as park maintenance. If the availability data is accessible, they can be used to upscale the data. Otherwise, the data with unsure capacity will be removed.

Case studies

For further explanation, you can find some real cases below. The detected anomalies are highlighted in red or green colour.

Case 1, Flatliners and Curtailments Anomalies (red)

Observation of a wind park: wind production versus wind (left), wind production time series (right).

Case 2, Commissioning Anomalies (red)

Observation of a wind park: wind production versus wind (left), wind production time series (right).

Case 3, Availability Anomalies (green) + Flatliners and Curtailments Anomalies (red)

Observation of a wind park: wind production versus wind (left), wind production time series (right).

Considering five types of anomalies, our team built a robust data-cleaning package to sweep the noise away for our renewable energy forecasting models.

If you are interested in hearing more about our renewable energy forecasting modelling, please reach out to us.

Scroll to Top