The power of data wrangling — How to save weeks of working hours by data wrangling in Pandas

Real-life example of energy modelling data

Mustafa Adel Amer
2 min readJun 23, 2022
Photo by Milad Fakurian on Unsplash

Energy models are used to predict the future of the energy system in a specific industry, sector, country, region or on a global scale.

The example we will go through here is an example of an energy model that is used to predict global energy demand in +140 countries. The model uses linear programming to calculate energy demand based on existing power plant capacities, sectoral needs, existing industries, availability of infrastructure, and electricity coverage.

The model takes in +500 time series indicators on macroeconomics, population, power generation capacities, emissions, …etc.

To update the model with a new dataset from a different data provider or to spot results between different energy scenarios that you run on the model, you would need to go through a lot of line plots to see the differences and drive insights. This practice is time-consuming, not efficient, difficult to document and challenging to track.

However, using pandas and relying on a few steps allowed us to spot differences between the time series datasets (from different energy scenario results or from different data update sources) in a timely manner.

The logic is simple:

1] restructure the dataset using pandas.

2] calculate dimensionless parameter definition to compare time series.

3] collect the resulting calculation for the country and the sector in a new dataframe.

4] construct a heatmap using seaborn on the transformed results_df.

for a detailed step-by-step code, feel free to contact me.

Spotting differences between large time-series datasets — Image by the author for illustration purposes only

The benefits of the resulting tool are:

  • Cutting the benchmarking/spotting differences from days and weeks into minutes.
  • Allowing for interdisciplinary insights as you can consider many vectors and see changes along all of them.
  • Having the benchmarking/differences between time series datasets well documented to allow traceability and consistency.

To see an example of how we did it in practice, visit our article on energy scenario data analysis on BP Energy Statistics here. The article has a link to the entire code. In addition, if you would like to do the entire exercise from data gathering from Excel to benchmarking you can start with this article on extracting and cleaning data from the multiple Excel sheets in BP Energy Statistics Excel workbook here.

--

--