Skip to content

elliedel/electricty_etl

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

⚡️ Electricity ETL Pipeline

This project automates the extraction, transformation, and loading (ETL) of electricity data from raw .csv and .json files into a MySQL database. It's based on a foundational DataCamp project and expanded as a full end-to-end data engineering pipeline.

📂 Project Structure


.
├── datasets/
│   ├── electricity_sales.csv
│   └── electricity_capability_nested.json
├── main.py
├── nootbook.ipynb
├── .env
├── requirements.txt
└── README.md

📊 Data Sources

  1. electricity_sales.csv

    • Monthly electricity sales data across different states and sectors.
    • Columns: period, stateid, stateDescription, sectorid, sectorName, price, price-units
  2. electricity_capability_nested.json

    • Nested JSON representing generation capacity per energy source per state.
    • Flattened to: period, stateId, stateDescription, energySourceid, energySourceDescription, energySourceCapability, energySourceCapabilityUnits

🧪 ETL Pipeline Overview

  • Extract:

    • Reads CSV and JSON files from the datasets/ directory.
    • JSON is flattened for relational DB compatibility.
  • Transform:

    • For electricity_sales:
      • Filters only for residential and transportation sectors.
      • Parses period into year and month.
      • Drops unused columns and missing values.
    • For electricity_capability:
      • Normalizes nested JSON and converts capability to float.
  • Load:

    • Writes both cleaned DataFrames into a MySQL database using SQLAlchemy.

⚙️ Setup Instructions

  1. Clone the repository
git clone https://github.com/elliedel/electricty_etl.git
cd electricity-etl-project
  1. Create and configure your .env file
USER=your_mysql_username
PASSWORD=your_mysql_password
  1. Install dependencies
pip install -r requirements.txt
  1. Run the ETL pipeline
python etl_script.py

Make sure MySQL is running and the database electricity_db exists.

create schema if not exists electricity_db;

🛠️ Technologies Used

  • Python 3.10+
  • Pandas
  • SQLAlchemy
  • MySQL
  • dotenv

📈 Result

After running the ETL, you will have two tables in your MySQL database:

  • electricity_sales
  • electricity_capability

You can use these cleaned tables for further analytics or dashboarding.

📚 Credits

  • Based on a DataCamp project: “Powering Data for the Department of Energy - Building an ETL Pipeline"

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors