EVChargingEnv#
EVChargingEnv uses ACNSim [1] to simulate the charging of electric vehicles (EVs) based on actual data gathered from EV charging networks between fall 2018 and summer 2021. ACNSim is a “digital twin” of actual EV charging networks at Caltech and JPL, which have \(n=54\) and \(52\) charging stations (abbrv. EVSEs, Electric Vehicle Supply Equipment), respectively. ACNSim accounts for nonlinear EV battery charging dynamics and unbalanced 3-phase AC power flows, and is thus very realistic. ACNSim (and therefore EVChargingEnv) can be extended to model other charging networks as well. When drivers charge their EVs, they provide an estimated time of departure and amount of energy requested. Because of network and power constraints, not all EVSEs can simultaneously provide their maximum charging rates (a.k.a. “pilot signals”).
Each episode starts at midnight and runs at 5-minute time steps for 24 hours (\(T = 288\), \(\tau = 5/60\) hours). At each time step, the agent simultaneously decides all \(n\) EVSE pilot signals to be executed for the duration of that time step. Its objective is to maximize charge delivery while minimizing carbon costs and obeying the network and power constraints. At every time step, the system must decide the pilot signals for each EVSE to be executed for the duration of that time step. In the single-agent setting, the single agent simultaneously controls all \(n\) EVSEs. In the multi-agent setting, each agent decides the charging rate for a single EVSE.
EVChargingEnv supports real historical data as well as data sampled from a 30-component Gaussian Mixture Model (GMM) fit to historical data.
Observation Space#
An observation at time \(t\) is \(s(t) = (t, d, e, m_{t-1}, \hat{m}_{t:t+k-1|t})\). \(t \in \Z_+\) is the fraction of day between 0 and 1, inclusive. \(d \in \Z^n\) is estimated remaining duration of each EV (in # of time steps). \(e \in \R_+^n\) is remaining energy demand of each EV (in kWh). If no EV is charging at EVSE \(i\), then \(d_i = 0\) and \(e_i = 0\). If an EV charging at EVSE \(i\) has exceeded the user-specified estimated departure time, then \(d_i\) becomes negative, while \(e_i\) may still be nonzero.
Action Space#
EVChargingEnv exposes a choice of discrete actions \(a(t) \in \{0,1,2,3,4\}^n\), representing pilot signals scaled down by a factor of 8, or continuous actions \(a(t) \in [0, 1]^n\) representing the pilot signal normalized by the maximum signal allowed \(M\) (in amps) for each EVSE. Physical infrastructure in a charging network constrains the set \(\mathcal{A}_t\) of feasible actions at each time step \(t\). Furthermore, the EVSEs only support discrete pilot signals, so \(\mathcal{A}_t\) is nonconvex. To satisfy these physical constraints, EVChargingEnv can project an agent’s action \(a(t)\) into the convex hull of \(\mathcal{A}_t\) and round it to the nearest allowed pilot signal, resulting in final normalized pilot signals \(\tilde{a}(t)\). ACNSim processes \(\tilde{a}(t)\) and returns the actual charging rate \(M \bar{a} \in \R_+^n\) (in amps) delivered at each EVSE, as well as the remaining demand \(e_i(t+1)\).
Reward Function#
The reward function is a sum of three components: \(r(t) = p(t) - c_V(t) - c_C(t)\). The profit term \(p(t)\) aims to maximize energy delivered to the EVs. The constraint violation cost \(c_V(t)\) penalizes network and power constraint violations. Finally, the CO2 emissions cost \(c_C(t)\), which is a function of the MOER \(m_t\) and charging action, aims to reduce emissions by encouraging the agent to charge EVs when the MOER is low.
Getting Started#
Installation#
SustainGym is designed for Linux machines. SustainGym is hosted on PyPI and can be installed with pip
:
pip install sustaingym[ev]
Specifically for EVChargingEnv
, you also need to have a MOSEK license. You may either request a free personal academic license, or a free 30-day commercial trial license. The license file should be placed inside a folder called “mosek” under your home directory. Typically, that will be ~/mosek/mosek.lic
.
Custom RL Loop#
from sustaingym.envs.evcharging import EVChargingEnv, GMMsTraceGenerator
# Create events generator which samples events from a GMM trained on Caltech
# data. The 'jpl' site is also supported, along with the periods
# 'Fall 2019', 'Spring 2020' , and 'Summer 2021'.
gmmg = GMMsTraceGenerator('caltech', 'Summer 2019')
# Create environment
env = EVChargingEnv(gmmg)
obs, _ = env.reset(seed=123)
terminated = False
while not terminated:
action = env.action_space.sample()
obs, reward, terminated, truncated, info = env.step(action)
If you prefer using gym.make()
, you may instead create the environment as follows:
import gymnasium as gym
from sustaingym.envs.evcharging import GMMsTraceGenerator
gmmg = GMMsTraceGenerator('caltech', 'Summer 2019')
env = gym.make('sustaingym/EVCharging-v0', data_generator=gmmg)
Using our training script#
Install miniconda3.
(Optional, but recommended) If you are using a conda version
<=23.9.0
, set the conda solver to libmamba for faster dependency solving. Starting from conda version23.10.0
, libmamba is the default solver.conda config --set solver libmamba
Install the libraries necessary for runnning the EVChargingEnv environment.
conda env update --file env_ev.yml --prune
To train a single-agent PPO (RLLib) with a learning rate of 5e-4 using the GMM trained on Summer 2021 data from Caltech:
python -m examples.evcharging.train_rllib -a ppo -t "Summer 2021" -s caltech -r 123 --lr 5e-4
More generally, our training script takes the following arguments:
usage: train_rllib.py [-h] [-a {ppo,sac}] [-t {Summer 2019,Fall 2019,Spring 2020,Summer 2021}]
[-s {caltech,jpl}] [-d] [-m] [-p PERIODS_DELAY] [-r SEED] [-l LR]
train RLLib models on EVChargingEnv
optional arguments:
-h, --help show this help message and exit
-a {ppo,sac}, --algo {ppo,sac}
RL algorithm (default: ppo)
-t {Summer 2019,Fall 2019,Spring 2020,Summer 2021}, --train_date_period {Summer 2019,Fall 2019,Spring 2020,Summer 2021}
Season. (default: Summer 2021)
-s {caltech,jpl}, --site {caltech,jpl}
site of garage. caltech or jpl (default: caltech)
-d, --discrete
-m, --multiagent
-p PERIODS_DELAY, --periods_delay PERIODS_DELAY
communication delay in multiagent setting. Ignored for single agent. (default: 0)
-r SEED, --seed SEED Random seed (default: 123)
-l LR, --lr LR Learning rate (default: 5e-05)
References#
[1] Z. J. Lee et al., “Adaptive Charging Networks: A Framework for Smart Electric Vehicle Charging,” in IEEE Transactions on Smart Grid, vol. 12, no. 5, pp. 4339-4350, Sept. 2021, doi: 10.1109/TSG.2021.3074437. URL https://ieeexplore.ieee.org/document/9409126.