# ElectricityMarketEnv#

ElectricityMarketEnv simulates a realtime electricity market with 5-minute settlements. The default environment consists of 33 dispatchable generators and 1 80 MWh battery storage system connected on a 24-bus congested transmission network based on the IEEE Reliability Test System (IEEE RTS-24) [1], with load data from the IEEE Reliability Test System of the Grid Modernization Laboratory Consortium (IEEE RTS-GMLC) [2]. Every time step (representing a 5 minute lapse of time), all participants submit bids to the market operator (MO). Based on the bids, the MO solves the multi-timestep security-constrained economic dispatch (SCED) problem which determines the price and amount of electricity purchased from (or sold by) each generator and battery to meet realtime electricity demand. Each episode runs for 1 day, with 5-minute time intervals ($$T = 288$$, $$\tau = 5/60$$ hours). The agent controls the battery system and is rewarded for submitting bids that result in charging (buy) when prices are low, and discharging (sell) when prices and CO2 emissions are high, thus performing price arbitrage.

While ElectricityMarketEnv is a simplification of real-world electricity markets, the parameters of the environment’s generators and battery systems and the historical electricity demand, marginal operating emissions rate (MOER) values, and associated forecasts are obtained from actual systems within California. In this regard, ElectricityMarketEnv serves as a minimum test-bed for RL algorithms, as a real-world electricity market is significantly more complex (e.g., featuring complex generator cost models).

## Observation Space#

An observation is

$s(t) = (t, e, a(t-1), x_{t-1}, p_{t-1}, l_{t-1}, \hat{l}_{t:t+k-1}, m_{t-1}, \hat{m}_{t:t+k-1 \mid t}).$

## Reward Function#

The reward function encourages the agent to maximize profit from charging decisions while minimizing associated carbon emission. The reward function is a sum of three components:

$r(t) = r_R(t) + r_C(t) - c_T(t).$

The revenue term $$r_R(t) = p_t x_t$$ is the immediate revenue from the dispatch. The CO2 emissions reward term $$r_C(t) = P_\text{CO2} m_t x_t$$ represents the price of CO2 emissions displaced or incurred by the battery dispatch. The terminal cost $$c_T(t)$$, which is nonzero only when $$t=T$$, encourages the battery to have the same energy level at the end of the day as when it started. We also provide an option to delay all reward signals until the terminal time step (intermediate rewards are set to 0).

## Distribution Shift#

ElectricityMarketEnv considers temporal distribution shifts, specifically in the time of year demand and MOER values are drawn from.

## Getting Started#

### Installation#

1. Install miniconda3.

2. (Optional, but recommended) If you are using a conda version <=23.9.0, set the conda solver to libmamba for faster dependency solving. Starting from conda version 23.10.0, libmamba is the default solver.

conda config --set solver libmamba

3. Download SustainGym from GitHub. (In the future, we plan on releasing SustainGym via pypi.)

git clone --depth 1 https://github.com/chrisyeh96/sustaingym.git

4. Install the libraries necessary for runnning the ElectricityMarketEnv environment.

conda env update --file env_em.yml --prune


### Using our training script#

To train a central, single agent Soft Actor-Critic (SAC) with a learning rate of 0.0001 and demand and MOER data from July 2020:

python -m examples.electricitymarket.train_rllib -m 7 -a sac -l 0.0001 -o examples/train_logs


The resulting trained model and its metrics will be stored in the subdirectory “examples/train_logs” according to the flag -o.

More generally, our training script takes the following arguments:

usage: python -m examples.electricitymarket.train_rllib -m MONTH [-v EVAL_MONTH] [-d] [-i] -a ALGORITHM -l LR [-g GAMMA] [-e EVAL_FREQ] [-n EVAL_EPISODES] [-o LOG_DIR]

train RLLib models on ElectricityMarketEnv

options:
-m MONTH, --month MONTH  month of environment data for training (default: None)
-v EVAL_MONTH, --eval-month EVAL_MONTH month of environment data for out of
distribution evaluation (default: None)
-d, --discrete        whether to use discretized actions (default: False)
-i, --intermediate-rewards
whether to use intermediate rewards (default: False)
-a ALGORITHM, --algo ALGORITHM
type of model. dqn, sac, ppo, a2c, or ddpg (default: None)
-l LR, --lr LR        learning rate (default: None)
-g GAMMA, --gamma GAMMA
discount factor, between 0 and 1 (default: 0.9999)
-e EVAL_FREQ, --eval-freq EVAL_FREQ
# of episodes between eval/saving model during training (default: 20)
-n EVAL_EPISODES, --eval-episodes EVAL_EPISODES
# of episodes algorithm evaluated on during training (default: 5)
-o LOG_DIR, --log-dir LOG_DIR
directory for saving logs and models (default: .)


### Custom RL Loop#

from sustaingym.envs.electricitymarket import ElectricityMarketEnv

# utilize July 2020 load and MOER values and only report cumulative terminal rewards
env = ElectricityMarketEnv(month='2021-05', use_intermediate_rewards=False)

obs = env.reset(seed=123)
terminated = False
while not terminated:
action = env.action_space.sample()
obs, reward, terminated, truncated, info = env.step(action)


## References#

[1] P. M. Subcommittee, “IEEE Reliability Test System,” in IEEE Transactions on Power Apparatus and Systems, vol. PAS-98, no. 6, pp. 2047-2054, Nov. 1979, doi: 10.1109/TPAS.1979.319398. URL https://ieeexplore.ieee.org/document/4113721.

[2] C. Barrows et al., “The IEEE Reliability Test System: A Proposed 2019 Update,” in IEEE Transactions on Power Systems, vol. 35, no. 1, pp. 119-127, Jan. 2020, doi: 10.1109/TPWRS.2019.2925557. URL https://ieeexplore.ieee.org/document/8753693.