Skip to content

Commit 4c42e09

Browse files
committed
Merge branch 'develop'
2 parents 0325e4b + 7f5678a commit 4c42e09

File tree

13 files changed

+410
-32
lines changed

13 files changed

+410
-32
lines changed

CHANGELOG.md

Lines changed: 20 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,17 @@
1-
## [0.2.0] - 2023-11-..
1+
## [0.3.0] - 2023-12-05
2+
### Added:
3+
- Added `DifferentActions` and `AccountValue` as metrics. Metrics are the main way to evaluate the performance of the agent.
4+
- Now `metrics.Metrics` object can be used to calculate the metrics within trading environment.
5+
- Included `rockrl==0.0.4` as a dependency, which is a reinforcement learning package that I created.
6+
- Added `experiments/training_ppo_sinusoid.py` to train a simple Dense agent using PPO algorithm on the sinusoid data with discrete actions.
7+
- Added `experiments/testing_ppo_sinusoid.py` to test the trained agent on the sinusoid data with discrete actions.
8+
9+
### Changed:
10+
- Renamed and moved `playing.py` to `experiments/playing_random_sinusoid.py`
11+
- Upgraded `finrock.render.PygameRender`, now we can stop/resume rendering with spacebar and render account value along with the actions
12+
13+
14+
## [0.2.0] - 2023-11-29
215
### Added:
316
- Created `reward.simpleReward` function to calculate reward based on the action and the difference between the current price and the previous price
417
- Created `scalers.MinMaxScaler` object to transform the price data to a range between 0 and 1 and prepare it for the neural networks input
@@ -9,9 +22,9 @@
922

1023
## [0.1.0] - 2023-10-17
1124
### Initial Release:
12-
- created the project
13-
- created code to create random sinusoidal price data
14-
- created `state.State` object, which holds the state of the market
15-
- created `render.PygameRender` object, which renders the state of the market using `pygame` library
16-
- created `trading_env.TradingEnv` object, which is the environment for the agent to interact with
17-
- created `data_feeder.PdDataFeeder` object, which feeds the environment with data from a pandas dataframe
25+
- Created the project
26+
- Created code to create random sinusoidal price data
27+
- Created `state.State` object, which holds the state of the market
28+
- Created `render.PygameRender` object, which renders the state of the market using `pygame` library
29+
- Created `trading_env.TradingEnv` object, which is the environment for the agent to interact with
30+
- Created `data_feeder.PdDataFeeder` object, which feeds the environment with data from a pandas dataframe

README.md

Lines changed: 13 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -3,30 +3,37 @@ Reinforcement Learning package for Finance
33

44
# Environment Structure:
55
<p align="center">
6-
<img src="Tutorials\Documents\02_FinRock.jpg">
6+
<img src="Tutorials\Documents\03_FinRock.jpg">
77
</p>
88

9-
### Install requirements
9+
### Install requirements:
1010
```
1111
pip install -r requirements.txt
1212
pip install pygame
13+
pip install .
1314
```
1415

1516
### Create sinusoid data:
1617
```
1718
python bin/create_sinusoid_data.py
1819
```
1920

20-
### Run environment:
21+
### Train RL (PPO) agent on discrete actions:
2122
```
22-
python playing.py
23+
experiments/training_ppo_sinusoid.py
24+
```
25+
26+
### Test trained agent (Change path to the saved model):
27+
```
28+
experiments/testing_ppo_sinusoid.py
2329
```
2430

2531
### Environment Render:
2632
<p align="center">
27-
<img src="Tutorials\Documents\02_FinRock_render.png">
33+
<img src="Tutorials\Documents\03_FinRock_render.png">
2834
</p>
2935

3036
## Links to YouTube videos:
3137
- [Introduction to FinRock package](https://youtu.be/xU_YJB7vilA)
32-
- [Complete Trading Simulation Backbone](https://youtu.be/1z5geob8Yho)
38+
- [Complete Trading Simulation Backbone](https://youtu.be/1z5geob8Yho)
39+
- [Training RL agent on Sinusoid data](https://youtu.be/JkA4BuYvWyE)

Tutorials/03_Trading_with_RL.md

Lines changed: 43 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,43 @@
1+
# Complete Trading Simulation Backbone
2+
3+
### Environment Structure:
4+
<p align="center">
5+
<img src="Documents\03_FinRock.jpg">
6+
</p>
7+
8+
### Link to YouTube video:
9+
https://youtu.be/JkA4BuYvWyE
10+
11+
### Link to tutorial code:
12+
https://github.com/pythonlessons/FinRock/tree/0.3.0
13+
14+
### Download tutorial code:
15+
https://github.com/pythonlessons/FinRock/archive/refs/tags/0.3.0.zip
16+
17+
18+
### Install requirements:
19+
```
20+
pip install -r requirements.txt
21+
pip install pygame
22+
pip install .
23+
```
24+
25+
### Create sinusoid data:
26+
```
27+
python bin/create_sinusoid_data.py
28+
```
29+
30+
### Train RL (PPO) agent on discrete actions:
31+
```
32+
experiments/training_ppo_sinusoid.py
33+
```
34+
35+
### Test trained agent (Change path to the saved model):
36+
```
37+
experiments/testing_ppo_sinusoid.py
38+
```
39+
40+
### Environment Render:
41+
<p align="center">
42+
<img src="Documents\03_FinRock_render.png">
43+
</p>

Tutorials/Documents/03_FinRock.jpg

66.2 KB
Loading
173 KB
Loading

playing.py renamed to experiments/playing_random_sinusoid.py

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,6 @@
3131
while True:
3232
# simulate model prediction, now use random action
3333
action = np.random.randint(0, action_space)
34-
# action = 0 # always hold
3534

3635
state, reward, terminated, truncated, info = env.step(action)
3736
rewards += reward
Lines changed: 58 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,58 @@
1+
import numpy as np
2+
import pandas as pd
3+
import tensorflow as tf
4+
tf.get_logger().setLevel('ERROR')
5+
for gpu in tf.config.experimental.list_physical_devices('GPU'):
6+
tf.config.experimental.set_memory_growth(gpu, True)
7+
8+
from finrock.data_feeder import PdDataFeeder
9+
from finrock.trading_env import TradingEnv
10+
from finrock.render import PygameRender
11+
from finrock.scalers import MinMaxScaler
12+
from finrock.reward import simpleReward
13+
from finrock.metrics import DifferentActions, AccountValue
14+
15+
16+
df = pd.read_csv('Datasets/random_sinusoid.csv')
17+
df = df[-1000:]
18+
19+
pd_data_feeder = PdDataFeeder(df)
20+
21+
env = TradingEnv(
22+
data_feeder = pd_data_feeder,
23+
output_transformer = MinMaxScaler(min=pd_data_feeder.min, max=pd_data_feeder.max),
24+
initial_balance = 1000.0,
25+
max_episode_steps = 1000,
26+
window_size = 50,
27+
reward_function = simpleReward,
28+
metrics = [
29+
DifferentActions(),
30+
AccountValue(),
31+
]
32+
)
33+
34+
action_space = env.action_space
35+
input_shape = env.observation_space.shape
36+
pygameRender = PygameRender(frame_rate=120)
37+
38+
agent = tf.keras.models.load_model('runs/1701698276/ppo_sinusoid_actor.h5')
39+
40+
state, info = env.reset()
41+
pygameRender.render(info)
42+
rewards = 0.0
43+
while True:
44+
# simulate model prediction, now use random action
45+
# action = np.random.randint(0, action_space)
46+
prob = agent.predict(np.expand_dims(state, axis=0), verbose=False)[0]
47+
action = np.argmax(prob)
48+
49+
state, reward, terminated, truncated, info = env.step(action)
50+
rewards += reward
51+
pygameRender.render(info)
52+
53+
if terminated or truncated:
54+
print(rewards, info["metrics"]['account_value'])
55+
state, info = env.reset()
56+
rewards = 0.0
57+
pygameRender.reset()
58+
pygameRender.render(info)
Lines changed: 103 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,103 @@
1+
import numpy as np
2+
import pandas as pd
3+
import tensorflow as tf
4+
tf.get_logger().setLevel('ERROR')
5+
for gpu in tf.config.experimental.list_physical_devices('GPU'):
6+
tf.config.experimental.set_memory_growth(gpu, True)
7+
8+
from keras import layers, models
9+
10+
from finrock.data_feeder import PdDataFeeder
11+
from finrock.trading_env import TradingEnv
12+
from finrock.scalers import MinMaxScaler
13+
from finrock.reward import simpleReward
14+
from finrock.metrics import DifferentActions, AccountValue
15+
16+
from rockrl.utils.misc import MeanAverage
17+
from rockrl.utils.memory import Memory
18+
from rockrl.tensorflow import PPOAgent
19+
20+
df = pd.read_csv('Datasets/random_sinusoid.csv')
21+
df = df[:-1000] # leave 1000 for testing
22+
23+
pd_data_feeder = PdDataFeeder(df)
24+
25+
26+
env = TradingEnv(
27+
data_feeder = pd_data_feeder,
28+
output_transformer = MinMaxScaler(min=pd_data_feeder.min, max=pd_data_feeder.max),
29+
initial_balance = 1000.0,
30+
max_episode_steps = 1000,
31+
window_size = 50,
32+
reward_function = simpleReward,
33+
metrics = [
34+
DifferentActions(),
35+
AccountValue(),
36+
]
37+
)
38+
39+
action_space = env.action_space
40+
input_shape = env.observation_space.shape
41+
42+
43+
actor_model = models.Sequential([
44+
layers.Input(shape=input_shape, dtype=tf.float32),
45+
layers.Flatten(),
46+
layers.Dense(512, activation='elu'),
47+
layers.Dense(256, activation='elu'),
48+
layers.Dense(64, activation='elu'),
49+
layers.Dropout(0.5),
50+
layers.Dense(action_space, activation='softmax')
51+
])
52+
53+
critic_model = models.Sequential([
54+
layers.Input(shape=input_shape, dtype=tf.float32),
55+
layers.Flatten(),
56+
layers.Dense(512, activation='elu'),
57+
layers.Dense(256, activation='elu'),
58+
layers.Dense(64, activation='elu'),
59+
layers.Dropout(0.5),
60+
layers.Dense(1, activation=None)
61+
])
62+
63+
agent = PPOAgent(
64+
actor = actor_model,
65+
critic = critic_model,
66+
optimizer=tf.keras.optimizers.Adam(learning_rate=0.0002),
67+
batch_size=512,
68+
lamda=0.95,
69+
kl_coeff=0.5,
70+
c2=0.01,
71+
writer_comment='ppo_sinusoid',
72+
)
73+
74+
75+
memory = Memory()
76+
meanAverage = MeanAverage(best_mean_score_episode=1000)
77+
state, info = env.reset()
78+
rewards = 0.0
79+
while True:
80+
action, prob = agent.act(state)
81+
82+
next_state, reward, terminated, truncated, info = env.step(action)
83+
memory.append(state, action, reward, prob, terminated, truncated, next_state, info)
84+
state = next_state
85+
86+
if memory.done:
87+
history = agent.train(memory)
88+
mean_reward = meanAverage(np.sum(memory.rewards))
89+
90+
if meanAverage.is_best(agent.epoch):
91+
agent.save_models('ppo_sinusoid')
92+
93+
if history['kl_div'] > 0.05:
94+
agent.reduce_learning_rate(0.99, verbose=False)
95+
96+
print(agent.epoch, np.sum(memory.rewards), mean_reward, info["metrics"]['account_value'], history['kl_div'])
97+
agent.log_to_writer(info['metrics'])
98+
memory.reset()
99+
state, info = env.reset()
100+
101+
102+
if agent.epoch >= 10000:
103+
break

finrock/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
__version__ = "0.2.0"
1+
__version__ = "0.3.0"

finrock/metrics.py

Lines changed: 83 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,83 @@
1+
from .state import State
2+
3+
""" Metrics are used to track and log information about the environment.
4+
possible metrics:
5+
- DifferentActions,
6+
- AccountValue,
7+
- MaxDrawdown,
8+
- SharpeRatio,
9+
- AverageProfit,
10+
- AverageLoss,
11+
- AverageTrade,
12+
- WinRate,
13+
- LossRate,
14+
- AverageWin,
15+
- AverageLoss,
16+
- AverageWinLossRatio,
17+
- AverageTradeDuration,
18+
- AverageTradeReturn,
19+
"""
20+
21+
class Metric:
22+
def __init__(self, name: str="metric") -> None:
23+
self.name = name
24+
self.reset()
25+
26+
def update(self, state: State):
27+
assert isinstance(state, State), f'state must be State, received: {type(state)}'
28+
29+
return state
30+
31+
@property
32+
def result(self):
33+
raise NotImplementedError
34+
35+
def reset(self, prev_state: State=None):
36+
assert prev_state is None or isinstance(prev_state, State), f'prev_state must be None or State, received: {type(prev_state)}'
37+
38+
return prev_state
39+
40+
41+
class DifferentActions(Metric):
42+
def __init__(self, name: str="different_actions") -> None:
43+
super().__init__(name=name)
44+
45+
def update(self, state: State):
46+
super().update(state)
47+
48+
if not self.prev_state:
49+
self.prev_state = state
50+
else:
51+
if state.allocation_percentage != self.prev_state.allocation_percentage:
52+
self.different_actions += 1
53+
54+
self.prev_state = state
55+
56+
@property
57+
def result(self):
58+
return self.different_actions
59+
60+
def reset(self, prev_state: State=None):
61+
super().reset(prev_state)
62+
63+
self.prev_state = prev_state
64+
self.different_actions = 0
65+
66+
67+
class AccountValue(Metric):
68+
def __init__(self, name: str="account_value") -> None:
69+
super().__init__(name=name)
70+
71+
def update(self, state: State):
72+
super().update(state)
73+
74+
self.account_value = state.account_value
75+
76+
@property
77+
def result(self):
78+
return self.account_value
79+
80+
def reset(self, prev_state: State=None):
81+
super().reset(prev_state)
82+
83+
self.account_value = prev_state.account_value if prev_state else 0.0

0 commit comments

Comments
 (0)