Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mistake #4

Open
L-EARN opened this issue Apr 5, 2021 · 11 comments
Open

Mistake #4

L-EARN opened this issue Apr 5, 2021 · 11 comments

Comments

@L-EARN
Copy link

L-EARN commented Apr 5, 2021

Hi,

Big mistake right here :
https://github.com/pythonlessons/RL-Bitcoin-trading-bot/blob/main/RL-Bitcoin-trading-bot_7/RL-Bitcoin-trading-bot_7.py#L309

Basically you set your reward as the difference in value of the trades. Which is fine.

The value of a trade is amount of the thing multplied by the price of the thing. Fine.

However, in this particular line, you have made a mistake (by copy pasting your code I believe, it happens).
Because you set the amount of the buy trade the same as the amount of the previous sell trade (= bypassing the fees).

It is only one character at line 309 ... and it dramastically changes your results and the way the model learns.

Still, I would like to thank you for providing your lessons and code online. Sincerely.
I am doing other personal things using pytorch and you're inspiring a lot.
It is a great experience for me to try and translate your code and other youtubers work into pytorch.
You have no idea how much I learned faster thanks to you.
I like a lot your blog posts, mathematical explanations.

Keep it up.

@pythonlessons
Copy link
Owner

Hi, thanks for your comment.

I still don't get where is my mistake, maybe you can write how you think it should look like?

You don't even imagine how satisfied I feel hearing such great feedback, thank you!

@L-EARN
Copy link
Author

L-EARN commented Apr 5, 2021

reward = self.trades[-2]['total']*self.trades[-2]['current_price'] - self.trades[-2]['total']*self.trades[-1]['current_price']

should be
reward = self.trades[-2]['total']*self.trades[-2]['current_price'] - self.trades[-1]['total']*self.trades[-1]['current_price']

@pythonlessons
Copy link
Owner

reward = self.trades[-2]['total']*self.trades[-2]['current_price'] - self.trades[-2]['total']*self.trades[-1]['current_price']

should be
reward = self.trades[-2]['total']*self.trades[-2]['current_price'] - self.trades[-1]['total']*self.trades[-1]['current_price']

Thanks, will test the impact later

@L-EARN
Copy link
Author

L-EARN commented Apr 5, 2021

If you want, here is a little piece of code of my own.
So yeah I'm trying to make a neural network who can hold the most profitable crypto.
(Which is a different mindset from the buy/do nothing/sell. Here, I just hold something. And when I want to hold another thing, a transaction is made)

Semantic :

  • If I want to hold the same thing, return.
  • If I need to do another transaction to hold the new thing, recurse.
  • Get the price ratio
  • Save my previous trade value for later (to have the reward)
  • Apply fees to my current holding
  • Update my previous informations (if I'm not recursing)
  • Update my current currency amount
  • Update my current currency name
  • And basically return the relative percent change of the old trade value against this current trade value.
    (The relative percent change is (v2 - v1) / abs(v1). abs does nothing really. And it never exceeds -1 <-> 1 for my dataset)

With such a function you can easily put any currency in it. Even dollar.
Because 1 dollar is 1 dollar. If I hold x amount of dollar then dollar is just ... dollar.
Or maybe if you want you can set the dollar value as the inflation value in the dataset ... Could be fun.

def trade(self, trade_type: int, update_last: bool = True) -> float:
		"""
		Execute action.
		Exchange our currency to another one, apply fees.
		:param trade_type: The wanted currency index number as defined in the global dataframe
		:return: The ratio of the exchange (relative percent change)
		"""
		wanted_currency: Final[str] = self.df.currencies[trade_type]

		if wanted_currency == self.current_currency:
			return 0

		if wanted_currency in ["etheur", "ltceur"] and self.current_currency in ["etheur", "ltceur"]:
			self.trade(1, update_last=False)  # Trade to bitcoin first because there is no relation between eth and ltc

		my_currency_price = self.df.price(self.current_currency, self.current_step)
		new_currency_price = self.df.price(wanted_currency, self.current_step)
		ratio = my_currency_price / new_currency_price

		last_trade_value = self.last_amount * self.last_price

		if self.fees:
			self.amount *= 0.995  # Apply fee

		if update_last:
			self.last_amount = self.amount
			self.last_price = my_currency_price
			self.last_currency = self.current_currency

		self.amount *= ratio

		self.current_currency = wanted_currency

		current_trade_value = self.amount * new_currency_price

		trade_ratio = data.PctChange(last_trade_value, current_trade_value)

		with open("W:\\trades.log", "a") as outfile:
			outfile.write(f"Order_id:{self.episode_orders}:step{self.current_step}::{self.current_currency}:amount={self.amount}:price={new_currency_price}:Ratio={trade_ratio}\n")

		self.episode_orders += 1

		return trade_ratio

@TanJeremy
Copy link

reward = self.trades[-2]['total']*self.trades[-2]['current_price'] - self.trades[-2]['total']*self.trades[-1]['current_price']

should be
reward = self.trades[-2]['total']*self.trades[-2]['current_price'] - self.trades[-1]['total']*self.trades[-1]['current_price']

I also saw this but I wanted to finish the tutorial before making any change, and forgot later on, thx for the reminder

@L-EARN
Copy link
Author

L-EARN commented Apr 5, 2021

I also saw this but I wanted to finish the tutorial before making any change, and forgot later on, thx for the reminder

You'll see how much harder it is to go above average profit-per-episode just by fixing that.

@since4satang
Copy link

since4satang commented Sep 20, 2021

I think that original version is correct.
Let me explain,
this code is for previous selling and buying now

if self.trades[-1]['type'] == "buy" and self.trades[-2]['type'] == "sell":
reward = self.trades[-2]['total']*self.trades[-2]['current_price'] - self.trades[-2]['total']*self.trades[-1]['current_price']

If this code is changed to
reward = self.trades[-2]['total']*self.trades[-2]['current_price'] - self.trades[-1]['total']*self.trades[-1]['current_price']

there won't be any reward.
example)
previous : sell 1BTC at $40000
now : buy 2BTC at $20000 or 0.5BTC at $80000
than,
reward : 1×40000- 2×20000 = 0 or 1×40000 - 0.5×80000 = 0

reward will always be zero.
(assumed that fee is ignored but it is not significant)

The orignal code is correct because it tells that it was good choice 'selling' at higher price than current price or it was bad choice 'selling' at lower price than current price.

example)
previous : sell 1BTC at $40000
now(price) : $20000 or $80000
than,
reward : 1×40000- 1×20000 = 20000 (Good sold = if i hadn't sold it , I would lose more money)
or
1×40000 - 1×80000 = -20000 (Bad sold = If I hadn't sold it, I would get more money)

sorry for poor english

@HoaxParagon
Copy link

HoaxParagon commented Sep 29, 2021

I think that original version is correct. Let me explain, this code is for previous selling and buying now

if self.trades[-1]['type'] == "buy" and self.trades[-2]['type'] == "sell": reward = self.trades[-2]['total']*self.trades[-2]['current_price'] - self.trades[-2]['total']*self.trades[-1]['current_price']

If this code is changed to reward = self.trades[-2]['total']*self.trades[-2]['current_price'] - self.trades[-1]['total']*self.trades[-1]['current_price']

there won't be any reward. example) previous : sell 1BTC at $40000 now : buy 2BTC at $20000 or 0.5BTC at $80000 than, reward : 1×40000- 2×20000 = 0 or 1×40000 - 0.5×80000 = 0

reward will always be zero. (assumed that fee is ignored but it is not significant)

The orignal code is correct because it tells that it was good choice 'selling' at higher price than current price or it was bad choice 'selling' at lower price than current price.

example) previous : sell 1BTC at $40000 now(price) : $20000 or $80000 than, reward : 1×40000- 1×20000 = 20000 (Good sold = if i hadn't sold it , I would lose more money) or 1×40000 - 1×80000 = -20000 (Bad sold = If I hadn't sold it, I would get more money)

sorry for poor english

I think this is correct as well; it's the price from the (second to last purchase/sale * quantity) ie, the total, compared to the latest at the previous price. What I can't get behind is the use of a negative reward for a sale followed by a purchase. I think that's limiting.

@HoaxParagon
Copy link

some one can helpe me to connect the rl-bitcoin.bot to my exchange crypto.com?i begin whit small amount 50$!i love this bot but i don't know how to connect to trade live,i'm new in python..

Seems like the perfect opportunity to learn more about it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants
@L-EARN @TanJeremy @pythonlessons @since4satang @HoaxParagon and others