Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong array dimension/index in tdZeroApprox.py #3

Open
vdouet opened this issue May 17, 2020 · 0 comments
Open

Wrong array dimension/index in tdZeroApprox.py #3

vdouet opened this issue May 17, 2020 · 0 comments

Comments

@vdouet
Copy link

vdouet commented May 17, 2020

Hi,

For the code for TD(0) approximation 'tdZeroApprox.py' in Unit-8-The-Mountaincar.

Line 19 when the numpy array 'tiledState' is created:

tiledState = np.zeros(nTiles*nTiles*nTiles)

Shouldn't it be:

tiledState = np.zeros((nTiles - 1)*(nTiles - 1)*nLayers)

Because if the number of layers is different from the number of tiles then the "tiledState" array will not have the right dimension.

Also, here nTiles = nBins = 8. So if we have 8 bins per axis we have 7 actual tiles per axis and np.digitize will return a number between 1 and 7 (because the if condition is strictly inferior/superior) so a total of 49 tiles.

Furthermore, I think idx should be idx = x * y + row * (nTiles-1)**2 - 1 because of the fact that np.digitize returns values between [1, 7] and not between [0, 7]. The values will then be for each row [0, 48], [49, 97], [98, 146], etc. Because with the original idx: idx = (x + 1) * (y + 1) + row * nTiles**2 - 1 there will be some index values that can never be used in tiledState (for example, it will always start at index n°3: (1+1) * (1+1) + 0 - 1 = 3).

I think it is because in the slide when you introduce the index equation during the chapter "Linear methods and tiling" the first possible value for x and y is (x,y) = (0,0). But the first value possible with np.digitize is (x,y) = (1,1). If we want the index equation to still be true we should maybe write:

x = np.digitize(position, pos_bins[row]) - 1
y = np.digitize(velocity, vel_bins[row]) - 1

and then we can write:

 idx = (x + 1) * (y + 1) + (row * n_tiles**2) - 1

But here is the modified code I used:

def tile_state(pos_bins, vel_bins, obs, n_bins=8, n_layers=8):

    position, velocity = obs

    # The number of tiles per axis is the number of bins per axis - 1
    n_tiles = n_bins - 1
    tiled_state = np.zeros(n_tiles * n_tiles * n_layers)

    for row in range(n_layers):

        if position > pos_bins[row][0] and \
                position < pos_bins[row][n_bins - 1]:

            if velocity > vel_bins[row][0] and \
                    velocity < vel_bins[row][n_bins - 1]:

                x = np.digitize(position, pos_bins[row])
                y = np.digitize(velocity, vel_bins[row])
                idx = (x * y) + (row * n_tiles**2) - 1

                tiled_state[idx] = 1.0

            else:
                break
        else:
            break

    return tiled_state

And I find the following result:

Update rule

But maybe there is something I did not understand?

Best regards,
Victor Douet

@vdouet vdouet changed the title Wrong array dimension in tdZeroApprox.py Wrong array dimension/index in tdZeroApprox.py May 17, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant