You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
After a recent update of Python/TensorFlow/Keras, a minimal working example (MWE)
I used to run to produce samples from a target distribution does not produce such samples
anymore (close but clearly from a different distribution; see the attached screenshots below). After more than 24h searching the needle in the haystack, I'm still clueless. A colleague ran the MWE
under his setup on Windows with older versions of Python/TensorFlow/Keras and
obtained the correct samples as we always did. And so did another colleague on macOS. Our loss functions also produce very similar values, so we are still unsure whether it's keras' fit() or predict().
Here is the full story which, by now, I consider a 'bug' in the hope others may see this post when
realizing their networks don't train/predict properly anymore. The biggest issue is that this can remain entirely undetected as the loss functions don't indicate any problem... hence this post. Also, it means
that certain R packages (e.g. 'gnn') currently can work for some (my colleague) but
not others (myself) without any warning.
The MWE trains a single-hidden-layer neural network (NN) to act as a random
number generator (RNG). I pass iid N(0,1) samples through the NN and then
compare them to given dependent multivariate samples from some target
distribution (here: scaled ranks of absolute values of correlated normals) with the
loss function MMD (maximum mean discrepancy) that we implemented (jointly with the NN, this is
called a GMMN, a generating moment matching network).
The MWE below worked well with R running inside a virtual Python environment
(installed with Minimorge3 on my M1
14" MacBook Pro, first gen) and then TensorFlow installed via "conda install -c
apple tensorflow-deps" and "python -m pip install
tensorflow-metal". This was until about a year ago. When I wanted to run
the MWE again this week, I received:
Error: Valid installation of TensorFlow not found.
Python environments searched for 'tensorflow' package:
/usr/local/miniforge3/bin/python3.10
...
ModuleNotFoundError: No module named 'tensorflow'
You can install TensorFlow using the install_tensorflow() function.
After reinstalling Python/TensorFlow/Keras in the exact way as I used to do,
I still received this error. I then read on t-kalinowski/deep-learning-with-R-2nd-edition-code#3
that the following is the (now) recommended way to install Python/TensorFlow/Keras on all platforms,
so I did:
After that, the MWE ran again. However, it did not properly
generate samples from the target distribution anymore. I cannot go
back to older versions of the R package 'keras' as then the above error
appears again.
Here is the MWE with sessionInfo() etc., also for the outputs
of my colleague (on Windows). Again, he obtains very similar loss values,
but my generated samples look like normals, not asymmetric anymore as they should
(and his are fine).
library(tensorflow) # only needed for our custom MMD loss function
library(keras)
## Generate training data U (scaled ranks of absolute values of correlated normals)
d <- 2 # bivariate case
P <- matrix(0.9, nrow = d, ncol = d); diag(P) <- 1 # correlation matrix
A <- t(chol(P)) # Cholesky factor
ntrn <- 50000 # training data sample size
set.seed(271)
Z <- matrix(rnorm(ntrn * d), ncol = d) # generate N(0,1)
X <- abs(Z %*% t(A)) # absolute values of N(0,P) samples
U <- apply(X, 2, rank) / (ntrn + 1) # training data
if(FALSE)
plot(U, pch = ".") # ... to see the rough sample shape we are aiming for
## Helper function for custom MMD loss function (from 'gnn')
radial_basis_function_kernel <- function(x, y, bandwidth = 10^c(-3/2, -1, -1/2, -1/4, -1/8, -1/16))
{
x. <- tf$expand_dims(x, axis = 1L)
y. <- tf$expand_dims(y, axis = 0L)
dff2 <- tf$square(x. - y.)
dst2 <- tf$reduce_sum(dff2, axis = 2L)
dst2.vec <- tf$reshape(dst2, shape = c(1L, -1L))
fctr <- tf$convert_to_tensor(as.matrix(1 / (2 * bandwidth^2)), dtype = dst2.vec$dtype)
kernels <- tf$exp(-tf$matmul(fctr, b = dst2.vec))
tf$reshape(tf$reduce_mean(kernels, axis = 0L),
shape = tf$shape(dst2))
}
## Maximum mean discrepancy (MMD) loss function (from 'gnn')
MMD <- function(x, y, ...)
{
is.R.x <- !tf$is_tensor(x)
is.R.y <- !tf$is_tensor(y)
if(is.R.x) x <- tf$convert_to_tensor(x, dtype = "float64")
if(is.R.y) y <- tf$convert_to_tensor(y, dtype = "float64")
res <- tf$sqrt(tf$reduce_mean(radial_basis_function_kernel(x, y = x, ...)) +
tf$reduce_mean(radial_basis_function_kernel(y, y = y, ...)) -
2 * tf$reduce_mean(radial_basis_function_kernel(x, y = y, ...)))
if(is.R.x || is.R.y) as.numeric(res) else res
}
## Setup model
in.lay <- layer_input(shape = 2)
hid.lay <- layer_dense(in.lay, units = 300, activation = "relu")
out.lay <- layer_dense(hid.lay, units = 2, activation = "sigmoid")
model <- keras_model(in.lay, out.lay)
compile(model, optimizer = "adam", loss = function(x, y) MMD(x, y = y))
## Note:
## 1) Even with loss = "mse" I get different sample shapes than before
## (before they were scattered around (1/2, 1/2), now they seem to be normal around (1/2, 1/2))
## 2) With optimizer = optimizer_adam() instead of optimizer = "adam", I get the following
## (but training seems to remain unaffected):
## WARNING:absl:At this time, the v2.11+ optimizer `tf.keras.optimizers.Adam` runs slowly on M1/M2 Macs, please use the legacy Keras optimizer instead, located at `tf.keras.optimizers.legacy.Adam`.
## WARNING:absl:There is a known slowdown when using v2.11+ Keras optimizers on M1/M2 Macs. Falling back to the legacy Keras optimizer, i.e., `tf.keras.optimizers.legacy.Adam`.
## 3) I also tried optimizer = keras$optimizers$legacy$Adam() but it makes no difference
## Train
fit(model,
x = matrix(rnorm(ntrn * d), ncol = 2), # prior sample (here: training)
y = U, # training data to match (here: target data)
batch_size = 500, epochs = 10) # small values here, but enough so that we should see barely any difference between the generated samples and those in the training data
## Generate from trained model by passing through new prior samples
N <- matrix(rnorm(2000 * d), ncol = 2)
V <- predict(model, x = N)
## Compare with training data
layout(t(1:2))
opar <- par(pty = "s", pch = 20, cex = 0.7)
plot(U[1:2000,], xlab = expression(U[1]), ylab = expression(U[2]))
plot(V, xlab = expression(V[1]), ylab = expression(V[2])) # => not close anymore!
par(opar)
layout(1)
My colleague saved the weights and whole model he trained based on the above code
and if I pass 'N' through those then the samples are also off (more mass towards
the corners). Same the other way around (if I send him my trained model/weights).
What has possibly changed that could affect such a serious difference?
I saw on t-kalinowski/deep-learning-with-R-2nd-edition-code#6 (comment) that one might need to tell the optimizer before fit()
which variables it will be modifying... Is this related? But why are the losses
close yet the samples so different (they are always symmetric, more normally distributed
but should be asymmetric)
Below is more information about the two sessions (mine, my colleague). The only
difference we found is that if we both run class(model), then his output starts
with "keras.engine.training.Model" and mine with "keras.engine.functional.Functional"
(and then with "keras.engine.training.Model"). But even calling keras:::predict.keras.engine.training.Model()
directly did not make a difference. Nothing in the above code was modified from
the previous point this was working for me, so it must be due to a change in
TensorFlow/Keras (perhaps on macOS only?). Any hunch? I'm happy to provide (even) more
details.
I cleaned everything (Python, TensorFlow, Keras) and installed Keras the way I used to do again (essentially manually). Now it did run without errors but also produced wrong samples. I then realized that
is essentially doing the same thing -- and actually ignores whatever I install manually (conda, location of virtual environments...). I then looked into keras::install_keras() and realized that it uses version = "default" as default, which is 2.13 (but I know that my colleague used tensorflow 2.15 and got the code to produce the correct samples). I then did:
and it solved the problem! This is reproducible, if I call keras::install_keras() again, it fails again. As I mentioned before, note that there is nothing that indicates the failure (very similar loss values, no indication of wrong training).
Running your code, I can't reproduce the issue. I suspect that this ultimately boils down to an issue with older builds of tensorflow-metal or tensorflow-macos, the M1 specific builds provided by Apple. The early versions of them had some bugs related to random tensor generation, and it's possible the current versions have them too.
Fortunately, beginning with TF 2.16 (available as an RC now, should be in release soon), we'll no longer need to install tensorflow-macos, as the necessary parts to make tensorflow work on M1 macs are now part of the official build.
If for some reason you require running an older version of tensorflow on an M1 mac, you can skip tensorflow-macos and force the tensorflow-cpu package.
Hi,
After a recent update of Python/TensorFlow/Keras, a minimal working example (MWE)
I used to run to produce samples from a target distribution does not produce such samples
anymore (close but clearly from a different distribution; see the attached screenshots below). After more than 24h searching the needle in the haystack, I'm still clueless. A colleague ran the MWE
under his setup on Windows with older versions of Python/TensorFlow/Keras and
obtained the correct samples as we always did. And so did another colleague on macOS. Our loss functions also produce very similar values, so we are still unsure whether it's keras' fit() or predict().
Here is the full story which, by now, I consider a 'bug' in the hope others may see this post when
realizing their networks don't train/predict properly anymore. The biggest issue is that this can remain entirely undetected as the loss functions don't indicate any problem... hence this post. Also, it means
that certain R packages (e.g. 'gnn') currently can work for some (my colleague) but
not others (myself) without any warning.
The MWE trains a single-hidden-layer neural network (NN) to act as a random
number generator (RNG). I pass iid N(0,1) samples through the NN and then
compare them to given dependent multivariate samples from some target
distribution (here: scaled ranks of absolute values of correlated normals) with the
loss function MMD (maximum mean discrepancy) that we implemented (jointly with the NN, this is
called a GMMN, a generating moment matching network).
The MWE below worked well with R running inside a virtual Python environment
(installed with Minimorge3 on my M1
14" MacBook Pro, first gen) and then TensorFlow installed via "conda install -c
apple tensorflow-deps" and "python -m pip install
tensorflow-metal". This was until about a year ago. When I wanted to run
the MWE again this week, I received:
After reinstalling Python/TensorFlow/Keras in the exact way as I used to do,
I still received this error. I then read on t-kalinowski/deep-learning-with-R-2nd-edition-code#3
that the following is the (now) recommended way to install Python/TensorFlow/Keras on all platforms,
so I did:
After that, the MWE ran again. However, it did not properly
generate samples from the target distribution anymore. I cannot go
back to older versions of the R package 'keras' as then the above error
appears again.
Here is the MWE with sessionInfo() etc., also for the outputs
of my colleague (on Windows). Again, he obtains very similar loss values,
but my generated samples look like normals, not asymmetric anymore as they should
(and his are fine).
My colleague saved the weights and whole model he trained based on the above code
and if I pass 'N' through those then the samples are also off (more mass towards
the corners). Same the other way around (if I send him my trained model/weights).
What has possibly changed that could affect such a serious difference?
I saw on t-kalinowski/deep-learning-with-R-2nd-edition-code#6 (comment) that one might need to tell the optimizer before fit()
which variables it will be modifying... Is this related? But why are the losses
close yet the samples so different (they are always symmetric, more normally distributed
but should be asymmetric)
Below is more information about the two sessions (mine, my colleague). The only
difference we found is that if we both run class(model), then his output starts
with "keras.engine.training.Model" and mine with "keras.engine.functional.Functional"
(and then with "keras.engine.training.Model"). But even calling keras:::predict.keras.engine.training.Model()
directly did not make a difference. Nothing in the above code was modified from
the previous point this was working for me, so it must be due to a change in
TensorFlow/Keras (perhaps on macOS only?). Any hunch? I'm happy to provide (even) more
details.
Thanks & cheers,
Marius
Info about my session
Python, TensorFlow, Keras were installed via:
reticulate::py_config() shows:
sessionInfo() shows (note: I also installed the R package tensorflow in version 2.13.0
but it didn't solve the problem):
Info about my colleague's session
His reticulate::py_config() shows:
His sessionInfo() shows:
The text was updated successfully, but these errors were encountered: