-
-
Notifications
You must be signed in to change notification settings - Fork 237
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sudden deviations for converted neural network #422
Comments
Oh, that's a big deviation! Thanks for the very good report. I'll look into it and get back to you here. |
Oh, so many changes happened in TensorFlow during the year between version But, regarding versions: Here is a FROM python:3.12.4
RUN apt-get update
ENV DEBIAN_FRONTEND=noninteractive
RUN apt-get install -y build-essential cmake
RUN pip3 install tensorflow==2.16.1
RUN apt-get remove --purge -y cmake
RUN pip install cmake --upgrade
RUN git clone -b 'v0.2.24' --single-branch --depth 1 https://github.com/Dobiasd/FunctionalPlus && cd FunctionalPlus && mkdir -p build && cd build && cmake .. && make && make install
RUN git clone -b '3.4.0' --single-branch --depth 1 https://gitlab.com/libeigen/eigen.git && cd eigen && mkdir -p build && cd build && cmake .. && make && make install && ln -s /usr/local/include/eigen3/Eigen /usr/local/include/Eigen
RUN git clone -b 'v3.11.3' --single-branch --depth 1 https://github.com/nlohmann/json && cd json && mkdir -p build && cd build && cmake -DJSON_BuildTests=OFF .. && make && make install
RUN git clone -b 'v0.16.0' --single-branch --depth 1 https://github.com/Dobiasd/frugally-deep && cd frugally-deep && mkdir -p build && cd build && cmake .. && make && make install
WORKDIR /frugally-deep
RUN wget https://syncandshare.lrz.de/dl/fi13NA5BiRsof71omTc8Be/frugally-deep-issue.dir -q -O models.zip
RUN unzip models.zip
RUN echo '#include "fdeep/fdeep.hpp"\n\
#include <iostream>\n\
int main()\n\
{\n\
const auto model = fdeep::load_model("no_deviations.json");\n\
}' >> main.cpp
RUN g++ main.cpp -o main
RUN ./main Output:
The fact, that you don't get this error shows, that you are not using the latest frugally-deep version (with the right So I looked up what was the last frugally-deep release, that did support TensorFlow 2.13 (can be seen in FROM python:3.12.4
RUN apt-get update
ENV DEBIAN_FRONTEND=noninteractive
RUN apt-get install -y build-essential cmake
RUN pip3 install tensorflow==2.16.1
RUN apt-get remove --purge -y cmake
RUN pip install cmake --upgrade
RUN git clone -b 'v0.2.24' --single-branch --depth 1 https://github.com/Dobiasd/FunctionalPlus && cd FunctionalPlus && mkdir -p build && cd build && cmake .. && make && make install
RUN git clone -b '3.4.0' --single-branch --depth 1 https://gitlab.com/libeigen/eigen.git && cd eigen && mkdir -p build && cd build && cmake .. && make && make install && ln -s /usr/local/include/eigen3/Eigen /usr/local/include/Eigen
RUN git clone -b 'v3.11.3' --single-branch --depth 1 https://github.com/nlohmann/json && cd json && mkdir -p build && cd build && cmake -DJSON_BuildTests=OFF .. && make && make install
RUN git clone -b 'v0.15.30' --single-branch --depth 1 https://github.com/Dobiasd/frugally-deep && cd frugally-deep && mkdir -p build && cd build && cmake .. && make && make install
WORKDIR /frugally-deep
RUN wget https://syncandshare.lrz.de/dl/fi13NA5BiRsof71omTc8Be/frugally-deep-issue.dir -q -O models.zip
RUN unzip models.zip
RUN echo '#include "fdeep/fdeep.hpp"\n\
#include <iostream>\n\
int main()\n\
{\n\
const auto model = fdeep::load_model("deviations.json");\n\
}' >> main.cpp
RUN g++ main.cpp -o main
RUN ./main Output
So here, I could now try to reproduce the deviations you see with custom inputs, but as long as we're not sure about which versions you use, it would all be too fuzzy. Can you try updating the TensorFlow version on your server (or run it somewhere else), re-train, and make sure you're testing with the latest frugally-deep (plus dependencies)? |
I just tried the latest frugally-deep with TensorFlow So instead of TensorFlow But I can no longer support TensorFlow |
Thank you so much for your effort. Since getting the right modules installed on our training server is currently a bit problematic, and other servers would take too long for the final training result, I will use version 0.15.30 for the conversion and validation, and see whether this fixes it. Until then, thank you again for your quick response, and apologies for mixing the version numbers. |
Sounds good! And don't worry. |
Hi again :) We also checked again the model that we "updated" to TensorFlow 2.16 via the |
Sure, here is the check: FROM python:3.11.9
RUN apt-get update
ENV DEBIAN_FRONTEND=noninteractive
RUN apt-get install -y build-essential cmake
RUN pip3 install tensorflow==2.13
RUN apt-get remove --purge -y cmake
RUN pip install cmake --upgrade
RUN git clone -b 'v0.2.24' --single-branch --depth 1 https://github.com/Dobiasd/FunctionalPlus && cd FunctionalPlus && mkdir -p build && cd build && cmake .. && make && make install
RUN git clone -b '3.4.0' --single-branch --depth 1 https://gitlab.com/libeigen/eigen.git && cd eigen && mkdir -p build && cd build && cmake .. && make && make install && ln -s /usr/local/include/eigen3/Eigen /usr/local/include/Eigen
RUN git clone -b 'v3.11.3' --single-branch --depth 1 https://github.com/nlohmann/json && cd json && mkdir -p build && cd build && cmake -DJSON_BuildTests=OFF .. && make && make install
RUN git clone -b 'v0.15.30' --single-branch --depth 1 https://github.com/Dobiasd/frugally-deep && cd frugally-deep && mkdir -p build && cd build && cmake .. && make && make install
WORKDIR /frugally-deep
RUN wget https://syncandshare.lrz.de/dl/fi13NA5BiRsof71omTc8Be/frugally-deep-issue.dir -q -O models.zip
RUN unzip models.zip
RUN python3 keras_export/convert_model.py deviations.keras deviations_converted.json
RUN echo 'import tensorflow as tf \n\
import numpy as np \n\
model = tf.keras.models.load_model("deviations.keras", compile=False) \n\
data = [-0.15723465, -0.18722926, -0.14018555, -0.54661024, -0.3598269, -0.13201863, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.04876948, -0.10836805, 0.11795691, -0.3889477, -0.08791534, 0.26476863, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, -2.6917076, -0.47304294, -3.6160529, -0.9949142, -0.47304294, -4.5883393, -1.7162547, 1.7362174, 0.46023318, -1.0019623, -0.43958497, 0.21765545, 0.716984, 0.2811993, 0.4104652, -0.041849896, 0.2102925, -0.4721365, -0.7124588] \n\
result = model.predict(np.array([data])) \n\
print(result)' >> main.py
ADD "https://www.random.org/cgi-bin/randbyte?nbytes=10&format=h" skipcache
RUN echo '#include "fdeep/fdeep.hpp"' > main_single.cpp
RUN echo '#include <iostream>' >> main_single.cpp
RUN echo 'int main() \n\
{ \n\
const auto model = fdeep::load_model("deviations_converted.json"); \n\
std::vector<fdeep::float_type> inputs = {-0.15723465, -0.18722926, -0.14018555, -0.54661024, -0.3598269, -0.13201863, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.04876948, -0.10836805, 0.11795691, -0.3889477, -0.08791534, 0.26476863, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, -2.6917076, -0.47304294, -3.6160529, -0.9949142, -0.47304294, -4.5883393, -1.7162547, 1.7362174, 0.46023318, -1.0019623, -0.43958497, 0.21765545, 0.716984, 0.2811993, 0.4104652, -0.041849896, 0.2102925, -0.4721365, -0.7124588}; \n\
const auto results = model.predict({fdeep::tensor(model.get_dummy_input_shapes()[0], inputs)}); \n\
std::cout << fdeep::show_tensors(results) << std::endl; \n\
}' >> main_single.cpp
RUN g++ main_single.cpp -o main_single
RUN echo '#define FDEEP_FLOAT_TYPE double' > main_double.cpp
RUN cat main_single.cpp >> main_double.cpp
RUN g++ main_double.cpp -o main_double
RUN python3 main.py
RUN ./main_single
RUN ./main_double And indeed, I get the same results as you do: TensorFlow:
frugally-deep
Additionally, switching frugally.deep from single-precision floats to double-precision floats changes the result: frugally-deep with
In my experience, this can be an indicator, that the model might be missing some regularization, and thus tended towards very big or very small weights during training, which gives such numerical instability. Of course, on the other hand, it might also simply be a bug in this outdated frugally-deep version:
Yeah, same here. Conversion runs fine, and the automated test (during
That's a bit relieving to me because if different TensorFlow versions are also not aligned on the output of this model-and-input combination, it's another indicator of a lack of stability in the floating-point arithmetic.
That's another indicator, that something unwanted is happening during the training. Can you iterate over the weights (and maybe even intermediate tensors during prediction) and check if something looks suspiciously large? Or maybe you can try replacing the PReLU activation with ELU or sigmoid. |
The numeric instability might happen in the softmax layer. I checked the output of the last dense layer (before the softmax): from keras.models import Model
import tensorflow as tf
import numpy as np
model = tf.keras.models.load_model("deviations.keras", compile=False)
model2 = Model(inputs=model.input, outputs=model.get_layer("dense_2").output)
#print(model2.predict(np.random.rand(1, 49)))
data = [-0.15723465, -0.18722926, -0.14018555, -0.54661024, -0.3598269, -0.13201863, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.04876948, -0.10836805, 0.11795691, -0.3889477, -0.08791534, 0.26476863, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, -2.6917076, -0.47304294, -3.6160529, -0.9949142, -0.47304294, -4.5883393, -1.7162547, 1.7362174, 0.46023318, -1.0019623, -0.43958497, 0.21765545, 0.716984, 0.2811993, 0.4104652, -0.041849896, 0.2102925, -0.4721365, -0.7124588]
print(model2.predict(np.array([data])))
Softmax might not be very stable with such inputs: import tensorflow as tf
tf.nn.softmax([-3056833., -3053894.5, -3058091., -3056993.2, -3053893.5, -3062269.5])
Compare with just a small change (relative to the absolute numbers): tf.nn.softmax([-3056833., -3053892.5, -3058091., -3056993.2, -3053893.5, -3062269.5])
Maybe the used test input is a "stimulus" the model does not "experience" during training? Is this input vector part of the training set or validation set? Do you get good results on the validation set (loss similarly low as for the training set) during training? |
Hi again :) Based on your investigations we added a batch normalization layer right before the Softmax and at least after ~700 epochs, there are no substantial deviations. So thanks again, cheers |
Nice! 🎉 |
Hello! :)
After using frugally-deep for the last couple of months without any issues, we found some discrepancies when comparing a
keras
network and the converted.json
file (more details below).Currently, we are running on
convert_model.py
We train a relatively simple network:
The conversion (and implementation in our analysis framework) worked for the first network perfectly fine with deviations somewhere on the single-precision level (see
no_deviations.json
andno_deviations.keras
)After some updates, we converted the new model (without the dropout layer following the dense layer with 6 nodes). We came across some deviations in the order of 10e-1 (see
deviations.json
anddeviations.keras
).One strange behavior is that the deviations we observe in our analysis framework become larger the longer we train our model. Retraining the model did not solve the problem.
We tested 3 different inputs:
mean = 0
)[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, -1.8184667, -1.0182238, 0.15304942, 0.99804157, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0]
[-0.15723465, -0.18722926, -0.14018555, -0.54661024, -0.3598269, -0.13201863, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.04876948, -0.10836805, 0.11795691, -0.3889477, -0.08791534, 0.26476863, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, -2.6917076, -0.47304294, -3.6160529, -0.9949142, -0.47304294, -4.5883393, -1.7162547, 1.7362174, 0.46023318, -1.0019623, -0.43958497, 0.21765545, 0.716984, 0.2811993, 0.4104652, -0.041849896, 0.2102925, -0.4721365, -0.7124588]
For the first 2 examples, the deviations are small; however, for the last, we see large deviations:
deviations.json
output:[0.0, 0.7772999, 0.0, 0.0, 0.22270015, 0.0]
deviations.keras
output:[0.0, 0.26894143, 0.0, 0.0, 0.73105854, 0.0]
The 2 different networks in
.keras
and.json
format can be found here: https://syncandshare.lrz.de/getlink/fi13NA5BiRsof71omTc8Be/frugally-deep-issueFor evaluating the
.keras
network, we use the following code:For the frugally-deep evaluation, we use:
Any help would be highly appreciated, cheers
Martin
The text was updated successfully, but these errors were encountered: