- Add heterogeneous cluster changes
- Fix/ci
- use tensorflow 2.3.1 and add data parallel integ test
- upgrade to sagemaker-training 3.7.1
- include granular buildspecs for dlc and generic cpu and gpu testing
- include sm-data-distributed and upgrade dependencies
- workaround to print stderr when capture_error is True
- propagate log level
- add condition to avoid error when 'model_dir' is None
- add a missing comma
- call entry_point.run with capture_error=True
- tensorflow 2.3 support
- add integration test for MPI env vars propagation
- add single-instance, multi-process Horovod test for local GPU
- Update README.rst
- Make docker folder read only, remove unused tests.
- fix. bump version of sagemaker-training for script entry point fix.
- update image-building instructions
- Bump version of sagemaker-training for typing fix
- Replace sagemaker-containers with sagemaker-training
- Python 3.7 support
- fix typo in release buildspec.
- use tox in buildspecs
- remove extra newlines for consistency
- version bump
- Revert the version of SMdebug in TF2
- version bump
- install sagemaker-tensorflow-toolkit from PyPI.
- Upgrading the pyyaml version
- Allowing arguments for deep_learning_container.py for tf2
- refactor toolkit tests.
- copy tests to test-toolkit folder.
- Adding of deep_learning_container.py in Tf2
- Added skip marker
- smdebug 0.7.1
- Revert "Revert "add py2 fixture for tf2.x test (#310)" (#319)"
- Revert "add py2 fixture for tf2.x test (#310)"
- Update sagemaker-python-sdk in test env
- add missing fixtures
- Add default timeout
- install experiments with python 3
- add py2 fixture for tf2.x test
- Add smdebug to TF 2.x
- remove python 3.6 specific f strings
- Updating LD_LIBRARY_PATH for tf-2.0 Dockerfile.gpu
- install SageMaker Python SDK into Python 3 images
- Add sagemaker-experiments
- update sagemaker-tensorflow version to 2.1
- add pipemode integ tests.
- update: add r2.1 dockerfiles
- add 2.0.1 dockerfiles
- Add release to PyPI. Change package name to sagemaker-tensorflow-training.
- remove sagemaker_experiments from mnist script
- Update package name in the dockerfile for tf-2.0.
- Merge branch 'master' of https://github.com/aws/sagemaker-tensorflow-containers into tf-2
- Merge branch 'master' of https://github.com/aws/sagemaker-tensorflow-containers into merge-master-into-tf-2
- Revert "Merge 'master' branch into 'tf-2' branch. (#279)"
- Merge 'master' branch into 'tf-2' branch.
- update README.rst
- Add twine check during PR.
- properly fail build if has-matching-changes fails
- update no-p2 and no-p3 regions.
- fix horovod mnist script
- bump sagemaker-containers version to 2.4.10
- add hyperparameter tuning test
- fix integ test errors when running with py2
- only run one test during deployment
- resolve pluggy version conflict
- remove non-ascii character in CHANGELOG
- remove extra comma in buildspec-release.yml
- Parameterize processor and py_version for test runs
- use unique name for integration job hyperparameter tuning job
- fix flake8 errors and add flake8 run in buildspec.yml
- skip gpu SageMaker test in regions with limited amount of p2/3 instances
- skip setup on second remote run
- add setup file back
- add branch name to remote gpu test run command
- remove setup file in release build gpu test
- ignore coverage in release build tests
- use tar file name as framework_support_installable in build_all.py
- Add release build
- Explicitly set lower-bound for botocore version
- Pull request to test codebuild trigger on TensorFlow script mode
- Update integ test for checking Python version
- Upgrade to TensorFlow 1.13.1
- Add mpi4py to pip installs
- Add SageMaker integ test for hyperparameter tuning model_dir logic
- Add Horovod benchmark
- Fix model_dir adjustment for hyperparameter tuning jobs
- change model_dir to training job name if it is for tuning.
- Tune test_s3_plugin test
- Skip the s3_plugin test before new binary released
- Add model saving warning at end of training
- Specify region when creating S3 resource in integ tests
- Fix instance_type fixture setup for tests
- Read framework version from Python SDK for integ test default
- Fix SageMaker Session handling in Horovod test
- Configure encoding to be utf-8
- Use the test argement framework_version in all tests
- Fix broken test test_distributed_mnist_no_ps
- Add S3 plugin tests
- Skip horovod local CPU test in GPU instances
- Add Horovod tests
- Skip horovod integration tests
- TensorFlow 1.12 and Horovod support
- Deprecate get_marker. Use get_closest_marker instead
- Force parameter server to run on CPU
- Add python-dev and build-essential to Dockerfiles
- Update script_mode_train_any_tf_script_in_sage_maker.ipynb
- Skip keras local mode test on gpu and use random port for serving in the test
- Fix Keras test
- Create parameter server in different thread
- Add Keras support
- Fix broken unit tests
- Unset CUDA_VISIBLE_DEVICES for worker processes
- Disable GPU for parameter process
- Set parameter process waiting to False
- Update sagemaker containers
- GPU fix
- Set S3 environment variables
- Add CI configuration files
- Add distributed training support
- Edited the tf script mode notebook
- Add benchmarking script
- Add Script Mode example
- Add integration tests to run training jobs with sagemaker
- Add tox.ini and configure coverage and flake runs
- Scriptmode single machine training implementation
- Update region in s3 boto client in serve
- Update readme with instructions for 1.9.0 and above
- Fix deserialization of dicts for json predict requests
- Add dockerfile and update test for tensorflow 1.10.0
- Support tensorflow 1.9.0
- Add integ tests to verify that tensorflow in gpu-image can access gpu-devices.
- train on 3 epochs for pipe mode test
- Change error classes used by _default_input_fn() and _default_output_fn()
- Changing assertion to check only existence
- Install sagemaker-tensorflow from pypi. Add MKL environment variables for TF 1.8
- get most recent saved model to export
- pip install tensorflow 1.8 in 1.8 cpu image
- install tensorflow extensions
- upgrade cpu binaries in docker build
- Force upgrade of the framework binaries to make sure the right binaries are installed.
- Add Pillow to pip install list
- Increase train steps for cifar distributed test to mitigate race condition
- Add TensorFlow 1.8 dockerfiles
- Add TensorFlow 1.7 dockerfiles
- Explain how to download tf binaries from PyPI
- Allow training without S3
- Fix hyperparameter name for detecting a tuning job
- Checkout v1.4.1 tag instead of r1.4 branch
- Move processing of requirements file in.
- Generate checkpoint path using TRAINING_JOB_NAME environment variable if needed
- Wrap user-provided model_fn to pass arguments positionally (maintains compatibility with existing behavior)
- Add more unit tests for trainer, fix all and rename train.py to avoid import conflict
- Use regional endpoint for S3 client
- Update README.rst
- Pass input_channels to eval_input_fn if defined
- Fix setup.py to refer to renamed README
- Add test and build instructions
- Fix year in license headers
- Add TensorFlow 1.6
- Add test instructions in README
- Add container support to install_requires
- Add Apache license headers
- Use wget to install tensorflow-model-server
- Fix file path for integ test
- Fix s3_prefix path in integ test
- Fix typo in path for integ test
- Add input_channels to train_input_fn interface.
- Update logging and make serving_input_fn optional.
- remove pip install in tensorflow training
- Modify integration tests to run nvidia-docker for gpu
- add h5py for keras models
- Add local integ tests & resources
- Restructure repo to use a directory per TF version for dockerfiles
- Rename "feature_map" variables to "feature_dict" to avoid overloading it with the ML term "feature map"
- Copying in changes from internal repo:
- Add functional test
- Fix FROM image names for final build dockerfiles
- Add dockerfiles for building our production images (TF 1.4)
- GPU Dockerfile and setup.py fixes
- Add base image Dockerfiles for 1.4
- Merge pull request #1 from aws/mvs-first-commit
- first commit
- Updating initial README.md from template
- Creating initial file from template
- Creating initial file from template
- Creating initial file from template
- Creating initial file from template
- Creating initial file from template
- Initial commit
- skip setup on second remote run
- add setup file back
- add branch name to remote gpu test run command
- remove setup file in release build gpu test
- ignore coverage in release build tests
- use tar file name as framework_support_installable in build_all.py
- Add release build
- Explicitly set lower-bound for botocore version
- Pull request to test codebuild trigger on TensorFlow script mode
- Update integ test for checking Python version
- Upgrade to TensorFlow 1.13.1
- Add mpi4py to pip installs
- Add SageMaker integ test for hyperparameter tuning model_dir logic
- Add Horovod benchmark
- Fix model_dir adjustment for hyperparameter tuning jobs
- change model_dir to training job name if it is for tuning.
- Tune test_s3_plugin test
- Skip the s3_plugin test before new binary released
- Add model saving warning at end of training
- Specify region when creating S3 resource in integ tests
- Fix instance_type fixture setup for tests
- Read framework version from Python SDK for integ test default
- Fix SageMaker Session handling in Horovod test
- Configure encoding to be utf-8
- Use the test argement framework_version in all tests
- Fix broken test test_distributed_mnist_no_ps
- Add S3 plugin tests
- Skip horovod local CPU test in GPU instances
- Add Horovod tests
- Skip horovod integration tests
- TensorFlow 1.12 and Horovod support
- Deprecate get_marker. Use get_closest_marker instead
- Force parameter server to run on CPU
- Add python-dev and build-essential to Dockerfiles
- Update script_mode_train_any_tf_script_in_sage_maker.ipynb
- Skip keras local mode test on gpu and use random port for serving in the test
- Fix Keras test
- Create parameter server in different thread
- Add Keras support
- Fix broken unit tests
- Unset CUDA_VISIBLE_DEVICES for worker processes
- Disable GPU for parameter process
- Set parameter process waiting to False
- Update sagemaker containers
- GPU fix
- Set S3 environment variables
- Add CI configuration files
- Add distributed training support
- Edited the tf script mode notebook
- Add benchmarking script
- Add Script Mode example
- Add integration tests to run training jobs with sagemaker
- Add tox.ini and configure coverage and flake runs
- Scriptmode single machine training implementation
- Update region in s3 boto client in serve
- Update readme with instructions for 1.9.0 and above
- Fix deserialization of dicts for json predict requests
- Add dockerfile and update test for tensorflow 1.10.0
- Support tensorflow 1.9.0
- Add integ tests to verify that tensorflow in gpu-image can access gpu-devices.
- train on 3 epochs for pipe mode test
- Change error classes used by _default_input_fn() and _default_output_fn()
- Changing assertion to check only existence
- Install sagemaker-tensorflow from pypi. Add MKL environment variables for TF 1.8
- get most recent saved model to export
- pip install tensorflow 1.8 in 1.8 cpu image
- install tensorflow extensions
- upgrade cpu binaries in docker build
- Force upgrade of the framework binaries to make sure the right binaries are installed.
- Add Pillow to pip install list
- Increase train steps for cifar distributed test to mitigate race condition
- Add TensorFlow 1.8 dockerfiles
- Add TensorFlow 1.7 dockerfiles
- Explain how to download tf binaries from PyPI
- Allow training without S3
- Fix hyperparameter name for detecting a tuning job
- Checkout v1.4.1 tag instead of r1.4 branch
- Move processing of requirements file in.
- Generate checkpoint path using TRAINING_JOB_NAME environment variable if needed
- Wrap user-provided model_fn to pass arguments positionally (maintains compatibility with existing behavior)
- Add more unit tests for trainer, fix all and rename train.py to avoid import conflict
- Use regional endpoint for S3 client
- Update README.rst
- Pass input_channels to eval_input_fn if defined
- Fix setup.py to refer to renamed README
- Add test and build instructions
- Fix year in license headers
- Add TensorFlow 1.6
- Add test instructions in README
- Add container support to install_requires
- Add Apache license headers
- Use wget to install tensorflow-model-server
- Fix file path for integ test
- Fix s3_prefix path in integ test
- Fix typo in path for integ test
- Add input_channels to train_input_fn interface.
- Update logging and make serving_input_fn optional.
- remove pip install in tensorflow training
- Modify integration tests to run nvidia-docker for gpu
- add h5py for keras models
- Add local integ tests & resources
- Restructure repo to use a directory per TF version for dockerfiles
- Rename "feature_map" variables to "feature_dict" to avoid overloading it with the ML term "feature map"
- Copying in changes from internal repo:
- Add functional test
- Fix FROM image names for final build dockerfiles
- Add dockerfiles for building our production images (TF 1.4)
- GPU Dockerfile and setup.py fixes
- Add base image Dockerfiles for 1.4
- Merge pull request #1 from aws/mvs-first-commit
- first commit
- Updating initial README.md from template
- Creating initial file from template
- Creating initial file from template
- Creating initial file from template
- Creating initial file from template
- Creating initial file from template
- Initial commit