Navigate to distributed-training-workshop > notebooks > part-2-sagemaker You should see the following files:
part-2-sagemaker/ ├── cifar10-sagemaker-distributed.ipynb └── code ├── cifar10-multi-gpu-horovod-sagemaker.py └── model_def.py
|cifar10-sagemaker-distributed.ipynb||This jupyter notebook contains code to define and kick off a SageMaker training job|
|code||This directory contains the training scrip and other training script dependencies|
SageMaker is a fully-managed service, which means when you kick off a training job using the SageMaker SDK in the
cifar10-sagemaker-distributed.ipynb notebook, few different things happen behind the scene
codedirectory into the container
MPIjob with the right settings so that workers can communicate with each other.
In addition, SageMaker does a lot more to ensure that the jobs run optimally and you get the best perfomance out-of-the box. As a user you don’t have to worry about managing machine learning infrastructure.