Installing TensorFlow on an AWS EC2 P2 GPU Instance

Amazon just recently released new AWS EC2 P2 instance types featuring K80 GPUs. In this blog post I’ll cover how to configure and install TensorFlow on a Ubuntu 16.04 targeting these new GPU instance types. This article is building upon previous blog post by John Ramey on installing TensorFlow for older AWS GPU instance types.

The following components will be installed:

  • Essentials
  • CUDA 8.0
  • Download cuDNN v5.1 (August 10, 2016), for CUDA 8.0
  • Bazel 0.3.2 (Java 8 is a dependency)
  • TensorFlow 0.11

This tutorial was written for a p2.xlarge instance type, however all AWS P2 instance types should work. You’re going to need 16 GiB of storage on the root partition to follow this guide.

Installing dependencies & build tools

First we need to install the dependencies and basic build tools required by TensorFlow. To do this execute the following commands from the ubuntu command prompt.

sudo apt-get update
sudo apt-get upgrade
sudo apt-get install -y build-essential git python-pip libfreetype6-dev libxft-dev libncurses-dev libopenblas-dev gfortran python-matplotlib libblas-dev liblapack-dev libatlas-base-dev python-dev python-pydot linux-headers-generic linux-image-extra-virtual unzip python-numpy swig python-pandas python-sklearn unzip wget pkg-config zip g++ zlib1g-dev libcurl3-dev
sudo pip install -U pip

Installing CUDA 8

Second we wish to use the latest version of the CUDA library with the instance type. The latest version at the date of writing this post is 8.0. To install CUDA download the package from NVIDIA and install it.

wget https://developer.nvidia.com/compute/cuda/8.0/prod/local_installers/cuda-repo-ubuntu1604-8-0-local_8.0.44-1_amd64-deb
sudo dpkg -i cuda-repo-ubuntu1604-8-0-local_8.0.44-1_amd64-deb
rm cuda-repo-ubuntu1604-8-0-local_8.0.44-1_amd64-deb
sudo apt-get update
sudo apt-get install -y cuda

Installing cuDNN

Third we want to download and install latest version of cuDNN. Downloading cuDNN requires logging into NVIDIA developer site, so we can’t use wget to fetch the files. Download the following files from NVIDIA and upload them to your AWS instance.

cuDNN v5.1 Runtime Library for Ubuntu14.04 (Deb)
cuDNN v5.1 Developer Library for Ubuntu14.04 (Deb)

After having uploaded the files to your server, install them using the following commands.

sudo dpkg -i libcudnn5_5.1.5-1+cuda8.0_amd64.deb
sudo dpkg -i libcudnn5-dev_5.1.5-1+cuda8.0_amd64.deb

Configure the Environment

Finally we need to configure the environment to work with CUDA and cuDNN. Add to the following lines to your ~/.profile file.

export CUDA_HOME=/usr/local/cuda
export CUDA_ROOT=/usr/local/cuda
export PATH=$PATH:$CUDA_ROOT/bin
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CUDA_ROOT/lib64

Installing Bazel

To build TensorFlow we need to install Bazel. Bazel itself depends on Java 8, so we will begin by installing Java 8.

sudo add-apt-repository -y ppa:webupd8team/java
sudo apt-get update
# Hack to silently agree license agreement
echo debconf shared/accepted-oracle-license-v1-1 select true | sudo debconf-set-selections
echo debconf shared/accepted-oracle-license-v1-1 seen true | sudo debconf-set-selections
sudo apt-get install -y oracle-java8-installer

After you have completed installing Java, you can move forward installing Bazel.

echo "deb [arch=amd64] http://storage.googleapis.com/bazel-apt stable jdk1.8" | sudo tee /etc/apt/sources.list.d/bazel.list
curl https://storage.googleapis.com/bazel-apt/doc/apt-key.pub.gpg | sudo apt-key add -
sudo apt-get update && sudo apt-get install bazel
sudo apt-get upgrade bazel

 Note: You will need a Bazel version 0.3.2. The build will fail with Bazel version 0.4. The version installed using the technique below provides Bazel 0.3.2 at the time of writing this post. To check bazel version type the following command:

bazel version

Building and Installing TensorFlow

Now all our toolchain for building TensorFlow is ready. We want to build TensorFlow from the sources to be able to use the latest CUDA and cuDNN libraries. The first step is to clone the sources from GitHub.

git clone --recurse-submodules https://github.com/tensorflow/tensorflow

This will download the latest development copy of TensorFlow. If you want a specific release, you can switch to a specific branch.

Next we need to configure TensorFlow build. To do it execute the following two lines of code

cd tensorflow
TF_UNOFFICIAL_SETTING=1 ./configure

Answer the default for all other questions except the following

Do you wish to build TensorFlow with GPU support? [y/N] y
Please specify the Cuda SDK version you want to use, e.g. 7.0. [Leave empty to use system default]: 8.0

Please specify the Cudnn version you want to use. [Leave empty to use system default]: 5.1.5

Please specify the location where cuDNN 5.1.5 library is installed. Refer to README.md for more details. [Default is /usr/local/cuda]: /usr/lib/x86_64-linux-gnu

Please specify a list of comma-separated Cuda compute capabilities you want to build with.
You can find the compute capability of your device at: https://developer.nvidia.com/cuda-gpus.
Please note that each additional compute capability significantly increases your build time and binary size.
[Default is: "3.5,5.2"]: 3.7

Now we have configured the TensorFlow build and can move forward building it. This will take quite a long time.

bazel build -c opt --config=cuda //tensorflow/cc:tutorials_example_trainer
bazel build -c opt --config=cuda //tensorflow/tools/pip_package:build_pip_package
bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg

Finally we can install TensorFlow. Note that the package filename below depends on the version of TensorFlow you checked out from the Git repository.

sudo pip install --upgrade /tmp/tensorflow_pkg/tensorflow-0.11.0rc0-py2-none-any.whl

Validating the Installation

You should now be completed with the installation process. To validate your install try running the mnist convolutional neural network example shipping with TensorFlow.

python ~/tensorflow/tensorflow/models/image/mnist/convolutional.py

If TensorFlow got installed correctly you should see something like this:

I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcublas.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcudnn.so.5.1.5 locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcufft.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcurand.so.8.0 locally
Extracting data/train-images-idx3-ubyte.gz
Extracting data/train-labels-idx1-ubyte.gz
Extracting data/t10k-images-idx3-ubyte.gz
Extracting data/t10k-labels-idx1-ubyte.gz
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:925] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_device.cc:951] Found device 0 with properties: 
name: Tesla K80
major: 3 minor: 7 memoryClockRate (GHz) 0.8235
pciBusID 0000:00:1e.0
Total memory: 11.17GiB
Free memory: 11.11GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:972] DMA: 0 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] 0:   Y 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Tesla K80, pci bus id: 0000:00:1e.0)
Initialized!

Congratulation. You now should have TensorFlow set up for the new P2 instance up. Using p2.xlarge instance TensorFlow should be about 4x faster than with the older g2.2xlarge instance.

Related Posts

21 Comments

Dan

about 6 months ago

This was a great help! Thanks for putting this together.

Varun Jain

about 6 months ago

One observation I had was that validation step would have an error "modprobe: ERROR: could not insert 'nvidia_361_uvm': Unknown symbol in module, or unknown parameter" unless a reboot was done prior to running the step. After rebooting, the issue disappeared. My environment was Ubuntu 14.04 on a P2 EC2 instance

wfb

about 5 months ago

Thank you so much for sharing this! Two small comments: First, I got an "error: invalid command 'bdist_wheel'" at one point; this was fixed by running "sudo pip install wheel" Second, apparently TensorFlow and CUDA take up quite a lot of space. I had a small root partition but a large mounted partition, but unfortunately the installations go into the root volume by default. I avoided that by symlinking several directories to the other, more spacious volume. Specifically, assuming you've mounted the volume to /lotsa_space/ then I did something like: mkdir /lotsa_space/cuda-repo-8-0-local ; sudo ln -s /lotsa_space/cuda-repo-8-0-local /var/cuda-repo-8-0-local cd ~ mkdir /lotsa_space/.cache ; ln -s .cache /lotsa_space/.cache mkdir /lotsa_space/git ; ln -s /lotsa_space/git git (and then I did cd ~/git before git cloning tensorflow.) All told, that removed about 4 GB from my root volume; for me, that made a difference.

Sundeep

about 5 months ago

I think it works! Thanks so much. Did some minor changes for python 3.5. To test, I ran: python3 cifar10_multi_gpu_train.py --num_gpus=1 I am getting 0.160 secs/batch. Is this about what others are getting. It is about 10x faster than the t1-micro instance I was using.

Tomi Maila

about 5 months ago

Wfb, I used a 16 GiB root volume that was sufficient for the installation. 8 GiB might be tight or insufficient. Varun, I didn't see the issue loading the kernel module requiring a reboot. I'm trying to think if I rebooted after installing CUDA, I'm not sure. Sundeep, with Python 2.7 using P2.xlarge I get about 1.60 - 1.80 secs/batch for the cifar10_multi_gpu_train.py example.

Sundeep

about 5 months ago

Tomi, If you are only getting 1.6 to 1.8 secs/batch, it sounds like the GPU was not running. Hope you can figure it out. no point of spending 90cents / hour if you can't use the GPU. Best of luck!

Kevin

about 5 months ago

Sundeep, can you point out the changes necessary for Python 3.5?

Tomi Maila

about 5 months ago

Update to the performance test results. I had a decimal point mistake in my previous comment. I get ~0.16 sec/batch on p2.xlarge using TensorFlow 0.11 and 0.47 sec/batch on g2.2xlarge using TensorFlow 0.10. The p2.xlarge is set up according to this blog post and the g2.2xlarge is set up following John Ramey's post.

Anton

about 5 months ago

Any chance you can make a AMI for this?

Tomi Maila

about 5 months ago

Anton, I didn't make an AMI because NVIDIA doesn't allow redistribution of cuDNN so I felt it is kind of meaningless.

Jeremy

about 4 months ago

Hi, Amazon EC2 support just told me they don't support an instance limit increase for my usage (a request to try Tensorflow on P2.xlarge) Anyone else had this response? I'm using the Europe West / Ireland availability zone btw. Many thanks Jeremy

Zach Moshe

about 4 months ago

Thanks so much! Can confirm it worked for me on Ubuntu 16.04 (ami-0d77397e) on p2.xlarge machine.

Shi

about 4 months ago

Thank you SO much for this tutorial. It worked very well. The part that shows us how to answer the ```TF_UNOFFICIAL_SETTING=1 ./configure``` settings is especially useful to me!

Marco Trombetti

about 4 months ago

Thanks! Very well done, I completed all in about an hour (Tensorflow compilation was bottleneck). Before finding this I spent days trying.

Nishant Arora

about 4 months ago

The problem with your config is this: http://stackoverflow.com/questions/33651810/the-minimum-required-cuda-capability-is-3-5 The value for compute capability you chose is wrong. you should make it work with 3.0 to use gpu else it just ignores it.

Tomi Maila

about 4 months ago

Nishant, the K80 compute capability is 3.7 which is what I've used in this post. For the older GRID K520 the computer capability is 3.0. I didn't include support for K520 into the build in this article as there really isn't any reason currently to use the slower and less cost effective GPU instances for most people.

Sandeep Samdaria

about 4 months ago

With latest bazel-0.4 release, the below command fails 'bazel build -c opt --config=cuda //tensorflow/tools/pip_package:build_pip_package' You might want to update the article to install bazel-0.3.2 version instead of 0.4. More details over here: https://github.com/tensorflow/tensorflow/issues/5536

Sundeep

about 4 months ago

Kevin: For python 3, I think the only thing I changed was: In the “Installing dependencies and build tools” section I replaced the final command sudo pip install -U pip with: sudo apt install python3-pip sudo pip install -U pip pip3 install jupyter pandas numpy matplotlib pip3 install --upgrade pip That seemed to work for me. I wasn't able to get this to work with Anaconda. If someone figures that out, I would be very interested.

joostb

about 2 weeks ago

Thanks a lot for this guide! a big help

Nabiha Asghar

about 2 weeks ago

Excellent post! Thank you. Just one comment. The bazel version installed by the command you used is 0.4 now. To install Bazel 0.3, run the following commands: wget -O bazel-3.sh https://github.com/bazelbuild/bazel/releases/download/0.3.0/bazel-0.3.0-installer-linux-x86_64.sh chmod 755 bazel-3.sh sudo ./bazel-3.sh bazel version

Nabiha Asghar

about 2 weeks ago

Excellent post! Thank you. Just one comment. The bazel version installed by the command you used is 0.4 now. To install Bazel 0.3, run the following commands: wget -O bazel-3.sh https://github.com/bazelbuild/bazel/releases/download/0.3.0/bazel-0.3.0-installer-linux-x86_64.sh chmod 755 bazel-3.sh sudo ./bazel-3.sh bazel version

Leave a Comment