Installing TensorFlow on an AWS EC2 P2 GPU Instance

Amazon just recently released new AWS EC2 P2 instance types featuring K80 GPUs. In this blog post I’ll cover how to configure and install TensorFlow on a Ubuntu 16.04 targeting these new GPU instance types. This article is building upon previous blog post by John Ramey on installing TensorFlow for older AWS GPU instance types.

The following components will be installed:

  • Essentials
  • CUDA 8.0
  • Download cuDNN v5.1 (August 10, 2016), for CUDA 8.0
  • Bazel 0.3.2 (Java 8 is a dependency)
  • TensorFlow 0.11

This tutorial was written for a p2.xlarge instance type, however all AWS P2 instance types should work. You’re going to need 16 GiB of storage on the root partition to follow this guide.

Installing dependencies & build tools

First we need to install the dependencies and basic build tools required by TensorFlow. To do this execute the following commands from the ubuntu command prompt.

sudo apt-get update
sudo apt-get upgrade
sudo apt-get install -y build-essential git python-pip libfreetype6-dev libxft-dev libncurses-dev libopenblas-dev gfortran python-matplotlib libblas-dev liblapack-dev libatlas-base-dev python-dev python-pydot linux-headers-generic linux-image-extra-virtual unzip python-numpy swig python-pandas python-sklearn unzip wget pkg-config zip g++ zlib1g-dev libcurl3-dev
sudo pip install -U pip

Installing CUDA 8

Second we wish to use the latest version of the CUDA library with the instance type. The latest version at the date of writing this post is 8.0. To install CUDA download the package from NVIDIA and install it.

sudo dpkg -i cuda-repo-ubuntu1604-8-0-local_8.0.44-1_amd64-deb
rm cuda-repo-ubuntu1604-8-0-local_8.0.44-1_amd64-deb
sudo apt-get update
sudo apt-get install -y cuda

Installing cuDNN

Third we want to download and install latest version of cuDNN. Downloading cuDNN requires logging into NVIDIA developer site, so we can’t use wget to fetch the files. Download the following files from NVIDIA and upload them to your AWS instance.

cuDNN v5.1 Runtime Library for Ubuntu14.04 (Deb)
cuDNN v5.1 Developer Library for Ubuntu14.04 (Deb)

After having uploaded the files to your server, install them using the following commands.

sudo dpkg -i libcudnn5_5.1.5-1+cuda8.0_amd64.deb
sudo dpkg -i libcudnn5-dev_5.1.5-1+cuda8.0_amd64.deb

Configure the Environment

Finally we need to configure the environment to work with CUDA and cuDNN. Add to the following lines to your ~/.profile file.

export CUDA_HOME=/usr/local/cuda
export CUDA_ROOT=/usr/local/cuda
export PATH=$PATH:$CUDA_ROOT/bin

Installing Bazel

To build TensorFlow we need to install Bazel. Bazel itself depends on Java 8, so we will begin by installing Java 8.

sudo add-apt-repository -y ppa:webupd8team/java
sudo apt-get update
# Hack to silently agree license agreement
echo debconf shared/accepted-oracle-license-v1-1 select true | sudo debconf-set-selections
echo debconf shared/accepted-oracle-license-v1-1 seen true | sudo debconf-set-selections
sudo apt-get install -y oracle-java8-installer

After you have completed installing Java, you can move forward installing Bazel.

echo "deb [arch=amd64] stable jdk1.8" | sudo tee /etc/apt/sources.list.d/bazel.list
curl | sudo apt-key add -
sudo apt-get update && sudo apt-get install bazel
sudo apt-get upgrade bazel

 Note: You will need a Bazel version 0.3.2. The build will fail with Bazel version 0.4. The version installed using the technique below provides Bazel 0.3.2 at the time of writing this post. To check bazel version type the following command:

bazel version

Building and Installing TensorFlow

Now all our toolchain for building TensorFlow is ready. We want to build TensorFlow from the sources to be able to use the latest CUDA and cuDNN libraries. The first step is to clone the sources from GitHub.

git clone --recurse-submodules

This will download the latest development copy of TensorFlow. If you want a specific release, you can switch to a specific branch.

Next we need to configure TensorFlow build. To do it execute the following two lines of code

cd tensorflow

Answer the default for all other questions except the following

Do you wish to build TensorFlow with GPU support? [y/N] y
Please specify the Cuda SDK version you want to use, e.g. 7.0. [Leave empty to use system default]: 8.0

Please specify the Cudnn version you want to use. [Leave empty to use system default]: 5.1.5

Please specify the location where cuDNN 5.1.5 library is installed. Refer to for more details. [Default is /usr/local/cuda]: /usr/lib/x86_64-linux-gnu

Please specify a list of comma-separated Cuda compute capabilities you want to build with.
You can find the compute capability of your device at:
Please note that each additional compute capability significantly increases your build time and binary size.
[Default is: "3.5,5.2"]: 3.7

Now we have configured the TensorFlow build and can move forward building it. This will take quite a long time.

bazel build -c opt --config=cuda //tensorflow/cc:tutorials_example_trainer
bazel build -c opt --config=cuda //tensorflow/tools/pip_package:build_pip_package
bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg

Finally we can install TensorFlow. Note that the package filename below depends on the version of TensorFlow you checked out from the Git repository.

sudo pip install --upgrade /tmp/tensorflow_pkg/tensorflow-0.11.0rc0-py2-none-any.whl

Validating the Installation

You should now be completed with the installation process. To validate your install try running the mnist convolutional neural network example shipping with TensorFlow.

python ~/tensorflow/tensorflow/models/image/mnist/

If TensorFlow got installed correctly you should see something like this:

I tensorflow/stream_executor/] successfully opened CUDA library locally
I tensorflow/stream_executor/] successfully opened CUDA library locally
I tensorflow/stream_executor/] successfully opened CUDA library locally
I tensorflow/stream_executor/] successfully opened CUDA library locally
I tensorflow/stream_executor/] successfully opened CUDA library locally
Extracting data/train-images-idx3-ubyte.gz
Extracting data/train-labels-idx1-ubyte.gz
Extracting data/t10k-images-idx3-ubyte.gz
Extracting data/t10k-labels-idx1-ubyte.gz
I tensorflow/stream_executor/cuda/] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
I tensorflow/core/common_runtime/gpu/] Found device 0 with properties: 
name: Tesla K80
major: 3 minor: 7 memoryClockRate (GHz) 0.8235
pciBusID 0000:00:1e.0
Total memory: 11.17GiB
Free memory: 11.11GiB
I tensorflow/core/common_runtime/gpu/] DMA: 0 
I tensorflow/core/common_runtime/gpu/] 0:   Y 
I tensorflow/core/common_runtime/gpu/] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Tesla K80, pci bus id: 0000:00:1e.0)

Congratulation. You now should have TensorFlow set up for the new P2 instance up. Using p2.xlarge instance TensorFlow should be about 4x faster than with the older g2.2xlarge instance.

Related Posts



about 2 years ago

This was a great help! Thanks for putting this together.

Varun Jain

about 2 years ago

One observation I had was that validation step would have an error "modprobe: ERROR: could not insert 'nvidia_361_uvm': Unknown symbol in module, or unknown parameter" unless a reboot was done prior to running the step. After rebooting, the issue disappeared. My environment was Ubuntu 14.04 on a P2 EC2 instance


about 2 years ago

Thank you so much for sharing this! Two small comments: First, I got an "error: invalid command 'bdist_wheel'" at one point; this was fixed by running "sudo pip install wheel" Second, apparently TensorFlow and CUDA take up quite a lot of space. I had a small root partition but a large mounted partition, but unfortunately the installations go into the root volume by default. I avoided that by symlinking several directories to the other, more spacious volume. Specifically, assuming you've mounted the volume to /lotsa_space/ then I did something like: mkdir /lotsa_space/cuda-repo-8-0-local ; sudo ln -s /lotsa_space/cuda-repo-8-0-local /var/cuda-repo-8-0-local cd ~ mkdir /lotsa_space/.cache ; ln -s .cache /lotsa_space/.cache mkdir /lotsa_space/git ; ln -s /lotsa_space/git git (and then I did cd ~/git before git cloning tensorflow.) All told, that removed about 4 GB from my root volume; for me, that made a difference.


about 2 years ago

I think it works! Thanks so much. Did some minor changes for python 3.5. To test, I ran: python3 --num_gpus=1 I am getting 0.160 secs/batch. Is this about what others are getting. It is about 10x faster than the t1-micro instance I was using.

Tomi Maila

about 2 years ago

Wfb, I used a 16 GiB root volume that was sufficient for the installation. 8 GiB might be tight or insufficient. Varun, I didn't see the issue loading the kernel module requiring a reboot. I'm trying to think if I rebooted after installing CUDA, I'm not sure. Sundeep, with Python 2.7 using P2.xlarge I get about 1.60 - 1.80 secs/batch for the example.


about 2 years ago

Tomi, If you are only getting 1.6 to 1.8 secs/batch, it sounds like the GPU was not running. Hope you can figure it out. no point of spending 90cents / hour if you can't use the GPU. Best of luck!


about 2 years ago

Sundeep, can you point out the changes necessary for Python 3.5?

Tomi Maila

about 2 years ago

Update to the performance test results. I had a decimal point mistake in my previous comment. I get ~0.16 sec/batch on p2.xlarge using TensorFlow 0.11 and 0.47 sec/batch on g2.2xlarge using TensorFlow 0.10. The p2.xlarge is set up according to this blog post and the g2.2xlarge is set up following John Ramey's post.


about 2 years ago

Any chance you can make a AMI for this?

Tomi Maila

about 2 years ago

Anton, I didn't make an AMI because NVIDIA doesn't allow redistribution of cuDNN so I felt it is kind of meaningless.


about 2 years ago

Hi, Amazon EC2 support just told me they don't support an instance limit increase for my usage (a request to try Tensorflow on P2.xlarge) Anyone else had this response? I'm using the Europe West / Ireland availability zone btw. Many thanks Jeremy

Zach Moshe

about 2 years ago

Thanks so much! Can confirm it worked for me on Ubuntu 16.04 (ami-0d77397e) on p2.xlarge machine.


about 2 years ago

Thank you SO much for this tutorial. It worked very well. The part that shows us how to answer the ```TF_UNOFFICIAL_SETTING=1 ./configure``` settings is especially useful to me!

Marco Trombetti

about 2 years ago

Thanks! Very well done, I completed all in about an hour (Tensorflow compilation was bottleneck). Before finding this I spent days trying.

Nishant Arora

about 2 years ago

The problem with your config is this: The value for compute capability you chose is wrong. you should make it work with 3.0 to use gpu else it just ignores it.

Tomi Maila

about 2 years ago

Nishant, the K80 compute capability is 3.7 which is what I've used in this post. For the older GRID K520 the computer capability is 3.0. I didn't include support for K520 into the build in this article as there really isn't any reason currently to use the slower and less cost effective GPU instances for most people.

Sandeep Samdaria

about 2 years ago

With latest bazel-0.4 release, the below command fails 'bazel build -c opt --config=cuda //tensorflow/tools/pip_package:build_pip_package' You might want to update the article to install bazel-0.3.2 version instead of 0.4. More details over here:


about 2 years ago

Kevin: For python 3, I think the only thing I changed was: In the “Installing dependencies and build tools” section I replaced the final command sudo pip install -U pip with: sudo apt install python3-pip sudo pip install -U pip pip3 install jupyter pandas numpy matplotlib pip3 install --upgrade pip That seemed to work for me. I wasn't able to get this to work with Anaconda. If someone figures that out, I would be very interested.


about 1 year ago

Thanks a lot for this guide! a big help

Nabiha Asghar

about 1 year ago

Excellent post! Thank you. Just one comment. The bazel version installed by the command you used is 0.4 now. To install Bazel 0.3, run the following commands: wget -O chmod 755 sudo ./ bazel version

Nabiha Asghar

about 1 year ago

Excellent post! Thank you. Just one comment. The bazel version installed by the command you used is 0.4 now. To install Bazel 0.3, run the following commands: wget -O chmod 755 sudo ./ bazel version

Jie He

about 1 year ago

Hi, thanks for your guide. I followed your steps in my P2 EC2 instance, but when installing cuda, I got error: ERROR: The kernel header file '/lib/modules/4.4.0-1013-aws/build/include/linux/version.h' does not exist. The most likely reason for this is that the kernel source files in '/lib/modules/4.4.0-1013-aws/build' have not been configured. Should I change or upgrade the kernel to solve this problem? Thanks.


about 1 year ago

I must say it was hard to find your website in search results. You write great articles but you should rank your blog higher in search engines. If you don't know 2017 seo techniues search on youtube: how to rank a website Marcel's way

Faruk Mustafic

about 1 year ago

thanks a ton !!!

Leave a Comment