Affordable GPUs for Deep Learning Training – rossgritz.com

THE BOTTOM LINE: For personal use do not use AWS, Google Cloud or Azure. It is best to either build your own machine or to rent GPU instances using the 1080 Ti or 2080 Ti from alternate cloud providers.

I’ve written extensively on this blog about building cheap deep learning machines. Initially, last year, I built a $600, basic deep learning machine for practice. Then, earlier this year, I built a small cluster with 8 Nvidia GTX 1080 Tis in order to finish a study (see previous post) which has since been published by the journal JAMIA on 3d deep learning for cancer detection. My extensive documentation of these builds was intended to offer a roadmap for other students trying to do deep learning research on a budget. More specifically, the intention was to present the argument for building a deep learning machine over using AWS. Jeff Chen has recently done an even better job of presenting this case.

However, the purpose of this post is to present an alternative for cheap deep learning that does not require AWS or building your own machine. The reason for the difference in cost between AWS and building your own machine is simple: AWS uses more expensive GPUs. Technically, the GPUs used by AWS are meant for scientific computing while the cards used to build your own machine are meant for consumers. Practically, the scientific cards don’t have a video port while the consumer cards have multiple video ports.

Nvidia’s scientific cards are sold as part of the Tesla product line. The current generation is the V100 and the previous generation is the P100. The equivalent consumer cards are the GTX 2080 Ti and the GTX 1080 Ti, respectively. The performance of these cards is nearly the same, plus or minus 10%. However, the price of the scientific cards is about 10 times higher. By building our own machine we are able to get a steep discount. Now there are some benefits of the scientific cards, primarily in scalable parallelization. These benefits are useful on very large models and for very large batch sizes. If your research involves these elements, then you’re probably best off using AWS or a similar cloud provider. If not, then you’re probably best off building your own machine, right? No, not necessarily.

Recently I’ve become aware of at least one online cloud provider that offers instances of a variety of Nvidia’s consumer GPUs: Vectordash. You can see on that the 1080 Ti is only $0.64 /hr. Leader GPU offers weekly rates for 2 cards that approach $0.74 /hr. For my research project I built my cluster for $11,000* and I ran it for 6 weeks. GPU instance pricing has changed since the blog post about building that cluster, so below I compare the system to equivalent alternatives using current pricing.

Had I used Google Cloud, which offers the cheapest P100 instances at present, I would have spent approximately $11,773.44. Because I built my own cluster I saved some cash and I still had the hardware when the project was finished. However, had I used Leader GPU, I would have spent roughly $3600.

After the research project completed, I kept the cluster in case I had to run anymore cases. Then I kept it for some small jobs until just recently, when I sold it on eBay recouping $5,000 after fees. Thus, the research project ultimately cost me $6,000. Had I used Leader GPU I could have saved $2,400. Without the option to rent 1080 Ti instances, it was definitely cheaper to build my own cluster. However, renting 1080 Ti instances would have saved me a chunk of cash. If I had to do it again**, I definitely would have rented 1080 Ti instances.

*The cluster cost $11,000 to build in March, but due to diminished demand for GPUs it would cost roughly $1,600 less to build now.

**Even considering today’s cheaper GPUs, I would still save $800 and a substantial amount of my time necessary for building, configuring and maintaining the cluster.

Leave a Reply Cancel reply