1080Ti vs P100

When designing a small deep learning cluster for the university last year I ran into trouble trying to determine whether the P100 or 1080Ti was more powerful (and if so, how much more powerful). Ultimately, I was unable to come to a conclusion, so, I decided to get both so I could find out for myself. This post describes my experience using these cards on a recent project of mine and is a follow-up to the previous post.

Recently I had to revise and resubmit a manuscript I had written for a medical informatics journal on using a 2 stage deep learning system for detecting lung cancer from CT scans. I was using a 3D U-Net and a 3D resnet, so, this required a lot of compute. I was using 2 mini clusters, one at the university and one I had built at home (see previous post). In total there were 15 1080Tis and a single P100. It wound up taking me 6 weeks to train and optimize the 180 models for the 10 separate 9-fold cross-validations I had to conduct.

OBSERVATIONS

I compared performance between the two cards for both the 3D U-Net and 3D resnet. Over the course of running the cases for the revision the 1080Tis consistently outperformed the P100 by roughly 10%. This was not surprising to me and confirms another benchmarking that was published online since I had designed the cluster. I didn’t really analyze these results, but I will try to add plots for comparison in the future.

DISCUSSION

I was unable to compare the performance of the P100 and the 1080Tis for half-precision (float16) operations. Based on Nvidia’s literature, I do suspect that the P100 would outperform the 1080Tis by about 30%. This is substantial, but far from what one would expect given the differences in price.

Because I only had 1 P100 I was also unable to compare the two GPUs on parallel performance. The 1080Tis are designed only to be used for task parallelism by using each GPU (i.e. training separate models on each). This was effective for my project because I had to train 180 separate models, but is limiting for other tasks involving larger networks (note, I was using a batch size of 2 using single GPUs). The P100s are designed for data parallelism (insomuch as they have much more bandwidth with Nvidia’s NVLink interconnect) as well as task parallelism, so, this should be easier. Of course, we can’t know without testing this, and the results could be surprising as they were in the head-to-head comparison. This is suggested for future work.

Assuming that the P100 is superior for data parallelism as well as for half-precision operations, one may ask whether the P100 is worth the cost. The answer, of course, depends on the type of research you conduct. For most people, I think the answer would be no. For researchers only interested in half-precision operations I still don’t think the assumed advantage of the P100 would be worth the price tag of 6x the 1080Ti. The only situations I envision the P100 being worth the extra cost is if you have datasets that are large enough to merit multiple GPU use, i.e. non-public datasets.

CONCLUSION

My conclusion is that, without a doubt, 1080Tis perform similarly to P100s for most tasks.  Many factors contribute to performance that I didn’t account for, so I’m not confident to say more than that. I was also unable to determine whether this would hold up for data parallelism or for half-precision floats. Regardless, the performance would still be in the same ballpark and my recommendations still the same: unless you’re working with very large datasets typically not publicly available, there’s no need for the features offered by the P100 or any Tesla series card.

Looking to the future, I am curious to see if Nvidia continues to sell consumer grade GPUs that are as fast or faster than the research grade cards. The GTX 1100 series cards are expected toward the end of the summer, and I look forward to testing their performance against the V100.

 

Equivalent of $15k monthly AWS compute from $11k DIY deep learning cluster

UPDATE: AWS changed its P3 instance and the pricing. The figures quoted here were from March 2018, when the cluster was built. For an updated analysis, and some new insight, please see the latest post on this topic.

I’ve been working on a system for detecting lung cancer from CT scans for 3 years. Last fall I submitted a manuscript to a premier medical informatics journal detailing this research. In the second round of revisions a reviewer questioned the size of my test dataset. I was conducting 9-fold cross-validation on a 10 bin dataset, using the 10th bin as test. In order to appease the reviewer, I choose to conduct the 9-fold cross-validation 10 times so I could use the entire dataset as test data. However, for each cross-validation bin I had to train 2 separate models, one of which contained over 100 million parameters. So, this project was going to be very computationally expensive.

I have 2 workstations (w/ 4 1070s, 3 1080Tis and a P100) at the university that I initially began using for this, but they weren’t going to cut it alone. In order to do this and meet the deadline for the revise-and-resubmit we were going to need a lot more compute. Doing this on AWS or other IaaS services would have been very expensive. Ultimately, I determined the most efficient option was to build my own cluster at home.

The home cluster cost ~$11,000 to build while equivalent compute on AWS is ~$15,000 per month.

The resulting cluster is shown above. It’s not pretty, but it gets the job done. In order to complete the study, I used this cluster for a total of ~1,000 hours. Based on this estimate we can do a rough calculation of the cost for equivalent computational capacity on AWS. The AWS P3 instances are $3.06 per hour. My experience comparing P100s and 1080Tis is the exact same as this benchmark, so I will estimate that each 1080Ti is equivalent to 90% of a P3 instance’s compute. This leads us to approximate each hour of 1080Ti use in my home cluster at roughly $2.75 /hr. Using 8 of these GPUs for 1,000 hours each at $2.75 /hr gives us a total of approximately $22,000. So, I saved roughly $11,000 on the revision alone by building my own cluster. And now I have the equivalent of roughly $15,000 per month of AWS compute in my home.

There certainly are downsides to this alternative. Prominently, one must be proficient at linux scripting in order to run a large number of jobs in parallel using an unmanaged network switch. Then, of course, there is the fact that these clusters are running on minimal hardware*:

  • Intel Core-i5 6th gen 3.0Ghz processor
  • 2TB RAID-10 primary disk
  • 32GB DDR4 RAM

I ran into a trouble trying to load the training data into memory. This was simply not possible when training 3 separate models on each machine. There was even a little performance degradation once I had switched to loading training data from a directory, but it was just something that had to be dealt with.

CONCLUSION

For the purposes of my project, i.e. the revision, the minimalist cluster was more than satisfactory. However, the project was rather limited. The data directories were relatively small (compared to something like ImageNet) and it only consisted of straightforward supervised learning.

This summer I’m working on some DeepRL projects, and may work on some unsupervised tasks, too. I’ll try to update this or make a new post regarding the minimalist cluster’s performance on these other applications.

*All of the hardware was the cheapest possible option with the exception on the power supplies, which were all EVGA bronze or silver certification. All of the GPUs were purchased from eBay due to retailers being out of stock. Had the GPUs been purchased at their list price the entire cluster would have cost less than $9,000.

A note on budget machines for deep learning

Since the post last year detailing a cheap deep learning machine, I’ve have some modified advice regarding memory and GPU selection:

  1. Regarding memory, I would definitely recommend 16GB or 32GB. This can be useful for working with large datasets, and the time/hassle it saves is likely worth the money. (A larger SSD is also useful, but not as much a priority).
  2. Regarding the GPU, I would now recommend the 1060 6GB or the 1070 8GB. The selection of GPU, however, is highly dependent on your specific needs. I’ve been working a lot with medical image segmentation and had to upgrade to a 1070 8GB card in order to train some of the large networks with a batch size of 1.

Based on the first post, and given these considerations, you should be able to build a quality and user friendly deep learning machine for as low as $700.

Deep learning on a budget

Deep Learning on a Budget (i.e. $550)

The GPU power required for deep learning on large datasets does not come cheap, especially on the cloud. Currently, it is $0.90 per hour on AWS for a single Nvidia K80. Unless you’re a prodigy at picking hyperparameters, using this for training deep nets on large datasets can add up very fast. In fact, the cost of one month of training is more than the cost of an entry level DIY deep learning machine.

If you’re like me, and you would like to build a deep learning machine on a budget, it’s not that difficult. I built a new machine for myself over Christmas for less than $600. There are really only seven parts you need to worry about:

  • processor
  • motherboard
  • power supply
  • memory
  • hard drive
  • graphics card
  • chasis

For deep learning applications you will be doing the vast majority of your computation on GPUs. So, forking over a lot of money for a powerful processor is not necessary. The processor we’re going to look at is really fast, the only downside is that it only has two cores. You may want a more powerful processor for other applications, but this post is about building a bare bones deep learning machine. Thus, a Core i3-7100 will work just fine, and at $120 it’s a steal.

The motherboard is an essential component of any CPU build. For deep learning, there really is only one feature of the motherboard that we’re concerned with – a PCIe 3.0×16 slot for our GPU. This criterion is critical because it allows for minimal latency between the GPU and the CPU, but it’s standard on virtually all motherboards with the LGA 1151 chipset that we want for our i3-6500. In this case I’ll just recommend the motherboard I used, the ASUS H110M-A/m.2. At $59 it might be the cheapest LGA 1151 out there (it also takes an NVMe SSD if you want to upgrade).

Moving on, the power supply is absolutely critical. At a minimum, it should support the maximum requirements for all of our components even at only 90% of it’s rated power. For our machine, we won’t need over 350W, so we’ll be safe with a 400W power supply. We’ll go with the Rosewill RD400-2-SB. It isn’t certified bronze, silver, gold, titanium or platinum, but we’re not building a mission critical machine. We’re on a budget, and at $37 this is exactly what we need.

Now on to memory. We don’t want anything fancy here, and, in fact, 8GB is plenty for our entry-level deep learning machine. As a rule of thumb, you should have at least as much CPU RAM as GPU RAM. Having twice as much CPU RAM is ideal. Our GPU is going to have 3GB of RAM, so, since we do care about deep learning we definitely want 8GB. This is even more than double, which leaves 2GB that can always be dedicated to the OS. I will suggest just getting one DIM because there really isn’t a difference in price. We’ll go with the 8GB G.Skill Ripjaws V Series 2400MHz DDR4 SDRAM. This is slightly overclocked which doesn’t really matter for us, but, for $53 we’re still doing pretty well.

Now for the hard drive. This choice comes down to three criteria; size, speed and noise. Size is pretty obvious, and since you’re building a machine to work with large datasets we might need to splurge a little bit here. The last time I checked the ImageNet dataset was over a 1TB, so, I’m going to suggest that we go for a 2TB drive here. That’s enough to get you started and you can always add more storage later. Speed is another concern, but now 7200RPM drives are pretty common and affordable, so we’ll definitely opt for that. Finally comes volume, and this is where we’ll save some money. A lot of cheap drives function very well but make a lot of noise. You pay extra to avoid the noise and that’s just not a luxury we can afford (I just cut on my stereo to drown out the noise). Here we’re going to go with a HGST/Hitachi UltraStar 7200RPM 2TB SATA drive. At $52 this is another steal.

Now on to the GPU. Obviously, this is the most important component of our machine and that’s why it will also the most expensive. That said, the price per flop in Nvidia GPUs has dropped drastically in the past year (and as far as we’re concerned Nvidia is the only game in town). This drop in price is because Nvidia came out with it’s much anticipated line of 10-series cards. These cards use it’s new Maxwell architecture and are far superior to the previous generation. Don’t be deceived by discounted 9-series cards, they are obsolete for our purposes. We’re going to focus on the GTX1060, which I think is sort of Nvidia’s entry-level VR card. There are many choices from various manufacturers, but, under the hood they’re all the same. The one I wound up with is the Gigabyte GeForce GTX1060. It’s $200, so, we’re still doing well. The only downside is that it only has 3GB of RAM. For $40 we could go to 6GB of RAM, but I opted against it. The fact of the matter is that with most datasets you don’t need that much RAM. However, if you have a dataset that does need that much RAM you’ll get a really nasty memory error with the 3GB card. Not to worry though, all you need to do is half the mini batch size and you’ll be up and running again (in extreme cases you might have to half the mini batch size again). This won’t necessarily be ideal, so, if you do plan to be working with high dimensional datasets, like color images, you may want to spend the extra $40. For now we’ll just go with the 3GB card.

Last but not least is the chassis or case. There is certainly no need to spend a lot here unless you’d like the machine to look flashy or store a bunch of drives. If you’re on a budget I hope that’s the least of your worries. I went with a $30 Rosewill Micro ATX Tower. The FBM-05 should work just fine. It has an extra internal bay if you want to add another hard drive later, and it has three external bays.

So, that’s it. Well, you need a mouse, keyboard and monitor, but hopefully you’ve got that already. If not you can look on eBay or as the IT department at your school to see if they can help you out. You also want an operating system and I’d recommend Ubuntu. Free, of course, and a little easier to use with GPUs. I prefer CentOS, but it isn’t always straightforward to getting the X server running with the Nvidia driver.

All of the prices quoted here include shipping and come from a single online retailer, newegg.com. So, it’s pretty simple and straightforward. Just go there, order the parts and put them together. I guess I could have made a post showing you how to do that but there are plenty of videos on youtube that can help.

OK, so, that’s it. A $550 deep learning machine.