When designing a small deep learning cluster for the university last year I ran into trouble trying to determine whether the P100 or 1080Ti was more powerful (and if so, how much more powerful). Ultimately, I was unable to come to a conclusion, so, I decided to get both so I could find out for myself. This post describes my experience using these cards on a recent project of mine and is a follow-up to the previous post.
Recently I had to revise and resubmit a manuscript I had written for a medical informatics journal on using a 2 stage deep learning system for detecting lung cancer from CT scans. I was using a 3D U-Net and a 3D resnet, so, this required a lot of compute. I was using 2 mini clusters, one at the university and one I had built at home (see previous post). In total there were 15 1080Tis and a single P100. It wound up taking me 6 weeks to train and optimize the 180 models for the 10 separate 9-fold cross-validations I had to conduct.
I compared performance between the two cards for both the 3D U-Net and 3D resnet. Over the course of running the cases for the revision the 1080Tis consistently outperformed the P100 by roughly 10%. This was not surprising to me and confirms another benchmarking that was published online since I had designed the cluster. I didn’t really analyze these results, but I will try to add plots for comparison in the future.
I was unable to compare the performance of the P100 and the 1080Tis for half-precision (float16) operations. Based on Nvidia’s literature, I do suspect that the P100 would outperform the 1080Tis by about 30%. This is substantial, but far from what one would expect given the differences in price.
Because I only had 1 P100 I was also unable to compare the two GPUs on parallel performance. The 1080Tis are designed only to be used for task parallelism by using each GPU (i.e. training separate models on each). This was effective for my project because I had to train 180 separate models, but is limiting for other tasks involving larger networks (note, I was using a batch size of 2 using single GPUs). The P100s are designed for data parallelism (insomuch as they have much more bandwidth with Nvidia’s NVLink interconnect) as well as task parallelism, so, this should be easier. Of course, we can’t know without testing this, and the results could be surprising as they were in the head-to-head comparison. This is suggested for future work.
Assuming that the P100 is superior for data parallelism as well as for half-precision operations, one may ask whether the P100 is worth the cost. The answer, of course, depends on the type of research you conduct. For most people, I think the answer would be no. For researchers only interested in half-precision operations I still don’t think the assumed advantage of the P100 would be worth the price tag of 6x the 1080Ti. The only situations I envision the P100 being worth the extra cost is if you have datasets that are large enough to merit multiple GPU use, i.e. non-public datasets.
My conclusion is that, without a doubt, 1080Tis perform similarly to P100s for most tasks. Many factors contribute to performance that I didn’t account for, so I’m not confident to say more than that. I was also unable to determine whether this would hold up for data parallelism or for half-precision floats. Regardless, the performance would still be in the same ballpark and my recommendations still the same: unless you’re working with very large datasets typically not publicly available, there’s no need for the features offered by the P100 or any Tesla series card.
Looking to the future, I am curious to see if Nvidia continues to sell consumer grade GPUs that are as fast or faster than the research grade cards. The GTX 1100 series cards are expected toward the end of the summer, and I look forward to testing their performance against the V100.