The R vs. Python language war is dead. This is an observation from Strata + Hadoop World San Jose 2016. There were no discussion among participants about the merits of one over the other. Nor was there any content about which is better in any of the sessions that I attended. In a show of acceptance of using either language for Data Science, a full-day tutorial was held for each of the two languages.
What has instead emerged is acceptance that Python is the more general purpose of the two while now also being well suited for Data Science. And that R is the statistical-domain specific of the two while also being well suited for Data Science.
What’s emerged is that the technical challenges underlying integration of these languages into Big Data are essentially the same. A key post by software engineer Wes McKinney discusses the the commonality. It’s an important post. Read it here.
The language war is dead. A takeaway is that it’s not one or the other but both. Data Scientists will need to know both. Being more fluent in Python is better. Having enough facility in R to get data into and out of the R ecosystem, being able to use and interpret results from statistical tests, and being able to use the visualization libraries, is probably enough.