Edit: Paula Poundstone’s keynote address is now available here.
Strata + Hadoop World Santa Jose 2016 did not fail attendees. The best presentation was Paula Poundstone‘s stand-up comedy routine that went two minutes over schedule. She lambasted technology and the industry. Brilliant. I will have to post the YouTube video as an update. In the meantime, here’s a commentary by Paula about the problem of flat-screen addiction:
Expo hall gets it right
The Expo Hall was an integral part of the conference experience. The density of vendors could not have been more satisfying. It was a feast of interactivity between attendees and the really smart people who were staffing the booths. As well, it provided a city-like feel. The vendor booths were like apartments crammed in together like along a city street. Kudos to the organizers.
What was the killer software I have to have? DataRobot. It automates the task of building models in a data space that’s too large for a human alone to ever fully explore. The business problem for most is the need to monetize data, the need to innovate and create data products. DataRobot can help do this. It’s the first, I won’t say set-and-forget, but rather tune-set-and-forget, Machine Learning, AI bot that’s commercially available. And, amazingly, DataRobot is not too expensive. For an install on a four-core box, it would set me back about $20,000 (more, obviously, for install on a cluster). The DataRobot company will get bought out. Why? It can be set loose on data to run near continuously, with only periodic human interaction, to model the hell out of your data. On a Mac, with the bot running 24 hours a day, seven days a week, it’d crush. It includes a catalog of models that I have never even heard of. It’s easily 100x times faster at modelling data than a human could ever be. Feature rich including the ability to clone models, provide parameter input settings but also using sensible defaults, use of various metrics to evaluate model performance, and also the automatic generation of plots in a data exploration step. In a final stroke of ah-ha, the bot can be configured to take the top models that were returned from a run on a dataset and create an ensemble from these, improving results (probably!) over the one, best model. Models can then be deployed. DataRobot is a no-brainer investment for companies serious about monetizing their data.
Overall themes and key observations in a more or less time-sequence order, from beginning of the conference at top to end at the bottom:
- Real-time analytics
- Tay as an example of AI gone awry; an embarrassment to Tay’s parents
- Peak BI
- Use of Notebooks to share results and for narration
- Visualization of key note speakers were not all that flashy; numerous examples of simple line graphs and simple charts and it was ok.
- I actually saw statistics creeping into Data Science…confidence intervals were included on a bar chart!
- A greater percentage of attendees were women compared to the 2013 edition of the same conference in 2013
- noSQL means noMarketShare (no common API and no way to exchange data means no market share)
- Docker; it is worth the time to learn it
- Speaker demos that were executed in Python were in Python3 (RIP Python2)
- No language wars encountered
- Nothing about the Julia programming language
Please feel free to leave comments.