Atlassian provides a freely available to the public, command line interface (CLI) for at least some of the tools in their suite. The CLI downloads and documentation are located here. The scripts include a parameter for the password. The password is used to log into the server. No one wants to, nor should they ever, hard code the password in a script.

Here’s a technique to avoid having to hardcode the password when using one of the Altassian shell scripts:

This script is for use with CLI for Bitbucket.

Here’s an example session:

Session showing contents of script and execution
Session showing contents of script and execution

This technique is not ideal…you have to type in the password each time you execute the script. Also, if you use the CLI for two or more Atlassian tools, you have to maintain a separate script for each. A next step would be to write a new script and pass it a parameter for the tool you want to access and the password.

unixODBC is a mess. I installed pyodbc last week. My configuration got wonky. I couldn’t use ODBC. I couldn’t connect to Greenplum from Tableau. I was caught in the tar pit and needed help. Apple no longer supports an ODBC manager application. I decided to try one of two ODBC manager applications that I could find. I tried ODBC Manager. Here’s the About screenshot:

ODBC Manager about
‘ODBC Manager’ About information

Verdict?

My verdict on ODBC Manager: Don’t use it.

I don’t think it’s a supported application. I was trying to create user DSNs. I expected the odbc.ini and odbcinst.ini files to get written out to directory /Users/user/Library/ODBC. Instead, the files (at least odbcinst.ini) were written out to /Library/ODBC.

The app has a bug. You can’t delete driver information from within the app. If you trash the app, the artifact of the odbcinst.ini file will remain. To delete the file, first locate it using this command:

sudo find / -name odbc*.ini 2>/dev/null

Remove the file. Then go back into the app. The driver information is gone.

Recommendations

If you don’t have it installed, don’t install it. If you do have it installed, make sure to remove the ODBC configuration info and then trash the app. (I don’t think the app saves data in directories /Users/user/Library/Containers, '/Users/user/Library/Application Support', or /Users/user/Library/Preferences/, but I can’t 100% vouch for this.) I don’t have a recommendation for a GUI for unixODBC management tasks on Mac OS X. The unixODBC project has as an aim a GUI for KDE and GNOME. Let’s hope that Apple re-releases one. In the meantime, use the command line. And, bonne chance!

The R vs. Python language war is dead. This is an observation from Strata + Hadoop World San Jose 2016. There were no discussion among participants about the merits of one over the other. Nor was there any content about which is better in any of the sessions that I attended. In a show of acceptance of using either language for Data Science, a full-day tutorial was held for each of the two languages.

What has instead emerged is acceptance that Python is the more general purpose of the two while now also being well suited for Data Science. And that R is the statistical-domain specific of the two while also being well suited for Data Science.

What’s emerged is that the technical challenges underlying integration of these languages into Big Data are essentially the same. A key post by software engineer Wes McKinney discusses the the commonality. It’s an important post. Read it here.

The language war is dead. A takeaway is that it’s not one or the other but both. Data Scientists will need to know both. Being more fluent in Python is better. Having enough facility in R to get data into and out of the R ecosystem, being able to use and interpret results from statistical tests, and being able to use the visualization libraries, is probably enough.

Incidentally, the search interest in R is stable now for the last three years:

Edit: Paula Poundstone’s keynote address is now available here.

Strata + Hadoop World Santa Jose 2016 did not fail attendees. The best presentation was Paula Poundstone‘s stand-up comedy routine that went two minutes over schedule. She lambasted technology and the industry. Brilliant. I will have to post the YouTube video as an update. In the meantime, here’s a commentary by Paula about the problem of flat-screen addiction:

Expo hall gets it right

The Expo Hall was an integral part of the conference experience. The density of vendors could not have been more satisfying. It was a feast of interactivity between attendees and the really smart people who were staffing the booths. As well, it provided a city-like feel. The vendor booths were like apartments crammed in together like along a city street. Kudos to the organizers.

My favorite

What was the killer software I have to have? DataRobot. It automates the task of building models in a data space that’s too large for a human alone to ever fully explore. The business problem for most is the need to monetize data, the need to innovate and create data products. DataRobot can help do this. It’s the first, I won’t say set-and-forget, but rather tune-set-and-forget, Machine Learning, AI bot that’s commercially available. And, amazingly, DataRobot is not too expensive. For an install on a four-core box, it would set me back about $20,000 (more, obviously, for install on a cluster). The DataRobot company will get bought out. Why? It can be set loose on data to run near continuously, with only periodic human interaction, to model the hell out of your data. On a Mac, with the bot running 24 hours a day, seven days a week, it’d crush. It includes a catalog of models that I have never even heard of. It’s easily 100x times faster at modelling data than a human could ever be. Feature rich including the ability to clone models, provide parameter input settings but also using sensible defaults, use of various metrics to evaluate model performance, and also the automatic generation of plots in a data exploration step. In a final stroke of ah-ha, the bot can be configured to take the top models that were returned from a run on a dataset and create an ensemble from these, improving results (probably!) over the one, best model. Models can then be deployed. DataRobot is a no-brainer investment for companies serious about monetizing their data.

Themes

Overall themes and key observations in a more or less time-sequence order, from beginning of the conference at top to end at the bottom:

  1. Streaming
  2. Real-time analytics
  3. Spark
  4. Tay as an example of AI gone awry; an embarrassment to Tay’s parents
  5. Peak BI
  6. In-memory
  7. Streaming
  8. Spark
  9. Kafka
  10. Use of Notebooks to share results and for narration
  11. Visualization of key note speakers were not all that flashy; numerous examples of simple line graphs and simple charts and it was ok.
  12. I actually saw statistics creeping into Data Science…confidence intervals were included on a bar chart!
  13. A greater percentage of attendees were women compared to the 2013 edition of the same conference in 2013
  14. noSQL means noMarketShare (no common API and no way to exchange data means no market share)
  15. Streaming
  16. Docker; it is worth the time to learn it
  17. Spark
  18. Streaming
  19. Speaker demos that were executed in Python were in Python3 (RIP Python2)
  20. No language wars encountered
  21. Nothing about the Julia programming language

Please feel free to leave comments.