This post shows an example of an entry in the crontab file to run a script once per quarter. The entry does work as intended. After creating the entry in my crontab in January, I had to wait until April 1st to see if it worked as intended.

Crontab for quarterly execution of cronjob

This example executes my script at 10AM on the first day of the month on the first, fourth, seventh, and eleventh months of the year. This scheduled executes the job on the first business day after the end of a quarter:

################################################################
#  First five fields, followed by the command to be executed
#          minute (0-59),
#          hour (0-23),
#          day of the month (1-31),
#          month of the year (1-12),
#          day of the week (0-6 with 0=Sunday).
######################################################################
* 10 1 1,4,7,11 * /usr/local/anaconda3/bin/python /home/user/my-code-rocks.py

References

I relied on the information at these two posts:

http://stackoverflow.com/questions/21773817/crontab-run-a-cronjob-quarterly
http://alvinalexander.com/linux/unix-linux-crontab-every-minute-hour-day-syntax

Edit: Paula Poundstone’s keynote address is now available here.

Strata + Hadoop World Santa Jose 2016 did not fail attendees. The best presentation was Paula Poundstone‘s stand-up comedy routine that went two minutes over schedule. She lambasted technology and the industry. Brilliant. I will have to post the YouTube video as an update. In the meantime, here’s a commentary by Paula about the problem of flat-screen addiction:

Expo hall gets it right

The Expo Hall was an integral part of the conference experience. The density of vendors could not have been more satisfying. It was a feast of interactivity between attendees and the really smart people who were staffing the booths. As well, it provided a city-like feel. The vendor booths were like apartments crammed in together like along a city street. Kudos to the organizers.

My favorite

What was the killer software I have to have? DataRobot. It automates the task of building models in a data space that’s too large for a human alone to ever fully explore. The business problem for most is the need to monetize data, the need to innovate and create data products. DataRobot can help do this. It’s the first, I won’t say set-and-forget, but rather tune-set-and-forget, Machine Learning, AI bot that’s commercially available. And, amazingly, DataRobot is not too expensive. For an install on a four-core box, it would set me back about $20,000 (more, obviously, for install on a cluster). The DataRobot company will get bought out. Why? It can be set loose on data to run near continuously, with only periodic human interaction, to model the hell out of your data. On a Mac, with the bot running 24 hours a day, seven days a week, it’d crush. It includes a catalog of models that I have never even heard of. It’s easily 100x times faster at modelling data than a human could ever be. Feature rich including the ability to clone models, provide parameter input settings but also using sensible defaults, use of various metrics to evaluate model performance, and also the automatic generation of plots in a data exploration step. In a final stroke of ah-ha, the bot can be configured to take the top models that were returned from a run on a dataset and create an ensemble from these, improving results (probably!) over the one, best model. Models can then be deployed. DataRobot is a no-brainer investment for companies serious about monetizing their data.

Themes

Overall themes and key observations in a more or less time-sequence order, from beginning of the conference at top to end at the bottom:

  1. Streaming
  2. Real-time analytics
  3. Spark
  4. Tay as an example of AI gone awry; an embarrassment to Tay’s parents
  5. Peak BI
  6. In-memory
  7. Streaming
  8. Spark
  9. Kafka
  10. Use of Notebooks to share results and for narration
  11. Visualization of key note speakers were not all that flashy; numerous examples of simple line graphs and simple charts and it was ok.
  12. I actually saw statistics creeping into Data Science…confidence intervals were included on a bar chart!
  13. A greater percentage of attendees were women compared to the 2013 edition of the same conference in 2013
  14. noSQL means noMarketShare (no common API and no way to exchange data means no market share)
  15. Streaming
  16. Docker; it is worth the time to learn it
  17. Spark
  18. Streaming
  19. Speaker demos that were executed in Python were in Python3 (RIP Python2)
  20. No language wars encountered
  21. Nothing about the Julia programming language

Please feel free to leave comments.

The pandas-datareader package is not included as part of the Anaconda distribution. (At least not yet as of the most current distribution on the date of this post.) In line 6 of the strata_pandas.ipynb used in the pyData tutorial, an error occurs. If you want to use the features in this module in a script or Juypter notebook, you need to first install the package. Here’s the conda command to download and install the package:

conda install -c https://conda.anaconda.org/anaconda pandas-datareader

If you execute the command conda list at the Bash prompt before and after the install, you’ll see that the package is not there at first, then is present after the install.

For more info including changes: https://anaconda.org/anaconda/pandas-datareader

I created a Note to self category after seeing this one in person and realizing how big a problem of perceptions or brand image it could create. If you are setting up to make a pitch, get your video hooked and make sure it’s displaying through the projector…then turn down the brightness until you’re up.

Here’s an example of what I mean. The speaker who is standing is handling Q&A for the pitch for his startup. Behind him on the screen is the logo and opening slide for the startup that the next speaker will be pitching on. Why is this a problem? Well, it can vary. In this case, for the speaker who is wrapping up Q&A, he is standing next to not the logo for his startup, but for a startup whose mission it is to be the online marketplace for the wholesale distribution of regulated cannabis!

Set up display, then turn brightness down until it's your turn to pitch.
Set up display, then turn brightness down until it’s your turn to pitch.

Set up your display, then turn brightness down until it’s your turn to pitch. I caught this at the March 21, 2016, pyCon Startup Row Pitch Night in Seattle. I’d never noticed this before. It occurred for each of the transitions in pitch talks that evening.

Final note, ReUP was one of two startups selected that evening to represent Seattle at the upcoming pyCon event in Portland. Congrats!