Computing clusters are going to get larger. And it’s not just because Facebook needs room to store photos drunk people take of each other at wedding parties. Why do we need larger clusters?
An article published in The Atlantic discusses the orders of magnitude faster that today’s cameras are compared to the very first cameras. The difference in speed, as measured by the shutter speed, is 10 billion trillion times faster. In default R, scientific notation, that’s 1e+16. And the difference is going to get larger, our now being on the brink of attosecond photography. An attosecond is the length of time it takes for light to travel three hydrogen atoms.
A computing cluster today stores about 1 billion trillion more bytes than say, a computing device of approximately 70 years ago. Today’s edge case for cluster capacity is in the petabytes (PB), one billion trillion bytes. Exabyte (EB) clusters are next. That’s 1,000 times the cluster of a petabyte (PB).
An interesting fact about this chart is that the y-axis has to be transformed to a logarithmic scale. Otherwise, whatever the largest number is, gigabytes, terabytes, or any other metric, it dwarfs the all the smaller metrics, itself being 1,000 times larger than the previous metric. This fact repeats in this way and continues to the asymptotic.
Why are we going to need larger clusters? One application is the storage and processing of images captured from femto cameras. Watch this TedTalk video form YouTube and start thinking about the next dimension in imaging, femto cameras.
[iframe src="//www.youtube.com/embed/Y_9vd4HWlVA?&w=640&h=360&rel=0" allowfullscreen>]
It’s inevitable that EB scale clusters will emerge. A use case? Dr. Raskar suggests that smartphone will have a femto camera that can be used to tell the freshness of a fruit at the grocery store. It’s not the camera that will tell the freshness but the processing of the images the camera takes. It’s a classification problem. We are going to build classification systems that determine if a tomato at the food store is fresh or not. The series of images captured from the femto camera will be run through a fruit freshness recognizer app. The algorithm will compare the piece you imaged to known pieces. Instead of a classification system for handwritten, digit recognition that uses a 28×28 pixel training images, imagine a multi-frame, 200×200 pixel image with 200 pieces of fruit in a training set. Half of these, 100, will be fresh fruit that are ripe and should be purchased. The other 100 pieces will be not ripe, maybe some are under ripe and others are over ripe.
Now, imagine all the different kinds of fruits and vegetables in the grocery stor, and everything else that a photon pulse can be shot through and used to build a classificaiton system for. This is going to be a large amount of data and it’s going to require computer clusters that are larger than today’s clusters. Emerging technologies such as attosecond photography will themselves produce large quantities of data, but it’s the applications that we invent that use these technologies that will create even more data. Exabyte computing clusters seem inevitable.
For the curious, here’s the R code used to produce the above chart:
require(ggplot2) require(scales) # for y-scale transform label <- c("kB", "MB", "GB", "TB", "PB", "EB", "ZB", "YB") value <- c(3, 6, 9, 12, 15, 18, 21, 24) number <- c(1000, 1000^2, 1000^3, 1000^4, 1000^5, 1000^6, 1000^7, 1000^8) numberLabel <- c( "1 000 B", "1 000 000 B", "1 000 000 000 B", "1 000 000 000 000 B", "1 000 000 000 000 000 B", "1 000 000 000 000 000 000 B", "1 000 000 000 000 000 000 000 B", "1 000 000 000 000 000 000 000 000 B") df <- data.frame(label = label, number = number) p <- ggplot(df, aes(x = value, y = number)) color <- c("#1b9e77", "#1b9e77", "#1b9e77", "#1b9e77", "#2a2a2a", "#1b9e77", "#1b9e77", "#1b9e77") p + geom_point(colour = color, size = 4, shape = 18) + labs(x = "Metric", title = "Current edge case for size of\ncomputing cluster is petabyte scale") + scale_x_continuous(breaks=value, labels = label) + scale_y_continuous(trans = log10_trans(), breaks = number, labels = numberLabel) + coord_trans(y = "log10") + annotate("text", x = 18.3, # will vary depending on aspect ratio y = 1000^5 * 0.1, label = "today's edge case", size = 4.5, color = "#2a2a2a", fontface = 3) + theme(plot.title = element_text(size = rel(1.3), face = 'bold'), axis.title.y = element_blank(), axis.title.x = element_text(face = 'bold', size = 12), panel.grid.minor = element_blank(), legend.position = "none")