A primary purpose and use for transposing matrices is to determine if the transposed matrix is equal to the original. If the two matrices are not equal, then the original matrix is asymmetrical. Otherwise, it is symmetrical. That’s the purpose.

Here’s the code. In SQL, tranposing is pretty straightforward. First, let’s get the data for a 3×3 matrix that’s stored in a table in a database:

SELECT * 
FROM tableA;

row_number	column_number	value
1	1	0
1	2	1
1	3	0
2	1	1
2	2	0
2	3	1
3	1	1
3	2	1
3	3	0

Next, create a view. The view is the transpose. Notice that it’s a matter of switching the order of the columns in the SELECT used to create the view:

CREATE VIEW tableA_transpose (row_number, column_number, value) AS 
SELECT column_number
	, row_number
	, value 
FROM tableA;

Now, query the view and confirm that it is the transpose:

SELECT *
FROM tableA_transpose
ORDER BY row_number, column_number;

row_number	column_number	value
1	1	0
1	2	1
1	3	1
2	1	1
2	2	0
2	3	1
3	1	0
3	2	1
3	3	0

Looks good. That’s how to do it in SQL. Here’s the R version.

First, I saved the query results from the original query to a .txt file for use in importing it into R. I saved the file with the column header that I’ll remove while reading. This is a simple trick to overcome problems with unreadable characters that the R scan() function might have trouble reading. Making sure to use UTF-8 character encoding consistently is the best way to avoid read problems later on.

Here’s the R code:

> file.name <- "C:/Users/user/workspace/input_to_r_one_column.txt"
> x <- scan(file.name, skip = 1)
Read 9 items
> x
[1] 0 1 0 1 0 1 1 1 0

Typing y at the RStudio console shows that original original matrix was properly created and loaded:

> y <- matrix(x, 3, 3)
> y
     [,1] [,2] [,3]
[1,]    0    1    1
[2,]    1    0    1
[3,]    0    1    0

Wait. There’s a problem. The output file saved from the SQL is in row_number, column_number order. R creates column vectors; the values are loaded in column order so the matrix y is the transpose already. This is one of the features of R that I don’t appreciate. If we want to compare the matrices to determine if the original is symmetric or not, we still need the original matrix.

Transposing matrices in R is simple. To transpose matrix y, use the R t() function:

> z <- t(y)
> z
     [,1] [,2] [,3]
[1,]    0    1    0
[2,]    1    0    1
[3,]    1    1    0

Reviewing the both the original and transposed matrices, the results obtained from both SQL and R corroborate as we expect. Visual inspection in this simple case does show that the original matrix is asymmetrical.

Posted in R.

Is it important for SQL developers to know about relational algebra? If so, why?

Yes, it is important, and here’s why. The key breakthrough with SQL is the idea of data independence, that the data can be manipulated and reasoned about independent of the underlying physical representation of the data. Relational algebra is part of the theoretical foundation that allows this independence.

E.F. Codd describes a model that implements operations from relational algebra together with other theory into a language that can be used to manipulate data. That language ended up being SQL. From Codd’s 1970 paper “A Relational Model of Data for Large Banks of Data”:

Activities of users at terminals and most application programs should remain unaffected when the internal representation of data is changed and even when some aspects of the external representation are changed.

It’s an important paper, having over 1,400 citations, and should be required reading for SQL developers. One section in the paper discusses operations on the data, or what are called primitives. The primitives are selection, projection, the Cartesian product, set union, and set difference. Other operations that are immediately more familiar to SQL developers are the different join operations, the aggregate operations for computations over a domain, and the rename of attribute names using rename.

An interesting note about the SQL language are the names used for implementation of the relational algebra operations selection and projection. The relational algebra selection (σ) is implemented in SQL as a predicate in the where clause of a select query. The attribute listing, or *, in the select statement is a projection (π) in relational algebra. The select is a relational algebra projection of attributes, and a predicate in the where is a selection of rows.

The idea of data independence is key. In Big Data, and in the Hadoop ecosystem with Hive, vendors are working toward implementing SQL on NoSQL database systems. The underlying structure of the data in RDBMS, NoSQL, or a Hadoop implementation, are different, but the language used to manipulate and reason about the data are the same.

Most developers are going to be unfamiliar with Codd’s paper and the theoretical foundation of SQL. The representation of the data as separate from the reasoning and manipulation of the data are possible only because of this theoretical foundation that includes relational algebra.