How PRISM Works, How the FBI and NSA Will Organize Your Private Data

Ever wonder how the NSA organizes all that data of yours it got through PRISM? Wonder no longer. Tom Ewing scores the story. Here’s how PRISM works. —  In the now-global debate over the ethics and legalities of the Obama-sanctioned PRISM project that collects every possible form of data and sifts through it using advanced data mining technologies, there are some intriguing technical issues. How do the intelligence agencies actually organize your data, those pics, videos and Skypes and emails they get off Google, Facebook, Microsoft and other services so that they can be mined?

In the recent furor over Edward Snowden’s revelation that the FBI and NSA are collecting the information — email, phone data, location data, Internet histories, videos, photos, you name it — of millions upon millions of individuals in the United States and worldwide, everyone is overlooking one question. How do they analyze all of this stuff?

The big data tech wave of the last few years certainly holds part of the answer, but another big part of the story almost certainly lies in a technology that shares a name with an abstruse branch of philosophy — Ontology.

In philosophy, it is the study of what things can exist, how to classify those things and how they relate to one other. Computer science has appropriated the term to mean a formal description of what can exist in a particular view of the world and how to classify data according to that view.

For example, the world seen through the eyes of an address book application might include people, names, addresses, telephone numbers and email addresses, along with the structural information that people have one name, people may have many telephone numbers and so forth. That is the address book app’s ontology. If it meets another application that shares the same ontology, say an email client, it can share data with that application because the ontology has annotated the data with meaning – it has made the semantics of the data explicit.

Special notations have been developed for encoding these “data ontologies,” and because they obey certain mathematical properties they enable machines to make logical deductions about the data they describe, says Leo Zancani of Ontology Systems in London. This turns out to have two very desirable effects: it makes it much, much easier to combine data coming from many different sources, and it makes it possible to identify non-obvious implications of that data through machine reasoning.

Those effects are not surprisingly of great interest to the intelligence community. And, although ontologies and semantics are making their way into the mainstream, many of the tools enterprises use to integrate data, provide unified customer views and migrate applications started out as government-funded research projects. A lot of vendors of such tools derive significant revenues from government customers to this day.

Indeed, the NSA is still an active researcher in the field, covers it in its internship programs, and is widely-cited in the ontology literature as a potential beneficiary of technological advances.

PRISM data about electronic communications seems a perfect candidate for treatment through ontologies and semantics: although it comes from many different sources it is all about one subject (domain of discourse as ontologists like to say) and the value in it lies not in the data itself, but in the inferences that can be made from the data.

It must be analyzed promptly to be useful, and traditional data-integration approaches based around relational databases simply can’t be flexible enough to keep up.

Somewhere, without a doubt, there is a very-complete and rich ontology of electronic communications mechanisms, and the kinds of people that use them — you and me.

Clark Parsia is a company that has made produced systems that might be of interest to intelligence agencies.  Its Pellet system is rumored to be free of a “stop button” because its main customers never need to stop, an unusual feature for a commercial product.  Only an entity feeding it enormous amounts of data would never need a stop, come to think about it.

PRISM needs no stop button if the project is continually collecting the unimaginable volumes of bits and bytes it is capturing from Google (Gmail and Youtube alone would be huge), plus Microsoft, Facebook and other companies it names among its partners.

Another interesting point. Did you hear Cray is announcing this monster? And so the plot, it thickens.

