Hogan's Alley

Thursday, March 16, 2006

Data Mining and Network Theory

As most of those who are paying attention to the debate over what the MSM gives the short hand appellation of "domestic spying" is actually the use of large scale computing power to try to find patterns in the vast pile of data that is available from the telephone systems, internet and other sources. Data mining is the generic term that has emerged for this effort.

In last Sunday's NY Times Magazine, Patrick Radden Keefe provides the background of this effort in the academic arena. Key quote:

In its simplest form, network theory is about connecting the dots. Stanley Milgram's finding that any two Americans are connected by a mere six intermediaries -— or "degrees of separation" -— is one of the animating ideas behind the science of networks; the Notre Dame physicist Albert-Laszlo Barabasi studied one obvious network -— the Internet -— and found that any two unrelated Web pages are separated by only 19 links. After Sept. 11, Valdis Krebs, a Cleveland consultant who produces social network "maps" for corporate and nonprofit clients, decided to map the hijackers. He started with two of the plotters, Khalid al-Midhar and Nawaf Alhazmi, and, using press accounts, produced a chart of the interconnections -— shared addresses, telephone numbers, even frequent-flier numbers -— within the group. All of the 19 hijackers were tied to one another by just a few links, and a disproportionate number of links converged on the leader, Mohamed Atta. Shortly after posting his map online, Krebs was invited to Washington to brief intelligence contractors.

Keefe presents a balanced analysis of the possibilities and limits of this technique:

The use of such network-based analysis may explain the administration's decision, shortly after 9/11, to circumvent the Foreign Intelligence Surveillance Court. The court grants warrants on a case-by-case basis, authorizing comprehensive surveillance of specific individuals. The N.S.A. program, which enjoys backdoor access to America's major communications switches, appears to do just the opposite: the surveillance is typically much less intrusive than what a FISA warrant would permit, but it involves vast numbers of people.

In some ways, this is much less alarming than old-fashioned wiretapping. A computer that monitors the metadata of your phone calls and e-mail to see if you talk to terrorists will learn less about you than a government agent listening in to the words you speak. The problem is that most of us are connected by two degrees of separation to thousands of people, and by three degrees to hundreds of thousands. This explains reports that the overwhelming number of leads generated by the N.S.A. program have been false positives -— innocent civilians implicated in an ever-expanding associational web.

This has troubling implications for civil liberties. But it also points to a practical obstacle for using link analysis to discover terror networks: information overload. The National Counterterrorism Center's database of suspected terrorists contains 325,000 names; the Congressional Research Service recently found that the N.S.A. is at risk of being drowned in information. Able Danger analysts produced link charts identifying suspected Qaeda figures, but some charts were 20 feet long and covered in small print. If Atta's name was on one of those network maps, it could just as easily illustrate their ineffectiveness as it could their value, because nobody pursued him at the time.


It seems obvious that this technique, especially as it is refined based on experience, has the potential to produce information that could interrupt an al Qaeda network or plot. I'm not sure that I can afford to be absolutist about my civil rights being violated if calls or internet searches of mine go into the pile of data being mined. As a non-terrorist I do not expect to attract more than cursory government attention.

The crucial issue is to assure that no entity of government decides to use these potentially powerful tactics to look into even the most extreme political activity or beliefs, as long as no violent actions are planned. The courts should play an oversight role that requires government proofs of the dangers at play that justify particular search patterns, not individual warrants, which are impossible and largely irrelevant in data mining.

Hopefully, after all the political shouting has died down, a reasonable consensus can emerge in the Congress that would amend the FISA law to this effect.

Oh, I forgot we are already deep into the 2006 election season...never mind.