• Quick note - the problem with Youtube videos not embedding on the forum appears to have been fixed, thanks to ZiprHead. If you do still see problems let me know.

The extent of government data mining of communications

Skeptic Ginger

Nasty Woman
Joined
Feb 14, 2005
Messages
96,955
This is a discussion split from the two threads currently discussing Bush's actions in the White House which ignored Constitutional separation of powers. I made the rash claim Bush had been monitoring all domestic phone conversations to which many people, rightly so, found that to be one of those extraordinary claims and probably one belonging in the CT threads.

So let me qualify that, the Bush administration was data mining all communications. That presumes all calls are basically screened, those flagged are further screened, and so on down to those which are actually wiretapped and transcribed by a person.


But, two things turn that from just data mining who calls whom, to data mining actual conversations.

It will take considerable time here to support what I've said. So if you have a short attention span for this kind of thing, see if you can get the gist from the bolded sections and the last 3 paragraphs.(All the bolding in the following is mine.)

One is access to the communications. A law passed back in 1994.
The new technology at the root of the NSA wiretap scandal; Jon Stokes, Directing Editor, ars technica
The domestic electronic surveillance ball really got rolling under the Clinton administration, with the 1994 Communications Assistance for Law Enforcement Act (CALEA). CALEA mandated that the telcos aid wiretapping by installing remote wiretap ports onto their digital switches so that the switch traffic would be available for snooping by law enforcement. After CALEA passed, the FBI no longer had to go on-site with wiretapping equipment in order to tap a line—they could monitor and digitally process voice communications from the comfort of the home office. (The FCC has recently ruled that CALEA covers VOIP services, which means that providers like Vonage will have to find a way to comply.)
It's no secret, the Bush admin obtained access to phone systems from the major carriers. At first calls made by Qwest customers were not included, but I believe they are now.

NSA has massive database of Americans' phone calls
According to sources familiar with the events, Qwest's CEO at the time, Joe Nacchio, was deeply troubled by the NSA's assertion that Qwest didn't need a court order — or approval under FISA — to proceed. Adding to the tension, Qwest was unclear about who, exactly, would have access to its customers' information and how that information might be used....

The NSA told Qwest that other government agencies, including the FBI, CIA and DEA, also might have access to the database, the sources said. As a matter of practice, the NSA regularly shares its information — known as "product" in intelligence circles — with other intelligence groups. Even so, Qwest's lawyers were troubled by the expansiveness of the NSA request, the sources said.

The NSA, which needed Qwest's participation to completely cover the country, pushed back hard.

Trying to put pressure on Qwest, NSA representatives pointedly told Qwest that it was the lone holdout among the big telecommunications companies. It also tried appealing to Qwest's patriotic side: In one meeting, an NSA representative suggested that Qwest's refusal to contribute to the database could compromise national security, one person recalled.

In addition, the agency suggested that Qwest's foot-dragging might affect its ability to get future classified work with the government. Like other big telecommunications companies, Qwest already had classified contracts and hoped to get more.

Unable to get comfortable with what NSA was proposing, Qwest's lawyers asked NSA to take its proposal to the FISA court. According to the sources, the agency refused.

The NSA's explanation did little to satisfy Qwest's lawyers. "They told (Qwest) they didn't want to do that because FISA might not agree with them," one person recalled. For similar reasons, this person said, NSA rejected Qwest's suggestion of getting a letter of authorization from the U.S. attorney general's office. A second person confirmed this version of events.

Additional access to Internet communications was installed by Bush.
Room 641A
Room 641A is an alleged intercept facility operated by AT&T for the U.S. National Security Agency, beginning in 2003. Room 641A is located in the SBC Communications building at 611 Folsom Street, San Francisco, three floors of which were occupied by AT&T before SBC purchased AT&T. The room was referred to in internal AT&T documents as the SG3 [Study Group 3] Secure Room. It is fed by fiber optic lines from beam splitters installed in fiber optic trunks carrying Internet backbone traffic and, therefore, presumably has access to all Internet traffic that passes through the building.

The room measures about 24 by 48 feet (7.3 m × 15 m) and contains several racks of equipment, including a Narus STA 6400, a device designed to intercept and analyze Internet communications at very high speeds.[1]

The existence of the room was revealed by a former AT&T technician, Mark Klein, and is the subject of a 2006 class action lawsuit by the Electronic Frontier Foundation against AT&T.[2] Klein claims he was told that similar black rooms are operated at other facilities around the country.

Room 641A and the controversies surrounding it were subjects of an episode of "Frontline", the current affairs documentary program on PBS. It was originally broadcast on May 15, 2007. It was also featured on PBS's NOW on March 14, 2008.



Two, the needed technology exists to store and data mine vast quantities of information including conversations.
The new technology at the root of the NSA wiretap scandal; Jon Stokes, Directing Editor, ars technica
CALEA opened up a huge can of worms, and PGP creator Phil Zimmermann sounded the alarm back in 1999 about where the program was headed: ...

...The only plausible way of processing that amount of traffic is a massive Orwellian application of automated voice recognition technology to sift through it all, searching for interesting keywords or searching for a particular speaker's voice. If the government doesn't find the target in the first 1 percent sample, the wiretaps can be shifted over to a different 1 percent until the target is found, or until everyone's phone line has been checked for subversive traffic. The FBI said they need this capacity to plan for the future. This plan sparked such outrage that it was defeated in Congress. But the mere fact that the FBI even asked for these broad powers is revealing of their agenda. ...

...It is entirely possible that the NSA technology at issue here is some kind of high-volume, automated voice recognition and pattern matching system. Now, I don't at all believe that all international calls are or could be monitored with such a system, or anything like that. Rather, the NSA could very easily narrow down the amount of phone traffic that they'd have to a relatively small fraction of international calls with some smart filtering. First, they'd only monitor calls where one end of the connection is in a country of interest. Then, they'd only need the ability to do a roving random sample of a few seconds from each call in that already greatly narrowed pool of calls. As Zimmermann describes above, you monitor a few seconds of some fraction of the calls looking for "hits," and then you move on to another fraction. If a particular call generates a hit, then you zero in on it for further real-time analysis and possible human interception. All the calls can be recorded, cached, and further examined later for items that may have been overlooked in the real-time analysis. ... And yes, this kind of real-time voice recognition, crude semantic parsing and pattern matching is doable with today's technology, especially when you have a budget like the NSA.

Schneier on Security - A blog covering security and security technology; September 17, 2008; NSA Snooping on Cell Phone Calls - From CNet:
In a Web demo (PDF) (mirrored here) to potential customers back in May, ThorpeGlen's vice president of global sales showed off the company's tools by mining a dataset of a single week's worth of call data from 50 million users in Indonesia, which it has crunched in order to try and discover small anti-social groups that only call each other.

Hepting v. AT&T
A former AT&T engineer, Mark Klein, attested that a supercomputer built by Narus was installed for the purpose, and that similar systems were also installed in at least Seattle, San Jose, Los Angeles and San Diego. Wired News states Klein said he came forward "because he does not believe that the Bush administration is being truthful about the extent of its extrajudicial monitoring of Americans' communications":

"Despite what we are hearing, and considering the public track record of this administration, I simply do not believe their claims that the NSA's spying program is really limited to foreign communications or is otherwise consistent with the NSA's charter or with FISA [...] And unlike the controversy over targeted wiretaps of individuals' phone calls, this potential spying appears to be applied wholesale to all sorts of Internet communications of countless citizens."

Narus product, NarusInsight
Carrier-class scalability and reliability with over 2.7 petabytes of IP traffic processed at a single customer, driving 100 billion packet records per day (greater than 7 terabytes) to upstream security applications


Speech recognition data mining:
Introduction to the Special Issue on Data Mining of Speech, Audio, and Dialog; 09/05
With the advent of inexpensive storage space and faster processing over the past decade or so, data mining research has started to penetrate new grounds in areas of speech and audio processing as well as spoken language dialog. It has been fueled by the influx of audio data that are becoming more widely available from a variety of multimedia sources including webcasts, conversations, music, meetings, voice messages, lectures, television, and radio. Algorithmic advances in automatic speech recognition have also been a major, enabling technology behind the growth in data mining. Current state-of-the-art, large-vocabulary, continuous speech recognizers are now trained on a record amount of data—several hundreds of millions of words and thousands of hours of speech. Pioneering research in robust speech processing, large-scale discriminative training, finite state automata, and statistical hidden Markov modeling have resulted in real-time recognizers that are able to transcribe spontaneous speech with a word accuracy exceeding 85%. With this level of accuracy, the technology is now highly attractive for a variety of speech mining applications.

TECHNOLOGY NEWS - Let's Hear It for Audio Mining
Users want to make the most of this material by searching and indexing the digitized audio content. In the past, companies had to create and manually analyze written transcripts of audio content because using computers to recognize, interpret, and analyze digitized speech was difficult. However, the development of faster microprocessors, larger storage capacities, and better speech-recognition algorithms has made audio mining easier.


Then there is storing the vast amount of data.
Data Warehouses
The drop in price of data storage has given companies willing to make the investment a tremendous resource: Data about their customers and potential customers stored in "Data Warehouses." Data warehouses are becoming part of the technology. ...



And finally what about evidence this is what Bush was doing? I posted some of this already in the other threads. There is evidence news reporters were targeted looking to stop whistle blowers, not national security leakers. I don't want to debate that here because people cannot get past the belief or disbelief Bush purposefully lied to gain public support for his planned war with Iraq.

But there is more that can be discussed.
Secret Spy Court Repeatedly Questions FBI Wiretap Network
Among other things, the declassified documents reveal that lawyers in the FBI's Office of General Counsel and the Justice Department's Office of Intelligence Policy Review queried FBI technology officials in late July 2006 about cellphone tracking. The attorneys asked whether the FBI was obtaining and storing real-time cellphone-location data from carriers under a "pen register" court order that's normally limited to records of who a person called or was called by.

...According to the documents, which the EFF obtained in a Freedom of Information Act lawsuit, an FBI general counsel lawyer asked on July 21, 2006: "Can we at the collection end tell the equipment NOT to receive the cell site location information?"

The lawyer added a note of concern that phone companies might be sending along cell-site data even when they aren't asked for it. "Do we get it all or can we, when required, tell the equipment to not collect the cell-site location data?," the lawyer asked.

Separately, the secret court questioned if the FBI was using pen register orders to collect digits dialed after a call is made, potentially including voicemail passwords and account numbers entered into bank-by-phone applications.

Using a pen register order, the FBI can force a phone company to turn over records of who a person calls, or is called by, simply by asserting the information would be relevant to an investigation. But existing case law holds that those so-called "post-cut-through dialed digits" count as the content of a communication, and thus to collect that information, the FBI would need to get a full-blown wiretapping warrant based on probable cause....

...The documents (.pdf) show that the majority of FBI offices surveyed internally were collecting that information without full-blown wiretap orders, especially in classified investigations. The documents also indicate that the information was being uploaded to the FBI's central repository for wiretap recordings and phone records, where analysts can data-mine the records for decades.

EFF's Bankston says it's clear that FBI offices had configured their digit-recording software, DCS 3000, to collect more than the law allows.

NSA has massive database of Americans' phone calls; Updated 5/11/2006
Last month, U.S. Attorney General Alberto Gonzales alluded to that possibility. Appearing at a House Judiciary Committee hearing, Gonzales was asked whether he thought the White House has the legal authority to monitor domestic traffic without a warrant. Gonzales' reply: "I wouldn't rule it out." His comment marked the first time a Bush appointee publicly asserted that the White House might have that authority.



And that little matter of only tapping calls between foreign and domestic sources...
Blog - Greg Downey, associate professor [tenured, summer 2006] at the University of Wisconsin-Madison; December 24, 2005; The geography of "wiretapping" labor
Further, when mining this data at the level of the "telecom switch," even within one particular corporation's control at a time, it is not always apparent whether the geography of the telephone call or Internet transaction is intended to involve the US, or whether it simply reaches US-based equipment as a consequence of decisions made about the topological efficiency and market cost of global corporate communication networks: "The switches are some of the main arteries for moving voice and some Internet traffic into and out of the United States, and, with the globalization of the telecommunications industry in recent years, many international-to-international calls are also routed through such American switches."...

..."One outside expert on communications privacy who previously worked at the N.S.A. said that to exploit its technological capabilities, the American government had in the last few years been quietly encouraging the telecommunications industry to increase the amount of international traffic that is routed through American-based switches." In other words, actions taken unilaterally by the US government in pursuit of its own definition of a "War on Terror" are helping to change the geography of information flow, information surveillance, and information profit on a global scale.



Is this proof? No. Is it absolutely certain no one could make a legal case for this data mining? No, a number of NeoCon leaning lawyers and judges exist within the system. It comes down to, do you trust the government not to spy on citizens for political gain? I find that scenario way less probable than the scenario where they would spy. The little stuff is overlooked but history reveals the bigger stuff. Nixon was caught spying for political gain in a big way. And the FBI spied on antiwar protesters in a big way at the behest of the Nixon government. Only a fraction of this spying involved radical groups. Daniel Berrigan's disclosure of the Pentagon Papers was politically damaging more than anything else.

Bush has sent undercover operatives into political activist groups. No terrorism was suspected. He used the DoJ to initiate politically motivated prosecutions. That is not in question, only whether anyone will be prosecuted for it is. And if you believe the evidence, Bush manipulated the press to gain political support. His administration said publicly they wanted to stop White House leaks. These were politically damaging leaks, not leaks that threatened national security. Spying on reporters phone calls to catch the leakers is strongly suspected and has been testified to by one whistle blower.

Obviously there is not yet any reason to fear an Orwellian crack down on political dissent. The importance of discussing these events is to prevent them going any further. The ability to track everything we do and say is technologically inevitable. Vigilance is critical.
 
So data mining isn't "spying", since the data collected is an impersonal and incoherent stream of information, it only becomes spying when the communications have been through multiple screening processes and that the people have already been suspect of being of terror groups?
 
So data mining isn't "spying", since the data collected is an impersonal and incoherent stream of information, it only becomes spying when the communications have been through multiple screening processes and that the people have already been suspect of being of terror groups?

Data mining is the process of pulling inferences out of a mass of data, which the original databases were not designed to furnish. It is, for example, used by companies with large credit card purchase databases to identify those customers who would be particularly amenable to, say, and auto parts sale, or sales campaign offered only to customers who have used their in-store credit card, by way of promoting use of those cards. This is the sense in which data mining was used 15 years ago; I imagine it has become much more highly automated than where it was when I worked in that field. And the databases, perhaps, a lot more interesting.

Data mining, as the name implies, was done after the fact, in a statically held database, but that almost certainly is no longer the exclusive case. It could be used, for example, to identify particular phone lines to provide special attention to, or to be able to sift for unusual transactions which may have more meaning than the surface transaction would describe. It is, in many ways, the epitome of the data "fishing expedition", because you never know what an inference might come up with.

You may not consider it spying for Sears to keep track of what you buy, but perchance they link your purchases in the childrens clothes shop to making inferences about how much your kids weigh, and sell that data to an insurance company, who will make decisions based on that twenty years later - and, of course, can never admit to what individually flimsy, but statistically useful, data they used (and perhaps don't even remember or track). What think you of that? If you think that sort of thing is far-fetched, you need to educate yourself. In the national political arena, calls you made to suspect phone numbers, even in error, in college can follow you to the grave. Take a look at Roma's problem for just such a possibility: http://www.internationalskeptics.com/forums/showthread.php?t=136869
 
Last edited:
How significantly different is this from ECHELON, a program that the NSA reportedly used during the Clinton years?

Echelon reportedly is the code name for an automated global interception and relay system operated by intelligence agencies in five nations, led by the U.S. National Security Agency.

Nations reportedly involved

United States Britain Canada Australia New Zealand

Some reports say the system may intercept as many as 3 billion communications each day, including phone calls, e-mail messages and satellite transmissions. This is how Echelon works, according to the American Civil Liberties Union:

Did the Clinton Administration shred the Constitution? Did the Clinton Administration violate separation of powers?
 
So data mining isn't "spying", since the data collected is an impersonal and incoherent stream of information, it only becomes spying when the communications have been through multiple screening processes and that the people have already been suspect of being of terror groups?
It depends on what happens to the information. If it gets you on the no fly list because you attended anti-war protests, then what would you think?

Unlikely Terrorists On No Fly List
"We've been told by a number of different people that what happened under the tight deadlines was that the CIA and various agencies just took all the names that they had floating around for one reason or another and just dumped 'em into your computer," Kroft says.

"And that's why we are undergoing the record by record review," Bucella states.

Jack Cloonan says in the headlong rush to get a list, they forgot quality control.

The problem with these kind of mass sweeps is the rate of false positives that get put in the system.
 
How significantly different is this from ECHELON, a program that the NSA reportedly used during the Clinton years?



Did the Clinton Administration shred the Constitution? Did the Clinton Administration violate separation of powers?
The technology is evolving and there is no doubt any politician could be tempted to use information gathering such as this for the wrong purposes. Nixon used the FBI to investigate political enemies including anyone in the country who opposed the Vietnam war.

Clinton signed into law the Bill which required the wiretapping capabilities be installed by the telecoms. This could have been for the benefit of legitimate FBI investigations. Nothing came to light that Clinton expanded widespread data mining specifically looking for politically damaging information such as whistle blowers might be giving reporters, or using the data mining to threaten whistle blowers into keeping quiet.

Do you have evidence this is something Clinton did? If so, then it was wrong and should be disclosed. If it is just a matter of Clinton being involved with technology advances, then it is not the issue.
 
That Echelon link is pretty interesting, Brainster. If a number of countries are cooperating, then Britain could spy on US citizens and the US could reciprocate by spying on the Brits and both could claim they were not spying on their own citizens.

OTOH, it's a bit harder to get away with misusing the system for personal political gain when other security agencies are involved in the activities. Since the plan to thwart the system for a day in protest involved
The paranoia may have peaked on Oct. 21, known on the Internet as "Jam Echelon Day," when organizers urged e-mail users around the world to send as many messages as possible containing words such as "bomb" and "assassinate" in an attempt to overload NSA supercomputers that sort through millions of intercepted communications looking for threats to national security
it would seem this was legitimate data mining.

If it is used to control one's political enemies, perceived or real, or it is used to stifle free speech by harassing government protesters, that is when it becomes dangerous.
 
Last edited:
You mean like Echelon?
So was Echelon determined to be real, or was it more of a shadow system? It does appear that it could have been a precursor to Bush's use of data mining.

As for CTs, this is all coming out in Congressional hearings. That is politics, not CTs.
 
Brainster,

Did the Clinton Administration shred the Constitution? Did the Clinton Administration violate separation of powers?

Yep. Bush finished the job.


Skeptigirl,

It isn't a theory. It is an evidence based political discussion. You are welcome to point out anything that is not supported by evidence.

From what I've read I agree


INRM
 
Looking at all the sources in the related CT thread, they seem to support what I've posted here. Travis posted a link to an in depth article over the events surrounding the surveillance program and I've commented on it here.

Two key things from the article are that the people who would have been most involved in counter terrorism were kept out of the surveillance program loop, and all but a few very loyal insiders felt so strongly the program crossed the line, they were ready to resign.

That just doesn't sound like a national security surveillance program.
 
It isn't a theory. It is an evidence based political discussion. You are welcome to point out anything that is not supported by evidence.

Evidence however does not show that it is possible to capture ALL traffic.In fact conspiracy is in itself about technology available for such task.

And it can be argued that surveillance requires capturing all traffic.And that is not possible without conspiracy of more subjects than american telecoms.And even then it would be neccessary to be able to combine all packets/fragments captured on different nodes,meaning to know which fragmenst/packets are to be combined.

Technically overall not possible.Not even supercomputers are that powerfull,because often there are various encodings and encryption systems and even the most simple one adds computing overhead.
 
Evidence however does not show that it is possible to capture ALL traffic.In fact conspiracy is in itself about technology available for such task.

And it can be argued that surveillance requires capturing all traffic.And that is not possible without conspiracy of more subjects than american telecoms.And even then it would be neccessary to be able to combine all packets/fragments captured on different nodes,meaning to know which fragmenst/packets are to be combined.

Technically overall not possible.Not even supercomputers are that powerfull,because often there are various encodings and encryption systems and even the most simple one adds computing overhead.
I already clarified this. You data mine all traffic with the basic program, you single out less for more detailed data mining, then you use the voice recognition data mining on the smaller subset and actually wiretap the smallest subset.
 
Ok, I'm NOT in the know here, but based on what I knew about things like word spotting in 1990, there's nothing at all to prevent this system from working on some random-sampling basis at the very least. The word-spotting isn't a whole lot more work than present low-bit-rate speech codecs, really, of course then the data spotted would have to be forwarded for more evaluation.

The question is more one of infrastructure than anything else, and is far from technically impossible.

But that's purely an engineering evaluation, mind you, I have no evidence for anything beyond having heard the word Echelon somewhere or other, no idea where at this point.
 
And it can be argued that surveillance requires capturing all traffic.And that is not possible without conspiracy of more subjects than american telecoms.And even then it would be neccessary to be able to combine all packets/fragments captured on different nodes,meaning to know which fragmenst/packets are to be combined.

Technically overall not possible.Not even supercomputers are that powerfull,because often there are various encodings and encryption systems and even the most simple one adds computing overhead.

I already clarified this. You data mine all traffic with the basic program, you single out less for more detailed data mining, then you use the voice recognition data mining on the smaller subset and actually wiretap the smallest subset.

I bolded part you forgot!Internet is not simple network,where you can tap a single(multiple) central nodes and get entire traffic for network.

It is often forgotten that part of internet is routing and various interconnections.Another part is fragmentation.As soon as two fragments from same IP packet will travel different routes,there is problem for surveillance.

And third even if there is simple screening it will take big time or wil miss target.and enocded/encrypted transfer is more and more used/popular.It is quite nice chunk of traffic for analysis.(You have to store entire encoded/encrypted traffic and it is large)
 
Ok, I'm NOT in the know here, but based on what I knew about things like word spotting in 1990, there's nothing at all to prevent this system from working on some random-sampling basis at the very least. The word-spotting isn't a whole lot more work than present low-bit-rate speech codecs, really, of course then the data spotted would have to be forwarded for more evaluation.
Correct,bur random spotting is not that usefull.Capturing and analysing all ,is quite different problem.And quite big.(And still storage is needed!)
The question is more one of infrastructure than anything else, and is far from technically impossible.

But that's purely an engineering evaluation, mind you, I have no evidence for anything beyond having heard the word Echelon somewhere or other, no idea where at this point.

I think that Echelon's effectivity was bit overrated,but suspect it was at least something for secret services.
I "misswrote" as you pointed out:
Technically possible,but unlikely as infrastructure would be too costly even for NSA,CIA and others together.Or effectivity would go down.

I suspect they are surveilling only those who were already found by other means.(Police and FBI work included)
 

Back
Top Bottom