Scorched America

NSA’s storage capabilities for voice data could be of unprecedented scale

NSA datacenterGlenn Greenwald teased a new story he’s been working on for The Guardian about new technology that allows the National Security Agency to “redirect” up to one billion cell phone calls per day to its servers, and then store them for an unknown length of time:

What we are really talking about here is a globalized system that prevents any form of electronic communication from taking place without its being stored and monitored by the National Security Agency. It doesn’t mean they’re listening to every call, it means they’re storing every call and have the capability to listen to them at any time, and it does mean that they’re collecting millions upon millions upon millions of our phone and email records.

That leaves a lot of room for interpretation and there’s not much to go on until the story is published. These claims raise serious questions that need answering.

Do domestic and global telecommunications companies willing participate in the gathering of this information, or is it being collected discretely without their knowledge and/or consent?

All good questions, and the public doesn’t currently have any answers.

One good question raised by several people who commented on the story I wrote Friday on this topic questioned if the NSA even has the technical ability to store that much information.

Only someone with an appropriate security clearance knows the data storage capacity of the NSA (this would make a good question for Edward Snowden, if anyone can get access) and large information providers like Google aren’t always forthcoming about their storage solutions, either. But I believe we can make some make some educated guesses by looking at the capabilities of the private sector which then can be extrapolated to government intelligence services worldwide.

Your entire life and then some

A paper published by Jeffery Dean and Sanjay Ghemwat in 2008 revealed that Google was processing 20 petabytes (20,971,520 gigabytes) of data per day as of the fall of 2007. Processing data isn’t the same thing as storing it, but I think that gets us in the ballpark, and we’re talking about technical capabilities that are already six years old.

A note written by Peter Vajgel from the Facebook Engineering group in the spring of 2009 revealed that his company was storing 1.5 petabytes (1,572,864 gigabytes) worth of photos and was adding 25 terabytes (25,600 gigabytes) worth every week. He also shared that Facebook’s photo storage servers were only using a series of 1 terabyte drives at that time, and we’ve now got 4 terabyte drives on the market for $179.99. And that’s a consumer rate, we’ve got to assume that governments, like Facebook and Google, get discounts for buying in bulk.

A story by Sebastian Anthony on Extreme Tech found that Facebook’s IPO documentation claimed storage of 100 petabytes of data between photos and video, Microsoft was over 100 petabytes for Hotmail (given its advantages, it’s probably significantly more for Gmail), and even Dropbox is storing in excess of 40 petabytes of data.

Netflix is storing its video library on Amazon’s S3 service, with the former at one time consisting of 1 petabyte of data, and S3 was thought to store a total of 566 exabytes of data as of late 2011. 566 exabytes is 579,584 petabytes (607,737,872,384 gigabytes.)

Even if all of these estimates are off, you could cut them by large factors and still be talking about incredible amounts of data storage.

The government doesn’t know the meaning of thrift

We’ll probably never know the current and maximum storage capacity (and even less well ever be known about its processing capabilities) of the National Security Agency or Central Intelligence Agency, but we do have some information to go.

The NSA is constructing a 1.5 million square foot data center at Camp Williams, Utah, with a projected completion date of September of this year. That data center is rumored to have a storage capacity measured in yottabytes, with one yottabyte equaling 1,125,899,906,842,624 gigabytes.

Compared to some of the major information providers that I listed above, at one yottabyte, the NSA conceivably could store:

That’s a lot of storage, and obviously the NSA won’t be using it only to store telephone intelligence, but there’s plenty of room for that when it comes to recording and storing the content of phone calls for an extended length of time.

Hundreds of hours on a floppy disk

Many people are making assumptions about the storage requirements of audio based on their personal experience with the MP3 codec when used to store music. The frequency range of the human voice is much less than that of musical instruments, and can be clearly understood at a much lower quality level that we might prefer just to satisfy personal taste. MP3 would be significant overkill if the government’s priority were efficiency and quality only mattered in discerning words

My old Samsung Reality cellphone (2010) uses the QCELP codec developed by Qualcomm in 1994 to record voice notes. A quick test using that codec found that 32 seconds of audio only required 22.5 KiB of storage space, which works out to 47.407 minutes of audio per megabyte of storage. The 4 GiB SD card in my phone could store nearly 1.7 million hours of audio using QCELP, and that codec is ancient.

(Note: A second test of constant talking stored 30 seconds in 45.6 KiB of data. These codecs apparently are super efficient at not storing silence.)

Forget about five megabytes of storage for five minutes of audio with MP3, the NSA and any other intelligence agency has had the capability to store 596 hours of audio on a 1.44 MiB floppy disk for nearly two decades, if not more.

QCELP has been replaced several times over the years. Selectable Mode Vocoder (which is in the process of being replaced by Enhanced Variable Rate Codec B) can store audio with a bitrate as low as 800 bits per second at its lowest quality. At that bitrate, an hours worth of the human voice can be stored in as little as 351.5 KiB, or nearly three hours of speech per megabyte.

A single four terabyte hard drive from Newegg ($179.99) could store 11.1 million hours of audio using SMV. If rumors are accurate, then the NSA’s new data center could conceivably store 383,347,862,637,820 years of audio, or about 40,297,527 trillion five minute phone calls in total, using SMV. With that kind of storage capacity possibly coming online within half a year, it’s not unreasonable to believe that the NSA could already intercept and store one billion calls per day.

With little more than rumors to work from, there are of course many caveats to these estimates. The NSA may not have anything close to that kind of storage in reality, and even if it does have that capacity once the Utah data center goes online, it may not be storing that much data in perpetuity. It’s also likely that a great deal of that storage capacity would be lost to backups, live data redundancy, regular hardware loss, and an incentive not to fill the storage system to the brim.

It’s also unlikely that the NSA would use the SMV codec, much less at its lowest quality setting. A higher quality bitrate setting could easily increase storage requirements by a factor of 10, SMV isn’t the most advanced codec in existence for storing voice information, and the NSA may even have more advanced codecs than what the private sector has. Yet with such astronomical numbers to begin with, a 10 times reduction of 383,347,862,637,820 years worth of voice storage isn’t very meaningful.

Another thing to consider is that since more than one phone call can be intercepted at a time, a hundred hours worth of calls might be recorded within a single hour of real-time, or even a few minutes, depending on how extensive the surveillance is. One million hours worth of calls could end up being captured within a surprisingly short amount of time.

Private sector spooks

The information providers that I mentioned earlier can once again give us an idea of what the NSA might reasonably be capable of storing today.

The servers storing Netflix’s one petabyte of data could hold 357,020 years worth of audio with the SMV codec at its lowest bitrate, or 35,702 years at a much higher quality. Microsoft could store 35,702,051 years on the servers that make up Hotmail.

Amazon may well have the largest storage system in the world. If every human on Earth (call it seven billion) made a five minute phone call in a 24 hour period, that’d be 35 billion minutes worth of voice data. That’s only 190.9 terabytes worth of data per day using the SMV codec, meaning Amazon’s rumored 566 exabytes of storage capacity could hold 207,025,133,181 years worth of phone calls.

In other words, even using an audio quality at 10 times my estimates, Amazon could easily store every phone call in the world indefinitely.

Even if the NSA falls well short of Amazon or the yottabyte mark, a start-up like Dropbox could store 19,669,809 years worth of audio give its current storage capabilities, and that’s a company that’s run on less than $260 million in funding since 2008, staffed by just 221 people as of January of 2013.

Given the current state of technology and what we already believe has been deployed in the private sector, it’s very likely that the National Security Agency could store far more than one billion phone calls per day if it really wanted to.

* * *

A few notes about this story:

Exit mobile version