New Photos:

  New Ramblings:

  New Links:

Counter

Last Updated

 


Previous Posts





About the Blog

The thoughts and theories of a guy who basically should have gone to bed hours ago.

I know, I know - what's the point? But look at it this way - I stayed up late writing it, but you're reading it...

Let's call ourselves even & move on, OK?


Powered by Blogger

Friday, May 12, 2006

By the Way, Is Data Mining Unconstitutional?


I have a question: Is data-mining by the U.S. Government unconstitutional?

By my count, we've had this discussion at least three times now (the TSA profiling debate after 9/11, the request for search results from Google/Microsoft/Yahoo to fight online porn, and now the NSA getting phone numbers), but the politics of the specific example always get in the way of the more generic question.

Here's my thinking: if the government wants to spy on one person, they need to get a warrant. Without a warrant, clearly it's illegal.

If they want to spy on a group of people (defined by a characteristic, not a list of names - i.e., all people who pay cash for airline tickets, or all Arab-Americans, or all people who call Afghanistan more than five times per month), warrants are less applicable (what do you do? Get a list of names and obtain warrants for each person?), but there should be some check/balance to make sure that they have a legitimate reason to investigate the group - a "group warrant," if you will (this, by the way, strikes me as protection against prejudice more than protection against privacy).

Now, if they want data on everyone in the country (i.e., all phone calls), both the warrant and the "group warrant" seem less relevant. Clearly, the point of retrieving all the records is to <u>look</u> for reasons to investigate. So what checks and balances are required in this case?

In both the second and third cases, I think the missing check is some assurance that the government will use the data they collect <u>in the aggregate only</u>. Today, even if that's all they're doing, people worry about the possibility that they may do more.

What if we could allow aggregated data searches, but require the government to obtain warrants if patterns are identified, or if certain individuals meet the criteria of a search? In other words, the they get the data mining for free, but if it turns up anything, they should get approval to check it out. This scenario would take some technical development (not too much, I think) and some advancements in the law to catch up with the technology (to define, for instance, when the search is specific enough to cross the line from data mining into spying on individuals).

Such advances, I believe, would provide the government the opportunity to take advantage of modern surveillance techniques while giving people the peace of mind that their privacy is not being unduly invaded.

posted by Brian at 5:23 PM


14 Comments:

  • I'll be brief so not as to recapitulate the things I'm saying in the -- no exaggeration -- dozen or so posts I'm working on for my site, but here's the essential problem: you're thinking like a marketer.

    That is, in most commercial data gathering, you generate aggregate data because the aggregate itself has value for you. You don't care whether you sell Joe Smith a refrigerator. John Brown will do just as well. All you want is to know that out of these 10,000 targets, you'll sell 100 appliances.

    The government doesn't work like that. For one thing, there are far fewer terrorists than people who want fridges. For another, actual terrorists are a tiny subset of people who are wannabes -- that is, maybe they have the same thoughts and sympathies, but they don't have the will to act on them. American law doesn't recognize a thought crime, so these people are not terrorists. But they serve as noise to mask the signal.

    And then there's the much larger set of false positives, people who inadvertently match the same patterns and are complete innocent in both thought and deed. As I keep telling you, I've associated with known terrorists. I can think of at least five other vectors that I'd match nicely. Not only do I serve as more noise to mask signal, but there's also the personal risk that my life will be ruined in the process of accidentally making a DHS official's job harder.

    NSA doesn't want 100 random customers. NSA needs to find Joe Smith.

    So -- yes, group warrants are unconstitutional, period. Why? Because let's say you've got your 10,000 targets, and you narrow that down to your 100 probables, and you think Joe is in there. You get your warrant -- whoops, you don't only have a warrant for Joe, you also have a warrant for the other 99 people. Those 99 people don't meet the standard of probable cause -- and at this point, neither does Joe, even if he's guilty as sin.

    But that's only the legal argument -- the technical argument stems from the law of large numbers. We don't have 10,000 targets, we have 300,000,000. We have a starting data set in the hundreds of billions. It's like picking out ET phoning home with a transistor radio while standing in front of a jet engine. Every time you boil signal out from noise, you're moving away from the raw data; every time you do that, you distance yourself from the notion of probable cause. Ergo, this doesn't work for a process that needs to find Joe -- especially considering that in many cases, Joe hasn't committed a crime yet.

    To answer your other technical question, yes, you can do a one-way hash to provide an aggregate data set which completely masks individual data (presuming large sets of data; the set of all Americans who are Jewish clarinetists who joined KA and graduated from Penn in 1990 has one member). But there are mathematical difficulties because no two source databases have identical fields. A secure system of aggregation might be possible, but the NSA doesn't want that, because they know it won't do much to help them find Joe.

    Unfortunately, what they're doing now is highly unlikely to help them find Joe, either. Too much noise, too little signal.

    By Anonymous Jeff Porten, at 4:43 AM, May 15, 2006  


  • OK, I can buy the signal-to-noise problem, the false positives problem, and the very, very large dataset problem.

    But here's my question: if the data really is as unattainable as you suggest, why are they doing this? The cost, time, and effort involved in obtaining billions of phone records is so much higher than simply getting a warrant for Joe Smith, it strains belief that they'd go through these motions just to randomly spy on individuals.

    Also, the only way to solve the problems you mention is to develop more effective algorithms. And to do that, you need access to the data for testing purposes.

    If the one way hash could be genericized to the point of becoming a feasible way to transfer sensitive data to the government, could we not draw a line somewhere to allow data mining to a point, and then require warrants to after individuals who fit the pattern? False positives, just to take one example from above, may make it through the first gate, but fail to justify the warrant step, improving the overall system.

    By Blogger Brian, at 1:59 PM, May 15, 2006  


  • My thinking on why they're doing this: aggregation of power. They're buying into Vernor Vinge's theory: this is "the ultimate policy response of those dissatisfied by the unruliness of the natural world." Cf. my article on CFP2006 at: http://db.tidbits.com/getbits.acgi?tbart=08519

    There are mathematical proofs which show that you can never remove all noise from signal; the less signal you have to begin with, the worse it gets. Ergo, "better algorithms" are not necessarily an answer to this (although I stipulate that we have not played out the complete cycle of diminishing returns here). See my link at: http://jeffporten.com/2006/05/15/the-signal-to-noise-problem/

    By Anonymous Jeff Porten, at 4:20 PM, May 15, 2006  


  • The fact that you can't remove all the noise doesn't mean you stop trying. Data mining today is miles ahead of where it was 5 years ago, and it wouldn't be if folks back then just wrote the whole thing off based on some theoretical study of probabilities based on (then) current technology.

    The impossible will one day be possible...

    By Blogger Brian, at 5:48 PM, May 15, 2006  


  • Greenberg, meet Gödel. Gödel, Greenberg. The reason you might want to meet is that Gödel developed some fascinating mathematical proofs early last century demonstrating that all self-consistent mathematical systems have insoluble problems.

    Relevance to this discussion is that the problem of separating signal from noise is one of those problems. Yes, we can (and should) make our data-mining technologies better. Noise cancelling headphones are good things. So are environmental models that will hopefully teach us how not to drown the planet.

    But both of those problems can be solved with answers that don't reach a granular level -- and in fact, it's provable that weather and sound waves are sufficiently stochastic that granularity is impossible. But you need granular answers for anti-terrorism, and the mathematics is fairly strong (although not, to my knowledge, proven) that such is similarly impossible with a TIA-like program such as what's being done.

    By Anonymous Jeff Porten, at 12:08 AM, May 18, 2006  


  • Assuming Gödel is dead, I'll pass on the meeting. Thanks for the setup, though.

    As to the proofs, why are you assuming a data-mining system needs to reach the granular level? What if data-mining identified 1,000 potential suspects out of 300 million? Could we not have a legal procedure in place to do additional research on those 1,000 (without presuming guilt for any of them)? If that analysis narrowed it down to 50, could we not pursue warrants on those 50?

    More to the point, if this type of technology is available (and I believe, although I can't prove, that it is), how bad would it be to learn that we didn't use it when we had the chance?

    Is it really just the stigma of being investigated that we're trying to avoid? I, for one, don't mind being investigated if my name comes out of a computer model, and the investigation is painless (or even invisible) to me, and I'm immediately discarded once it's determined that I'm not the guy they're looking for.

    (by the way, please don't misinterpret the above as the "Nothing to Hide" argument. I'm not saying I'm OK with it because I have nothing to hide. I'm recognizing that some processes include false positives, and saying that I'm OK with that, as long as being a false positive doesn't impact me in any significant way.)

    By Blogger Brian, at 5:44 PM, May 18, 2006  


  • (Ampersand ouml semicolon, Brian. Copy and paste doesn't work with Unicode. I'm seeing your Gödel as Gödel.)

    You're reversing cause and effect. First you need probable cause, then you get a warrant, which is the "legal procedure in place to do additional research, without presuming guilt for any of them". The datasuck on the 300 million bypasses the Constitutional protection on their data with a vengeance.

    If you're going to propose classes of data that don't require warrants, then why stop at communications? Or at pen traces? Let's just grab everyone's email, or all phone content, or the positional data from cell phones. Hey, in 2007 we'll start getting RFID data from REAL ID cards. Where exactly is the line you draw to say, "whoa, for that you need a warrant, bub"?

    The way we draw that line is to make it universal. If you don't, what's the point?

    More to the point, if this type of technology is available (and I believe, although I can't prove, that it is)

    You're behind on your research. Not only is it available, but the NSA is documented as having it. Cf. the EU investigations into ECHELON.

    Is it really just the stigma of being investigated that we're trying to avoid? I, for one, don't mind being investigated if my name comes out of a computer model, and the investigation is painless (or even invisible) to me, and I'm immediately discarded once it's determined that I'm not the guy they're looking for.

    Sigh. How many times do I need to say this? Your rights are your rights, regardless of whether they're being abrogated right now. By this theory, you wouldn't care if your right to a jury trial was waived on alternate Thursdays, since right now you're not expecting to stand trial -- and even then you'd have a 92% chance of getting one.

    Fact is, that's a lot of ifs, Brian. It's not about stigma, because the investigations are supposed to be secret. Until of course, someone finds it useful to leak it to damage you. The investigation is painless, until somebody gets their wires crossed and puts you (or any other Brian Greenberg) on a no-fly list.

    Discarded? Your data, or the analysis making you a suspect? <insert sound effect of stifled, maniacal laughter> You do know better than that, right?

    By Anonymous Jeff Porten, at 3:49 PM, May 21, 2006  


  • Where exactly is the line you draw to say, "whoa, for that you need a warrant, bub"? The way we draw that line is to make it universal. If you don't, what's the point?

    Exactly. Clearly, we haven't drawn that universal line at all. Hence all the argument.

    Sigh. How many times do I need to say this? Your rights are your rights, regardless of whether they're being abrogated right now. By this theory, you wouldn't care if your right to a jury trial was waived on alternate Thursdays, since right now you're not expecting to stand trial -- and even then you'd have a 92% chance of getting one.

    Sigh. You've made this argument a thousand times, and I agree each time. But that's not at all what I was saying. I'm not abrogating my rights at all. I'm asking if, given the current state of technology, there's a point at which my data makes up the model but I'm not personally being investigated.

    Let's use a Google search as an example: I type "Brian Greenberg" into Google, and get 32,700 hits back.

    First question: have the billions of web pages that Google just scanned been "investigated" by my search? Technically yes, but practically speaking, I'd say no. I never saw them, never really had a chance of seeing them. They're only purpose was to hit up against the search algorithm to produce this initial list of 32,700.

    Second question: the 32,700 hits contain many false positives. The first one says, "Brian Greenberg - Your Realtor for Life." I (playing the role of NSA grunt, in this example) take one look at that and say, "Nope - not the guy I'm looking for" and move on. Has that guy been investigated? That's a little closer now (I may have even clicked on his link if the blurb wasn't so clear), but I rejected him almost immediately. So I'm on the fence on this one.

    The next three hits are that @#$^! guy who acts with Uma Thurman. He clearly deserves no civil rights at all (but I digress...)

    #5 is me, at which point I stop. But let's say Google had a "search within results" function (does it? It must, right?) and I kept searching (-realtor -uma -hollywood) until I got a list of folks who were probably me. At that point, I'm going to click on each page and read through some stuff to determine if it's the guy I'm looking for. These folks are clearly being investigated, and I'd expect the NSA grunt would need a warrant at that level.

    Obivously, we need to draw a line somewhere/somehow. But all the hysterical talk about "mass spying" is a little like accusing Google of reading your personal web page billions of times a day, all in the name of finding naked pictures of some supermodel...

    By Blogger Brian, at 10:42 AM, May 22, 2006  


  • Exactly. Clearly, we haven't drawn that universal line at all. Hence all the argument.

    No, sorry, you're just wrong here. The Fourth Amendment is the universal line. The argument is because public opinion is not sufficient to force the courts or the Congress to push the administration back within those lines.

    Basically, unless there is case law interpreting the 4th that I'm unaware of, the case law and Congressional legislation basically both say, "what the NSA is doing is in breach of Constitutional protections." Only the Supreme Court is allowed to interpret otherwise; the executive does not have the unilateral right to say, "we're going to do this anyway because we think we're allowed to." The reason they're doing it regardless is because of a failure of checks and balances -- it's the role of Congress and the courts to rope them back in.

    I think we're both in agreement that this failure is not due to universal acceptable of the legal brilliance of Alberto Gonzales, but due to political issues that makes many people who are supposedly in an oversight position very friendly, and reluctant to exercise their Constitutional obligations.

    given the current state of technology, there's a point at which my data makes up the model but I'm not personally being investigated. Let's use a Google search as an example: I type "Brian Greenberg" into Google, and get 32,700 hits back.

    Can't respond to this because your analogy is flawed. Everything in Google is public information, and AFAIK, law enforcement and security agencies can data-mine these as much as they please without any warrants. (There are some protections in place on who they can open investigations on in the first place; NSA can't start searching for Brian Greenberg and create a file on you for the hell of it. But since you've left the country from time to time, that's sufficient.)

    The issue is that the dataset in question is information that requires a warrant to begin with. Under the letter and standing interpretation of the law, the NSA can't say, "give us all that data", and it's federal crime to ask for it and for any company to comply. This is known to be true because that's what they were doing under Project Shamrock, and what Congress specifically put a stop to. So your analogy doesn't stand up, because it's the raw dataset in the first place that NSA can't compile without warrants.

    You're right that, again AFAIK, there is no legal mechanism that would allow NSA to say, "give us all that", without essentially getting an individual warrant on every American with a phone line. So if there's a value to this kind of data-mining that we do want to enable, that would require new law. It's not just what they're doing, it's how they went about it, that is particularly destructive of individual rights.

    At that point, I'm going to click on each page and read through some stuff to determine if it's the guy I'm looking for. These folks are clearly being investigated, and I'd expect the NSA grunt would need a warrant at that level.

    Again, you make completely unfounded assumptions about how NSA goes about this. First, as it stands, the NSA grunt can do whatever he damn well pleases because he has no oversight and no legal standards -- his program is completely off the legal books and if he wants to spend his week trying to figure out if you're having an affair with your secretary, that's completely kosher. No one will ever know the difference.

    Second, let's say that at the point they have a list of probables -- and you're in there -- that list simply set aside. TSA decides to use that list for the May no-fly list. Or it gets fed into the nationwide police database. You'd never know the difference -- unless you tried to fly somewhere, or you got pulled over for a traffic stop and found yourself spending four hours in a north Jersey lockup for reasons that were entirely unknown to you.

    The point is, they can do anything they want with this data, and apply it in ways that can affect your life in egregious ways, and you're arguing against exactly the kinds of checks and balances that normally provide you with the legal protections that you've come to expect by carrying one of those nifty blue passports. Let's at least meet halfway on doing all of this in the standard legal way.

    By Anonymous Jeff Porten, at 8:09 PM, May 24, 2006  


  • Sigh....

    You've completely missed my point, for which I blame myself, because I'm obviously not expressing myself clearly here.

    So I'm not going to try again. Suffice to say this much:

    I don't agree that all of this is definitely illegal, because most of the people stating it as fact have no legal training or constitutional law background, and because there's been no investigation or trial. You yourself have simultaneously argued that we don't know much about this program and that it's clearly against the law, practically in the same sentence.

    The fact of the matter is we'll probably never know, because there probably won't be an investigation, and if there is one, it'll either a) go nowhere, or b) result in an impeachment of the President, which will be immediately interpreted by way too many people as the convenient excuse "The Left" used to finally "get" George W. Bush. Mark my words - the house speeches will be all about missing WMD's and torture chambers, not about data mining.

    But I digress...

    Regardless of whether it's legal today or not, I've been trying to have a rational discussion about whether our intelligence & law enforcement communities should have access to modern surveillance technology like data-mining. But it's impossible to do, because folks (not just you - I'm getting beaten up about this over on Scalzi's blog too) keep explaining to me about how my civil rights are/could be violated.

    At the end of the day, it's all just a big cluster%@#&. We argue about the stuff we can't prove, and we can't discuss the legitimate issues that argument raises because we're paranoid about the END OF DEMOCRACY IN AMERICA.

    Even a straightforward analogy (web data is to call lists as Google Searcher is to NSA worker) gets shot down because the terms in the analogy aren't exactly the same.

    </grumpy mood>

    By Blogger Brian, at 12:21 AM, May 25, 2006  


  • There's a difference between not making your point and not being convincing. I think I get your point -- I'm blowing this out of proportion, this might be bad in some ways, the system still works, I can still write whatever I like without expectation of being shipped to Gitmo.

    Yes, we are in agreement, this is not China or Stalinist Russia.

    But what puts me in a grumpy mood is that you seem to disengage your usual critical thinking beyond that assessment, and as I keep telling you, we canaries aren't going to be successful (or survive, for that matter) without the ability to occasionally get people like you to notice we're chirping.

    To wit:

    1) What we do know about the program is at least illegal under FISA, according to the letter and spirit of that law. Are we at least in agreement that the government, when they want to break a law, should get it changed, rather than break it first and hope to clean up afterwards?

    2) The reason why this leads to discussion of the END OF DEMOCRACY IN AMERICA is because the above standard removes all checks from government power. If the government can break any law it wishes until they're stopped, and if the government has a political system where they're unlikely to go through oversight, then you have a natural feedback system where government powers will expand indefinitely.

    (And this is where the conversation branches out beyond the NSA -- if you want evidence of the extension of executive power and the intentions of the administration going forward, it's quite rational to review their actions in other areas.)

    3) Finally, you keep referring to how little we know and how complicated the situation is -- when the people running the program are in charge of revealing little information and making it sound very complicated. You can't make that argument without taking that side, even though in most cases that's the rational way to go. You ignore the possibility that this program is secret not to protect the national interest, but to protect political interest.

    By Anonymous Jeff Porten, at 6:48 PM, May 26, 2006  


  • 1) What we do know about the program is at least illegal under FISA, according to the letter and spirit of that law.

    And what we don't know might change what we do know. And neither you nor I is qualified to say what the "letter and spirit" of the FISA law mean. And we both know enough lawyers to know that when one law is inconvient, there are a few other laws they can pull out that work better. "Right and Wrong" are rarely black and white issues. That's why we have trials.

    2) The reason why this leads to discussion of the END OF DEMOCRACY IN AMERICA is because the above standard removes all checks from government power. If the government can break any law it wishes....[blah, blah, blah]

    This discussion we're having now (and the corresponding one in the press) is the check on government power. And I have no evidence to suggest that it won't continue the next time around.

    There's also Congress. It seems we both agree that the Democrats have no spine when talking in the aggregate, but when we get to a specific case, they get a pass. These folks (and their Republican counterparts, by the way) should be providing oversight, meaning they should be brining pressure on the administration to reject ideas they disagree with. This can happen within the walls of secrecy, but requires a desire to help without scoring public, political points. I wonder if we've lost that...

    3) You ignore the possibility that this program is secret not to protect the national interest, but to protect political interest.

    No, I don't. I have in my head, at the same time, two possibilities. You seem to be ignoring the second one, though...

    By Blogger Brian, at 10:39 AM, May 27, 2006  


  • This one needs a point-by-point.

    And what we don't know might change what we do know.

    By this logic, we are required to keep our mouths shut about anything with any classified components to it, or even about topics for which we are not omniscient.

    And neither you nor I is qualified to say what the "letter and spirit" of the FISA law mean.

    I'm the first to say that uneducated people are entitled to their own opinions in the privacy of their own homes only. But whatever the threshold is to have an informed opinion on a wide variety of subjects, you and I have passed it. I don't have a law degree, no. But I do have more formal training in American Civilization than you do -- do you therefore stipulate my essential rightness on all things American to me? I'll do the same for you on anything involving money.

    Don't worry, we can still argue about Mac and Windows.

    And we both know enough lawyers to know that when one law is inconvient, there are a few other laws they can pull out that work better. "Right and Wrong" are rarely black and white issues. That's why we have trials.

    Cf. my other comment where I essentially accuse you of abdicating your responsibility as an American citizen. This statement gets awfully close to that. If you think of law as a matter of convenience to be dredged and manipulated to support the whims of the day, and you think it's only up to appointed judges to decide whether that's okay, you're missing the most essential check in the check and balance process.

    This discussion we're having now (and the corresponding one in the press) is the check on government power.

    That would be the press that sat on this story for a year? Fascinating comment at CFP: "What's scandalous isn't that the NYT kept this for a year and didn't talk about it during a presidential campaign, although that's of note. What's scandalous is that during that embargo period, no one else in the press uncovered the story for themselves."

    Yes, this discussion is a good start. But government power is not checked unless you and I actually, you know, do stuff. To debate it and then think we're "done" would be the classic example of "good men doing nothing."

    It seems we both agree that the Democrats have no spine when talking in the aggregate, but when we get to a specific case, they get a pass.

    That's because we both have general-interest blogs. Let me quote myself in just the last two months, to various Democratic activists:

    "Leahy sure knows how to make the right noises. Has he been sleeping or incompetent when he was actually doing his work as a senator?"

    "We're going to get creamed in November if you don't get your act together." (This to someone at the DNC.)

    And my favorite: "Quick, who's your favorite for 2008? [two-second silence] That's why we're going to lose."

    For the record, I've frequently said that "Joe Lieberman" will mean the same thing in the 21st century that "Neville Chamberlain" did in the 20th, and it's also my opinion that the more likely it becomes that GWB goes down as the worst president in history, the more likely it becomes that every sitting member of Congress will be sucked into wretched obscurity; their vacillating incompetence for the past six years will overshadow anything else they might have done. Hillary included.

    No, I don't. I have in my head, at the same time, two possibilities. You seem to be ignoring the second one, though...

    Everything I do know about wiretapping disproves the notion that this promotes national security. Cf. Wired News commentary for excellent summaries, especially by Bruce Schneier and Jennifer Granick.

    By Anonymous Jeff Porten, at 4:09 PM, May 28, 2006  


  • By this logic, we are required to keep our mouths shut about anything with any classified components to it, or even about topics for which we are not omniscient.

    Admitting we don't know all the facts is not at all the same thing as requiring us to keep our mouths shut. It does inconveniently rule out knee-jerk assumptions that 100% of what the Bush administration does is corrupt. Funny thing, that...

    Cf. my other comment where I essentially accuse you of abdicating your responsibility as an American citizen. This statement gets awfully close to that. If you think of law as a matter of convenience to be dredged and manipulated to support the whims of the day, and you think it's only up to appointed judges to decide whether that's okay, you're missing the most essential check in the check and balance process.

    Woah...I didn't say the law was a matter of convenience. I said the law is sufficienty complex that there is rarely a black & white decision. There are real legal scholars and real constitutional lawyers out there defending what the NSA has done. If you're going to write them off as "the FOX News crowd," then you're guilty of exactly what you've accused me of: only hearing one side.

    Everything I do know about wiretapping disproves the notion that this promotes national security.

    Not true. Everything you're willing to believe disproves that notion. You've written off anything else as government propoganda.

    Again, because now you've got me completely paranoid: I'm not saying it does promote national security. I'm saying I don't know. I'm parroting what you've told your DNC buddies, but in a specific instance:

    Democrats in Congress - get off your &$*#&@ butts and lobby long & hard for a bi-partisan (or non-partisan) investigation. We've all declared the press the only available oversight body because the one in the constitution (Congress) seems to have completely abdicated the post.

    By Blogger Brian, at 10:26 PM, May 28, 2006  


Post a Comment

<< Home