msmemory_archive: (Default)
msmemory_archive ([personal profile] msmemory_archive) wrote2008-11-12 09:54 am

Brilliant use of search data

NY Times reports that Google is now charting the spread of the flu using aggregated search data superimposed on zip/state data. If there's a spike in searches on "flu symptoms" for example, in New England, then the hypothesis is that there is an uptick in the incidence of flu. The data are supporting the notion, too. That is SO COOL.

http://www.nytimes.com/2008/11/12/technology/internet/12flu.html
ext_104661: (Default)

[identity profile] alexx-kay.livejournal.com 2008-11-12 04:12 pm (UTC)(link)
"it can be spoofed."

So? Some small percentage of the population are griefers. How does the existence of this tool change that in any way?

You might as well say, "Great, now that we know how to make fire, some people will burn other people's huts down." Does that make the invention of fire a net loss?

[identity profile] goldsquare.livejournal.com 2008-11-12 04:17 pm (UTC)(link)
I think you are catastrophizing.

There are many measures that become much less valuable or sensitive once the subject knows they exist. I venture to say that this is one of them.
ext_104661: (Default)

[identity profile] alexx-kay.livejournal.com 2008-11-12 04:23 pm (UTC)(link)
*I'm* catastrophizing?

Surely you realize how much of an outlier you are about your sensitivity to privacy issues. Likewise, spoofers are rare.

Yes, knowledge of this measure does, in some very small degree, make it less accurate. But knowledge of it *also* makes it much more accessible, greatly increasing the benefits. Seems like a clear win to me.

[identity profile] goldsquare.livejournal.com 2008-11-12 04:34 pm (UTC)(link)
Yes, I think you are. Truly.

Yes, I am an outlier on how important such things are to me. And yet: I blog. :-) I contain multitudes.

I think it would be trivial for someone who has access to some of the SPAM botnets to use them to drive false data. Should they choose to. I can think of several ways to do so without botnets, but they are trickier.

My job, my professional expertise, involves understanding how such measures are vulnerable to skew, and how to stop or track that skew. It is what I do. I may be, in your eyes, ultra-paranoid. At the same time, such techniques of data mining represent rather dangerous intrusions into personal privacy. THIS USE may be innocent. But it is a model for others that might not be.

And if I were a sophisticated terrorist, knowing that I could spoof CDC and law enforcement in this way would be a powerful tool.

Frankly, if I wanted CDC and others to react to a prevalence of flu, I would not use indirect methods to get them to do so: if I were "The Man In The White Hat". I'd give them and local boards of health a phone call.

So, knowing this exists does not help the average person. Knowing it exists helps the bad guy. And seeing if this sort of profiling works can hurt the average person, in the long run.