msmemory_archive

You're viewing

msmemory_archive's journal
Create a Dreamwidth Account Learn More

Reload page in style: site light

NY Times reports that Google is now charting the spread of the flu using aggregated search data superimposed on zip/state data. If there's a spike in searches on "flu symptoms" for example, in New England, then the hypothesis is that there is an uptick in the incidence of flu. The data are supporting the notion, too. That is SO COOL.

http://www.nytimes.com/2008/11/12/technology/internet/12flu.html

Flat | Top-Level Comments Only

From:

goldsquare.livejournal.com

And yet not.

It is slightly invasive of privacy.

And since people are now aware of it, it can be spoofed.

From:

baron-steffan.livejournal.com

Invasive of privacy? Only very vaguely, I think.

But spoofable? Well, yeah. There is that.

Still, Hari Seldon would be proud %^).

From:

goldsquare.livejournal.com

I am somewhat familiar with the issues of privacy on line.

After all, my company just introduced a product line that does advertising based upon behavioral targeting... And I've worked in this arena for a while.

Suffice it to say that someone I know is considering bankruptcy, and I thought quite a bit before I did web searches on the topic...

From:

msmemory.livejournal.com

There is a reason why I will not accept a CVS customer card: I do not want them tracking my health conditions vs. my other purchases. (It's already intrusive that they put flyers for support groups in with certain of my prescriptions.) But that's a lot more detailed than Google is.

From:

goldsquare.livejournal.com

I don't have one.

Do you ever use the same credit card for prescriptions and regular purchases?

Common identifiers are common. :-)

From:

baron-steffan.livejournal.com

As a pharmacist [and former (twice!) CVS employee], I'm wondering what you mean by "tracking my health conditions vs. my other purchases". I assume you're referring to OTC purchases, since integrating Rx history into the keytag-thingie would set off some pretty high-decibel HIPAA sirens.

From:

alexx-kay.livejournal.com

"it can be spoofed."

So? Some small percentage of the population are griefers. How does the existence of this tool change that in any way?

You might as well say, "Great, now that we know how to make fire, some people will burn other people's huts down." Does that make the invention of fire a net loss?

From:

goldsquare.livejournal.com

I think you are catastrophizing.

There are many measures that become much less valuable or sensitive once the subject knows they exist. I venture to say that this is one of them.

From:

alexx-kay.livejournal.com

*I'm* catastrophizing?

Surely you realize how much of an outlier you are about your sensitivity to privacy issues. Likewise, spoofers are rare.

Yes, knowledge of this measure does, in some very small degree, make it less accurate. But knowledge of it *also* makes it much more accessible, greatly increasing the benefits. Seems like a clear win to me.

From:

goldsquare.livejournal.com

Yes, I think you are. Truly.

Yes, I am an outlier on how important such things are to me. And yet: I blog. :-) I contain multitudes.

I think it would be trivial for someone who has access to some of the SPAM botnets to use them to drive false data. Should they choose to. I can think of several ways to do so without botnets, but they are trickier.

My job, my professional expertise, involves understanding how such measures are vulnerable to skew, and how to stop or track that skew. It is what I do. I may be, in your eyes, ultra-paranoid. At the same time, such techniques of data mining represent rather dangerous intrusions into personal privacy. THIS USE may be innocent. But it is a model for others that might not be.

And if I were a sophisticated terrorist, knowing that I could spoof CDC and law enforcement in this way would be a powerful tool.

Frankly, if I wanted CDC and others to react to a prevalence of flu, I would not use indirect methods to get them to do so: if I were "The Man In The White Hat". I'd give them and local boards of health a phone call.

So, knowing this exists does not help the average person. Knowing it exists helps the bad guy. And seeing if this sort of profiling works can hurt the average person, in the long run.

From:

jducoeur

It is slightly invasive of privacy.

On the one hand, this is true. OTOH, it is no truer of this than it is of Google Trends in general. (Which I assume it's based on.) This particular horse is long since out of the barn, and this is just a minor instance of a more general matter.

And since people are now aware of it, it can be spoofed.

Again, precisely as true as any of the rest of Google Trends. There are a fair number of obvious ways to at least partially counter the effect, and if Google is using even the most basic ones, it is pretty unlikely that anyone is going to be able to skew the statistics without being horribly obvious about it. It's important to note that there are several ways to drive up the traffic on a particular subject, but I don't see any that are likely to succeed in doing so in geographically-controlled ways.

So seriously: I disbelieve. I'm sure it's hypothetically possible, but I don't see a practical way to manage the geographic balance of the spoofing. And that geographic balance is the point of the exercise. (As opposed to most Google Trend spoofing, which is all about simply increasing traffic on a subject...)

From:

goldsquare.livejournal.com

I'm sure it's hypothetically possible, but I don't see a practical way to manage the geographic balance of the spoofing.

And yet, while not trivial (it is some work), it is not particularly hard.

It depends on exactly how it is that Google does geographic plotting of requests. I know how my employer does it, and a few of our competition, and all of them can be spoofed with little at-home effort.

From:

goldsquare.livejournal.com

I suppose I should give two examples.

The question is: how does Google perform geolocation of the user, when the user is using an IP address that is not allocated geographically? (Remember, if I have a static IP on my laptop, I can plug it in anywhere in the world....)

There are two methods that might be used - one of which is very inefficient but obvious. Using a PC or Unix box, hack your IP packets that go to Google so they contain a false reply address, presuming that Google uses the reply address to perform traceroute pinging to determine your location. For spoofing purposes, you don't care if your answer gets lost. I doubt like heck that they do this, because it is expensive as a way to perform geo-location of an IP address.

The other is to use the same technique they use to load balance - when you contact Google.com, a DNS lookup is performed and an address is returned to you. Many large-scale server systems (such as Googles) load-balance by assigning unique or varied DNS replies based upon information they have about the DNS server you use, or other information related to that query. There are about 4-5 algorithms in use, some of which are patented and therefore easy to find.

To create fraudulent requests, either consistently use a novel DNS server, or first collect a number of Google server addresses that correspond to a given region, and use those repeatedly.

Combine the two techniques, and the results are likely to be wildly successful. Use a server farm, or bot-net, or some distributed tool, and you can deeply amplify the result.

Is that hard? Not really, beyond startup costs for programming tools and some data collection.

(To go beyond that, you can do some truly exciting work with authorities and BGP....)

From:

baronessmartha.livejournal.com

the cdc has a weekly histogram you can cross reference with that.

wow. group of geeks are we? YAY GEEKS!

From:

alexx-kay.livejournal.com

Yes, that's amazingly cool.

Even to the incidental detail that the head of Google.org is named "Dr. Brilliant" :-)

Flat | Top-Level Comments Only

Profile

msmemory_archive

My Website

April 2011

S	M	T	W	T	F	S
					1	2
3	4	5	6	7	8	9
10	11	12	13	14	15	16
17	18	19	20	21	22	23
24	25	26	27	28	29	30

Page Summary

Style Credit

Style: Neutral Good for Practicality by timeasmymeasure

Expand Cut Tags

No cut tags

Page generated Dec. 14th, 2025 09:35 am

Brilliant use of search data

Brilliant use of search data

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

Profile

April 2011

Most Popular Tags

Page Summary

Style Credit

Expand Cut Tags