Data Isn’t So Private
Gawker and the media were able to find out how much various celebrities tipped for taxi rides across New York via a public database of New York taxis. The database in itself didn’t supply the identities of the celebrities but with a little digging the media was able to piece it together. This sheds light on the fact of how every single piece of data you put out there is one step closer to someone identifying you for it.
My Take
This really shows data privacy and how it is getting harder and harder to keep your data private, as businesses are incentivized to use it and draw insights from it. After all, that’s what is useful about data – the insights. There is a quote by Paul Ohm (a professor of Law at Georgetown University Law Center) that says “Data can either be useful or perfectly anonymous but never both”. Finding that balance is hard and maintaining it is a whole other level.
Summary of Article
In an article written by Open Source Data(which is based on a talk by Steve Touw; the article can be found here), they talk about a possible balance between these two extremes and one idea for it playing out in the future. More specifically they detail why this new era of data requires privacy measures that are also new and how the old ones won’t cut it anymore. They also go over some ways to mask data, the business perspective when dealing with data, and what businesses should be considering when dealing with data.
The article details three main questions that businesses should be looking at when dealing with data, they are:
- Are you using personal data beyond the scope of what they expect?
- Are you confining your data crawl only to what’s necessary for decision making?
- Do you understand what data was used to train your models?
My Take
These questions are good starting questions to go by if you are just getting into the field of collecting and analyzing data for business purposes. I believe this because this series of questions takes into consideration the big points that you should be taking in: the consumer, your interests, and your judgment.
Moving Forward
So how do we move forward into that next age of data privacy and please both the business and the consumer? Well, some people are saying we mask data while still making it usable. One way to do this is by using differential privacy. Differential privacy is where you add noise to the data which makes it easier for someone to deny that the information was about them while still keeping the data useable for the business and if the query is too sensitive to be deemed “private enough”, then you do nothing.
My Take
I think this is a good idea because it maintains that middle ground that we all need when it comes to data privacy without favoring one side too much. It is also a very elegant solution in my opinion because it doesn’t involve tampering with the data too much and risking skewing the data or messing with the insights that it has to offer, it just masks the person/people that gave the data.
The art of data privacy is like the game tug of war. On the one hand, you have the business that wants to use the data and get the most out of it, and on the other hand, you have the consumer who, in most cases, wants to remain anonymous and not have their data used and especially not have their identity be accessible to the business. So, you are constantly trying to keep it in the zone where both parties are satisfied – or keep playing the game of tug of war. Pull too hard on one side, the whole thing collapses causing the game to end.
Conclusion
All in all, this was a great article to read as it was really interesting and it taught me the technical side of data privacy as well as putting it into the scope and let me know all the angles of data privacy, from the business to the consumer, to the industry. Overall it was a great article and I defiantly recommend you read it(the article can be found here).