Part 2: Algorithms and Big Data’s Impact on Our Privacy–What Can We Do?


My interest in the “tyranny of algorithms” phrase led to my writing a recent blog post  on our privacy risks in the onrush of algorithms and big data. The exploration brought humorous consequences, i.e., I nearly drowned in too much information on big data and privacy.   I needed an algorithm to make sense of it all!

I had promised to tell you “if there was anything we could do as citizens, internet users, and organizational leaders to protect ourselves and others from becoming information chattel.” However, the deeper I went, the darker and more complex the topic became. I finally resurfaced, recognizing that I could not make this research my life’s work…and had to give you something to consider!

In this blog post I will offer three kinds of insight. The first focus is what we can do to change our online behaviors to limit our vulnerability to privacy encroachments. The second area concerns the effectiveness of de-identification of big data and the threats posed by “anonymized” data collection release. The third concerns bad actors, that is, data brokers who “have made us their business model” and need to be better regulated by the Federal Trade Commission to stop the worst abuses.

Consumer Actions Online

The first “online privacy” technical solutions I found were directed at individual behaviors online.  They included actions for safely browsing the web, using virtual private networks (VPN), and using your credit card to shop safely online and setting the right privacy settings on Facebook.  Jack Schofield at The Guardian also offered ideas including NOT using the Google search engine (a huge user of algorithms to satisfy our curiosity and need-to-know inquiries as well as track our interests) since they “save your searches and send data to websites.” Forbes writer Kashmir Hill also offered ten worthwhile tips for maintaining one’s privacy online.

As savvy digital citizens, we should consider adopting these behaviors. We also must recognize how we contribute to the mountain of personal identifiers online.  For instance, we may share our contact information, birth dates, zip codes, movies that we watch and books that we read without thinking twice about it.  Tweeting, putting our resumes on LinkedIn, posting tagged pictures of family members and events on Facebook and Instagram, publishing blog posts like this one, broaden and deepen our discoverability, and potential value to others to appreciate or exploit.

I agree with Scott Goodson’s assessment in Forbes:

in this digital age we have sacrificed our privacy in order to access all manner of free stuff on the web. It’s a movement that most of us have come to accept. Or have we?

…‘If you’re not paying for it; you are the product’….the next time you’re browsing the web or enjoying a video on YouTube, remember that Google is watching your every move; because that’s the price you pay.

Let me point out that even if one abstains from using smartphones and other connections to the internet, and pays for all purchases in cash, digital records are still being maintained by others on us. These might include doctors’ visits,  hospitalizations, medical diagnoses, drivers’ licenses (lists of which are sold to data brokers and groups recruiting new members–has anyone heard of AARP?), birth certificates, home ownership, turnpike tolls paid, and on and on. These data bases may be public from day one or well protected and private, and only made available to researchers or purchasers after being “anonymized”.

Effectiveness of “De-identifying” Data

In spite of my caution above, I use Twitter to learn and share what I learn. Tweeting led me to Daniel Barth-Jones, a professor at the Columbia Mailman School of Public Health. He does research using medical data and is concerned that if “society concludes that de-identification cannot be achieved … then we have come to a conclusion that starts to impede our ability to do basic science.”

One of his tweets linked to a 65 minute video on YouTube featuring him and Felix Wu, a professor at the Benjamin Cardozo School of Law at Yeshiva University discussing How Anonymous Is Anonymity? Open Data Releases and Re-identification.  This discussion happened on July 15, 2015, just days ago, and includes his quote above.

The below image from the video shows Dr. Barth-Jones narrating a slide on the tradeoff between information quality and privacy protection.  It shows the impossibility of ensuring perfect protection and perfect


information.  He believes that if data files are too heavily “de-identified” or “anonymized,” then the “data utility is destroyed.”  He wishes for a “cogent risk analysis” of the small threat posed by open data to individuals’ privacy and not let the “fear of what ‘could happen’ to an extremely small fraction of people” lead to a “general doubt” and research-dampening prescriptions in law and policy.

Picture of Felix Wu from interview in Cardozo Law School newsletter--

Picture of Felix Wu from interview in Cardozo Law School newsletter–

Wu’s assessment was also pragmatic.  Wu said that

Once we take a threat modeling approach, which is drawn from data security, and brought into the privacy context, then you can begin to think about how one goes about addressing privacy threats.  One possibility is scrubbing the data or using technical solutions.  Another possibility is using legal solutions.  Sometimes legal solutions are the most efficient…Sometimes technical solutions are the most efficient.  In any given context, it is probably a combination of the two that’s likely to be the optimal [approach].

Barth-Jones indicated that “re-identification of individuals’ medical data is particularly harmful. There is a loophole in HIPAA (Health Insurance Portability and Accountability Act), that once data is de-identified, it falls outside HIPAA.  If the data becomes re-identified, there are no consequences” for the person or organization who did it. He supports the Personal Data Deidentification Act with changes that would allow him and other researchers to “continue their work under their Institutional Research Board’s approval with applicable oversight.”

In Wu’s Big Data Threats article, he details “three broad categories of big data threats: surveillance, disclosure, and discrimination.”  Surveillance is “the feeling of being watched, which can result from the collection, aggregation, and/or use of one’s information.” Disclosure is the release of “information outside the context in which it was collected.”  And discrimination is “people being treated differently on the basis of information collected about them.” Discrimination might lead to “personalized persuasion” which is one of the problems identified in the next section.

Consumer Data Brokers

What was the cost of the loss of privacy for the retired librarian in Wisconsin with early stage Alzheimer’s, a police officer, and the mother in Texas?  Pam Dixon, Executive Director of the World Privacy Forum, testified on the scope and depth of the consumer data brokers problem before the US Senate Committee on Commerce, Science, and Transportation on December 18, 2013. Dixon explained that

Data brokers collect, compile, buy and sell personally identifiable information about who we are, what we do, and much of our “digital exhaust.”

We are their business models. The police officer was “uncovered” by a data broker who revealed his family information online, jeopardizing his safety. The mother was a victim of domestic violence who was deeply concerned about people finder web sites that published and sold her home address online. The librarian lost her life savings and retirement because a data broker put her on an eager elderly buyer and frequent donor list. She was deluged with predatory offers.

Dixon also focused in depth on the consumer scoring problem in her testimony to show how algorithms using seemingly objective criteria may mix together data that produce credit ratings, and higher costs than may be justifiable for loans or health insurance for consumers. She recommended specific measures to “bring fairness, accuracy, and transparency to consumers regarding data broker activities” including a national data broker list maintained by the Federal Trade Commission to force the data brokers’ activities out of the shadows and into the sunshine.

My foray into the tyranny of the algorithm revealed much more than I could share with you now.  It was exhausting yet fun.  I hope this post opens your eyes to risks we face as well as our responsibility to understand how computer technologies are changing our world, and how we handle ourselves and our information online.

What might you start doing and stop doing as a result of this post? We have rights and responsibilities as digital citizens. I urge you to alert members of Congress and the FTC of your big data privacy concerns, especially the sometimes questionable role played by data brokers and the lack of consequences for interests who re-identify data, thereby putting our cost of living, personal safety, and quality of life at risk.


 Featured image courtesy of TCB at Pixabay



1 reply

Trackbacks & Pingbacks

Leave a Reply

Want to join the discussion?
Feel free to contribute!

Leave a Reply