Learning from Facebook’s Data Harvesting Mistakes

by Apr 12, 2018

At the moment, Mark Zuckerberg is testifying in front of Congress. Facebook is under the microscope because they allowed an app to take data that they collected and use it in ways that were not in alignment with their company policies. Zuckerberg openly admitted that they made mistakes. Let's look at the mistakes, Facebook's remedies and what anyone who is dealing with vasts amounts of data can do to not make the same mistake themselves. 

What happened:

  1. In 2007, Facebook wanted their apps to be more social so they collected information about people's locations, their addresses, their pictures, etc. They then built an API interface that would allow people to log into Facebook apps with their Facebook login and share information with the app about who their friends were as well as information about those friends.
  2. In 2013, Aleksandr Kogan, a Cambridge University researcher, created a Facebook personality quiz app which was installed by 300K people. Through the Facebook App API, this provided Kogan with information about those 300K people and their friends (and ultimately some information on 87 million users).
  3. In 2014, upon this discovery Facebook limited the information that was available to Facebook App developers.
  4. In 2015, Facebook learned that Kogan had shared his data from his app with Cambridge Analytica (which was in direct violation with Facebook policies). 

Facebook's response:

  • They limited information available to Facebook App Developers
  • They remove developer's access to your data if you don't use it within 3 months
  • Developers now have stricter requirements on how they may use the data
  • They have built better controls to allow Facebook users to understand which apps you have allowed access to your data

Companies everywhere are watching this testimony and waiting to see if new laws and regulations come into play in this information age. Zuckerberg has said that Facebook will be compliant with GDPR in geo-regions in the EU but will not be taking the steps towards compliance in other parts of the world.

Even though this data was "legally" acquired by these Facebook App Developers, Facebook has now found themselves in a "data harvest" situation that is equivalent to a "data breach" situation. 

What else could companies like Facebook do:

  • Define Business Processes (using a tool like ER/Studio Business Architect) that will allow you to clearly see who in your company has access to your data and where the data that you have collected is leaving your system
    • If Facebook had done this, then they would clearly know exactly what information was being used by these Facebook Apps and limited that information before it left their system.
  • Trace Data Lineage (using a tool like ER/Studio Data Architect) that will allow you to know where the data has traveled through your system
    • If Facebook had done this, they would have been able to quickly see the places where information was being accessed and shut down that access where necessary.
  • Know which APIs are accessing your data (using a tool like SQL Compliance Manager)
    • If Facebook had done this, they would have been able to see that the Facebook App that was calling the API and selecting data out of the Facebook system was gathering information with high volume.
  • Identify vulnerabilities in advance and put strong security policies in place (using a tool like SQL Secure)
    • If Facebook had done this, then they could have locked down the information that was being accessed by the Facebook App Developers

All of the steps that you would normally take to secure your data for a data breach should be used when you are allowing data that you have collected to leave your system. Once it has left your system, you have little control how that data is used. While you may not have been involved in the malicious activity, you certainly don't want to be the catalyst that allowed your data to be used against your own internal policies. 

Whenever data stories hit the news, it's a good time for all of us to look back and reflect on the ways that we can track and secure our information better. It's a good time to review Data Governance policies and Data Breach protocols. We should all strive to be good data stewards of our customers' and users' information.

Facebook is also under scrutiny over Russian interference in the recent US election, but that's a different topic altogether.