A contract for your data
A Personal Data Store (‘pod’) is a web server that you can run to store data about yourself. The idea is that web services would store data about you not in their servers but in your pod. The concept has become popular with online privacy advocates because you, as the owner of a pod, have complete control over whether online services have access to your data, and you can view, modify, or delete the data that services have stored in your pod. There are several initiatives to productize pod solutions. Each product has its own set of capabilities, but none of them focus on social media.
I have started the Bring Your Own Data & Algorithms (BYODA) project to target social media. The goal is to stop the surveillance economy and foster competition between social media services by reducing the benefits of the network effect.
The unique feature of the BYODA open-source software is that social media services do not get to store just any data in your pod, but each service must describe all data it stores about you in a data contract. When you sign up for a social media service that supports the BYODA software, you’ll have to accept the data contract offered by the service. The contract is an appendix to the privacy policy of the service.
The data contract is a technical specification and must include a detailed semantic description of each data element. This description gives you insight into the data the service stores in your pod. As you start using the service, you can decide what data to keep and what to delete from your pod. The data contract must use standardized data constructs, such as home addresses or what schools you have attended, wherever possible, to make the data easy to interpret and exchangeable with other services.
In the BYODA project, it is not just the social media service that can access the data stored in your pod. The contract can specify that members of the services that are in your social network as friends, family, or colleagues are permitted to query specific data directly from your pod. For example, suppose you and a friend have established a ‘friend’ relationship in a social network. The pod of your friend will store a unique identifier for you. The pod of your friend can then query your pod to find out what, for example, your phone number is. The pod of your friend does not have to store any data about you in their pod, other than this unique identifier. Their pod can request this information from your pod as long as your ‘friend’ relationship remains. This capability also opens up Peer-to-Peer (P2P) scenarios for traversing your social network graph. For example, you could request from your friend’s friends whether they have ‘liked’ a movie.
The data contract uses a standard called ‘JSON Schema’ that the BYODA software can parse. The BYODA software interprets the contract and translates it into a GraphQL API. With this API, services and other pods can use GraphQL technology (developed by Facebook, ironically) to query the data in the pod. The BYODA software adds additional capabilities to JSON Schema to specify detailed access rights to each data element in the pod. The BYODA pod uses these access rights to determine whether a request for data from an online service or the pod of your friend should be granted or rejected. With this conversion from data contract to GraphQL API, the BYODA pod can support many social media services and their data models without requiring software development for the pod.
A social media service can request in the data contract to have read-only access to some of the data that other services have stored data in your pod. This section of the data contract is always optional; you can accept the data contract for a service without granting it access to your data for other services. This capability will make it easier for new social media services to develop a user base and thus better compete with existing social media services. Here are two examples of how a service could use this access:
- A new service can discover people in your existing social network that also have an account in the social network of the new service. The new service can suggest establishing those same links in its social network so you can develop your social network in the new service faster.
- The new service can use your other social network data to recommend content to you, enabling it to enrich your content feed immediately without having to learn your preferences from scratch.
The BYODA network requires any organization that wants to deliver services leveraging the BYODA pod to include several provisions in its privacy policy:
- The service will not store any Personally Identifiable Information (PII) about you, other than the unique identifier, in servers in its data centers. The data contract may specify some PII items that the service will cache in its data center. Caching will allow features like search to work fast. However, the service must purge any data cached in the data center within 24 hours of storing the data in the cache. The data contract must specify the data the service will cache in its data centers.
- The service must specify that it will not share your data with 3rd parties, except for legal requirements or subcontracting specific administrative processes.
- When a service wants to store additional data in your pod, it must publish a new version of its data contract. It must provide you with a period of three months to accept the new service contract. Within these six months, existing features must continue to work as per the now obsolete version of the data contract.
In summary, the concept of the data contract is a valuable addition to the idea of the pod as it gives meaning to the data stored in the pod, enables you to control the access to that data, and enables P2P scenarios. The data contract conversion to the GraphQL API provides a technology stack to access data in the pod that is well known in the web development community and allows the pod to support many different services and their data models without requiring changes to the pod software. Finally, the improved privacy of the data in your pod and the reuse of existing data by new services should increase competition, forcing existing social media networks to focus more on the needs of the people that use their service.