Introducing a P2P Graph database

A P2P Graph Database

In this post, Im introducing a P2P Graph database I developed called BYODA. In this P2P architecture, individual pods store data about nodes and edges to other pods. For example, there are two siblings named Suzie and Mike, both have their own pods, and these two pods store that their owners are family of each other. If you want to know where the family of Suzie lives, you query her pod to get the home addresses from all pods to which she has a relation of type family. This example of storing personal information is intentional, but Ill explain why later in this post.

This example raises some questions:

How do the pods know that they should store data about family relations and addresses?
How do the pods authenticate clients and authorize requests to query the information?
How do you connect to the pods of Suzie and Mike?
How do you send a request for data to a pod?
How are relations to other pods stored?

Ill answer these questions briefly below. Each topic deserves a more in-depth discussion, but Ill leave that for another post.

The service schema

In a distributed database, the nodes in the network must use a common schema that describes in detail what data to store. In BYODA, a service defines its schema and pods that want to become a member of that service:

download the schema from the service
create a namespace implementing the schema
host data as defined by the schema for the service in the namespace
make APIs available to access the data

Each pod can support multiple services and their schemas and has one namespace per service it is a member of.

The service schema is versioned. If changes to the data model are required, the service owner creates an updated schema with its version number incremented. Until there is support for ETL, the only modification permitted to a schema is the addition of data elements to ensure schemas remain backward compatible.

Authentication

To authenticate requests, BYODA uses a private CA architecture: each service has a CA that signs PKI certificates for its members. The member certificate specifies the membership ID for the pod. When a pod connects to another pod, it presents its client certificate in the TLS handshake. The receiving pod now knows the requesting pod is a service member and uses that information to evaluate the access rights. For checking whether the requesting pod is in the network of the receiving pod, the receiving pod looks up the membership ID of the requesting pod in a table called network links to see if there is a relation and, if so, what type of relationships it has.

The owner of a pod can use a web app to communicate with their pod. The pod supports an authentication API that uses username/password and, soon, an optional one-time code to authenticate a request for a Javascript Web Token (JWT). They can then use this JWT to authenticate other requests to the pod. Only the owner of the pod can use this JWT to communicate with the pod. If the owner wants to retrieve data from other pods, they send the request to their pod, which will proxy the request to other pods.

Request authorization

Request authentication and authorization are critical in an architecture where different parties manage the infrastructure. To implement these functionalities, the schema does not just describe the data elements but also the access rights for that data. Because of this capability, we call the schema a service contract; it specifies what data is stored and who can access it. Each pod must comply with this contract.

You can define access rights in a schema for different entities:

Your membership of the pod: this allows you to use a web browser or an app to access the data for the service in your pod
The service, operated by the service owner
Other service members that are in your (extended) network, optionally with a specific type of relation (i.e., family or colleague)
Other members of the service, who may or may not be in your network
Anonymous users

There are the usual access permissions, such as read, update, append, and delete, but there are a couple of unusual ones:

First, there is persist: A pod (or a service) may only store a data element retrieved from another pod if that data element has the persist access right. In the P2P Graph architecture, each pod owns its data, and other pods do not replicate this data. For example, a service contract specifies the persist access right for the membership ID data element. A pod has to persist this ID if it adds an edge to another pod and is allowed to do so with this persist permission.

Second, there is the search access permission; a service can collect data with this permission from pods and make this data searchable in an API. While the P2P paradigm is powerful, there are certainly use cases where centralized APIs provide additional capabilities or lower latency and do not impair the design goals of BYODA. An example of this is a service with an API to discover other pods based on the email address of the people running pods. The service schema would specify the search permission for the email address and the persist permission for the membership ID.

Finally, the read permission supports a property cache; pods and services may cache data fetched from a remote pod for the duration specified by the cache property. The pod or service must automatically and without manual intervention delete this data from its cache once its lifetime has expired.

Connectivity

The BYODA architecture maintains DNS records for all pods and all their memberships; if you know the membership ID of the remote pod, your pod can connect to it. The pod has two TCP ports accessible:

Port 443 with for connections from web apps. This port uses JWTs for authentication.
Port 444 for connections between pods using mutual-TLS authentication using PKI signed by the private CA hierarchy.

In the BYODA architecture, you connect your web app to your pod, and your pod connects to other pods on your behalf to proxy your requests. There are two options to communicate with your browser to your pod:

You can configure your pod with a custom DNS record in your DNS domain. In this case, the pod will request a Let*s Encrypt certificate on your behalf, and your browser or app can connect to port 443 without issues.
If you don’t have a DNS domain available, you can’t connect your browser directly to your pod, as your browser will not recognize the private CA used to sign the server certificate of the pod. With this option, you must use a BYODA proxy to connect to your pod.

While the proxy does not log any data, it does decrypt and re-encrypt the data transmitted between your browser and your pod, so using a custom DNS record is the more secure option.

Sending requests

Once your web app or another pod connects to your pod, requests for data use the Graph Query Language (GraphQL). Facebook developed this technology and has open-sourced it. It provides a simple and well-known capability of querying and mutating complex data structures. The pod auto-generates GraphQL APIs based on the data elements in the service schema.

A pod can proxy a request to other pods with service members. There are two use cases: You can provide the pod with the membership ID to which the request should be proxied, or you can perform a recursive query, where you can request your pod to proxy the recursive query to all pods it has an edge to or specify a filter so that the pod will only proxy to other pods with which it has an edge that matches the filter. Queries support a depth parameter, specifying how often pods should proxy the query. The service contract can set the maximum depth for queries to prevent them from getting proxied to too many pods and consuming too many resources on the network of pods.

Data verification

In the distributed P2P architecture, you don’t know whether any data your pod requests from another pod is valid. The pod storing the data is the authority over it; if it wants to respond with incorrect data, that is its prerogative. However, the data may make specific assertions. For example, it may claim that someone (person A) follows the pod owner (person B). When you receive that information about person A from pod B by asking for his list of followers, you need to know if A indeed follows B or if person B is making a false claim. To resolve this issue, BYODA supports the digital signing of data. The service contract can include a list of follower objects and allow other service members to append to this list. When A starts to follow B, they can append an object to the follower list in pod B. That object includes a digital signature by person A covering the membership ID of A and that the relation from A to B is of type follower. Each pod has a data certificate signed by the CA hierarchy, and you can download that certificate from pod A. So when you query pod B for their list of followers, you get a data object saying that the membership ID of person A has a relation follower to pod B. You can then download the data certificate from pod A to validate the signature, and you then know that the person owning pod A has at some time created a follower edge to the person owning pod B.

Another example could be a third-party organization that provides a content review service. Person B produces content and submits the content to the third party for review, which returns a digital signature to confirm the content is suitable for a specific audience. When pod A asks pod B for content, pod B can return the content plus the digital signature. If person A has previously accepted that it trusts the content reviews by the third party, then person A can consume the content.

A final example is about identity. Suppose a third party provides identity confirmation services, and person B has proven his identity to the third party and has received a digital signature in return. When pod A asks pod B for the identity of the owner of pod B, pod B can return that data with the digital signature, and person A now knows that the third party has confirmed the identity of person B.

Why BYODA?

As you may have guessed from the examples, I developed BYODA as a technology platform for creating social media services. I believe that a fully distributed social media platform can start to address some of the problems with the current social media landscape. I*ll describe my ideas around this in a future post. BYODA delivers a storage platform, a security architecture, and logic for predefined data elements that services can include in their service contract. Services can develop web apps that leverage these capabilities to implement their features. Because BYODA does not prescribe a specific data model, you can implement different types of social media services with it. BYODA is thus not a Twitter or Youtube clone but can provide the storage infrastructure for both and many other services.

If you want to know more about the BYODA implementation, you can visit its Github page.