Skip to main content

Interlay Overview

Published onOct 28, 2019
Interlay Overview

A few different conversations have independently arrived at the need for “Interlays” in our conceptual architecture. This is Joel’s quick sketch of what the three layers Underlay, Interlay, and Overlay mean to him.

  • The Underlay is the immutable, append-only collection of graph data. The data is expressed as canonicalized RDF datasets, addressed by content-hash, and retrieved over IPFS. The addressing scheme is well-known and dataset authors are encouraged to link directly into datasets when appropriate (i.e. developer tooling should make this happen automatically, magically).

    • IPFS is a retrieval network - nothing about discoverability, etc. So it solves the problem of getting the content of an dataset given its hash, but not the problem of publishing them or finding ones that you care about given your interests or whatever.

    • Anyone can run an IPFS node and pin whatever datasets they want to help host.

    • Like the Web, the Underlay has very loose boundaries. If it’s an RDF dataset and someone is pinning it to IPFS, then it’s “in” the Underlay.

    • The Underlay is primordial ooze. The vision is of this universal densely-linked ocean of graph data from all sources that needs to be molded into specific schemas before it’s usable, but the underlying data is free and decentralized, so that anyone can project it onto another shape.

  • An Interlay is a domain-specific server that keeps a materialized view into the Underlay or can materialize views on-demand. They might also generate datasets, communicate with other Interlays, and/or respond to queries.

    • “Domain-specific” means that these views typically follow a strict schema and are often tightly coupled with a specific namespace of predicates.

    • However, there can also be “general Interlays” that don’t have any schema for the data they store, but instead have other heuristics (like provenance) for inclusion.

    • Interlays will typically run an IPFS node pinning all of the datasets they “care” about - at least the minimal set required to reproduce their materialized view.

    • Interlays will often use traditional relational databases to store views. This means developing idioms and best practices for relating data in traditional databases to the Underlay datasets that derive it.

    • Interlays are the “state accumulators/reducers” where mutability and versions are implemented.

    • Interlays are where distribution and discovery happens. This means coming up with systems/protocols/networks for different use cases. Pubsub is a good place to start.

    • The “open-world” model of only positive statements isn’t very useful - lots of queries need to be made against some total, closed model (e.g. “Find me actors that haven’t worked with Michael Bay”). Interlays provide this closure.

    • Interlays can be either or both “input” and “output”, in fact it’s hard to tell the difference between them. This is because “input” is so loose (see the bullet in Underlay about boundaries). Data producers - that have some non-Underlay data source like the real world - will run Interlays that first generate a dataset for every new piece of data (even it’s a mutation or a transaction), and then share it with their peers, or integrate it into their local view, or whatever. From the perspective of the other Interlays, this is indistinguishable from just forwarding data that was “found” in the Underlay.

  • An Overlay is a user-facing application that renders data from one or more Interlays.

    • This is actually the least concrete term of the three. Based on our internal usage I think it’s fair to say that we expect Overlays to be relatively pure/functional “render” layers over the views in various Interlays.

    • Calling something an Overlay is more a qualitative description of purpose than technical specification of behavior.

Samuel J. Klein: Closure can be approximate; up to the interlay. Some will strongly privilege minimizing either false positives or false negatives.
Joel Gustafson: They’re up the each Interlay, but (practically, technically) are exact at any point in time. Each piece of data either will or won’t be considered.
Samuel J. Klein: ++
Samuel J. Klein: Claiming to be the original source is a special case. 1 step removed vs. 10 steps removed is fairly similar — if you want to check a data point you can traverse its provenance and ask the source (or if the source is offline, ask others who got information from the same source). But being the interface b/t the UL and other parts of the world means no one can check your data w/o going outside the network.
Joel Gustafson: Absolutely - but it needs to be made explicit in the data, not implicit in the mechanics of who is sharing data with who.
+ 1 more...