Patterns for Crystallizing Knowledge

Samuel Klein

🔼☴ The Underlay Project

_v.0.24

^An^underlay^{: a broad}^layer^of^knowledge^{that can be extended or completed, and combined + referenced by higher-level descriptions or overlays, with deep provenance extending through trees of observation and synthesis.}^Exs^:^{a 4D map of Earth, properties of elementary particles, a detailed census}^.

^The^{underlay project}^:^tools^{, a}^platform^{, and}^templates^{for building and connecting underlays, including annotation, attribution, and contextual reputation layers, and a first}^{implementation}

^{The Underlay}^{: the stellation of all underlays.}

Our knowledge of the world develops as we observe it more closely over time, identify patterns in it, and separate one pattern from another. Describing things concisely and precisely lets us map and align different observations along some shared dimension, so that we can learn from the comparison or interaction. Identifying different dimensions lets us map out the space of combinations. The Underlay Project is building tools to construct and refine knowledge in this fashion, and a global Underlay network connecting all known instances.

Concepts

^A^piece^{of knowledge is an observation, assertion, or a synthesis / interpolation of other pieces.}

^A^layer^{of knowledge is a collection united by shared properties (type, detail, origin, other scope)}

^An^overlay^{is a lens: a piece or layer that summarizes, synthesizes, and otherwise organizes knowledge from other layers for an audience (view, map, essay, journal)}

^An^underlay^{is a granular layer, that can support a large number of overlays. (taxonomy, language, reference, database, collection, normalized dataset)}

^An^interlay^{is a connective layer mapping two or more layers onto one another.}

^A^mesh^{, applied to a layer, is the space of all possible combinations of layer properties. This often involves a measure of distance along each dimension and a characteristic}^{grain size}^{or smallest unit of distance between two points. A layer with knowledge about every point in a mesh is}^completed^{with respect to that mesh.}

^{These terms depend on context: an overlay in one context can be an underlay in another. Synthetic overlays can stack, each adding}^synthesis^and^{interpretation}^{to the next. An underlay is itself an overlay of its components, grouped for clarity into a layer.}

Underlays can be easy to describe but hard to sift through; overlays make them relevant to different audiences. An underlay helps increase the precision or reliability of a statement, by capturing the complex details underlying simple statements. A common underlay for a single observation is the knowledge used to create it, such as its axioms + method + a bibliography + standard references assumed shared by its audience.

We can define methods to complete or close a set of knowledge, leading to general-purpose underlays that are closed under that method, or completed to the granularity of a given mesh. The underlay of a statement in which every claim is traced to its first originators, might be closed under [provenance]. A set of images of Earth from L1 might be closed under [hourly repetition for a year].

Some ways to think about the effectiveness of an underlay of pieces:

the compression it allows for descriptions. [identifying widely-used elements]
its precision in describing dependencies [references, assumptions, methods]
the cleanliness of clustered components [standardizing + referencing components and arguments]
the clarity of completion of the dataspace defined by the layer [existing data sources that map onto its mesh, clear gaps for interpolation + future observation]
the ease of reorganizing, updating, and recompiling those pieces when elements change.

Usage

Casual use: an “underlay for X” is the significant data and analysis related to X, both available and used. This may need further articulation. For instance, articulating subparts and dimensions of composite knowledge, sharpening the granularity of knowledge along known dimensions, deepening source-provenance, all create new or expanded underlays for that knowledge.

Just as reports and summaries can include many levels of detail and specificity, an underlay can be more or less granular, far-ranging or narrowly focused. No underlay is complete - you can always extend back through history or forward in time, down to finer details, or retain more discarded data. However, we can talk about the completion of an underlay w.r.t. a given mesh.

An underlay project in a research area should catalog, cluster, + layer knowledge supporting research in that area, at the level of granularity and abstraction already being measured. Over time this can be enriched by adding and interpolating both more and less granular elements.

: An underlay for an experiment might include background material, sensor + other data, notes + papers. A good one would include all related material + data for any other work on that project or equipment.

: An underlay for an event might include all observations, estimates, discussions + writeups. A good one would include all observations at that time [that could have seen or been influenced by the event].

: An underlay for a paper might include all data + sources cited, and their data, cites, and critical reception. A good one would do this recursively (to closure?), clustering alignable data, works in a series, replications, &c.

: An underlay for Earth’s structure might include all maps, images, soundings, and other data. A good one would do this over time, identify gaps + interpolation attempts, and distinguish independent data from copies.

Users of the Underlay can contribute by adding to it: creating and annotating assertions, tracing or confirming the context + provenance of assertions, tweaking the parameters of overlays or creating new ones. At its heart, the Underlay is a public knowledge graph which users can repurpose for any use, from building applications and services to training machine learning algorithms to designing urban landscapes.

Example: A Research Underlay

Reproducibility initiatives can describe their efforts in part as: expanding the scope of the known underlay of a project, until all questions about how to replicate are answered, and the # and outcome of replications + comparative trials are tracked.

Preprint servers offer an underlay of [all] work shared by authors as ready for publication

PubPub & similar publishing spaces gather versioned writing and research, making it available for multiple editorial and review overlays.

WikiCite is the underlay of citation [m]data needed to form overlays such as cite affect and IF.

Refigure connects the underlay of images visualizing any implementation of an experiment.

LD4P is a toolchain for creating and linking metadata about library resources, starting w/ 6 institutions.

^{Overlays can be read-only (s}^{tatic journals, the patent database}^{), read-write (}^{dynamic notebooks}^,^websites^,^pubpub^{), or write-only (}^{submission point}^{for the latest v. of a dataset).}

Other examples

Science:
{the elements}, later {particles, elements, compounds}, provides a surprisingly compact underlay for matter. Elements turned out to be enumerable and transmutable, based on an even smaller set of particles & forces, with individual & bulk properties.

Astronomy & cosmology name very low-level layers of raw sensor data, as astronomers regularly revisit & bootstrap past assumptions about how higher layers should interpret them.

Spacetime is the ultimate shared metric space.

Language:
Wordnik is an underlay for {dictionaries + word work}. It offers “what strings are used where when in which corpora”, an iterative collection, as an underlay for publishers of words and definitions.

Ancient linguistics (stone carvings) & palimpsest analysis name low-level raw scans, layered analysis, and conflicting interpretations, with feedback from guesses as to meaning.

TVtropes was expressly designed as an underlay for critical analysis of media. It names, clusters, & cross-references tropes + styles, expanding scope & granularity with time.

Crowdvoice (Majal) provides overlays of news in historical context, expressly building an underlay as reporters research + report. Enables visual geo- and time- templates for context.

General data:
QDR, an annotated repository of qualitative data. Ex: article + data from the British Journal of Political Science.

Dashboards are data-summarizing overlays.

Discussion

The metaphor of vertical orientation seems to collide with a lot of epistemological / philosophy of science consensuses. One man's overlay is another man's underlay, and a central questions is about the foundationality or supervenience of specific underlays and overlays. Does the stuff any given overlay leaves out (deliberately, tacitely, or subconsciously) also belong to its underlay?

: Yes. An underlay is a [type and format of knowledge] which can be generalized, to include both true and false statements; observed, ignored, and inferred.

For p-hacking and other deliberate omission: can others add to an underlay w/o wiki-style edit wars? Could that be handled by forking an underlay? Or by having overlays that compare competing underlays and their flaws?

: We need an example of this in practice. Including the author’s omission, reviewer’s flagging, statistician’s estimate of what was left out, former colleague’s posting of actual left-out data, curator’s creation of an ‘all data’ cluster that includes both author’s and additional data, and query-overlay’s decision to use ‘all data’ rather than ‘data’ as the default source of data for that experiment. This touches on limits/open problems of decentralization and reputation.

Is underlay theory compatible with death of the author assumptions? Can data points "exist on their own" w/o a minimal overlay of context (intentions, background, hypotheses, funders...), or does context-free data provoke a lot of uncertainty?

: Data currently exist on their own; a rich UL makes it easier to find and trace context to one’s desired level of detail.

How is the quality of underlay data related to the quality of its overlays? Can competing underlays come from an overlay?

: They bootstrap one another. A layer can be expanded through introspection and self-similarity, or through additional analysis or measurement. Underlays can be extracted from overlays; a hasty or lossy overlay may hide the existence of elements that contributed to it. Likewise different extraction mechanisms could produce competing underlays, to a point. (There may be a sense in which a ‘complete’ or lossless extraction contains all possible underlays; needs review.)

Can underlays be "forked" and "merged"? If so, by whom?

: Forking: anyone can change their client software to only recognize new data from certain nodes, or to exclude certain data. Anyone can make their subnet private, optionally hearing from (but not posting to) the public.

: Merging: needs examples. Functionally I see merges happening at the overlay/search/interface/authority-file layer: an overlay that clusters two different sets of sources together.

Are there other terms to describe the overall project than a cohesive Underlay?

: At some scale it’s more like a multidimensional mesh; starting from one point and level of granularity, one can imagine levels of granularity/abstraction (more or less fine), or circles of sourcing and provenance (outward). The overall project as a networked collection of [completions] could be seen as an underverse... ground layer? something echoing the 'raisin bread' analogy for expansion of the universe? The name should capture nesting/reliance of some layers on others.

Is there a lowest-level layer?

: If you accept absolute reality, that is a foundational underlay. Within that, spacetime is a particularly simple + general one: an extractable component of ~everything else, with useful implications for clustering, description, comparison.

:: For the sake of the tradition of Western thought, I would keep ontological and epistemological arguments mostly separate. Otherwise you'd introduce a lot of uncertainty on the level of foundational underlays: is Schrödinger's cat dead or alive on the base underlay? You could render things like spacetime, object identity and permanence etc. as foundational to human perception of the world.

::: Different theoretical interpretations of the universe certainly map to different ways of describing fundamental observations; and different clustering into layers. Physics is an example of how discovering 'cleaner' or 'simpler' measures of complex systems can lead to more compact layers (for some measure of precision of prediction, fewer data points are needed to reach that precision)

:::: "foundational to human perception" could be applied to low-level sensory perception (vision, audition). Incoming sense data (e.g. photons incident on the retina) form an underlay; interpretations of those into experience of the world form various overlays. Patterns of activation on the retina are foundational to the brain's synthesis of physical objects, for instance. This can be mentioned separately from claims about ontological status of the raw data. The question of provenance here is interesting. Some steps by which visual cortex organizes that data into interpretations have not yet been clearly articulated in computational neuroscience. And even the more clear articulations are lenses/models themselves.

Tentative concepts

We might define a topical underlay as a compact comprehensive underlay for a body of research (a sort of closure under 'things used by similar research').
A lowercase 'underlay network' : a place to keep many topical underlays - a federation, without expecting shared standards, uniform granularity, or comprehensiveness.
Discarded data (layers): Large sensornets [geo, astro, particle detectors] may analyze streams on the fly rather than storing them. We need a name for the raw data left behind.
Types of alignment: Data gathered by a named standard technique from a model {organism, substance, sample} can be aligned; similar schemas can be clustered together. Many different methods deserve a range of names; alignment and compression of layers is one of the fundamental advantages of organizing knowledge this way.
Types of comprehensions or completions of a category or layer: This can include asymptotic coverage of infinite sets, and complete coverage of finite sets. For infinite categories, we can approach the ideal in a way that balances depth/precision and breadth/range of included observations, for instance by steadily increasing the fineness of the mesh. For example:

^"^{Images of Earth}^{: Monthly surface images at 2m resolution in 3 spectra, below 85°; dailies at 10m resolution in 1 spectrum. <Summary of future forecast + halving/doubling times>}
^{Overlay: visual of similar datasets, including those for the poles, sea floors, and underground"}
.