Matadisco

An open, decentralized network for data discovery. Publish metadata about any dataset to ATProto. Build community portals. Find what matters.


Open data is only as useful as it is discoverable

Petabytes of satellite imagery, climate models, and genomic sequences sit in public repositories — yet finding the right data means navigating dozens of siloed portals, each with different interfaces, APIs, and blind spots.

If you generate a derived dataset or clean up an existing one, there's often no way to make it findable. Government portals decide what gets published. Aggregators are centralized. Community contributions get lost.


How Matadisco works

Matadisco separates data discovery from data storage. Three pieces work together:

The schema is intentionally minimal — Matadisco is about discovery, not about prescribing how metadata should be structured. Whatever standard your community already uses, Matadisco can point to it.

Records

Lightweight pointers to metadata. A record contains a link to the actual metadata, an optional preview, and a timestamp. That's it — this minimal schema works with any metadata standard: STAC, DataCite, IIIF, RSS, and more.

Network

Records are published to ATProto, where relay nodes aggregate and redistribute them across the ecosystem. Every record is cryptographically signed. No single entity controls the network.

Portals

Subscribe to a relay, filter for records relevant to your community, and display them. A satellite imagery portal, a scientific data hub, a cultural heritage archive — each built in about 100 lines of code.


The schema

The Matadisco record is defined as an ATProto Lexicon. In MLF syntax:

cx.vmx.matadisco
/// A Matadisco record
record matadisco {
    /// The time the metadata record was created
    created!: Datetime,
    /// A URI containing metadata
    metadata!: Uri,
    /// Preview of the data
    preview: {
        /// The media type the preview has
        mimeType!: string,
        /// The URL to the preview
        url: Uri,
    },
}

Only metadata and created are required. The preview is optional — for satellite imagery it's a thumbnail, for articles a summary, for podcasts an audio snippet.

Browse records · View published lexicon


See it in action

The matadisco-viewer streams new ATProto records in real time and renders them. Currently showing Copernicus Sentinel-2 satellite imagery:

Sentinel-2 satellite image preview from the Matadisco viewer
Sentinel-2 L2A scene · metadata · full resolution (253 MiB)

Building portals

A portal subscribes to an ATProto relay, filters for relevant records, and presents them. The prototype demonstrates this with two components:

Because records flow through an open network, institutions manage their catalogues independently while participating in shared discovery.


Prior art & influences


Get started

Matadisco is experimental — things may break or change. That also means there's room to shape it. Here's how to get involved:


What's next

  • Additional geodata sources, such as the German geodata catalogue
  • Image-based sources like GLAM collections using IIIF
  • Non-image sources — podcasts, research datasets, publications
  • Schema evolution informed by real-world use across different domains

Publish records under your own namespace, build a portal for your community, or propose changes to the schema. We'd love to hear from anyone working in open data, metadata standards, or scientific infrastructure.