Skip to content

Beacon Aggregations for Data Summaries

WiP

Beacon aggregations are currently in preview and may change without deprecation. Use with caution and follow related PRs.

Overview and Use Cases

While the Beacon API provides different ways to discover and potentially retrieve data in biomedical genomics resources, with version 2.n responses were limited to global content (boolean or overall count of matched data and static collection information) or full record level access which for most resources would not be possible in a public context. Responses under the new aggregated granularity level allow to:

  • provide granular data overwiews about the content of resources and their collections, e.g. numbers of samples with individual features or combinations of features
  • profile query responses for multiple (single or intersected) parameters

Response Format

Aggregation are provided inside the responseAggregation property of the response and consist of array of objects with the following structure:

  • a required, ordered list of one or more concepts objects, describing the parameters for which the aggregation is provided
  • summaries for the single or interssected concepts
    • a count of the distinct values for single or intersected concepts and/or
    • a count for all records with existing values and/or
    • a distribution of all distinct values/combinations with the count of their occurrence
  • an optional scope parameter to indicate the entity the results refer to (usually the current entry type but might be variable for collection and overview aggregations)

The following examples display different aggregation objects (which would be items in the responseAggregation array). Note that id values are for demonstration only and do not have a normative function.

Aggregations and Queries: For most of the example cases one can envision both a use in "data overview context" (e.g. to profile the content of a resource or collection) and in "query context" (e.g. to profile the response for a specific query).

Example: Minimal Representation of Distinct Values Count distinctValuesCount

How many different diseases are represented in the data?

{
  "concepts": [
    {"id": "disease", "label": "Disease"}
  ],
  "distinctValuesCount": 89
}

Example: Informative Values anyValueCount

How many individuals in the data have a follow-up time?

{
  "scope": "individual",
  "concepts": [
    {"id": "followUpTime", "label": "Follow-up time"}
  ],
  "anyValueCount": 1200
}

Example: Value Distribution distribution, Single Concept

What is the distribution of diseases in the samples?

{
  "scope": "biosample",
  "concepts": [
    {
      "id": "sampleDiagnoses",
      "label": "Diagnoses of selected carcinoma types",
      "property": "biosample.histologicalDiagnosis.id"
    }
  ],
  "distribution": [
    {
      "conceptValues": [
        {"id": "NCIT:C2919", "label": "Prostate Adenocarcinoma"}
      ],
      "count": 426
    },
    {
      "conceptValues": [
        {"id": "NCIT:C4017", "label": "Breast Ductal Carcinoma"}
      ],
      "count": 423
    },
    {
      "conceptValues": [
        {"id": "NCIT:C3512", "label": "Lung Adenocarcinoma"}
      ],
      "count": 317
    }
  ]
}

Example: Value Distribution distribution, Intersecting Concepts

What is the distribution of diseases in the samples, separately by sex? Please note:

  • there are now 2 concepts in the concepts list and the conceptValues in the distribution are observed combinations of values for both concepts, in the same order
  • the count indicates the number of times this combination was observed (e.g. 426 cases of "male" & "Prostate Adenocarcinoma" but 0 cases for "female" & "Prostate Adenocarcinoma")
{
  "scope": "individual",
  "concepts": [
    {"id": "diseases", "label": "Selected carcinoma types"},
    {"id": "sexAtBirth", "label": "Sex at birth"}
  ],
  "distribution": [
    {
      "conceptValues": [
        {"id": "NCIT:C2919", "label": "Prostate Adenocarcinoma"},
        {"id": "NCIT:C20197", "label": "male"}
      ],
      "count": 426
    },
    {
      "conceptValues": [
        {"id": "NCIT:C2919", "label": "Prostate Adenocarcinoma"},
        {"id": "NCIT:C16576", "label": "female"}
      ],
      "count": 0
    },
    {
      "conceptValues": [
        {"id": "NCIT:C4017", "label": "Breast Ductal Carcinoma"},
        {"id": "NCIT:C16576", "label": "female"}
      ],
      "count": 423
    },
    {
      "conceptValues": [
        {"id": "NCIT:C4017", "label": "Breast Ductal Carcinoma"},
        {"id": "NCIT:C20197", "label": "male"}
      ],
      "count": 2
    },
    {
      "conceptValues": [
        {"id": "NCIT:C3512", "label": "Lung Adenocarcinoma"},
        {"id": "NCIT:C16576", "label": "female"}
      ],
      "count": 132
    },
    {
      "conceptValues": [
        {"id": "NCIT:C3512", "label": "Lung Adenocarcinoma"},
        {"id": "NCIT:C20197", "label": "male"}
      ],
      "count": 411
    }
  ]
}