Skip to content

Formats, Standards and Integrations

Data Formats and Standards

Coding and naming conventions

For historical reasons, in the names of entities, parameters and URLs we are following these conventions:

  • Entity names: PascalCase
  • parameters: camelCase
  • URI path elements: snake_case

The only exception is: service-info which is a required GA4GH standard and has a different word separation convention.

Schema Language and Conventions

The Beacon v2 API follows OpenAPI 3.0.2 specification for the endpoints, in conjuntion with JSON Schema (2020-12) to define the Framework and the Models components. The specification uses JSON references ($ref) to reference internal (e.g., definitions) or external concepts/terms (e.g., VRS).

The Beacon v2 specification is written in YAML. The original files are located under src directory (see below). For technical purposes, we also provide a copy of the original YAML in JSON format (see json directory below). Changes in the specification must be performed in the YAML version and are then rewritten to the JSON version.

framework
|-- json
|   |-- common
|   |   `-- examples
|   |-- configuration
|   |   `-- examples
|   |-- requests
|   |   |-- examples-fullDocuments
|   |   `-- examples-sections
|   `-- responses
|       |-- sections
|       |-- examples-fullDocuments
|       `-- examples-sections
`-- src
    |-- common
    |   `-- examples
    |-- configuration
    |   `-- examples
    |-- requests
    |   |-- examples-fullDocuments
    |   `-- examples-sections
    `-- responses
        |-- sections
        |-- examples-fullDocuments
        `-- examples-sections
models
|-- json
|   `-- beacon-v2-default-model
|       |-- analyses
|       |   `-- examples
|       |-- biosamples
|       |   `-- examples
|       |-- cohorts
|       |   `-- examples
|       |-- common
|       |-- datasets
|       |   `-- examples
|       |-- genomicVariations
|       |   `-- examples
|       |-- individuals
|       |   `-- examples
|       `-- runs
|           `-- examples
`-- src
    `-- beacon-v2-default-model
        |-- analyses
        |   `-- examples
        |-- biosamples
        |   `-- examples
        |-- cohorts
        |   `-- examples
        |-- common
        |-- datasets
        |   `-- examples
        |-- genomicVariations
        |   `-- examples
        |-- individuals
        |   `-- examples
        `-- runs
            `-- examples

Genome Coordinates

GA4GH Genome Coordinate Use Recommendation1

  • We recommends the use of "0-start, half-open" (interbase) coordinate system in all systems
  • "1-start, fully-closed" should be used when displaying coordinates through a GUI or report

Dates and Times

Date and time formats are specified as ISO8601 compatible strings, both for time points as well as for durations. Some of the ISO8601 compatible formats have not (yet) been used in the Beacon v2 default model.

Examples

  • time stamp in milliseconds in YYYY-MM-DDTHH:MM:SS.SSS
    • 2015-02-10T00:03:42.123Z
      • schema specification in JSON Schema is "type": "string", format": "date-time"
      • Timepoints with millisecond granularity are typical use cases for timing computer generated entries, e.g. the time of a record's update ("updateTime").
  • age in years and months in PnYnM
    • P43Y08M

Integration with External Standards

The development of the Beacon v2 framework and default model closely follows and widely adopts concepts and schemas from approved GA4GH products such as Phenopackets and the Variant Representation Standard (VRS).

Variant Representation Standard (VRS)

The GA4GH Variant Representation Standard (VRS) constitutes the reference one should use when implementing representations of genomic variations. The current version 1.2 has been approved and covers a set of use cases and requirements, especially with respect to genomic (including cytogenetic or feature based) locations. However, it is not yet suitable for a number of practical use cases, especially the representation of some structural variations.

The Beacon v2 default model for GenomicVariation makes use of the VRS standard to represent the variation part, i.e. the location and sequence or copy number changes of the genomic variation. While a "legacy" alternative is still allowed this one too has been adjusted to make use of the VRS Location format.

Examples

The examples are for different forma of the location property inside a genomicVariation.

"variation": {
    "type": "Allele",
    "state": {
        "sequence": "G",
        "type": "LiteralSequenceExpression"
    },
    "location": {
        "type": "SequenceLocation",
        "sequence_id": "refseq:NC_000017.11",
        "interval": {
            "type": "SequenceInterval",
            "start": {
                "type": "Number",
                "value": 7577120
            },
            "end": {
                "type": "Number",
                "value": 7577121
            }
        }
    }
}
"variation": {
    "type": "RelativeCopyNumber",
    "relative_copy_class": "partial loss",
    "location": {
        "type": "SequenceLocation",
        "sequence_id": "refseq:NC_000018.10",
        "interval": {
            "start": {
                "type": "Number",
                "value": 23029501
            },
            "end": {
                "type": "Number",
                "value": 62947165
            }
        }
    }
}
"variation": {
    "variantType": "SNP",
    "referenceBases": "C",
    "alternateBases": "G",
    "location": {
        "type": "SequenceLocation",
        "sequence_id": "refseq:NC_000017.11",
        "interval": {
            "type": "SequenceInterval",
            "start": {
                "type": "Number",
                "value": 7577120
            },
            "end": {
                "type": "Number",
                "value": 7577121
            }
        }
    }
}
"variation": {
    "variantType": "DEL",
    "location": {
        "type": "SequenceLocation",
        "sequence_id": "refseq:NC_000018.10",
        "interval": {
            "start": {
                "type": "Number",
                "value": 23029501
            },
            "end": {
                "type": "Number",
                "value": 62947165
            }
        }
    }
}

Phenopackets

In the Beacon v2 default data model, many schemas are either directly compatible to Phenopackets v2 building blocks or at least reflect them but with some adjustments. While the Beacon v2 default model's schemas do not per se have to reflect PXF schemas, we target an as-close-as-possible alignment to promote/leverage GA4GH-wide standardization.

Top-level differences

The Phenopackets model is centered around the Phenopacket, which is the collector and integrator of all sub-schemas (with the addition of the external Family and Cohort schemas). While Phenopacket usually describes information related to a subject - which is defined in an Individual - and the top level elements in Phenopacket relate to a specific proband (measurements as "Measurements performed in the proband"), the phenopacket itself does not explicitely represent an individual.

In contrast, the Beacon v2 default model uses a hierarchy in which biosamples reference individuals directly (if existing). For most purposes one can equate Beacon's Individual with a merge of Phenopacket's core Phenopacket and Individual parameters.

Building block comparisons: Beacon v2 == PXF v2

Age
AgeRange
Evidence
KaryotypicSex
ReferenceRange

While unit in Beacon points to a Unit definition, this is itself an OntologyTerm i.e. structurally the same.

Value

Beacon v2 =~ PXF v2 (e.g. renamed or additional parameters)

ComplexValue

Renamed ComplexValue.TypedQuantity.quantityType compared to GA4GH Phenopackets v2 ComplexValue.TypedQuantity.type due to problematic use of type as parameter

ExternalReference

Renamed ExternalReference.notes compared to GA4GH Phenopackets v2 ExternalReference.description due to problematic use of description as parameter

Measurement

Added notes and date.

PhenotypicFeature
Beacon Phenopackets
featureType type
severity (re-used definition reflecting an ontology term) severity (ontology class)
notes
Procedure
Beacon Phenopackets
procedureCode code
ageAtProcedure (TimeElement) performed (TimeElement)
dateOfProcedure (ISO date)
TimeElement

The specific parameters have been aligned w/ minimal differences in naming or use of general parameters.

Beacon Phenopackets
ontologyTerm ontology_class
age age (Age)
ageRange age_range (AgeRange)
gestationalAge gestational_age (GestationalAge)
...Timestamp timestamp (TimeStamp)
timeInterval interval (TimeInterval)
Treatment

Beacon still has an ageOfOnset parameter (?). Also, PXF agent has been renamed to a more general treatmentCode.

Beacon v2 ~ PXF v2 (e.g. multiple/complex differences)

Disease
Pedigree

While the Beacon & Phenopackets schemas for "pedigree" representation are not aligned, they may become superseded by the GA4GH pedigree standard currenty under development.

Sex

Beacon directly uses the (IMO preferable) representation through an ontology term, while PXF uses an ordinal mapping