Formats, Standards and Integrations¶
Data Formats and Standards¶
Coding and naming conventions¶
For historical reasons, in the names of entities, parameters and URLs we are following these conventions:
- Entity names:
PascalCase - parameters:
camelCase - URI path elements:
snake_case
The only exception is: service-info which is a required GA4GH standard and has a different word separation convention.
Schema Language and Conventions¶
The Beacon API follows OpenAPI 3.0.2 specification for the endpoints, in conjuntion with JSON Schema (2020-12) to define the Framework and the Models components. The specification uses JSON references ($ref) to reference internal (e.g., definitions) or external concepts/terms (e.g., VRS).
The Beacon specification is written in YAML. The original files are located under src directory (see below). For technical purposes, we also provide a copy of the original YAML in JSON format (see json directory below). Changes in the specification must be performed in the YAML version and are then rewritten to the JSON version.
framework
|-- json
| |-- common
| | `-- examples
| |-- configuration
| | `-- examples
| |-- requests
| | |-- examples-fullDocuments
| | `-- examples-sections
| `-- responses
| |-- sections
| |-- examples-fullDocuments
| `-- examples-sections
`-- src
|-- common
| `-- examples
|-- configuration
| `-- examples
|-- requests
| |-- examples-fullDocuments
| `-- examples-sections
`-- responses
|-- sections
|-- examples-fullDocuments
`-- examples-sections
models
|-- json
| `-- beacon-v2-default-model
| |-- analyses
| | `-- examples
| |-- biosamples
| | `-- examples
| |-- cohorts
| | `-- examples
| |-- common
| |-- datasets
| | `-- examples
| |-- genomicVariations
| | `-- examples
| |-- individuals
| | `-- examples
| `-- runs
| `-- examples
`-- src
`-- beacon-v2-default-model
|-- analyses
| `-- examples
|-- biosamples
| `-- examples
|-- cohorts
| `-- examples
|-- common
|-- datasets
| `-- examples
|-- genomicVariations
| `-- examples
|-- individuals
| `-- examples
`-- runs
`-- examples
Genome Coordinates¶
GA4GH Genome Coordinate Use Recommendation1
- We recommends the use of "0-start, half-open" (interbase) coordinate system in all systems
- "1-start, fully-closed" should be used when displaying coordinates through a GUI or report
Dates and Times¶
Date and time formats are specified as ISO8601 compatible strings, both for time points as well as for durations.
Examples¶
- time stamp in milliseconds in YYYY-MM-DDTHH:MM:SS.SSS
- 2015-02-10T00:03:42.123Z
- schema specification in JSON Schema is
"type": "string", format": "date-time" - Timepoints with millisecond granularity are typical use cases for timing computer generated entries, e.g. the time of a record's update ("updateTime").
- schema specification in JSON Schema is
- 2015-02-10T00:03:42.123Z
- age in years and months in PnYnM
- P43Y08M
LINK: W3C ISO8601¶
LINK: ISO8601 documentation at GA4GH SchemaBlocks¶
Integration with External Standards¶
The development of the Beacon framework and default model closely follows and widely adopts concepts and schemas from approved GA4GH products such as Phenopackets and the Variant Representation Standard (VRS).
Variant Representation Standard (VRS)¶
The GA4GH Variant Representation Standard (VRS) constitutes the reference one should use when implementing representations of genomic variations. The current version 1.2 has been approved and covers a set of use cases and requirements, especially with respect to genomic (including cytogenetic or feature based) locations. However, it is not yet suitable for a number of practical use cases, especially the representation of some structural variations.
The Beacon default model for GenomicVariation makes use of the VRS standard to represent
the variation part, i.e. the location and sequence or copy number changes of the
genomic variation. While a "legacy" alternative is still allowed this one too has been adjusted
to make use of the VRS Location format.
Examples¶
The examples are for different forma of the location property inside a genomicVariation.
"variation": {
"type": "Allele",
"state": {
"sequence": "G",
"type": "LiteralSequenceExpression"
},
"location": {
"type": "SequenceLocation",
"sequence_id": "refseq:NC_000017.11",
"interval": {
"type": "SequenceInterval",
"start": {
"type": "Number",
"value": 7577120
},
"end": {
"type": "Number",
"value": 7577121
}
}
}
}
"variation": {
"type": "RelativeCopyNumber",
"relative_copy_class": "partial loss",
"location": {
"type": "SequenceLocation",
"sequence_id": "refseq:NC_000018.10",
"interval": {
"start": {
"type": "Number",
"value": 23029501
},
"end": {
"type": "Number",
"value": 62947165
}
}
}
}
"variation": {
"variantType": "SNP",
"referenceBases": "C",
"alternateBases": "G",
"location": {
"type": "SequenceLocation",
"sequence_id": "refseq:NC_000017.11",
"interval": {
"type": "SequenceInterval",
"start": {
"type": "Number",
"value": 7577120
},
"end": {
"type": "Number",
"value": 7577121
}
}
}
}
"variation": {
"variantType": "DEL",
"location": {
"type": "SequenceLocation",
"sequence_id": "refseq:NC_000018.10",
"interval": {
"start": {
"type": "Number",
"value": 23029501
},
"end": {
"type": "Number",
"value": 62947165
}
}
}
}
LINK: VRS Documentation¶
Phenopackets¶
In the Beacon default data model, many schemas are either directly compatible to Phenopackets v2 building blocks or at least reflect them but with some adjustments. While the Beacon v2 default model's schemas do not per se have to reflect PXF schemas, we target an as-close-as-possible alignment to promote/leverage GA4GH-wide standardization.
Top-level differences¶
The Phenopackets model is centered around the Phenopacket, which is the collector
and integrator of all sub-schemas (with the addition of the external Family and
Cohort schemas). While Phenopacket usually describes information related to a
subject - which is defined in an Individual - and the top level elements in
Phenopacket relate to a specific proband (measurements as "Measurements performed
in the proband"), the phenopacket itself does not explicitely represent an individual.
In contrast, the Beacon v2 default model uses a hierarchy in which biosamples
reference individuals directly (if existing). For most purposes one can equate Beacon's
Individual with a merge of Phenopacket's core Phenopacket and Individual parameters.
Building block comparisons: Beacon v2 == PXF v2¶
Age¶
AgeRange¶
Evidence¶
KaryotypicSex¶
ReferenceRange¶
While unit in Beacon points to a Unit definition, this is itself an OntologyTerm i.e. structurally the same.
Value¶
Beacon v2 =~ PXF v2 (e.g. renamed or additional parameters)¶
ComplexValue¶
Renamed ComplexValue.TypedQuantity.quantityType compared to GA4GH Phenopackets v2 ComplexValue.TypedQuantity.type due to problematic use of type as parameter
ExternalReference¶
Renamed ExternalReference.notes compared to GA4GH Phenopackets v2 ExternalReference.description due to problematic use of description as parameter
Measurement¶
Added notes and date.
PhenotypicFeature¶
| Beacon | Phenopackets |
|---|---|
featureType |
type |
severity (re-used definition reflecting an ontology term) |
severity (ontology class) |
notes |
Procedure¶
| Beacon | Phenopackets |
|---|---|
procedureCode |
code |
ageAtProcedure (TimeElement) |
performed (TimeElement) |
dateOfProcedure (ISO date) |
TimeElement¶
The specific parameters have been aligned w/ minimal differences in naming or use of general parameters.
| Beacon | Phenopackets |
|---|---|
ontologyTerm |
ontology_class |
age |
age (Age) |
ageRange |
age_range (AgeRange) |
gestationalAge |
gestational_age (GestationalAge) |
...Timestamp |
timestamp (TimeStamp) |
timeInterval |
interval (TimeInterval) |
Treatment¶
Beacon still has an ageOfOnset parameter (?). Also, PXF agent has been renamed to a more general treatmentCode.
Beacon v2 ~ PXF v2 (e.g. multiple/complex differences)¶
Disease¶
Pedigree¶
While the Beacon & Phenopackets schemas for "pedigree" representation are not aligned, they may become superseded by the GA4GH pedigree standard currenty under development.
Sex¶
Beacon directly uses the (IMO preferable) representation through an ontology term, while PXF uses an ordinal mapping
LINK: Phenopackets Documentation¶
-
Source: @andrewyatz at GenomeStandards ↩
