Formats, Standards and Integrations¶
Data Formats and Standards¶
Coding and naming conventions¶
For historical reasons, in the names of entities, parameters and URLs we are following these conventions:
- Entity names:
PascalCase
- parameters:
camelCase
- URI path elements:
snake_case
The only exception is: service-info
which is a required GA4GH standard and has a different word separation convention.
Schema Language and Conventions¶
The Beacon v2 API follows OpenAPI 3.0.2 specification for the endpoints, in conjuntion with JSON Schema (2020-12) to define the Framework and the Models components. The specification uses JSON references ($ref
) to reference internal (e.g., definitions) or external concepts/terms (e.g., VRS).
The Beacon v2 specification is written in YAML. The original files are located under src
directory (see below). For technical purposes, we also provide a copy of the original YAML in JSON format (see json
directory below). Changes in the specification must be performed in the YAML version and are then rewritten to the JSON version.
framework
|-- json
| |-- common
| | `-- examples
| |-- configuration
| | `-- examples
| |-- requests
| | |-- examples-fullDocuments
| | `-- examples-sections
| `-- responses
| |-- sections
| |-- examples-fullDocuments
| `-- examples-sections
`-- src
|-- common
| `-- examples
|-- configuration
| `-- examples
|-- requests
| |-- examples-fullDocuments
| `-- examples-sections
`-- responses
|-- sections
|-- examples-fullDocuments
`-- examples-sections
models
|-- json
| `-- beacon-v2-default-model
| |-- analyses
| | `-- examples
| |-- biosamples
| | `-- examples
| |-- cohorts
| | `-- examples
| |-- common
| |-- datasets
| | `-- examples
| |-- genomicVariations
| | `-- examples
| |-- individuals
| | `-- examples
| `-- runs
| `-- examples
`-- src
`-- beacon-v2-default-model
|-- analyses
| `-- examples
|-- biosamples
| `-- examples
|-- cohorts
| `-- examples
|-- common
|-- datasets
| `-- examples
|-- genomicVariations
| `-- examples
|-- individuals
| `-- examples
`-- runs
`-- examples
Genome Coordinates¶
GA4GH Genome Coordinate Use Recommendation1
- We recommends the use of "0-start, half-open" (interbase) coordinate system in all systems
- "1-start, fully-closed" should be used when displaying coordinates through a GUI or report
Dates and Times¶
Date and time formats are specified as ISO8601 compatible strings, both for time points as well as for durations. Some of the ISO8601 compatible formats have not (yet) been used in the Beacon v2 default model.
Examples¶
- time stamp in milliseconds in YYYY-MM-DDTHH:MM:SS.SSS
- 2015-02-10T00:03:42.123Z
- schema specification in JSON Schema is
"type": "string", format": "date-time"
- Timepoints with millisecond granularity are typical use cases for timing computer generated entries, e.g. the time of a record's update ("updateTime").
- schema specification in JSON Schema is
- 2015-02-10T00:03:42.123Z
- age in years and months in PnYnM
- P43Y08M
LINK: W3C ISO8601¶
LINK: ISO8601 documentation at GA4GH SchemaBlocks¶
Integration with External Standards¶
The development of the Beacon v2 framework and default model closely follows and widely adopts concepts and schemas from approved GA4GH products such as Phenopackets and the Variant Representation Standard (VRS).
Variant Representation Standard (VRS)¶
The GA4GH Variant Representation Standard (VRS) constitutes the reference one should use when implementing representations of genomic variations. The current version 1.2 has been approved and covers a set of use cases and requirements, especially with respect to genomic (including cytogenetic or feature based) locations. However, it is not yet suitable for a number of practical use cases, especially the representation of some structural variations.
The Beacon v2 default model for GenomicVariation
makes use of the VRS standard to represent
the variation
part, i.e. the location and sequence or copy number changes of the
genomic variation. While a "legacy" alternative is still allowed this one too has been adjusted
to make use of the VRS Location
format.
Examples¶
The examples are for different forma of the location
property inside a genomicVariation
.
"variation": {
"type": "Allele",
"state": {
"sequence": "G",
"type": "LiteralSequenceExpression"
},
"location": {
"type": "SequenceLocation",
"sequence_id": "refseq:NC_000017.11",
"interval": {
"type": "SequenceInterval",
"start": {
"type": "Number",
"value": 7577120
},
"end": {
"type": "Number",
"value": 7577121
}
}
}
}
"variation": {
"type": "RelativeCopyNumber",
"relative_copy_class": "partial loss",
"location": {
"type": "SequenceLocation",
"sequence_id": "refseq:NC_000018.10",
"interval": {
"start": {
"type": "Number",
"value": 23029501
},
"end": {
"type": "Number",
"value": 62947165
}
}
}
}
"variation": {
"variantType": "SNP",
"referenceBases": "C",
"alternateBases": "G",
"location": {
"type": "SequenceLocation",
"sequence_id": "refseq:NC_000017.11",
"interval": {
"type": "SequenceInterval",
"start": {
"type": "Number",
"value": 7577120
},
"end": {
"type": "Number",
"value": 7577121
}
}
}
}
"variation": {
"variantType": "DEL",
"location": {
"type": "SequenceLocation",
"sequence_id": "refseq:NC_000018.10",
"interval": {
"start": {
"type": "Number",
"value": 23029501
},
"end": {
"type": "Number",
"value": 62947165
}
}
}
}
LINK: VRS Documentation¶
Phenopackets¶
In the Beacon v2 default data model, many schemas are either directly compatible to Phenopackets v2 building blocks or at least reflect them but with some adjustments. While the Beacon v2 default model's schemas do not per se have to reflect PXF schemas, we target an as-close-as-possible alignment to promote/leverage GA4GH-wide standardization.
Top-level differences¶
The Phenopackets model is centered around the Phenopacket
, which is the collector
and integrator of all sub-schemas (with the addition of the external Family
and
Cohort
schemas). While Phenopacket
usually describes information related to a
subject
- which is defined in an Individual
- and the top level elements in
Phenopacket
relate to a specific proband
(measurements
as "Measurements performed
in the proband"), the phenopacket itself does not explicitely represent an individual.
In contrast, the Beacon v2 default model uses a hierarchy in which biosamples
reference individuals directly (if existing). For most purposes one can equate Beacon's
Individual
with a merge of Phenopacket's core Phenopacket
and Individual
parameters.
Building block comparisons: Beacon v2 ==
PXF v2¶
Age
¶
AgeRange
¶
Evidence
¶
KaryotypicSex
¶
ReferenceRange
¶
While unit
in Beacon points to a Unit
definition, this is itself an OntologyTerm
i.e. structurally the same.
Value
¶
Beacon v2 =~
PXF v2 (e.g. renamed or additional parameters)¶
ComplexValue
¶
Renamed ComplexValue.TypedQuantity.quantityType
compared to GA4GH Phenopackets v2 ComplexValue.TypedQuantity.type
due to problematic use of type
as parameter
ExternalReference
¶
Renamed ExternalReference.notes
compared to GA4GH Phenopackets v2 ExternalReference.description
due to problematic use of description
as parameter
Measurement
¶
Added notes
and date
.
PhenotypicFeature
¶
Beacon | Phenopackets |
---|---|
featureType |
type |
severity (re-used definition reflecting an ontology term) |
severity (ontology class) |
notes |
Procedure
¶
Beacon | Phenopackets |
---|---|
procedureCode |
code |
ageAtProcedure (TimeElement) |
performed (TimeElement ) |
dateOfProcedure (ISO date) |
TimeElement
¶
The specific parameters have been aligned w/ minimal differences in naming or use of general parameters.
Beacon | Phenopackets |
---|---|
ontologyTerm |
ontology_class |
age |
age (Age ) |
ageRange |
age_range (AgeRange ) |
gestationalAge |
gestational_age (GestationalAge ) |
...Timestamp |
timestamp (TimeStamp ) |
timeInterval |
interval (TimeInterval ) |
Treatment
¶
Beacon still has an ageOfOnset
parameter (?). Also, PXF agent
has been renamed to a more general treatmentCode
.
Beacon v2 ~
PXF v2 (e.g. multiple/complex differences)¶
Disease
¶
Pedigree
¶
While the Beacon & Phenopackets schemas for "pedigree" representation are not aligned, they may become superseded by the GA4GH pedigree standard currenty under development.
Sex
¶
Beacon directly uses the (IMO preferable) representation through an ontology term, while PXF uses an ordinal mapping
LINK: Phenopackets Documentation¶
-
Source: @andrewyatz at GenomeStandards ↩