phenopacket_mapper package

A package to map data from a tabular format to the GA4GH Phenopacket schema (v2).

class phenopacket_mapper.DataModel(name: str, fields: ~typing.Tuple[~phenopacket_mapper.data_standards.data_model.DataField | ~phenopacket_mapper.data_standards.data_model.DataSection | ~phenopacket_mapper.data_standards.data_model.OrGroup, ...], id: str = None, resources: ~typing.Tuple[~phenopacket_mapper.data_standards.code_system.CodeSystem, ...] = <factory>)[source]

Bases: object

This class defines a data model for medical data using DataField

A data model can be used to import data and map it to the Phenopacket schema. It is made up of a list of DataField

Given that all DataField objects in a DataModel have unique names, the id field is generated from the name. E.g.: DataField(name=’Date of Birth’, …) will have an id of ‘date_of_birth’. The DataField objects can be accessed using the id as an attribute of the DataModel object. E.g.: data_model.date_of_birth. This is useful in the data reading and mapping processes.

Variables:
  • name – Name of the data model

  • fields – List of DataField objects

  • resources – List of CodeSystem objects

name: str
fields: Tuple[DataField | DataSection | OrGroup, ...]
id: str
resources: Tuple[CodeSystem, ...]
property is_hierarchical: bool
get_field(field_id: str, default: Optional = None) DataField | None[source]

Returns a DataField object by its id

Parameters:
  • field_id – The id of the field

  • default – The default value to return if the field is not found

Returns:

The DataField object

get_field_ids() List[str][source]

Returns a list of the ids of the DataFields in the DataModel

load_data(path: str | Path, compliance: Literal['lenient', 'strict'] = 'lenient', **kwargs) DataSet[source]

Loads data from a file using a DataModel definition

To call this method, pass the column name for each field in the DataModel as a keyword argument. This is done by passing the field id followed by ‘_column’. E.g. if the DataModel has a field with id ‘date_of_birth’, the column name in the file should be passed as ‘date_of_birth_column’. The method will raise an error if any of the fields are missing.

E.g.: `python data_model = DataModel("Test data model", [DataField(name="Field 1", value_set=ValueSet())]) data_model.load_data("data.csv", field_1_column="column_name_in_file") `

Parameters:
  • path – Path to the file containing the data

  • compliance – Compliance level to use when loading the data.

  • kwargs – Dynamically passed parameters that match {id}_column for each item

Returns:

A list of DataModelInstance objects

class phenopacket_mapper.PhenopacketMapper(data_model: DataModel, **kwargs)[source]

Bases: object

check_data_fields_in_model(element)[source]
map(data: DataSet) List[Phenopacket][source]

Map data from the DataModel to Phenopackets

The mapping is based on the definition of the DataModel and the parameters passed to the constructor.

If successful, a list of Phenopackets will be returned

Parameters:

data – List of DataModelInstances created from the data using the DataModel

Returns:

List of Phenopackets

Subpackages